PHP TOP

vfprintf »

« utf8_decode

utf8_encode

(PHP 4, PHP 5, PHP 7, PHP 8)

utf8_encode — ISO-8859-1 文字列を UTF-8 に変換する

警告

この関数は PHP 8.2.0 で 非推奨になります。この関数に頼らないことを強く推奨します。

説明

utf8_encode(string $string): string

この関数は、文字列 string を ISO-8859-1 エンコードから UTF-8 へ変換します。

注意:
この関数は、指定された文字列の現在の文字エンコーディングを推測しません。代わりに、 ISO-8859-1 ("Latin 1" とも呼ばれています) としてエンコードされていると解釈し、UTF-8 に変換します。全てのバイト列は有効な ISO-8859-1 の文字列であるため、この関数は決してエラーになりません。しかし、異なるエンコーディングを意図していた場合、有用な結果にはならないでしょう。

ISO-8859-1 文字エンコーディングを使っているとマークされている多くの Web ページが、実際にはそれと似た Windows-1252 を使っており、 Web ブラウザは ISO-8859-1 Web ページを Windows-1252 として解釈しています。Windows-1252 は ISO-8859-1 のある制御文字の代わりに、ユーロ記号 (€) や curly quote (“ ”) を印字可能な文字として追加しています。この関数はそうした Windows-1252 文字を正しく変換しません。 Windows-1252 の変換が必要な場合は、別の関数を使ってください。

パラメータ

string: ISO-8859-1 形式の文字列。

戻り値

string を UTF-8 に変換した結果を返します。

変更履歴

バージョン	説明
8.2.0	この関数は、推奨されなくなりました。
7.2.0	この関数は、XML拡張モジュールから PHP のコアに移動しました。これより前のバージョンでは、この関数は XML拡張モジュールをインストールしていた場合にのみ利用可能でした。

例

例1 基本的な例

<?php
// Convert the string 'Zoë' from ISO 8859-1 to UTF-8
$iso8859_1_string = "\x5A\x6F\xEB";
$utf8_string = utf8_encode($iso8859_1_string);
echo bin2hex($utf8_string), "\n";
?>

上の例の出力は以下となります。

5a6fc3ab

注意

注意: この関数は推奨されません。代替については下記のとおりです。

この関数は、PHP 8.2.0 以降は 推奨されなくなり、将来のバージョンで削除される予定です。この関数を使っているコードをチェックし、適切な代替に置き換えるべきです。

この関数と似た機能は、 mb_convert_encoding() で実現できます。この関数は、ISO-8859-1 と、多くの他の文字エンコーディングをサポートしています。
<?php $iso8859_1_string = "\xEB"; // 'ë' (e with diaeresis) in ISO-8859-1 $utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1'); echo bin2hex($utf8_string), "\n"; $iso8859_7_string = "\xEB"; // the same string in ISO-8859-7 represents 'λ' (Greek lower-case lambda) $utf8_string = mb_convert_encoding($iso8859_7_string, 'UTF-8', 'ISO-8859-7'); echo bin2hex($utf8_string), "\n"; $windows_1252_string = "\x80"; // '€' (Euro sign) in Windows-1252, but not in ISO-8859-1 $utf8_string = mb_convert_encoding($windows_1252_string, 'UTF-8', 'Windows-1252'); echo bin2hex($utf8_string), "\n"; ?>

上の例の出力は以下となります。
c3ab
cebb
e282ac
他の代替として、インストールされている拡張機能に依存した関数ですが、 UConverter::transcode() と iconv() が挙げられます。

次のコードは、いずれも同じ結果を返します:
<?php $iso8859_1_string = "\x5A\x6F\xEB"; // 'Zoë' in ISO-8859-1 $utf8_string = utf8_encode($iso8859_1_string); echo bin2hex($utf8_string), "\n"; $utf8_string = mb_convert_encoding($iso8859_1_string, 'UTF-8', 'ISO-8859-1'); echo bin2hex($utf8_string), "\n"; $utf8_string = UConverter::transcode($iso8859_1_string, 'UTF8', 'ISO-8859-1'); echo bin2hex($utf8_string), "\n"; $utf8_string = iconv('ISO-8859-1', 'UTF-8', $iso8859_1_string); echo bin2hex($utf8_string), "\n"; ?>

上の例の出力は以下となります。
5a6fc3ab
5a6fc3ab
5a6fc3ab
5a6fc3ab

参考

utf8_decode() - UTF-8 エンコードされた文字列を、ISO-8859-1 に変換し、表現できない文字を置換する
mb_convert_encoding() - ある文字エンコーディングの文字列を、別の文字エンコーディングに変換する
UConverter::transcode() - ある文字エンコーディングから別の文字エンコーディングに文字列を変換する
iconv() - ある文字エンコーディングの文字列を、別の文字エンコーディングに変換する

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 24 notes

down

140

deceze at gmail dot com ¶

13 years ago

Please note that utf8_encode only converts a string encoded in ISO-8859-1 to UTF-8. A more appropriate name for it would be "iso88591_to_utf8". If your text is not encoded in  ISO-8859-1, you do not need this function. If your text is already in UTF-8, you do not need this function. In fact, applying this function to text that is not encoded in ISO-8859-1 will most likely simply garble that text.

If you need to convert text from any encoding to any other encoding, look at iconv() instead.

down

Aidan Kehoe <php-manual at parhasard dot net> ¶

20 years ago

Here's some code that addresses the issue that Steven describes in the previous comment; 

<?php

/* This structure encodes the difference between ISO-8859-1 and Windows-1252,
   as a map from the UTF-8 encoding of some ISO-8859-1 control characters to
   the UTF-8 encoding of the non-control characters that Windows-1252 places
   at the equivalent code points. */

$cp1252_map = array(
    "\xc2\x80" => "\xe2\x82\xac", /* EURO SIGN */
    "\xc2\x82" => "\xe2\x80\x9a", /* SINGLE LOW-9 QUOTATION MARK */
    "\xc2\x83" => "\xc6\x92",     /* LATIN SMALL LETTER F WITH HOOK */
    "\xc2\x84" => "\xe2\x80\x9e", /* DOUBLE LOW-9 QUOTATION MARK */
    "\xc2\x85" => "\xe2\x80\xa6", /* HORIZONTAL ELLIPSIS */
    "\xc2\x86" => "\xe2\x80\xa0", /* DAGGER */
    "\xc2\x87" => "\xe2\x80\xa1", /* DOUBLE DAGGER */
    "\xc2\x88" => "\xcb\x86",     /* MODIFIER LETTER CIRCUMFLEX ACCENT */
    "\xc2\x89" => "\xe2\x80\xb0", /* PER MILLE SIGN */
    "\xc2\x8a" => "\xc5\xa0",     /* LATIN CAPITAL LETTER S WITH CARON */
    "\xc2\x8b" => "\xe2\x80\xb9", /* SINGLE LEFT-POINTING ANGLE QUOTATION */
    "\xc2\x8c" => "\xc5\x92",     /* LATIN CAPITAL LIGATURE OE */
    "\xc2\x8e" => "\xc5\xbd",     /* LATIN CAPITAL LETTER Z WITH CARON */
    "\xc2\x91" => "\xe2\x80\x98", /* LEFT SINGLE QUOTATION MARK */
    "\xc2\x92" => "\xe2\x80\x99", /* RIGHT SINGLE QUOTATION MARK */
    "\xc2\x93" => "\xe2\x80\x9c", /* LEFT DOUBLE QUOTATION MARK */
    "\xc2\x94" => "\xe2\x80\x9d", /* RIGHT DOUBLE QUOTATION MARK */
    "\xc2\x95" => "\xe2\x80\xa2", /* BULLET */
    "\xc2\x96" => "\xe2\x80\x93", /* EN DASH */
    "\xc2\x97" => "\xe2\x80\x94", /* EM DASH */

    "\xc2\x98" => "\xcb\x9c",     /* SMALL TILDE */
    "\xc2\x99" => "\xe2\x84\xa2", /* TRADE MARK SIGN */
    "\xc2\x9a" => "\xc5\xa1",     /* LATIN SMALL LETTER S WITH CARON */
    "\xc2\x9b" => "\xe2\x80\xba", /* SINGLE RIGHT-POINTING ANGLE QUOTATION*/
    "\xc2\x9c" => "\xc5\x93",     /* LATIN SMALL LIGATURE OE */
    "\xc2\x9e" => "\xc5\xbe",     /* LATIN SMALL LETTER Z WITH CARON */
    "\xc2\x9f" => "\xc5\xb8"      /* LATIN CAPITAL LETTER Y WITH DIAERESIS*/
);

function cp1252_to_utf8($str) {
        global $cp1252_map; 
        return  strtr(utf8_encode($str), $cp1252_map);
}

?>

down

Pini ¶

9 years ago

My version of utf8_encode_deep, 
In case you need one that returns a value without changing the original.

        /**
        * Convert Anything To UTF-8
        * @param mixed $var The variable you want to convert.
        * @param boolean $deep Deep convertion? (*Default: TRUE).
        * @return mixed
        */
        function anything_to_utf8($var,$deep=TRUE){
            if(is_array($var)){
                foreach($var as $key => $value){
                    if($deep){
                        $var[$key] = anything_to_utf8($value,$deep);
                    }elseif(!is_array($value) && !is_object($value) && !mb_detect_encoding($value,'utf-8',true)){
                         $var[$key] = utf8_encode($var);
                    }
                }
                return $var;
            }elseif(is_object($var)){
                foreach($var as $key => $value){
                    if($deep){
                        $var->$key = anything_to_utf8($value,$deep);
                    }elseif(!is_array($value) && !is_object($value) && !mb_detect_encoding($value,'utf-8',true)){
                         $var->$key = utf8_encode($var);
                    }
                }
                return $var;
            }else{
                return (!mb_detect_encoding($var,'utf-8',true))?utf8_encode($var):$var;
            }
        }

down

a dot rueedlinger at gmail dot com ¶

11 years ago

If you need a function which converts a string array into a utf8 encoded string array then this function might be useful for you:

<?php
function utf8_string_array_encode(&$array){
    $func = function(&$value,&$key){
        if(is_string($value)){
            $value = utf8_encode($value);
        } 
        if(is_string($key)){
            $key = utf8_encode($key);
        }
        if(is_array($value)){
            utf8_string_array_encode($value);
        }
    };
    array_walk($array,$func);
    return $array;
}
?>

down

bisqwit at iki dot fi ¶

19 years ago

For reference, it may be insightful to point out that:
  utf8_encode($s)
is actually identical to:
  recode_string('latin1..utf8', $s)
and:
  iconv('iso-8859-1', 'utf-8', $s)
That is, utf8_encode is a specialized case of character set conversions.

If your string to be converted to utf-8 is something other than iso-8859-1 (such as iso-8859-2 (Polish/Croatian)), you should use recode_string() or iconv() instead rather than trying to devise complex str_replace statements.

down

Oscar Broman ¶

12 years ago

Walk through nested arrays/objects and utf8 encode all strings.

<?php
// Usage
class Foo {
    public $somevar = 'whoop whoop';
}

$structure = array(
    'object' => (object) array(
        'entry' => 'hello wörld',
        'another_array' => array(
            'string',
            1234,
            'another string'
        )
    ),
    'string' => 'foo',
    'foo_object' => new Foo
);

utf8_encode_deep($structure);

// $structure is now utf8 encoded
print_r($structure);

// The function
function utf8_encode_deep(&$input) {
    if (is_string($input)) {
        $input = utf8_encode($input);
    } else if (is_array($input)) {
        foreach ($input as &$value) {
            utf8_encode_deep($value);
        }

        unset($value);
    } else if (is_object($input)) {
        $vars = array_keys(get_object_vars($input));

        foreach ($vars as $var) {
            utf8_encode_deep($input->$var);
        }
    }
}
?>

down

rocketman ¶

18 years ago

If you are looking for a function to replace special characters with the hex-utf-8 value (e.g. für Webservice-Security/WSS4J compliancy) you might use this:

$textstart = "Größe";
$utf8 ='';
$max = strlen($txt);

for ($i = 0; $i < $max; $i++) {

if ($txt{i} == "&"){
$neu = "&x26;";
}
elseif ((ord($txt{$i}) < 32) or (ord($txt{$i}) > 127)){
$neu = urlencode(utf8_encode($txt{$i}));
$neu = preg_replace('#\%(..)\%(..)\%(..)#','&#x\1;&#x\2;&#x\3;',$neu);
$neu = preg_replace('#\%(..)\%(..)#','&#x\1;&#x\2;',$neu);
$neu = preg_replace('#\%(..)#','&#x\1;',$neu);
}
else {
$neu = $txt{$i};
}
        
$utf8 .= $neu;
} // for $i

$textnew = $utf8;

In this example $textnew will be "Gr&#xC3;&#xB6;&#xC3;&#x9F;e"

down

Janci ¶

19 years ago

I was searching for a function similar to Javascript's unescape(). In most cases it is OK to use url_decode() function but not if you've got UTF characters in the strings. They are converted into %uXXXX entities that url_decode() cannot handle.
I googled the net and found a function which actualy converts these entities into HTML entities (&#xxx;) that your browser can show correctly. If you're OK with that, the function can be found here: http://pure-essence.net/stuff/code/utf8RawUrlDecode.phps

But it was not OK with me because I needed a string in my charset to make some comparations and other stuff. So I have modified the above function and in conjuction with code2utf() function mentioned in some other note here, I have managed to achieve my goal:

<?php
/**
 * Function converts an Javascript escaped string back into a string with specified charset (default is UTF-8). 
 * Modified function from http://pure-essence.net/stuff/code/utf8RawUrlDecode.phps
 *
 * @param string $source escaped with Javascript's escape() function
 * @param string $iconv_to destination character set will be used as second paramether in the iconv function. Default is UTF-8.
 * @return string
 */
function unescape($source, $iconv_to = 'UTF-8') {
    $decodedStr = '';
    $pos = 0;
    $len = strlen ($source);
    while ($pos < $len) {
        $charAt = substr ($source, $pos, 1);
        if ($charAt == '%') {
            $pos++;
            $charAt = substr ($source, $pos, 1);
            if ($charAt == 'u') {
                // we got a unicode character
                $pos++;
                $unicodeHexVal = substr ($source, $pos, 4);
                $unicode = hexdec ($unicodeHexVal);
                $decodedStr .= code2utf($unicode);
                $pos += 4;
            }
            else {
                // we have an escaped ascii character
                $hexVal = substr ($source, $pos, 2);
                $decodedStr .= chr (hexdec ($hexVal));
                $pos += 2;
            }
        }
        else {
            $decodedStr .= $charAt;
            $pos++;
        }
    }

    if ($iconv_to != "UTF-8") {
        $decodedStr = iconv("UTF-8", $iconv_to, $decodedStr);
    }
    
    return $decodedStr;
}

/**
 * Function coverts number of utf char into that character.
 * Function taken from: http://sk2.php.net/manual/en/function.utf8-encode.php#49336
 *
 * @param int $num
 * @return utf8char
 */
function code2utf($num){
    if($num<128)return chr($num);
    if($num<2048)return chr(($num>>6)+192).chr(($num&63)+128);
    if($num<65536)return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);
    if($num<2097152)return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128) .chr(($num&63)+128);
    return '';
}
?>

down

rogeriogirodo at gmail dot com ¶

15 years ago

This function may be useful do encode array keys and values [and checks first to see if it's already in UTF format]:



<?php

public static function to_utf8($in)

{

        if (is_array($in)) {

            foreach ($in as $key => $value) {

                $out[to_utf8($key)] = to_utf8($value);

            }

        } elseif(is_string($in)) {

            if(mb_detect_encoding($in) != "UTF-8")

                return utf8_encode($in);

            else

                return $in;

        } else {

            return $in;

        }

        return $out;

}

?>



Hope this may help.



[NOTE BY danbrown AT php DOT net: Original function written by (cmyk777 AT gmail DOT com) on 28-JAN-09.]

down

powtac 4t gmx d0t de ¶

13 years ago

I tried a lot of things, but this seems to be the final fail save method to convert any string to proper UTF-8. 



<?php

function _convert($content) {

    if(!mb_check_encoding($content, 'UTF-8')

        OR !($content === mb_convert_encoding(mb_convert_encoding($content, 'UTF-32', 'UTF-8' ), 'UTF-8', 'UTF-32'))) {



        $content = mb_convert_encoding($content, 'UTF-8');



        if (mb_check_encoding($content, 'UTF-8')) {

            // log('Converted to UTF-8');

        } else {

            // log('Could not converted to UTF-8');

        }

    }

    return $content;

}

?>

down

Anonymous ¶

19 years ago

// Reads a file story.txt ascii (as typed on keyboard) 
// converts it to Georgian character using utf8 encoding
// if I am correct(?) just as it should be when typed on Georgian computer
// it outputs it as an html file
// 
// http://www.comweb.nl/keys_to_georgian.html
// http://www.comweb.nl/keys_to_georgian.php
// http://www.comweb.nl/story.txt

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">

<HTML>
<HEAD>
<TITLE>keys to unicode code</TITLE>

// this meta tag is needed
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" >

// note the sylfean font seems to be standard installed on Windows XP
// It supports Georgian
 
<style TYPE="text/css">
<!--
body {font-family:sylfaen; }
-->
</style>
</HEAD>

<BODY>

<?
$eng=array(97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,
112,113,114,115,116,117,118,119,120,121,122,87,82,84,83,
67,74,90);
$geo=array(4304,4305,4330,4307,4308,4324,4306,4336,4312,4335,4313,
4314,4315,4316,4317,4318,4325,4320,4321,4322,4323,4309,
4332,4334,4327,4310,4333,4326,4311,4328,4329,4319,4331,
91,93,59,39,44,46,96);

$fc=file("story.txt");
foreach($fc as $line)
{
   $spacestart=1;
   for ($i=0; $i<strlen($line); $i+=1)
   {
      $character=ord(substr($line,$i,1));
      $found=0;
      for ($k=0; $k<count($eng); $k+=1)
      {
         if ($eng[$k]==$character)
         {
             print code2utf( $geo[$k] );
             $found=1;
         }
      }
      if ($found==0) 
      {
         if ($character==126 || $character==32 || $character==10 || $character==9)
         {
            if ($character==9)  { print '&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;'; }
            if ($character==10) { print "<BR>\n"; }
            if ($character==32) 
            { 
               if ($spacestart==1) {print '&nbsp;'; } else { print " "; }
            }
            if ($character==126){ print "~";      }
         } else
         { 
            print substr($line,$i,1);
         } 
      }
      if ($character!=32) { $spacestart=0; }
   }
}

/**
 * Function coverts number of utf char into that character.
 * Function taken from: http://sk2.php.net/manual/en/function.utf8-encode.php#49336
 *
 * @param int $num
 * @return utf8char
*/
function code2utf($num)
{
   if($num<128)return chr($num);
   if($num<2048)return chr(($num>>6)+192).chr(($num&63)+128);
   if($num<65536)return chr(($num>>12)+224).chr((($num>>6)&63)+128).chr(($num&63)+128);
   if($num<2097152)return chr(($num>>18)+240).chr((($num>>12)&63)+128).chr((($num>>6)&63)+128) .chr(($num&63)+128);
   return '';
}
?>

</BODY>
</HTML>

down

hrpeters (at) gmx (dot) net ¶

20 years ago

// Validate Unicode UTF-8 Version 4

// This function takes as reference the table 3.6 found at http://www.unicode.org/versions/Unicode4.0.0/ch03.pdf

// It also flags overlong bytes as error



function is_validUTF8($str)

{

    // values of -1 represent disalloweded values for the first bytes in current UTF-8

    static $trailing_bytes = array (

        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

        0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0, 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,

        -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,

        -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,

        -1,-1,1,1,1,1,1,1,1,1,1,1,1,1,1,1, 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,

        2,2,2,2,2,2,2,2,2,2,2,2,2,2,2,2, 3,3,3,3,3,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1

    );



    $ups = unpack('C*', $str);

    if (!($aCnt = count($ups))) return true; // Empty string *is* valid UTF-8 

    for ($i = 1; $i <= $aCnt;)

    {

        if (!($tbytes = $trailing_bytes[($b1 = $ups[$i++])])) continue;

        if ($tbytes == -1) return false;

        

        $first = true;

        while ($tbytes > 0 && $i <= $aCnt)

        {

            $cbyte = $ups[$i++];

            if (($cbyte & 0xC0) != 0x80) return false;

            

            if ($first)

            {

                switch ($b1)

                {

                    case 0xE0:

                        if ($cbyte < 0xA0) return false;

                        break;

                    case 0xED:

                        if ($cbyte > 0x9F) return false;

                        break;

                    case 0xF0:

                        if ($cbyte < 0x90) return false;

                        break;

                    case 0xF4:

                        if ($cbyte > 0x8F) return false;

                        break;

                    default:

                        break;

                }

                $first = false;

            }

            $tbytes--;

        }

        if ($tbytes) return false; // incomplete sequence at EOS

    }        

    return true;

}

down

Mark AT modernbill DOT com ¶

20 years ago

If you haven't guessed already: If the UTF-8 character has no representation in the ISO-8859-1 codepage, a ? will be returned. You might want to wrap a function around this to make sure you aren't saving a bunch of ???? into your database.

down

-1

Atelier ”慶”

WEBクリエイター歴２０年、最先端技術と安心をお届けします。

PHPのお勉強！

utf8_encode

説明

パラメータ

戻り値

変更履歴

例

注意

参考

Found A Problem?

User Contributed Notes 24 notes