PHP TOP

mb_convert_encoding

(PHP 4 >= 4.0.6, PHP 5, PHP 7, PHP 8)

mb_convert_encoding — ある文字エンコーディングの文字列を、別の文字エンコーディングに変換する

説明

文字列 string の文字エンコーディングを、 from_encoding または現在の内部エンコーディングから to_encoding に変換します。 string が配列の場合、それに含まれる全ての文字列型の値が再帰的に変換されます。

パラメータ

string

変換する string または array

to_encoding

変換したい文字エンコーディング。

from_encoding

string を解釈するのに使われている現在の文字エンコーディング。配列またはカンマ区切りの文字列とすることで、複数のエンコーディングを指定できます。この場合、正しいエンコーディングを mb_detect_encoding() と同じアルゴリズムで推測します。

from_encoding が省略されたり、 null だった場合は、 mbstring.internal_encoding setting が設定されていた場合、それを使います。設定されていない場合は、 default_charset setting を使います。

to_encoding や from_encoding に指定できる値は、サポートされる文字エンコーディングを参照ください。

戻り値

成功時に、変換後の文字列または配列を返します。失敗した場合に false を返します

エラー / 例外

PHP 8.0.0 以降では、 to_encoding または from_encoding に不正なエンコーディングが渡された場合、 ValueError がスローされるようになりました。これより前のバージョンでは、 E_WARNING が発生していました。

変更履歴

バージョン	説明
8.2.0	mb_convert_encoding() は、以下のテキストでないエンコーディングを返さなくなりました: `"Base64"`, `"QPrint"`, `"UUencode"`, `"HTML entities"`, `"7 bit"`, `"8 bit"`
8.0.0	`to_encoding` に不正なエンコーディングが渡された場合、 ValueError がスローされるようになりました。
8.0.0	`from_encoding` に不正なエンコーディングが渡された場合、 ValueError がスローされるようになりました。
8.0.0	`from_encoding` は、nullable になりました。
7.2.0	この関数は、 `string` に配列を受け入れるようになりました。これより前のバージョンでは、文字列のみがサポートされていました。

例

例1 mb_convert_encoding() の例

<?php
/* 内部文字エンコーディングからSJISに変換 */
$str = mb_convert_encoding($str, "SJIS");

/* EUC-JPからUTF-7に変換 */
$str = mb_convert_encoding($str, "UTF-7", "EUC-JP");

/* JIS, eucjp-win, sjis-winの順番で自動検出し、UCS-2LEに変換 */
$str = mb_convert_encoding($str, "UCS-2LE", "JIS, eucjp-win, sjis-win");

/* mbstring.language が "Japanese" の場合 "auto" は、"ASCII,JIS,UTF-8,EUC-JP,SJIS" に展開される */
$str = mb_convert_encoding($str, "EUC-JP", "auto");
?>

参考

mb_detect_order() - 文字エンコーディング検出順序を設定あるいは取得する
UConverter::transcode() - ある文字エンコーディングから別の文字エンコーディングに文字列を変換する
iconv() - ある文字エンコーディングの文字列を、別の文字エンコーディングに変換する

Found A Problem?

Learn How To Improve This Page • Submit a Pull Request • Report a Bug

＋add a note

User Contributed Notes 30 notes

down

josip at cubrad dot com ¶

11 years ago

For my last project I needed to convert several CSV files from Windows-1250 to UTF-8, and after several days of searching around I found a function that is partially solved my problem, but it still has not transformed all the characters. So I made this:

function w1250_to_utf8($text) {
    // map based on:
    // http://konfiguracja.c0.pl/iso02vscp1250en.html
    // http://konfiguracja.c0.pl/webpl/index_en.html#examp
    // http://www.htmlentities.com/html/entities/
    $map = array(
        chr(0x8A) => chr(0xA9),
        chr(0x8C) => chr(0xA6),
        chr(0x8D) => chr(0xAB),
        chr(0x8E) => chr(0xAE),
        chr(0x8F) => chr(0xAC),
        chr(0x9C) => chr(0xB6),
        chr(0x9D) => chr(0xBB),
        chr(0xA1) => chr(0xB7),
        chr(0xA5) => chr(0xA1),
        chr(0xBC) => chr(0xA5),
        chr(0x9F) => chr(0xBC),
        chr(0xB9) => chr(0xB1),
        chr(0x9A) => chr(0xB9),
        chr(0xBE) => chr(0xB5),
        chr(0x9E) => chr(0xBE),
        chr(0x80) => '&euro;',
        chr(0x82) => '&sbquo;',
        chr(0x84) => '&bdquo;',
        chr(0x85) => '&hellip;',
        chr(0x86) => '&dagger;',
        chr(0x87) => '&Dagger;',
        chr(0x89) => '&permil;',
        chr(0x8B) => '&lsaquo;',
        chr(0x91) => '&lsquo;',
        chr(0x92) => '&rsquo;',
        chr(0x93) => '&ldquo;',
        chr(0x94) => '&rdquo;',
        chr(0x95) => '&bull;',
        chr(0x96) => '&ndash;',
        chr(0x97) => '&mdash;',
        chr(0x99) => '&trade;',
        chr(0x9B) => '&rsquo;',
        chr(0xA6) => '&brvbar;',
        chr(0xA9) => '&copy;',
        chr(0xAB) => '&laquo;',
        chr(0xAE) => '&reg;',
        chr(0xB1) => '&plusmn;',
        chr(0xB5) => '&micro;',
        chr(0xB6) => '&para;',
        chr(0xB7) => '&middot;',
        chr(0xBB) => '&raquo;',
    );
    return html_entity_decode(mb_convert_encoding(strtr($text, $map), 'UTF-8', 'ISO-8859-2'), ENT_QUOTES, 'UTF-8');
}

down

Julian Egelstaff ¶

2 years ago

If you have what looks like ISO-8859-1, but it includes "smart quotes" courtesy of Microsoft software, or people cutting and pasting content from Microsoft software, then what you're actually dealing with is probably Windows-1252. Try this:

<?php
$cleanText = mb_convert_encoding($text, 'UTF-8', 'Windows-1252');
?>

The annoying part is that the auto detection (ie: the mb_detect_encoding function) will often think Windows-1252 is ISO-8859-1. Close, but no cigar. This is critical if you're then trying to do unserialize on the resulting text, because the byte count of the string needs to be perfect.

down

regrunge at hotmail dot it ¶

14 years ago

I've been trying to find the charset of a norwegian (with a lot of ø, æ, å) txt file written on a Mac, i've found it in this way:



<?php

$text = "A strange string to pass, maybe with some ø, æ, å characters.";



foreach(mb_list_encodings() as $chr){

        echo mb_convert_encoding($text, 'UTF-8', $chr)." : ".$chr."<br>";    

 } 

?>



The line that looks good, gives you the encoding it was written in.



Hope can help someone

down

volker at machon dot biz ¶

17 years ago

Hey guys. For everybody who's looking for a function that is converting an iso-string to utf8 or an utf8-string to iso, here's your solution:

public function encodeToUtf8($string) {
     return mb_convert_encoding($string, "UTF-8", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}

public function encodeToIso($string) {
     return mb_convert_encoding($string, "ISO-8859-1", mb_detect_encoding($string, "UTF-8, ISO-8859-1, ISO-8859-15", true));
}

For me these functions are working fine. Give it a try

down

Rainer Perske ¶

2 years ago

Text-encoding HTML-ENTITIES will be deprecated as of PHP 8.2.

To convert all non-ASCII characters into entities (to produce pure 7-bit HTML output), I was using:

<?php
echo mb_convert_encoding( htmlspecialchars( $text, ENT_QUOTES, 'UTF-8' ), 'HTML-ENTITIES', 'UTF-8' );
?>

I can get the identical result with:

<?php
echo mb_encode_numericentity( htmlentities( $text, ENT_QUOTES, 'UTF-8' ), [0x80, 0x10FFFF, 0, ~0], 'UTF-8' );
?>

The output contains well-known named entities for some often used characters and numeric entities for the rest.

down

francois at bonzon point com ¶

16 years ago

aaron, to discard unsupported characters instead of printing a ?, you might as well simply set the configuration directive:

mbstring.substitute_character = "none"

in your php.ini. Be sure to include the quotes around none. Or at run-time with

<?php
ini_set('mbstring.substitute_character', "none");
?>

down

aaron at aarongough dot com ¶

16 years ago

My solution below was slightly incorrect, so here is the correct version (I posted at the end of a long day, never a good idea!)

Again, this is a quick and dirty solution to stop mb_convert_encoding from filling your string with question marks whenever it encounters an illegal character for the target encoding. 

<?php
function convert_to ( $source, $target_encoding )
    {
    // detect the character encoding of the incoming file
    $encoding = mb_detect_encoding( $source, "auto" );
       
    // escape all of the question marks so we can remove artifacts from
    // the unicode conversion process
    $target = str_replace( "?", "[question_mark]", $source );
       
    // convert the string to the target encoding
    $target = mb_convert_encoding( $target, $target_encoding, $encoding);
       
    // remove any question marks that have been introduced because of illegal characters
    $target = str_replace( "?", "", $target );
       
    // replace the token string "[question_mark]" with the symbol "?"
    $target = str_replace( "[question_mark]", "?", $target );
   
    return $target;
    }
?>

Hope this helps someone! (Admins should feel free to delete my previous, incorrect, post for clarity)
-A

down

eion at bigfoot dot com ¶

18 years ago

many people below talk about using 

<?php

    mb_convert_encode($s,'HTML-ENTITIES','UTF-8');

?>

to convert non-ascii code into html-readable stuff.  Due to my webserver being out of my control, I was unable to set the database character set, and whenever PHP made a copy of my $s variable that it had pulled out of the database, it would convert it to nasty latin1 automatically and not leave it in it's beautiful UTF-8 glory.



So [insert korean characters here] turned into ?????.



I found myself needing to pass by reference (which of course is deprecated/nonexistent in recent versions of PHP)

so instead of

<?php

    mb_convert_encode(&$s,'HTML-ENTITIES','UTF-8');

?>

which worked perfectly until I upgraded, so I had to use

<?php

    call_user_func_array('mb_convert_encoding', array(&$s,'HTML-ENTITIES','UTF-8'));

?>



Hope it helps someone else out

down

Stephan van der Feest ¶

19 years ago

To add to the Flash conversion comment below, here's how I convert back from what I've stored in a database after converting from Flash HTML text field output, in order to load it back into a Flash HTML text field:

function htmltoflash($htmlstr)
{
  return str_replace("&lt;br /&gt;","\n",
    str_replace("<","&lt;",
      str_replace(">","&gt;",
        mb_convert_encoding(html_entity_decode($htmlstr),
        "UTF-8","ISO-8859-1"))));
}

down

urko at wegetit dot eu ¶

12 years ago

If you are trying to generate a CSV (with extended chars) to be opened at Exel for Mac, the only that worked for me was:

<?php mb_convert_encoding( $CSV, 'Windows-1252', 'UTF-8'); ?>



I also tried this:



<?php

//Separado OK, chars MAL

iconv('MACINTOSH', 'UTF8', $CSV);

//Separado MAL, chars OK

chr(255).chr(254).mb_convert_encoding( $CSV, 'UCS-2LE', 'UTF-8');

?>



But the first one didn't show extended chars correctly, and the second one, did't separe fields correctly

down

me at gsnedders dot com ¶

15 years ago

It appears that when dealing with an unknown "from encoding" the function will both throw an E_WARNING and proceed to convert the string from ISO-8859-1 to the "to encoding".

down

vasiliauskas dot agnius at gmail dot com ¶

6 years ago

When you need to convert from HTML-ENTITIES, but your UTF-8 string is partially broken (not all chars in UTF-8) - in this case passing string to mb_convert_encoding($string, 'UTF-8', 'HTML-ENTITIES'); - corrupts chars in string even more. In this case you need to replace html entities gradually to preserve character good encoding. I wrote such closure for this job :
<?php
$decode_entities = function($string) {
        preg_match_all("/&#?\w+;/", $string, $entities, PREG_SET_ORDER);
        $entities = array_unique(array_column($entities, 0));
        foreach ($entities as $entity) {
            $decoded = mb_convert_encoding($entity, 'UTF-8', 'HTML-ENTITIES');
            $string = str_replace($entity, $decoded, $string);
        }
        return $string;
    };
?>

down

Daniel Trebbien ¶

15 years ago

Note that `mb_convert_encoding($val, 'HTML-ENTITIES')` does not escape '\'', '"', '<', '>', or '&'.

down

katzlbtjunk at hotmail dot com ¶

16 years ago

Clean a string for use as filename by simply replacing all unwanted characters with underscore (ASCII converts to 7bit). It removes slightly more chars than necessary. Hope its useful. 

$fileName = 'Test:!"$%&/()=ÖÄÜöäü<<';
echo strtr(mb_convert_encoding($fileName,'ASCII'), 
    ' ,;:?*#!§$%&/(){}<>=`´|\\\'"', 
    '____________________________');

Atelier ”慶”

WEBクリエイター歴２０年、最先端技術と安心をお届けします。

PHPのお勉強！

mb_convert_encoding

説明

パラメータ

戻り値

エラー / 例外

変更履歴

例

参考

Found A Problem?

User Contributed Notes 30 notes