unpack
(PHP 4, PHP 5, PHP 7, PHP 8)
unpack — バイナリ文字列からデータを切り出す
説明
format
に基づき、バイナリ文字列から配列に分解します。
分解した結果は連想配列に格納されます。 このようにするには、別のフォーマットコードを使用してそれらを スラッシュ / で区切る必要があります。 引数にリピータが含まれる場合の配列の要素名は、 指定した名前の後に順番に番号がついたものとなります。
Perl の関数に動きを近づけるために、以下の変更が行われています:
- "a" は最後の NULL バイトを維持します。
- "A" は最後の ASCII 空白文字 (スペース、タブ、改行、キャリッジリターン、 NULL バイト) をすべて取り除きます。
- NULL 埋め文字列用に "Z" が追加されました。 これは最後の NULL バイトを取り除きます。
戻り値
バイナリ文字列を切り出した要素を含む連想配列を返します。失敗した場合に false
を返します
変更履歴
バージョン | 説明 |
---|---|
7.2.0 | float および double 型は、 ビッグエンディアンとリトルエンディアンの両方をサポートします。 |
7.1.0 |
オプションの offset が追加されました。
|
例
例1 unpack() の例
<?php
$binarydata = "\x04\x00\xa0\x00";
$array = unpack("cchars/nint", $binarydata);
print_r($array);
?>
上の例の出力は以下となります。
Array ( [chars] => 4 [int] => 160 )
例2 unpack() でのリピータの例
<?php
$binarydata = "\x04\x00\xa0\x00";
$array = unpack("c2chars/nint", $binarydata);
print_r($array);
?>
上の例の出力は以下となります。
Array ( [chars1] => 4 [chars2] => 0 [int] => 40960 )
注意
PHP は内部的に整数を符号付きで保持することに注意しましょう。 大きな値の unsigned long を切り出した場合、PHP の内部で保持された値は、 同じ大きさの符号付き整数となり、符号無しを指定して切出された場合でも 結果は負の数となります。
要素に名前をつけなければ、1
から始まる数値インデックスを用います。
名前をつけない要素が複数ある場合は、データが上書きされてしまうかもしれないことに注意しましょう。
それぞれの要素について、数値インデックスが 1
から割りあてられるからです。
例3 unpack() で名前のないキーを扱う例
<?php
$binarydata = "\x32\x42\x00\xa0";
$array = unpack("c2/n", $binarydata);
var_dump($array);
?>
上の例の出力は以下となります。
array(2) { [1]=> int(160) [2]=> int(66) }
c
で指定した最初の値が
n
で指定した値で上書きされることに注意しましょう。
User Contributed Notes 14 notes
A helper class to convert integer to binary strings and vice versa. Useful for writing and reading integers to / from files or sockets.
<?php
class int_helper
{
public static function int8($i) {
return is_int($i) ? pack("c", $i) : unpack("c", $i)[1];
}
public static function uInt8($i) {
return is_int($i) ? pack("C", $i) : unpack("C", $i)[1];
}
public static function int16($i) {
return is_int($i) ? pack("s", $i) : unpack("s", $i)[1];
}
public static function uInt16($i, $endianness=false) {
$f = is_int($i) ? "pack" : "unpack";
if ($endianness === true) { // big-endian
$i = $f("n", $i);
}
else if ($endianness === false) { // little-endian
$i = $f("v", $i);
}
else if ($endianness === null) { // machine byte order
$i = $f("S", $i);
}
return is_array($i) ? $i[1] : $i;
}
public static function int32($i) {
return is_int($i) ? pack("l", $i) : unpack("l", $i)[1];
}
public static function uInt32($i, $endianness=false) {
$f = is_int($i) ? "pack" : "unpack";
if ($endianness === true) { // big-endian
$i = $f("N", $i);
}
else if ($endianness === false) { // little-endian
$i = $f("V", $i);
}
else if ($endianness === null) { // machine byte order
$i = $f("L", $i);
}
return is_array($i) ? $i[1] : $i;
}
public static function int64($i) {
return is_int($i) ? pack("q", $i) : unpack("q", $i)[1];
}
public static function uInt64($i, $endianness=false) {
$f = is_int($i) ? "pack" : "unpack";
if ($endianness === true) { // big-endian
$i = $f("J", $i);
}
else if ($endianness === false) { // little-endian
$i = $f("P", $i);
}
else if ($endianness === null) { // machine byte order
$i = $f("Q", $i);
}
return is_array($i) ? $i[1] : $i;
}
}
?>
Usage example:
<?php
Header("Content-Type: text/plain");
include("int_helper.php");
echo int_helper::uInt8(0x6b) . PHP_EOL; // k
echo int_helper::uInt8(107) . PHP_EOL; // k
echo int_helper::uInt8("\x6b") . PHP_EOL . PHP_EOL; // 107
echo int_helper::uInt16(4101) . PHP_EOL; // \x05\x10
echo int_helper::uInt16("\x05\x10") . PHP_EOL; // 4101
echo int_helper::uInt16("\x05\x10", true) . PHP_EOL . PHP_EOL; // 1296
echo int_helper::uInt32(2147483647) . PHP_EOL; // \xff\xff\xff\x7f
echo int_helper::uInt32("\xff\xff\xff\x7f") . PHP_EOL . PHP_EOL; // 2147483647
// Note: Test this with 64-bit build of PHP
echo int_helper::uInt64(9223372036854775807) . PHP_EOL; // \xff\xff\xff\xff\xff\xff\xff\x7f
echo int_helper::uInt64("\xff\xff\xff\xff\xff\xff\xff\x7f") . PHP_EOL . PHP_EOL; // 9223372036854775807
?>
This is about the last example of my previous post. For the sake of clarity, I'm including again here the example, which expands the one given in the formal documentation:
<?
$binarydata = "AA\0A";
$array = unpack("c2chars/nint", $binarydata);
foreach ($array as $key => $value)
echo "\$array[$key] = $value <br>\n";
?>
This outputs:
$array[chars1] = 65
$array[chars2] = 65
$array[int] = 65
Here, we assume that the ascii code for character 'A' is decimal 65.
Remebering that the format string structure is:
<format-code> [<count>] [<array-key>] [/ ...],
in this example, the format string instructs the function to
1. ("c2...") Read two chars from the second argument ("AA ...),
2. (...chars...) Use the array-keys "chars1", and "chars2" for
these two chars read,
3. (.../n...) Read a short int from the second argument (...\0A"),
4. (...int") Use the word "int" as the array key for the just read
short.
I hope this is clearer now,
Sergio.
I had a situation where I had to unpack a file filled with little-endian order double-floats in a way that would work on either little-endian or big-endian machines. PHP doesn't have a formatting code that will change the byte order of doubles, so I wrote this workaround.
<?php
/*The following code is a workaround for php's unpack function
which does not have the capability of unpacking double precision
floats that were packed in the opposite byte order of the current
machine.
*/
function big_endian_unpack ($format, $data) {
$ar = unpack ($format, $data);
$vals = array_values ($ar);
$f = explode ('/', $format);
$i = 0;
foreach ($f as $f_k => $f_v) {
$repeater = intval (substr ($f_v, 1));
if ($repeater == 0) $repeater = 1;
if ($f_v{1} == '*')
{
$repeater = count ($ar) - $i;
}
if ($f_v{0} != 'd') { $i += $repeater; continue; }
$j = $i + $repeater;
for ($a = $i; $a < $j; ++$a)
{
$p = pack ('d',$vals[$i]);
$p = strrev ($p);
list ($vals[$i]) = array_values (unpack ('d1d', $p));
++$i;
}
}
$a = 0;
foreach ($ar as $ar_k => $ar_v) {
$ar[$ar_k] = $vals[$a];
++$a;
}
return $ar;
}
list ($endiantest) = array_values (unpack ('L1L', pack ('V',1)));
if ($endiantest != 1) define ('BIG_ENDIAN_MACHINE',1);
if (defined ('BIG_ENDIAN_MACHINE')) $unpack_workaround = 'big_endian_unpack';
else $unpack_workaround = 'unpack';
?>
This workaround is used like this:
<?php
function foo() {
global $unpack_workaround;
$bar = $unpack_workaround('N7N/V2V/d8d',$my_data);
//...
}
?>
On a little endian machine, $unpack_workaround will simply point to the function unpack. On a big endian machine, it will call the workaround function.
Note, this solution only works for doubles. In my project I had no need to check for single precision floats.
If having a zero-based index is useful/necessary, then instead of:
$int_list = unpack("s*", $some_binary_data);
try:
$int_list = array_merge(unpack("s*", $some_binary_data));
This will return a 0-based array:
$int_list[0] = x
$int_list[1] = y
$int_list[2] = z
...
rather than the default 1-based array returned from unpack when no key is supplied:
$int_list[1] = x
$int_list[2] = y
$int_list[3] = z
...
It's not used often, but array_merge() with only one parameter will compress a sequentially-ordered numeric-index, starting with an index of [0].
Functions I found useful when dealing with fixed width file processing, related to unpack/pack functions.
<?php
/**
* funpack
* format: array of key, length pairs
* data: string to unpack
*/
function funpack($format, $data){
foreach ($format as $key => $len) {
$result[$key] = trim(substr($data, $pos, $len));
$pos+= $len;
}
return $result;
}
/**
* fpack
* format: array of key, length pairs
* data: array of key, value pairs to pack
* pad: padding direction
*/
function fpack($format, $data, $pad = STR_PAD_RIGHT){
foreach ($format as $key => $len){
$result .= substr(str_pad($data[$key], $len, $pad), 0, $len);
}
return $result;
}
?>
Suppose we need to get some kind of internal representation of an integer, say 65, as a four-byte long. Then we use, something like:
<?
$i = 65;
$s = pack("l", $i); // long 32 bit, machine byte order
echo strlen($s) . "<br>\n";
echo "***$s***<br>\n";
?>
The output is:
X-Powered-By: PHP/4.1.2
Content-type: text/html
4
***A***
(That is the string "A\0\0\0")
Now we want to go back from string "A\0\0\0" to number 65. In this case we can use:
<?
$s = "A\0\0\0"; // This string is the bytes representation of number 65
$arr = unpack("l", $s);
foreach ($arr as $key => $value)
echo "\$arr[$key] = $value<br>\n";
?>
And this outpus:
X-Powered-By: PHP/4.1.2
Content-type: text/html
$arr[] = 65
Let's give the array key a name, say "mykey". In this case, we can use:
<?
$s = "A\0\0\0"; // This string is the bytes representation of number 65
$arr = unpack("lmykey", $s);
foreach ($arr as $key => $value)
echo "\$arr[$key] = $value\n";
?>
An this outpus:
X-Powered-By: PHP/4.1.2
Content-type: text/html
$arr[mykey] = 65
The "unpack" documentation is a little bit confusing. I think a more complete example could be:
<?
$binarydata = "AA\0A";
$array = unpack("c2chars/nint", $binarydata);
foreach ($array as $key => $value)
echo "\$array[$key] = $value <br>\n";
?>
whose output is:
X-Powered-By: PHP/4.1.2
Content-type: text/html
$array[chars1] = 65 <br>
$array[chars2] = 65 <br>
$array[int] = 65 <br>
Note that the format string is something like
<format-code> [<count>] [<array-key>] [/ ...]
I hope this clarifies something
Sergio
Don't forget to decode user-defined-pseudo-byte-sequences before unpacking...
<?php
$byte_code_string = '00004040';
var_dump ( unpack ( 'f', $byte_code_string ) );
?>
Result:
array(1) {
[1]=>
float(6.4096905560973E-10)
}
whereas
<?php
$byte_code_string = '00004040';
var_dump ( unpack ( 'f', hex2bin ( $byte_code_string ) ) );
?>
Result:
array(1) {
[1]=>
float(3)
}
Another option for converting binary data into PHP data types, is to use the Zend Framework's Zend_Io_Reader class:
http://bit.ly/9zAhgz
There's also a Zend_Io_Writer class that does the reverse.
Warning: This unpack function makes the array with keys starting at 1 instead of starting at 0.
For example:
<?php
function read_field($h) {
$a=unpack("V",fread($h,4));
return fread($h,$a[1]);
}
?>
be aware of the behavior of your system that PHP resides on.
On x86, unpack MAY not yield the result you expect for UInt32
This is due to the internal nature of PHP, being that integers are internally stored as SIGNED!
For x86 systems, unpack('N', "\xff\xff\xff\xff") results in -1
For (most?) x64 systems, unpack('N', "\xff\xff\xff\xff") results in 4294967295.
This can be verified by checking the value of PHP_INT_SIZE.
If this value is 4, you have a PHP that internally stores 32-bit.
A value of 8 internally stores 64-bit.
To work around this 'problem', you can use the following code to avoid problems with unpack.
The code is for big endian order but can easily be adjusted for little endian order (also, similar code works for 64-bit integers):
<?php
function _uint32be($bin)
{
// $bin is the binary 32-bit BE string that represents the integer
if (PHP_INT_SIZE <= 4){
list(,$h,$l) = unpack('n*', $bin);
return ($l + ($h*0x010000));
}
else{
list(,$int) = unpack('N', $bin);
return $int;
}
}
?>
Do note that you *could* also use sprintf('%u', $x) to show the unsigned real value.
Also note that (at least when PHP_INT_SIZE = 4) the result WILL be a float value when the input is larger then 0x7fffffff (just check with gettype);
Hope this helps people.
Reading a text cell from an Excel spreadsheet returned a string with low-order embedded nulls: 0x4100 0x4200 etc. To remove the nulls, used
<?php
$strWithoutNulls = implode( '', explode( "\0", $strWithNulls ) );
?>
(unpack() didn't seem to help much here; needed chars back to re-constitute the string, not integers.)
To convert big endian to little endian or to convert little endian to big endian, use the following approach as an example:
<?php
// file_get_contents() returns a binary value, unpack("V*", _ ) returns an unsigned long 32-bit little endian decimal value, but bin2hex() after that would just give the hex data in the file if alone, so instead we use:
// file_get_contents(), unpack("V*", _ ), then dechex(), in that order, to get the byte-swapping effect.
?>
With the logic of the approach in this example, you can discover how to swap the endian byte order as you need.