fgetcsv
(PHP 4, PHP 5, PHP 7, PHP 8)
fgetcsv — ファイルポインタから行を取得し、CSVフィールドを処理する
説明
resource
$stream
,?int
$length
= null
,string
$separator
= ",",string
$enclosure
= "\"",string
$escape
= "\\"): array|false
fgets() に動作は似ていますが、 fgetcsv() は行を CSV フォーマットのフィールドとして読込み処理を行い、 読み込んだフィールドを含む配列を返すという違いがあります。
注意:
この関数はロケール設定を考慮します。もし
LC_CTYPE
が例えばen_US.UTF-8
の場合、 1 バイトエンコーディングのファイルは間違って読み込まれるかもしれません。
パラメータ
stream
-
ファイルポインタは有効なものでなければならず、また fopen(), popen(), もしくは fsockopen() で正常にオープンされたファイルを指している必要があります。
length
-
(行末文字を考慮して) CSV ファイルにある最も長い行よりも大きい必要があります。 そうでない場合は、ひとつの行が
length
文字ずつのチャンクに分割されてしまいます。 ただし、フィールド囲いこみ文字の内部では、この分割は発生しません。このパラメータを省略 (もしくは 0 を設定、 PHP 8.0.0 以降では
null
を設定) すると、 最大行長は制限されません。この場合、若干動作が遅くなります。 separator
-
オプションの
separator
パラメータで、フィールドのデリミタ (シングルバイト文字 1 文字のみ) を設定します。 enclosure
-
オプションの
enclosure
パラメータで、フィールド囲いこみ文字 (シングルバイト文字 1 文字のみ) を設定します。 escape
-
オプションの
escape
パラメータで、エスケープ文字 (シングルバイト文字 最大で1文字) を設定します。 空文字列(""
) を指定すると、(RFC 4180 に準拠していない) 独自仕様のエスケープ機構が無効になります。注意:
enclosure
の文字は、フィールド内で2回出力される ことでエスケープされます。しかし、escape
文字はその代替として使えます。 デフォルトのパラメータの値""
と\"
は同じ意味を持ちます。enclosure
の文字をescape
文字でエスケープすることには、 特別な意味はありません。それ自身をエスケープする意味ですらありません。
escape
が空の文字列(""
)以外に設定されているとき、
» RFC 4180
に準拠しない CSV が生成されたり、PHP の CSV
関数を介してラウンドトリップ(往復変換)でデータが壊れる可能性があります。
escape
のデフォルト値は"\\"
なので、明示的に空の文字列を指定することを推奨します。デフォルト値は、PHP 9.0
以降の将来のバージョンで変更予定です。
戻り値
読み込んだフィールドの内容を含む数値添字配列を返します。
失敗した場合に false
を返します
注意:
CSV ファイルの空行は null フィールドを一つだけ含む配列として返され、 エラーにはなりません。
注意: マッキントッシュコンピュータ上で作成されたファイルを読み込む際に、
PHP
が行末を認識できないという問題が発生した場合、 実行時の設定オプションauto_detect_line_endings を有効にする必要が生じるかもしれません。
変更履歴
バージョン | 説明 |
---|---|
8.0.0 |
length は、nullable になりました。
|
7.4.0 |
escape パラメータが空文字列を受け入れるようになりました。
この場合、(RFC 4180 に準拠していない) 独自仕様のエスケープ機構が無効になります。
|
escape
が空の文字列(""
)以外に設定されているとき、
» RFC 4180
に準拠しない CSV が生成されたり、PHP の CSV
関数を介してラウンドトリップ(往復変換)でデータが壊れる可能性があります。
escape
のデフォルト値は"\\"
なので、明示的に空の文字列を指定することを推奨します。デフォルト値は、PHP 9.0
以降の将来のバージョンで変更予定です。
例
例1 CSV ファイルの全てのコンテンツを読み込み、表示する
<?php
$row = 1;
if (($handle = fopen("test.csv", "r")) !== FALSE) {
while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
$num = count($data);
echo "<p> $num fields in line $row: <br /></p>\n";
$row++;
for ($c=0; $c < $num; $c++) {
echo $data[$c] . "<br />\n";
}
}
fclose($handle);
}
?>
参考
- str_getcsv() - CSV 文字列をパースして配列に格納する
- explode() - 文字列を文字列により分割する
- file() - ファイル全体を読み込んで配列に格納する
- pack() - データをバイナリ文字列にパックする
- fputcsv() - 行を CSV 形式にフォーマットし、ファイルポインタに書き込む
User Contributed Notes 31 notes
If you need to set auto_detect_line_endings to deal with Mac line endings, it may seem obvious but remember it should be set before fopen, not after:
This will work:
<?php
ini_set('auto_detect_line_endings',TRUE);
$handle = fopen('/path/to/file','r');
while ( ($data = fgetcsv($handle) ) !== FALSE ) {
//process
}
ini_set('auto_detect_line_endings',FALSE);
?>
This won't, you will still get concatenated fields at the new line position:
<?php
$handle = fopen('/path/to/file','r');
ini_set('auto_detect_line_endings',TRUE);
while ( ($data = fgetcsv($handle) ) !== FALSE ) {
//process
}
ini_set('auto_detect_line_endings',FALSE);
?>
When a BOM character is suppled, `fgetscsv` may appear to wrap the first element in "double quotation marks". The simplest way to ignore it is to progress the file pointer to the 4th byte before using `fgetcsv`.
<?php
// BOM as a string for comparison.
$bom = "\xef\xbb\xbf";
// Read file from beginning.
$fp = fopen($path, 'r');
// Progress file pointer and get first 3 characters to compare to the BOM string.
if (fgets($fp, 4) !== $bom) {
// BOM not found - rewind pointer to start of file.
rewind($fp);
}
// Read CSV into an array.
$lines = array();
while(!feof($fp) && ($line = fgetcsv($fp)) !== false) {
$lines[] = $line;
}
?>
fgetcsv seems to handle newlines within fields fine. So in fact it is not reading a line, but keeps reading untill it finds a \n-character that's not quoted as a field.
Example:
<?php
/* test.csv contains:
"col 1","col2","col3"
"this
is
having
multiple
lines","this not","this also not"
"normal record","nothing to see here","no data"
*/
$handle = fopen("test.csv", "r");
while (($data = fgetcsv($handle)) !== FALSE) {
var_dump($data);
}
?>
Returns:
array(3) {
[0]=>
string(5) "col 1"
[1]=>
string(4) "col2"
[2]=>
string(4) "col3"
}
array(3) {
[0]=>
string(29) "this
is
having
multiple
lines"
[1]=>
string(8) "this not"
[2]=>
string(13) "this also not"
}
array(3) {
[0]=>
string(13) "normal record"
[1]=>
string(19) "nothing to see here"
[2]=>
string(7) "no data"
}
This means that you can expect fgetcsv to handle newlines within fields fine. This was not clear from the documentation.
Forget this while() loop mumbo jumbo! Use this:
$rows = array_map('str_getcsv', file('myfile.csv'));
$header = array_shift($rows);
$csv = array();
foreach ($rows as $row) {
$csv[] = array_combine($header, $row);
}
Source: https://steindom.com/articles/shortest-php-code-convert-csv-associative-array
Here is a OOP based importer similar to the one posted earlier. However, this is slightly more flexible in that you can import huge files without running out of memory, you just have to use a limit on the get() method
Sample usage for small files:-
-------------------------------------
<?php
$importer = new CsvImporter("small.txt",true);
$data = $importer->get();
print_r($data);
?>
Sample usage for large files:-
-------------------------------------
<?php
$importer = new CsvImporter("large.txt",true);
while($data = $importer->get(2000))
{
print_r($data);
}
?>
And heres the class:-
-------------------------------------
<?php
class CsvImporter
{
private $fp;
private $parse_header;
private $header;
private $delimiter;
private $length;
//--------------------------------------------------------------------
function __construct($file_name, $parse_header=false, $delimiter="\t", $length=8000)
{
$this->fp = fopen($file_name, "r");
$this->parse_header = $parse_header;
$this->delimiter = $delimiter;
$this->length = $length;
$this->lines = $lines;
if ($this->parse_header)
{
$this->header = fgetcsv($this->fp, $this->length, $this->delimiter);
}
}
//--------------------------------------------------------------------
function __destruct()
{
if ($this->fp)
{
fclose($this->fp);
}
}
//--------------------------------------------------------------------
function get($max_lines=0)
{
//if $max_lines is set to 0, then get all the data
$data = array();
if ($max_lines > 0)
$line_count = 0;
else
$line_count = -1; // so loop limit is ignored
while ($line_count < $max_lines && ($row = fgetcsv($this->fp, $this->length, $this->delimiter)) !== FALSE)
{
if ($this->parse_header)
{
foreach ($this->header as $i => $heading_i)
{
$row_new[$heading_i] = $row[$i];
}
$data[] = $row_new;
}
else
{
$data[] = $row;
}
if ($max_lines > 0)
$line_count++;
}
return $data;
}
//--------------------------------------------------------------------
}
?>
This function has no special BOM handling. The first cell of the first row will inherit the BOM bytes, i.e. will be 3 bytes longer than expected. As the BOM is invisible you may not notice.
Excel on Windows, or text editors like Notepad, may add the BOM.
I've had alot of projects recently dealing with csv files, so I created the following class to read a csv file and return an array of arrays with the column names as keys. The only requirement is that the 1st row contain the column headings.
I only wrote it today, so I'll probably expand on it in the near future.
<?php
class CSVparse
{
var $mappings = array();
function parse_file($filename)
{
$id = fopen($filename, "r"); //open the file
$data = fgetcsv($id, filesize($filename)); /*This will get us the */
/*main column names */
if(!$this->mappings)
$this->mappings = $data;
while($data = fgetcsv($id, filesize($filename)))
{
if($data[0])
{
foreach($data as $key => $value)
$converted_data[$this->mappings[$key]] = addslashes($value);
$table[] = $converted_data; /* put each line into */
} /* its own entry in */
} /* the $table array */
fclose($id); //close file
return $table;
}
}
?>
For anyone else struggling with disappearing non-latin characters in one-byte encodings - setting LANG env var (as the manual states) does not help at all. Look at LC_ALL instead.
In my case it was set to "pl_PL.utf8" but since my input file was in CP1250 most of polish characters (but not all of them!) had gone missing and city of "Łódź" had become just "dź". I've "fixed" it with "pl_PL".
Another version [modified michael from mediaconcepts]
<?php
function arrayFromCSV($file, $hasFieldNames = false, $delimiter = ',', $enclosure='') {
$result = Array();
$size = filesize($file) +1;
$file = fopen($file, 'r');
#TO DO: There must be a better way of finding out the size of the longest row... until then
if ($hasFieldNames) $keys = fgetcsv($file, $size, $delimiter, $enclosure);
while ($row = fgetcsv($file, $size, $delimiter, $enclosure)) {
$n = count($row); $res=array();
for($i = 0; $i < $n; $i++) {
$idx = ($hasFieldNames) ? $keys[$i] : $i;
$res[$idx] = $row[i];
}
$result[] = $res;
}
fclose($file);
return $result;
}
?>
Here's something I put together this morning. It allows you to read rows from your CSV and get values based on the name of the column. This works great when your header columns are not always in the same order; like when you're processing many feeds from different customers. Also makes for cleaner, easier to manage code.
So if your feed looks like this:
product_id,category_name,price,brand_name, sku_isbn_upc,image_url,landing_url,title,description
123,Test Category,12.50,No Brand,0,http://www.example.com, http://www.example.com/landing.php, Some Title,Some Description
You can do:
<?php
while ($o->getNext())
{
$dPrice = $o->getPrice();
$nProductID = $o->getProductID();
$sBrandName = $o->getBrandName();
}
?>
If you have any questions or comments regarding this class, they can be directed to michael.martinek@gmail.com as I probably won't be checking back here.
<?php
define('C_PPCSV_HEADER_RAW', 0);
define('C_PPCSV_HEADER_NICE', 1);
class PaperPear_CSVParser
{
private $m_saHeader = array();
private $m_sFileName = '';
private $m_fp = false;
private $m_naHeaderMap = array();
private $m_saValues = array();
function __construct($sFileName)
{
//quick and dirty opening and processing.. you may wish to clean this up
if ($this->m_fp = fopen($sFileName, 'r'))
{
$this->processHeader();
}
}
function __call($sMethodName, $saArgs)
{
//check to see if this is a set() or get() request, and extract the name
if (preg_match("/[sg]et(.*)/", $sMethodName, $saFound))
{
//convert the name portion of the [gs]et to uppercase for header checking
$sName = strtoupper($saFound[1]);
//see if the entry exists in our named header-> index mapping
if (array_key_exists($sName, $this->m_naHeaderMap))
{
//it does.. so consult the header map for which index this header controls
$nIndex = $this->m_naHeaderMap[$sName];
if ($sMethodName{0} == 'g')
{
//return the value stored in the index associated with this name
return $this->m_saValues[$nIndex];
}
else
{
//set the valuw
$this->m_saValues[$nIndex] = $saArgs[0];
return true;
}
}
}
//nothing we control so bail out with a false
return false;
}
//get a nicely formatted header name. This will take product_id and make
//it PRODUCTID in the header map. So now you won't need to worry about whether you need
//to do a getProductID, or getproductid, or getProductId.. all will work.
public static function GetNiceHeaderName($sName)
{
return strtoupper(preg_replace('/[^A-Za-z0-9]/', '', $sName));
}
//process the header entry so we can map our named header fields to a numerical index, which
//we'll use when we use fgetcsv().
private function processHeader()
{
$sLine = fgets($this->m_fp);
//you'll want to make this configurable
$saFields = split(",", $sLine);
$nIndex = 0;
foreach ($saFields as $sField)
{
//get the nice name to use for "get" and "set".
$sField = trim($sField);
$sNiceName = PaperPear_CSVParser::GetNiceHeaderName($sField);
//track correlation of raw -> nice name so we don't have to do on-the-fly nice name checks
$this->m_saHeader[$nIndex] = array(C_PPCSV_HEADER_RAW => $sField, C_PPCSV_HEADER_NICE => $sNiceName);
$this->m_naHeaderMap[$sNiceName] = $nIndex;
$nIndex++;
}
}
//read the next CSV entry
public function getNext()
{
//this is a basic read, you will likely want to change this to accomodate what
//you are using for CSV parameters (tabs, encapsulation, etc).
if (($saValues = fgetcsv($this->m_fp)) !== false)
{
$this->m_saValues = $saValues;
return true;
}
return false;
}
}
//quick example of usage
$o = new PaperPear_CSVParser('F:\foo.csv');
while ($o->getNext())
{
echo "Price=" . $o->getPrice() . "\r\n";
}
?>
Note that fgetcsv, at least in PHP 5.3 or previous, will NOT work with UTF-16 encoded files. Your options are to convert the entire file to ISO-8859-1 (or latin1), or convert line by line and convert each line into ISO-8859-1 encoding, then use str_getcsv (or compatible backwards-compatible implementation). If you need to read non-latin alphabets, probably best to convert to UTF-8.
See str_getcsv for a backwards-compatible version of it with PHP < 5.3, and see utf8_decode for a function written by Rasmus Andersson which provides utf16_decode. The modification I added was that the BOP appears at the top of the file, then not on subsequent lines. So you need to store the endian-ness, and then re-send it upon each subsequent line decoding. This modified version returns the endianness, if it's not available:
<?php
/**
* Decode UTF-16 encoded strings.
*
* Can handle both BOM'ed data and un-BOM'ed data.
* Assumes Big-Endian byte order if no BOM is available.
* From: http://php.net/manual/en/function.utf8-decode.php
*
* @param string $str UTF-16 encoded data to decode.
* @return string UTF-8 / ISO encoded data.
* @access public
* @version 0.1 / 2005-01-19
* @author Rasmus Andersson {@link http://rasmusandersson.se/}
* @package Groupies
*/
function utf16_decode($str, &$be=null) {
if (strlen($str) < 2) {
return $str;
}
$c0 = ord($str{0});
$c1 = ord($str{1});
$start = 0;
if ($c0 == 0xFE && $c1 == 0xFF) {
$be = true;
$start = 2;
} else if ($c0 == 0xFF && $c1 == 0xFE) {
$start = 2;
$be = false;
}
if ($be === null) {
$be = true;
}
$len = strlen($str);
$newstr = '';
for ($i = $start; $i < $len; $i += 2) {
if ($be) {
$val = ord($str{$i}) << 4;
$val += ord($str{$i+1});
} else {
$val = ord($str{$i+1}) << 4;
$val += ord($str{$i});
}
$newstr .= ($val == 0x228) ? "\n" : chr($val);
}
return $newstr;
}
?>
Trying the "setlocale" trick did not work for me, e.g.
<?php
setlocale(LC_CTYPE, "en.UTF16");
$line = fgetcsv($file, ...)
?>
But that's perhaps because my platform didn't support it. However, fgetcsv only supports single characters for the delimiter, etc. and complains if you pass in a UTF-16 version of said character, so I gave up on that rather quickly.
Hope this is helpful to someone out there.
If you don't want to define an enclosure charachter you can do the following:
<?php
$row = fgetcsv($handle, 0, $delimiter, 0x00);
?>
I needed this to detect the enclosure used for csv files.