str_word_count
(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)
str_word_count — 文字列に使用されている単語についての情報を返す
説明
string
に含まれる単語数を数えます。
オプションの format
が指定されていない場合、
見つかった単語の数を整数値で返します。
format
が指定されている場合は結果が配列で返され、
配列の内容は format
に依存します。
format
に設定できる値と対応する出力については
以下で示します。
この関数を使用するうえで、'単語' は「ロケールに依存したアルファベットから なる文字列で、その先頭以外の部分に "'" および "-" を含 めることができる」ものと定義されています。 マルチバイト文字列を使うロケールはサポートされていないので、注意が必要です。
パラメータ
string
-
文字列。
format
-
この関数の戻り値を設定します。現在サポートされている値は 以下のとおりです。
- 0 - 見つかった単語の数を返します。
-
1 -
string
の中に見つかった単語を含む 配列を返します。 -
2 - 連想配列を返します。
string
の中での 単語の開始位置がキー、単語自体を対応する値となります。
characters
-
'単語' とみなされる文字に追加する文字のリスト。
戻り値
選択した format
に応じて、配列あるいは整数を返します。
変更履歴
バージョン | 説明 |
---|---|
8.0.0 |
characters は、nullable になりました。
|
例
例1 str_word_count() の例
<?php
$str = "Hello fri3nd, you're
looking good today!";
print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));
echo str_word_count($str);
?>
上の例の出力は以下となります。
Array ( [0] => Hello [1] => fri [2] => nd [3] => you're [4] => looking [5] => good [6] => today ) Array ( [0] => Hello [6] => fri [10] => nd [14] => you're [29] => looking [46] => good [51] => today ) Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today ) 7
参考
- explode() - 文字列を文字列により分割する
- preg_split() - 正規表現で文字列を分割する
- count_chars() - 文字列で使用されている文字に関する情報を返す
- substr_count() - 副文字列の出現回数を数える
+add a note
User Contributed Notes 30 notes
cito at wikatu dot com ¶
12 years ago
<?php
/***
* This simple utf-8 word count function (it only counts)
* is a bit faster then the one with preg_match_all
* about 10x slower then the built-in str_word_count
*
* If you need the hyphen or other code points as word-characters
* just put them into the [brackets] like [^\p{L}\p{N}\'\-]
* If the pattern contains utf-8, utf8_encode() the pattern,
* as it is expected to be valid utf-8 (using the u modifier).
**/
// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>
splogamurugan at gmail dot com ¶
15 years ago
We can also specify a range of values for charlist.
<?php
$str = "Hello fri3nd, you're
looking good today!
look1234ing";
print_r(str_word_count($str, 1, '0..3'));
?>
will give the result as
Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )
Adeel Khan ¶
17 years ago
<?php
/**
* Returns the number of words in a string.
* As far as I have tested, it is very accurate.
* The string can have HTML in it,
* but you should do something like this first:
*
* $search = array(
* '@<script[^>]*?>.*?</script>@si',
* '@<style[^>]*?>.*?</style>@siU',
* '@<![\s\S]*?--[ \t\n\r]*>@'
* );
* $html = preg_replace($search, '', $html);
*
*/
function word_count($html) {
# strip all html tags
$wc = strip_tags($html);
# remove 'words' that don't consist of alphanumerical characters or punctuation
$pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
$wc = trim(preg_replace($pattern, " ", $wc));
# remove one-letter 'words' that consist only of punctuation
$wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));
# remove superfluous whitespace
$wc = preg_replace("/\s\s+/", " ", $wc);
# split string into an array of words
$wc = explode(" ", $wc);
# remove empty elements
$wc = array_filter($wc);
# return the number of words
return count($wc);
}
?>
MadCoder ¶
19 years ago
Here's a function that will trim a $string down to a certian number of words, and add a... on the end of it.
(explansion of muz1's 1st 100 words code)
----------------------------------------------
<?php
function trim_text($text, $count){
$text = str_replace(" ", " ", $text);
$string = explode(" ", $text);
for ( $wordCounter = 0; $wordCounter <= $count;wordCounter++ ){
$trimed .= $string[$wordCounter];
if ( $wordCounter < $count ){ $trimed .= " "; }
else { $trimed .= "..."; }
}
$trimed = trim($trimed);
return $trimed;
}
?>
Usage
------------------------------------------------
<?php
$string = "one two three four";
echo trim_text($string, 3);
?>
returns:
one two three...
uri at speedy dot net ¶
12 years ago
Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.
I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.
This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.
<?php
function count_words($string) {
// Return the number of words in a string.
$string= str_replace("'", "'", $string);
$t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
$string= str_replace($t, " ", $string);
$string= trim(preg_replace("/\s+/", " ", $string));
$num= 0;
if (my_strlen($string)>0) {
$word_array= explode(" ", $string);
$num= count($word_array);
}
return $num;
}
function my_strlen($s) {
// Return mb_strlen with encoding UTF-8.
return mb_strlen($s, "UTF-8");
}
?>
manrash at gmail dot com ¶
16 years ago
For spanish speakers a valid character map may be:
<?php
$characterMap = 'áéíóúüñ';
$count = str_word_count($text, 0, $characterMap);
?>
brettNOSPAM at olwm dot NO_SPAM dot com ¶
22 years ago
This example may not be pretty, but It proves accurate:
<?php
//count words
$words_to_count = strip_tags($body);
$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";
$words_to_count = preg_replace ($pattern, " ", $words_to_count);
$words_to_count = trim($words_to_count);
$total_words = count(explode(" ",$words_to_count));
?>
Hope I didn't miss any punctuation. ;-)
brettz9 - see yahoo ¶
14 years ago
Words also cannot end in a hyphen unless allowed by the charlist...
charliefrancis at gmail dot com ¶
15 years ago
Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.
It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.
<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
if( strlen( $text ) > $limit ) {
$words = str_word_count( $text, 2, $chars );
$words = array_reverse( $words, TRUE );
foreach( $words as $length => $word ) {
if( $length + strlen( $word ) >= $limit ) {
array_shift( $words );
} else {
break;
}
}
$words = array_reverse( $words );
$text = implode( " ", $words ) . '…';
}
return $text;
}
$str = "Hello this is a list of words that is too long";
echo '1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo '2: ' . word_limiter( $str );
?>
1: Hello this is a list of words…
2: Hello this is a list of words
Anonymous ¶
19 years ago
This function seems to view numbers as whitespace. I.e. a word consisting of numbers only won't be counted.
php dot net at salagir dot com ¶
6 years ago
This function doesn't handle accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2
setlocale(LC_ALL, 'fr_FR.utf8');
echo str_word_count("Not wôrking"); // expects 2, got 3.
?>
Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
return count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>
My solution:
<?php
function str_word_count_utf8($str) {
$a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
return count($a);
}
echo str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>
dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64) ¶
14 years ago
to count words after converting a msword document to plain text with antiword, you can use this function:
<?php
function count_words($text) {
$text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
$text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
$text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
$len = strlen($text);
if (0 === $len) {
return 0;
}
$words = 1;
while ($len--) {
if (' ' === $text[$len]) {
++$words;
}
}
return $words;
}
?>
it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.
counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string
rcATinterfacesDOTfr ¶
21 years ago
Here is another way to count words :
$word_count = count(preg_split('/\W+/', $text, -1, PREG_SPLIT_NO_EMPTY));
jazz090 ¶
15 years ago
Personally, I dont like using this function becuase the characters it omits are sometime nessesery for instance MS Word counts ">" or "<" alone as single word where this function doesnt. I like using this however, it counts EVERYTHING:
<?php
function num_words($string){
preg_match_all("/\S+/", $string, $matches);
return count($matches[0]);
}
?>