PHPのお勉強!

PHP TOP

str_word_count

(PHP 4 >= 4.3.0, PHP 5, PHP 7, PHP 8)

str_word_count 文字列に使用されている単語についての情報を返す

説明

str_word_count(string $string, int $format = 0, ?string $characters = null): array|int

string に含まれる単語数を数えます。 オプションの format が指定されていない場合、 見つかった単語の数を整数値で返します。 format が指定されている場合は結果が配列で返され、 配列の内容は format に依存します。 format に設定できる値と対応する出力については 以下で示します。

この関数を使用するうえで、'単語' は「ロケールに依存したアルファベットから なる文字列で、その先頭以外の部分に "'" および "-" を含 めることができる」ものと定義されています。 マルチバイト文字列を使うロケールはサポートされていないので、注意が必要です。

パラメータ

string

文字列。

format

この関数の戻り値を設定します。現在サポートされている値は 以下のとおりです。

  • 0 - 見つかった単語の数を返します。
  • 1 - string の中に見つかった単語を含む 配列を返します。
  • 2 - 連想配列を返します。string の中での 単語の開始位置がキー、単語自体を対応する値となります。

characters

'単語' とみなされる文字に追加する文字のリスト。

戻り値

選択した format に応じて、配列あるいは整数を返します。

変更履歴

バージョン 説明
8.0.0 characters は、nullable になりました。

例1 str_word_count() の例

<?php

$str
= "Hello fri3nd, you're
looking good today!"
;

print_r(str_word_count($str, 1));
print_r(str_word_count($str, 2));
print_r(str_word_count($str, 1, 'àáãç3'));

echo
str_word_count($str);

?>

上の例の出力は以下となります。

Array
(
    [0] => Hello
    [1] => fri
    [2] => nd
    [3] => you're
    [4] => looking
    [5] => good
    [6] => today
)

Array
(
    [0] => Hello
    [6] => fri
    [10] => nd
    [14] => you're
    [29] => looking
    [46] => good
    [51] => today
)

Array
(
    [0] => Hello
    [1] => fri3nd
    [2] => you're
    [3] => looking
    [4] => good
    [5] => today
)

7

参考

  • explode() - 文字列を文字列により分割する
  • preg_split() - 正規表現で文字列を分割する
  • count_chars() - 文字列で使用されている文字に関する情報を返す
  • substr_count() - 副文字列の出現回数を数える

add a note

User Contributed Notes 30 notes

up
36
cito at wikatu dot com
12 years ago
<?php

/***
* This simple utf-8 word count function (it only counts)
* is a bit faster then the one with preg_match_all
* about 10x slower then the built-in str_word_count
*
* If you need the hyphen or other code points as word-characters
* just put them into the [brackets] like [^\p{L}\p{N}\'\-]
* If the pattern contains utf-8, utf8_encode() the pattern,
* as it is expected to be valid utf-8 (using the u modifier).
**/

// Jonny 5's simple word splitter
function str_word_count_utf8($str) {
return
count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
?>
up
16
splogamurugan at gmail dot com
15 years ago
We can also specify a range of values for charlist.

<?php
$str
= "Hello fri3nd, you're
looking good today!
look1234ing"
;
print_r(str_word_count($str, 1, '0..3'));
?>

will give the result as

Array ( [0] => Hello [1] => fri3nd [2] => you're [3] => looking [4] => good [5] => today [6] => look123 [7] => ing )
up
1
Adeel Khan
17 years ago
<?php

/**
* Returns the number of words in a string.
* As far as I have tested, it is very accurate.
* The string can have HTML in it,
* but you should do something like this first:
*
* $search = array(
* '@<script[^>]*?>.*?</script>@si',
* '@<style[^>]*?>.*?</style>@siU',
* '@<![\s\S]*?--[ \t\n\r]*>@'
* );
* $html = preg_replace($search, '', $html);
*
*/

function word_count($html) {

# strip all html tags
$wc = strip_tags($html);

# remove 'words' that don't consist of alphanumerical characters or punctuation
$pattern = "#[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]+#";
$wc = trim(preg_replace($pattern, " ", $wc));

# remove one-letter 'words' that consist only of punctuation
$wc = trim(preg_replace("#\s*[(\'|\"|\.|\!|\?|;|,|\\|\/|\-|:|\&|@)]\s*#", " ", $wc));

# remove superfluous whitespace
$wc = preg_replace("/\s\s+/", " ", $wc);

# split string into an array of words
$wc = explode(" ", $wc);

# remove empty elements
$wc = array_filter($wc);

# return the number of words
return count($wc);

}

?>
up
1
MadCoder
19 years ago
Here's a function that will trim a $string down to a certian number of words, and add a... on the end of it.
(explansion of muz1's 1st 100 words code)

----------------------------------------------
<?php
function trim_text($text, $count){
$text = str_replace(" ", " ", $text);
$string = explode(" ", $text);
for (
$wordCounter = 0; $wordCounter <= $count;wordCounter++ ){
$trimed .= $string[$wordCounter];
if (
$wordCounter < $count ){ $trimed .= " "; }
else {
$trimed .= "..."; }
}
$trimed = trim($trimed);
return
$trimed;
}
?>

Usage
------------------------------------------------
<?php
$string
= "one two three four";
echo
trim_text($string, 3);
?>

returns:
one two three...
up
0
uri at speedy dot net
12 years ago
Here is a count words function which supports UTF-8 and Hebrew. I tried other functions but they don't work. Notice that in Hebrew, '"' and '\'' can be used in words, so they are not separators. This function is not perfect, I would prefer a function we are using in JavaScript which considers all characters except [a-zA-Zא-ת0-9_\'\"] as separators, but I don't know how to do it in PHP.

I removed some of the separators which don't work well with Hebrew ("\x20", "\xA0", "\x0A", "\x0D", "\x09", "\x0B", "\x2E"). I also removed the underline.

This is a fix to my previous post on this page - I found out that my function returned an incorrect result for an empty string. I corrected it and I'm also attaching another function - my_strlen.

<?php

function count_words($string) {
// Return the number of words in a string.
$string= str_replace("&#039;", "'", $string);
$t= array(' ', "\t", '=', '+', '-', '*', '/', '\\', ',', '.', ';', ':', '[', ']', '{', '}', '(', ')', '<', '>', '&', '%', '$', '@', '#', '^', '!', '?', '~'); // separators
$string= str_replace($t, " ", $string);
$string= trim(preg_replace("/\s+/", " ", $string));
$num= 0;
if (
my_strlen($string)>0) {
$word_array= explode(" ", $string);
$num= count($word_array);
}
return
$num;
}

function
my_strlen($s) {
// Return mb_strlen with encoding UTF-8.
return mb_strlen($s, "UTF-8");
}

?>
up
0
manrash at gmail dot com
16 years ago
For spanish speakers a valid character map may be:

<?php
$characterMap
= 'áéíóúüñ';

$count = str_word_count($text, 0, $characterMap);
?>
up
0
brettNOSPAM at olwm dot NO_SPAM dot com
22 years ago
This example may not be pretty, but It proves accurate:

<?php
//count words
$words_to_count = strip_tags($body);
$pattern = "/[^(\w|\d|\'|\"|\.|\!|\?|;|,|\\|\/|\-\-|:|\&|@)]+/";
$words_to_count = preg_replace ($pattern, " ", $words_to_count);
$words_to_count = trim($words_to_count);
$total_words = count(explode(" ",$words_to_count));
?>

Hope I didn't miss any punctuation. ;-)
up
-1
brettz9 - see yahoo
14 years ago
Words also cannot end in a hyphen unless allowed by the charlist...
up
-1
charliefrancis at gmail dot com
15 years ago
Hi this is the first time I have posted on the php manual, I hope some of you will like this little function I wrote.

It returns a string with a certain character limit, but still retaining whole words.
It breaks out of the foreach loop once it has found a string short enough to display, and the character list can be edited.

<?php
function word_limiter( $text, $limit = 30, $chars = '0123456789' ) {
if(
strlen( $text ) > $limit ) {
$words = str_word_count( $text, 2, $chars );
$words = array_reverse( $words, TRUE );
foreach(
$words as $length => $word ) {
if(
$length + strlen( $word ) >= $limit ) {
array_shift( $words );
} else {
break;
}
}
$words = array_reverse( $words );
$text = implode( " ", $words ) . '&hellip;';
}
return
$text;
}

$str = "Hello this is a list of words that is too long";
echo
'1: ' . word_limiter( $str );
$str = "Hello this is a list of words";
echo
'2: ' . word_limiter( $str );
?>

1: Hello this is a list of words&hellip;
2: Hello this is a list of words
up
-2
Anonymous
19 years ago
This function seems to view numbers as whitespace. I.e. a word consisting of numbers only won't be counted.
up
-2
php dot net at salagir dot com
6 years ago
This function doesn't handle accents, even in a locale with accent.
<?php
echo str_word_count("Is working"); // =2

setlocale(LC_ALL, 'fr_FR.utf8');
echo
str_word_count("Not wôrking"); // expects 2, got 3.
?>

Cito solution treats punctuation as words and thus isn't a good workaround.
<?php
function str_word_count_utf8($str) {
return
count(preg_split('~[^\p{L}\p{N}\']+~u',$str));
}
echo
str_word_count_utf8("Is wôrking"); //=2
echo str_word_count_utf8("Not wôrking."); //=3
?>

My solution:
<?php
function str_word_count_utf8($str) {
$a = preg_split('/\W+/u', $str, -1, PREG_SPLIT_NO_EMPTY);
return
count($a);
}
echo
str_word_count_utf8("Is wôrking"); // = 2
echo str_word_count_utf8("Is wôrking! :)"); // = 2
?>
up
-2
dmVuY2lAc3RyYWhvdG5pLmNvbQ== (base64)
14 years ago
to count words after converting a msword document to plain text with antiword, you can use this function:

<?php
function count_words($text) {
$text = str_replace(str_split('|'), '', $text); // remove these chars (you can specify more)
$text = trim(preg_replace('/\s+/', ' ', $text)); // remove extra spaces
$text = preg_replace('/-{2,}/', '', $text); // remove 2 or more dashes in a row
$len = strlen($text);

if (
0 === $len) {
return
0;
}

$words = 1;

while (
$len--) {
if (
' ' === $text[$len]) {
++
$words;
}
}

return
$words;
}
?>

it strips the pipe "|" chars, which antiword uses to format tables in its plain text output, removes more than one dashes in a row (also used in tables), then counts the words.

counting words using explode() and then count() is not a good idea for huge texts, because it uses much memory to store the text once more as an array. this is why i'm using while() { .. } to walk the string
up
-1
rcATinterfacesDOTfr
21 years ago
Here is another way to count words :
$word_count = count(preg_split('/\W+/', $text, -1, PREG_SPLIT_NO_EMPTY));
up
-3
jazz090
15 years ago
Personally, I dont like using this function becuase the characters it omits are sometime nessesery for instance MS Word counts ">" or "<" alone as single word where this function doesnt. I like using this however, it counts EVERYTHING:

<?php
function num_words($string){
preg_match_all("/\S+/", $string, $matches);
return
count($matches[0]);
}
?>
up