In PHP, when dealing with multibyte characters, traditional string length functions often fail to return accurate results. For example, Chinese characters in UTF-8 encoding consist of multiple bytes, and using regular functions leads to incorrect length calculations. Fortunately, PHP offers a dedicated function, mb_strlen, specifically designed to handle such cases.
Before using mb_strlen, ensure that the PHP multibyte string extension is enabled. Open your PHP configuration file php.ini and locate the following line:
;extension=mbstring
If there is a semicolon “;” at the beginning, remove it and restart your web server or PHP service to activate the extension.
<?php
$str = "你好,世界!";
$length = mb_strlen($str, "UTF-8");
echo "The length of string $str is: $length";
?>
In this example, the variable $str holds a string containing Chinese characters. By calling mb_strlen with the encoding set to UTF-8, the string length is correctly computed. The output will be:
The length of string 你好,世界! is: 6
Sometimes it’s necessary to check whether a string is empty. Using mb_strlen directly may not exclude strings that contain only whitespace. You can combine it with the trim function to remove spaces before checking the length:
<?php
$str = " ";
$trimmedStr = trim($str);
if (mb_strlen($trimmedStr, "UTF-8") > 0) {
echo "String is not empty";
} else {
echo "String is empty";
}
?>
This approach correctly identifies a string with only spaces as empty.
Using PHP’s mb_strlen function allows you to accurately get the length of multibyte strings, avoiding errors common with regular string functions in multilingual contexts. When combined with trim, it also effectively checks if a string is empty. These techniques are very useful for handling Chinese and other multibyte characters, making them essential skills for PHP developers.