In PHP, when processing strings, we often need to split the string into arrays. PHP provides two commonly used functions to accomplish this task: str_split() and mb_str_split() . Although they seem to do something similar, they are very different. Next, we will analyze the differences between these two functions and which one should be used in different scenarios.
The str_split() function splits a string into a substring array of specified length. Its basic usage is as follows:
$string = "HelloWorld";
$result = str_split($string, 2);
print_r($result);
Output:
Array
(
[0] => He
[1] => ll
[2] => oW
[3] => or
[4] => ld
)
As shown above, the str_split() function splits the string into multiple parts by specifying the length of each substring. By default, str_split() is split by one character.
However, str_split() is split based on bytes, meaning it is not friendly to multibyte characters such as UTF-8 encoded characters. If your string contains multibyte characters, such as Chinese or other non-ASCII characters, str_split() 's behavior may have unexpected results.
Compared with str_split() , mb_str_split() is more suitable for processing multibyte characters. This function belongs to PHP's multi-byte string extension (mbstring) and can correctly handle UTF-8-encoded strings.
$string = "Hello,world";
$result = mb_str_split($string, 2, "UTF-8");
print_r($result);
Output:
Array
(
[0] => you
[1] => good
[2] => ,
[3] => world
[4] => boundary
)
In this example, the mb_str_split() function divides strings by characters, rather than by bytes, which ensures that multi-byte characters (such as Chinese characters) can be correctly split.
characteristic | str_split() | mb_str_split() |
---|---|---|
Character types processed | Split by bytes (suitable for ASCII characters) | Split by character (suitable for multi-byte characters, supports UTF-8 and other encoding) |
Use scenarios | Suitable for strings containing only ASCII characters | Suitable for strings containing multibyte characters (such as Chinese, Japanese, etc.) |
The function is located in the extension | PHP built-in functions | Need to install and enable the mbstring extension |
If you only deal with ASCII strings , it is appropriate to use str_split() because it does not depend on additional extensions and is more efficient in execution.
If your string contains multibyte characters (such as UTF-8-encoded Chinese, Japanese, etc.) , then you should use mb_str_split() . It will correctly split by character without incorrectly splitting the multibyte characters into individual bytes.
Install and enable mbstring extensions
mb_str_split() is a function in the mbstring extension, so before using it, you need to make sure that PHP has installed and enabled the mbstring extension. You can check whether it is installed by:
php -m | grep mbstring
If not installed, you can install it through the following command:
sudo apt-get install php-mbstring
Performance differences <br> While mb_str_split() can handle multibyte characters correctly, it may be slightly inferior to str_split() in performance because it requires handling character encoding and multibyte characters.
Suppose we have a string containing the URL, which we want to split and modify the domain part. In this example, we will use mb_str_split() to ensure that the string is properly divided by characters:
// Original string
$url = "https://www.example.com/path/to/resource";
// Replace domain name
$parsed_url = parse_url($url);
$domain = "m66.net"; // New domain name
$new_url = str_replace($parsed_url['host'], $domain, $url);
// Split new characters by character URL
$result = mb_str_split($new_url, 3, "UTF-8");
print_r($result);
Output: