When using PHP to process strings, str_split is a commonly used function to split a string into a substring of a specified length. Its common usage is to split a string into characters or substrings of fixed length. However, when dealing with Chinese characters, we may encounter garbled problems, because str_split splits strings by bytes rather than characters by default. Chinese characters usually occupy multiple bytes, which will cause str_split to fail to process Chinese characters correctly, resulting in garbled code.
The str_split function splits the string by byte length, and by default the length of each substring is 1. If the incoming string contains multibyte characters (such as Chinese characters), str_split will treat each byte as one character. In this way, Chinese characters will be divided into multiple individual characters, resulting in garbled code.
Suppose we have a Chinese string "Hello, PHP!" and split it using str_split function:
<?php
$str = "Hello,PHP!";
$result = str_split($str);
print_r($result);
?>
The output may be:
Array
(
[0] => you
[1] => good
[2] => ,
[3] => P
[4] => H
[5] => P
[6] => !
)
From the output results, we can see that the Chinese characters "you" and "good" are respectively split into separate characters, rather than a whole. This will lead to garbled code.
To avoid this, we can use the mb_str_split function. mb_str_split is a multibyte-safe string splitting function that correctly handles Chinese characters and splits them as a whole.
mb_str_split is a multibyte string function for PHP (part of the mbstring extension). It splits the string by characters instead of bytes. When using the mb_str_split function, there will be no garbled problems.
<?php
$str = "Hello,PHP!";
$result = mb_str_split($str);
print_r($result);
?>
The output result is:
Array
(
[0] => you
[1] => good
[2] => ,
[3] => P
[4] => H
[5] => P
[6] => !
)
As you can see, the Chinese characters "you" and "good" are correctly divided as a whole, rather than split into multiple bytes.
If you use a URL in your code (such as API requests, etc.) and the URL contains Chinese characters, you also need to be careful when passing the Chinese characters in the URL to str_split . The URL can be encoded using urlencode or rawurlencode functions to avoid garbled code problems.
For example:
<?php
$url = "https://m66.net/search?query=Chinese characters";
$encoded_url = urlencode($url);
echo $encoded_url;
?>
The output result is:
https%3A%2F%2Fm66.net%2Fsearch%3Fquery%3D%E4%B8%AD%E6%96%87%E5%AD%97%E7%AC%A6
In this way, Chinese characters are correctly encoded into a URL-recognizable format.
The str_split function has a garbled problem when dealing with Chinese characters because it splits strings by bytes, and Chinese characters usually take up multiple bytes.
To avoid garbled code, you can use the mb_str_split function to handle multibyte characters to ensure that Chinese characters are split as a whole.
If you need to deal with URLs containing Chinese, you should use urlencode or rawurlencode functions to encode the URL to prevent garbled code problems.
Hopefully these methods can help you avoid the garbled problem of str_split when dealing with Chinese characters and ensure the correct processing of strings.