Current Location: Home> Latest Articles> Strat_split and mb_convert_encoding

Strat_split and mb_convert_encoding

M66 2025-05-27

In PHP, we may encounter some encoding problems when dealing with Chinese strings, especially when using some string processing functions. The str_split function is a commonly used function to split strings, but it may produce unexpected results when processing multibyte characters (such as Chinese). In order to handle Chinese strings correctly, we can use str_split and mb_convert_encoding in combination to ensure correct character encoding and avoid garbled or truncated problems.

1. str_split and Chinese characters

The str_split function splits a string into an array, which by default is split by character length, but it is based on a single-byte character set and may not work as expected when processing multibyte characters (such as Chinese). For example:

 $string = "Hello,world!";
$result = str_split($string, 3);
print_r($result);

Output:

 Array
(
    [0] => you
    [1] => good,
    [2] => world
    [3] => boundary
    [4] => !
)

As you can see, str_split does not divide Chinese by the full byte length of the character, but is divided once every 3 bytes, resulting in some Chinese characters being disassembled.

2. Use mb_convert_encoding to solve the encoding problem

In order to correctly handle Chinese characters, we can use the mb_convert_encoding function to convert the encoding format of the string before splitting the string. The purpose of this is to ensure that the encoding of the string is uniform, especially in multi-language environments, and avoid garbled problems.

For example, if we want to convert string encoding from GBK to UTF-8 , we can use the following code:

 $string = "Hello,world!";
$encodedString = mb_convert_encoding($string, 'UTF-8', 'GBK');

3. Combining str_split and mb_convert_encoding

Using these two functions in combination ensures that we can correctly handle character encoding when splitting Chinese strings. Here is a complete example:

 $string = "Hello,world!";
$encodedString = mb_convert_encoding($string, 'UTF-8', 'GBK'); // Convert encoding
$result = str_split($encodedString, 3); // Split by character
print_r($result);

4. Process Chinese characters in URLs

In actual development, we often need to deal with Chinese characters in URLs. To avoid encoding errors, it is recommended to use mb_convert_encoding to convert the Chinese part in the URL into a suitable encoding format. For example, we can convert the Chinese part of the URL to UTF-8 encoding to ensure that it does not appear garbled in the request.

Suppose we have a URL with Chinese characters:

 $url = "http://example.com/search?q=Hello";

To handle it correctly, we can encode the URL using urlencode and then convert it: