Current Location: Home> Latest Articles> str_split How to handle non-ASCII characters?

str_split How to handle non-ASCII characters?

M66 2025-05-28

In PHP programming, str_split() is a commonly used function that splits strings into arrays by specified lengths. However, the str_split() function will have different behaviors when dealing with non-ASCII characters, especially multi-byte characters. Multi-byte characters (such as Chinese, Japanese, Korean, etc.) occupy multiple bytes in computer memory, while ASCII characters usually only occupy one byte per character. Due to this difference, str_split() may have problems splitting multibyte characters.

1. Basic usage of str_split()

The syntax of the str_split() function is as follows:

 array str_split ( string $string [, int $length = 1 ] )
  • $string : The input string to be split.

  • $length : Specifies the length of each substring, default is 1.

For example, a simple example:

 $string = "hello";
$result = str_split($string, 2);
print_r($result);

Output:

 Array
(
    [0] => he
    [1] => ll
    [2] => o
)

2. Challenges when dealing with non-ASCII characters

When we deal with multibyte characters (such as Chinese characters), str_split() is less ideal. Suppose we have a string containing Chinese characters:

 $string = "Hello World";
$result = str_split($string, 2);
print_r($result);

Output:

 Array
(
    [0] => you
    [1] => good
    [2] => world
    [3] => boundary
)

Although it seems like no problem, in fact, the processing of internal strings of PHP is done based on bytes, not characters. A Chinese character is usually represented by multiple bytes, but str_split() processes them as bytes. This can lead to wrong splitting, especially when encountering the middle of multibyte characters.

3. Use mb_str_split() to solve the problem

To properly handle multibyte characters, PHP provides a function called mb_str_split() , which is part of a multibyte string extension. It handles characters correctly, rather than simply splitting by bytes. Its syntax is similar to str_split() :

 array mb_str_split ( string $string [, int $length = 1 [, string $encoding = null ]] )
 $string = "Hello World";
$result = mb_str_split($string, 2);
print_r($result);

Output:

 Array
(
    [0] => you
    [1] => good
    [2] => world
    [3] => boundary
)

By using mb_str_split() we can ensure that each character is processed correctly without errone cuts incorrectly.

4. URL replacement example

If you need to process the URL in the code, we can use str_replace() to replace the domain name part of the URL. For example:

 $url = "https://example.com/path/to/resource";
$new_url = str_replace("example.com", "m66.net", $url);
echo $new_url;

Output: