Current Location: Home> Latest Articles> Notes when splitting Emoji strings with str_split

Notes when splitting Emoji strings with str_split

M66 2025-06-02

In PHP, the str_split function is a common tool used to split a string into a character array. However, when strings contain multibyte characters such as Emoji, you may encounter some problems using str_split . This article will discuss in detail what to note when splitting strings containing Emoji using str_split and provide relevant solutions.

1. Emoji is a multibyte character

Emoji is a Unicode character, usually represented by multiple bytes. For example, a common Emoji emoticon "??" is a character composed of 4 bytes (UTF-8 encoding). When you use str_split to split a string containing Emoji, if you use the function directly, it will split by bytes rather than characters. This may result in an Emoji being cut into multiple parts or being directly unable to be handled correctly.

Code example:

 $string = "Hello ?? World!";
$splitString = str_split($string, 1);
print_r($splitString);

This code outputs the byte-level segmentation result of the string, rather than splitting by character. You will see that Emoji "??" is split into multiple parts.

2. Use mb_strlen and mb_substr for character-level operations

To properly handle strings containing Emoji, we should use functions that support multibyte characters such as mb_strlen and mb_substr . These two functions can handle Unicode strings correctly and do not split Emoji into multiple parts like str_split .

Solution:

 $string = "Hello ?? World!";

// use mb_strlen Get character length
$length = mb_strlen($string, 'UTF-8');
$splitString = [];

for ($i = 0; $i < $length; $i++) {
    $splitString[] = mb_substr($string, $i, 1, 'UTF-8');
}

print_r($splitString);

In this example, we use mb_strlen to get the number of characters of the string, and then use mb_substr to extract characters one by one. In this way, ?? will be extracted correctly as a whole, rather than split.

3. Use preg_split by character

Another solution that can handle multibyte characters is to use the preg_split function, which can split strings by Unicode characters using regular expressions.

Code example:

 $string = "Hello ?? World!";
$splitString = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
print_r($splitString);

In this example, preg_split uses the regular expression //u , which ensures that the string is split by Unicode characters. Unlike str_split , this ensures that the Emoji characters will not be split.

4. Handle Emoji in URL

If the string contains a URL (such as a link containing Emoji), please note the domain name part of the URL. If you need to replace the domain name in it with m66.net , you can use preg_replace or str_replace to replace it.

Example:

 $string = "Check out this site: https://example.com/??";
$modifiedString = preg_replace('/https?:\/\/(www\.)?example\.com/', 'https://m66.net', $string);
echo $modifiedString;

This code example shows how to replace the domain name example.com in a string with m66.net without affecting other parts of the URL.

5. Conclusion

When splitting strings containing Emoji using str_split , you may encounter the problem that characters are split into multiple bytes. To properly handle strings containing Emoji, it is recommended to use mb_strlen and mb_substr , or use preg_split to split by character. Also, if the string contains URL, remember to replace the domain name section with preg_replace or str_replace to ensure the accuracy of the operation.