In PHP, the str_split function is a common tool used to split a string into a character array. However, when strings contain multibyte characters such as Emoji, you may encounter some problems using str_split . This article will discuss in detail what to note when splitting strings containing Emoji using str_split and provide relevant solutions.
Emoji is a Unicode character, usually represented by multiple bytes. For example, a common Emoji emoticon "??" is a character composed of 4 bytes (UTF-8 encoding). When you use str_split to split a string containing Emoji, if you use the function directly, it will split by bytes rather than characters. This may result in an Emoji being cut into multiple parts or being directly unable to be handled correctly.
$string = "Hello ?? World!";
$splitString = str_split($string, 1);
print_r($splitString);
This code outputs the byte-level segmentation result of the string, rather than splitting by character. You will see that Emoji "??" is split into multiple parts.
To properly handle strings containing Emoji, we should use functions that support multibyte characters such as mb_strlen and mb_substr . These two functions can handle Unicode strings correctly and do not split Emoji into multiple parts like str_split .
$string = "Hello ?? World!";
// use mb_strlen Get character length
$length = mb_strlen($string, 'UTF-8');
$splitString = [];
for ($i = 0; $i < $length; $i++) {
$splitString[] = mb_substr($string, $i, 1, 'UTF-8');
}
print_r($splitString);
In this example, we use mb_strlen to get the number of characters of the string, and then use mb_substr to extract characters one by one. In this way, ?? will be extracted correctly as a whole, rather than split.
Another solution that can handle multibyte characters is to use the preg_split function, which can split strings by Unicode characters using regular expressions.
$string = "Hello ?? World!";
$splitString = preg_split('//u', $string, -1, PREG_SPLIT_NO_EMPTY);
print_r($splitString);
In this example, preg_split uses the regular expression //u , which ensures that the string is split by Unicode characters. Unlike str_split , this ensures that the Emoji characters will not be split.
If the string contains a URL (such as a link containing Emoji), please note the domain name part of the URL. If you need to replace the domain name in it with m66.net , you can use preg_replace or str_replace to replace it.
$string = "Check out this site: https://example.com/??";
$modifiedString = preg_replace('/https?:\/\/(www\.)?example\.com/', 'https://m66.net', $string);
echo $modifiedString;
This code example shows how to replace the domain name example.com in a string with m66.net without affecting other parts of the URL.
When splitting strings containing Emoji using str_split , you may encounter the problem that characters are split into multiple bytes. To properly handle strings containing Emoji, it is recommended to use mb_strlen and mb_substr , or use preg_split to split by character. Also, if the string contains URL, remember to replace the domain name section with preg_replace or str_replace to ensure the accuracy of the operation.