When dealing with Chinese or other multi-byte character texts, ordinary string replacement functions often fail to correctly recognize and handle double-byte characters, especially in scenarios like sensitive word filtering. The mb_eregi_replace function in PHP is a multi-byte safe regex replacement function that can ignore case and correctly process multi-byte characters, making it ideal for replacing sensitive words containing double-byte characters in text.
Below is an example demonstrating how to use mb_eregi_replace to replace sensitive words in a text.
<?php
// Set internal character encoding to UTF-8 to ensure multi-byte string functions work properly
mb_internal_encoding("UTF-8");
<p>// Original text containing Chinese sensitive words<br>
$text = "This is a test text containing sensitive words: 敏感词 and 不良内容.";</p>
<p>// List of sensitive words (supports regex patterns)<br>
$sensitiveWords = [<br>
"敏感词",<br>
"不良内容"<br>
];</p>
<p>// Replace sensitive words with ***<br>
foreach ($sensitiveWords as $word) {<br>
// Use mb_eregi_replace for case-insensitive replacement<br>
$text = mb_eregi_replace($word, "***", $text);<br>
}</p>
<p>echo $text;<br>
?><br>
Output result:
This is a test text containing sensitive words: *** and ***.
Multi-byte Safety
mb_eregi_replace is the case-insensitive version of mb_ereg_replace, designed specifically to handle multi-byte encoded strings, avoiding issues where regular expressions cannot recognize characters like Chinese or Japanese.
Character Encoding Setting
You need to call mb_internal_encoding("UTF-8") first or ensure the script’s default encoding is UTF-8 to guarantee the proper operation of multi-byte string functions.
Sensitive Word Matching
Supports regular expressions, allowing flexible definitions of sensitive word rules, such as fuzzy matching or stemming.
If there are many sensitive words, you can load the list from a database or file, then loop through to replace them. It can also be combined with user input filtering to perform real-time sensitive word replacement and ensure content safety.