Current Location: Home> Latest Articles> Use mb_eregi_replace to replace sensitive words in strings

Use mb_eregi_replace to replace sensitive words in strings

M66 2025-06-03

When building web applications, processing user input is an important task, especially filtering sensitive words. PHP provides some multi-byte string functions, where mb_eregi_replace is a powerful tool suitable for case-insensitive replacement of strings in multi-lingual environments. This article will explain how to use mb_eregi_replace to replace sensitive words in text.

1. Introduction to mb_eregi_replace function

mb_eregi_replace is a function provided by PHP multi-byte string extension mbstring , and its basic syntax is as follows:

 mb_eregi_replace(string $pattern, string $replacement, string $string, string $option = ""): string|false
  • $pattern : Regular expression to match the string to be replaced (case insensitive).

  • $replacement : The content used for replacement.

  • $string : original string.

  • $option : Optional parameter, used to specify encoding and other settings.

This function is suitable for processing multi-byte character sets such as UTF-8, and is ideal for handling Chinese sensitive word filtering.

2. Basic implementation of replacement sensitive words

Here is a simple example showing how to replace sensitive words in a string with mb_eregi_replace :

 <?php
// List of sensitive words
$sensitive_words = ['gamble', 'pornography', 'Violence'];

// Text to be processed
$input_text = "This website containsgambleandpornographycontent,access https://m66.net/play There will be more information。";

// Replace characters
$replacement = '***';

// Replace all sensitive words in a loop
foreach ($sensitive_words as $word) {
    $input_text = mb_eregi_replace($word, $replacement, $input_text);
}

echo $input_text;
?>

Output result:

 This website contains***and***content,access https://m66.net/play There will be more information。

As shown above, mb_eregi_replace accurately replaces multibyte sensitive words and leaves the URL and other content unchanged.

3. Use regular expressions to improve matching ability

In practical applications, sensitive words may have deformations, such as adding spaces or special symbols. We can use more complex regular expressions to improve recognition, for example:

 <?php
$sensitive_words = ['bet\s*Blog', 'color\s*Affection', 'Brutal\s*force'];
$input_text = "This is a matter ofbet Blogcontent,Also includedBrutal  forceandseAffection,请勿access http://m66.net/bad.html。";

foreach ($sensitive_words as $word) {
    $pattern = $word;
    $input_text = mb_eregi_replace($pattern, '***', $input_text);
}

echo $input_text;
?>

Output:

 This is a matter of***content,Also included***and***,请勿access http://m66.net/bad.html。

Use \s* to match any spaces so that the deformed sensitive words can also be replaced correctly.

4. Things to note

  • mb_eregi_replace was removed after PHP 8.0. It is recommended to use preg_replace with mb_convert_encoding to achieve similar functions.

  • Make sure that mbstring extension is enabled before use.

  • Special characters in the matching pattern should be escaped, otherwise a regular matching error may be caused.

V. Conclusion

With mb_eregi_replace , we can easily achieve accurate replacement of sensitive words in multi-lingual environments. However, it needs to be aware of its compatibility issues. It is recommended that new projects consider using more modern alternatives, such as a regular expression library that combines preg_replace and Unicode-supported. Either way, building a safe and healthy content environment is always an important responsibility of every developer.