How to Accurately Replace Keywords in Multilingual Texts Using the mb_eregi_replace Function

M66 2025-06-15

When developing multilingual applications, handling string replacements is particularly complex due to character encoding, case sensitivity, and regex language compatibility issues. PHP offers the multibyte string function mb_eregi_replace to help us handle these tasks more precisely. This article will provide an in-depth explanation of how to use the mb_eregi_replace function to accurately replace keywords in multilingual texts, ensuring expected results across various language environments.

1. Introduction to mb_eregi_replace

mb_eregi_replace is a function from PHP's multibyte string extension (mbstring) used to perform case-insensitive regular expression replacements. Similar to eregi_replace, it is specifically designed to handle multibyte character sets like UTF-8, providing better support for Chinese, Japanese, Korean, and other characters.

The function is defined as follows:

string mb_eregi_replace ( string $pattern , string $replacement , string $string [, string $option = "msr" ] )

$pattern: Regular expression (case-insensitive)
$replacement: Replacement text
$string: Target text to search
$option (optional): Controls pattern matching behavior, such as "m" (multiline), "s" (make . match newline), "r" (use replacement string)

2. Why Choose mb_eregi_replace?

When handling texts containing different languages, ordinary string replacement functions like str_replace or preg_replace often fail to correctly process diacritics or other special characters. For example, replacing “stra?e” with “road” in German might miss variants such as “Stra?e” if using str_replace('Strasse', 'Road', $text) directly, due to case or special character differences.

mb_eregi_replace can handle these complex character cases because it supports Unicode and performs case-insensitive matching by default.

3. Practical Example

Suppose we have a multilingual text and want to replace the keyword “café” with “coffee shop” regardless of its case or variants (such as "Café", "CAFé", "cafe"). We can do this as follows:

<?php
mb_internal_encoding("UTF-8");
<p>$text = "Let's meet at the Café. A nice little cafe downtown. Also try the CAFé by the river.";</p>
<p>$pattern = "café"; // Case-insensitive, supports multibyte characters automatically<br>
$replacement = "coffee shop";</p>
<p>$result = mb_eregi_replace($pattern, $replacement, $text);</p>
<p>echo $result;<br>
?><br>

Output:

Let's meet at the coffee shop. A nice little coffee shop downtown. Also try the coffee shop by the river.

As you can see, all forms of “café” are correctly replaced without concerns about character sets or case.

4. Real-world Use with URLs: Bulk Replacing Keywords in Links

For example, automatically replacing the keyword “下载” with a link https://m66.net/download in website content can be done like this:

<?php
mb_internal_encoding("UTF-8");
<p>$content = "你可以点击这里下载软件，也可以到其他页面去下载相关资料。";</p>
<p>$pattern = "下载";<br>
$replacement = "<a href="<a rel="noopener" target="_new" class="cursor-pointer">https://m66.net/download\">下载</a</a>>";</p>
<p>$result = mb_eregi_replace($pattern, $replacement, $content);</p>
<p>echo $result;<br>
?><br>

Output:

你可以点击这里<a href="https://m66.net/download">下载</a>软件，也可以到其他页面去<a href="https://m66.net/download">下载</a>相关资料。

This example demonstrates how to convert multilingual keywords into links, enhancing content interactivity without introducing extra complex logic.

5. Tips and Considerations

Set Encoding: Use mb_internal_encoding("UTF-8") to ensure proper handling of multibyte characters under UTF-8.
Matching Accuracy: To avoid incorrect replacements, such as replacing “下载” within “下载器,” word boundaries can help control matches—for example: \b下载\b.
Performance: For bulk replacements in large texts, evaluate performance overhead and consider using preg_replace combined with the /iu modifier if needed.

6. Conclusion

Simple string matching often falls short in multilingual environments. Through mb_eregi_replace, PHP provides a safer and more reliable way to handle replacements, especially when dealing with Unicode and case sensitivity. We hope this article and the examples help you apply it more effectively in your projects.