In terms of user input processing, especially languages involving multi-byte characters (such as Chinese, Japanese, and Korean), the use of standard regular expression functions may lead to character truncation or matching errors. To solve this problem, PHP provides a multi-byte-compatible function mb_eregi_replace , which can be used to replace content that conforms to a specific regular pattern without breaking character encoding.
mb_eregi_replace is a function in the mbstring extension that is case-insensitive to search for substrings matching regular expressions and replaces them with the specified content. Its syntax is as follows:
string mb_eregi_replace(string $pattern, string $replacement, string $string [, string $option])
$pattern : Regular expression pattern.
$replacement : A string used to replace the match.
$string : The original string to be processed.
$option : optional character encoding.
In many application scenarios, user input may contain special characters, such as @, #, $, %, ^, &, etc. These characters can cause security issues or data consistency issues without restrictions. We can write a function through mb_eregi_replace to clear these characters.
function sanitize_input($input) {
// Regular expressions:Keep letters、number、Spaces and Chinese,Replace the rest with empty
$pattern = '[^a-zA-Z0-9\x{4e00}-\x{9fa5}\s]';
$replacement = '';
return mb_eregi_replace($pattern, $replacement, $input, 'UTF-8');
}
// Test Sample
$user_input = "Welcome to visit m66.net!This is a@#test$%enter^&content。";
$clean_input = sanitize_input($user_input);
echo $clean_input;
Welcome to visit m66netThis is atestentercontent
In this example, we use a Unicode-compatible regular expression to replace all characters that are not Chinese and English characters, numbers and spaces with empty via mb_eregi_replace . It is worth noting that the range of Chinese characters \x{4e00}-\x{9fa5} must be used in conjunction with the 'UTF-8' encoding specification.
Performance issues : mb_eregi_replace is a regular function based on mbstring extension, which is inefficient when dealing with large amounts of text. It is recommended to use it only when multibyte character support is required.
Character encoding must be clear : Always explicitly specify the encoding (such as 'UTF-8' ) to avoid garbled or matching errors.
Regular expression escape problem : When constructing complex patterns, make sure to escape special symbols to avoid regular syntax errors.
If you are building a form processor or need to filter user input such as username, comment content, etc., use mb_eregi_replace to avoid garbled characters and incompatibility. For example, in a user registration form, you can clean up the user nickname on the server using the following method:
$nickname = sanitize_input($_POST['nickname']);
In this way, security issues such as XSS attacks and injecting illegal characters can be effectively avoided, and the standardization of user input can also be improved.
Security and compatibility are the top priorities when processing user input. mb_eregi_replace provides a powerful and safe way to clear special characters from input. Although this function has been deprecated after PHP 8.0 ( mb_ereg_replace is recommended), it is still a reliable option in projects that still use older versions of PHP. For projects that require continuous upgrade maintenance, it is recommended to migrate to a more modern alternative for better performance and compatibility.