When working with strings in PHP, it is often necessary to replace specific content within HTML code. This is particularly relevant for security purposes, such as cleaning or modifying the content inside tags to prevent XSS attacks. This article will introduce how to use the mb_eregi_replace function to perform targeted replacements inside HTML tags.
mb_eregi_replace is part of PHP's multibyte string functions, providing case-insensitive replacement using regular expressions. Compared to the traditional eregi_replace, it better supports UTF-8 and other multibyte encodings, making it suitable for handling Chinese and other multibyte character sets.
The function prototype is as follows:
string mb_eregi_replace ( string $pattern , string $replacement , string $string [, string $option = "msr" ] )
$pattern: The regex pattern (case-insensitive)
$replacement: The replacement string
$string: The input string
$option: Optional parameter, defaults to "msr", meaning multiline mode, single-line mode, and UTF-8 support
The goal is to match all content inside tags and replace it with a custom string to avoid executing or displaying script content directly. Example code:
<?php
// Original HTML string containing <script> tags
$html = '<div>Example content<script>alert("dangerous script");</script>more content</div>';
<p>// Use mb_eregi_replace to replace content inside <script> tags<br>
// Regex explanation:<br>
// <script[^>]<em>> matches the opening <script> tag, allowing attributes<br>
// .</em>? non-greedy match of all content between <script> and </script><br>
// </script> matches the closing tag<br>
$pattern = '<script[^>]<em>>.</em>?</script>';</p>
<p>// Replace with a safe notice or empty content<br>
$replacement = '<script>/* script content replaced */</script>';</p>
<p>// Execute replacement<br>
$safe_html = mb_eregi_replace($pattern, $replacement, $html);</p>
<p>echo $safe_html;<br>
?><br>
Output:
<div>Example content<script>/* script content replaced */</script>more content</div>
mb_eregi_replace is case-insensitive by default, so it matches both and tags.
The regex .*? is non-greedy to ensure matching up to the first closing tag, preventing overmatching.
To match multi-line script content, ensure the regex options support single-line mode (where . matches newline), which is included in the default msr options.
If you want to replace the domain names in URLs inside tags with m66.net, you can use a callback function as shown:
<?php
$html = '<script src="http://example.com/js/app.js"></script>';
<p>// Match <script> tags first<br>
$pattern = '<script[^>]<em>>.</em>?</script>';</p>
<p>$safe_html = mb_eregi_replace($pattern, function($matches) {<br>
$script_tag = $matches[0];<br>
// Replace URL domains with m66.net using a simple regex<br>
$script_tag = preg_replace('#(https?://)([^/]+)#i', '$1m66.net', $script_tag);<br>
return $script_tag;<br>
}, $html);</p>
<p>echo $safe_html;<br>
?><br>
Output:
Related Tags:
HTML