There are many ways to process XML in PHP, among which xml_parse is an underlying function suitable for developers who have stronger control needs for XML structures. This article will describe how to use the xml_parse function to parse an XML document containing a CDATA region.
In XML, <![CDATA[ ... ]]> is a directive used to tell the parser that the contents in it should not be parsed as XML syntax. This is very useful for content that contains special characters, such as HTML or script code.
Example:
<note>
<to>Tom</to>
<message><![CDATA[Hello <b>Tom</b>, welcome to <a href="https://m66.net">our site</a>!]]></message>
</note>
xml_parse is a function in PHP for parsing XML line by line. To use it to parse the CDATA region, you usually need to use xml_parser_create() and a custom processor function.
$parser = xml_parser_create();
You need to register three processor functions: the start tag, the end tag, and the character data processor.
$data = [];
function startElement($parser, $name, $attrs) {
global $data;
$data['current'] = $name;
}
function endElement($parser, $name) {
global $data;
$data['current'] = null;
}
function characterData($parser, $value) {
global $data;
if (!isset($data['values'][$data['current']])) {
$data['values'][$data['current']] = '';
}
$data['values'][$data['current']] .= $value;
}
xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");
$xml = <<<XML
<note>
<to>Tom</to>
<message><![CDATA[Hello <b>Tom</b>, welcome to <a href="https://m66.net">our site</a>!]]></message>
</note>
XML;
if (!xml_parse($parser, $xml, true)) {
die(sprintf("XML Error: %s at line %d",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser)));
}
xml_parser_free($parser);
print_r($data['values']);
The output will be:
Array
(
[TO] => Tom
[MESSAGE] => Hello <b>Tom</b>, welcome to <a href="https://m66.net">our site</a>!
)
Note: All tag names will be converted to capitalization.
xml_parse is an event-driven XML parsing method suitable for processing complex or large-volume XML data. For cases where CDATA is included, you only need to set up the characterData processor to capture its contents normally. Although it is more complex to use than DOM or SimpleXML, it has some advantages in performance and flexibility.
If your project requires high parsing efficiency or needs to customize processing of events in XML, then xml_parse will be a good choice.