Processing XML in PHP is a common requirement, especially when docking with third-party interfaces or processing configuration files. xml_parse() is an underlying function provided by PHP to parse XML data. However, when you deal with XML with many nested structures or complex formats, you may encounter parsing errors if you are not careful, resulting in data not being read correctly or the program crashes directly.
This article will explain how to avoid common parsing errors when using xml_parse() to handle nested XML and provide practical code examples.
PHP's XML parsing usually uses event-driven mode, and is used with the following functions:
xml_parse()
Here is a basic XML parsing example:
<?php
$xml = <<<XML
<books>
<book>
<title>PHP programming</title>
<author>Zhang San</author>
</book>
<book>
<title>XML Actual combat</title>
<author>Li Si</author>
</book>
</books>
XML;
$parser = xml_parser_create();
xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");
function startElement($parser, $name, $attrs) {
echo "Start Element: $name\n";
}
function endElement($parser, $name) {
echo "Ending Element: $name\n";
}
function characterData($parser, $data) {
if (trim($data)) {
echo "Character data: $data\n";
}
}
if (!xml_parse($parser, $xml, true)) {
die("XML mistake: " . xml_error_string(xml_get_error_code($parser)) .
" In the " . xml_get_current_line_number($parser) . " OK");
}
xml_parser_free($parser);
?>
The characterData callback function may be triggered multiple times when elements are nested deep, especially if there are line breaks or spaces between elements. The content should be collected using cache variables and then processed in endElement() .
$depth = 0;
$currentTag = '';
$contentBuffer = [];
function startElement($parser, $name, $attrs) {
global $depth, $currentTag;
$depth++;
$currentTag = $name;
}
function endElement($parser, $name) {
global $depth, $currentTag, $contentBuffer;
if (isset($contentBuffer[$depth])) {
echo "element $name The value of: " . trim($contentBuffer[$depth]) . "\n";
unset($contentBuffer[$depth]);
}
$depth--;
}
function characterData($parser, $data) {
global $depth, $contentBuffer;
if (!isset($contentBuffer[$depth])) {
$contentBuffer[$depth] = '';
}
$contentBuffer[$depth] .= $data;
}
Some XML may contain illegal characters, such as control characters, unescaped & symbols, etc. At this time, preprocessing is required before parsing:
$xml = preg_replace('/[^\x09\x0A\x0D\x20-\x7F]/u', '', $xml); // Remove illegal characters
$xml = str_replace('&', '&', $xml); // Escape unencoded &
Be careful not to repeatedly escape to ensure that legitimate XML entities are not destroyed.
Make sure that the XML data is UTF-8 encoded and specified when parser creation:
$parser = xml_parser_create('UTF-8');
xml_parser_set_option($parser, XML_OPTION_TARGET_ENCODING, 'UTF-8');
When nested XML structure is complex, it is recommended to use a stack structure to record the label level to facilitate subsequent structured processing.
$elementStack = [];
function startElement($parser, $name, $attrs) {
global $elementStack;
array_push($elementStack, $name);
}
function endElement($parser, $name) {
global $elementStack;
array_pop($elementStack);
}
Through the stack, you can get the current path or hierarchy in real time, and even build a multi-dimensional array to represent XML structure.
For external XML sources, be sure to add exception processing to prevent the page from crashing due to parsing failure.
Use libxml_use_internal_errors() with simplexml_load_string() as an alternative solution to the failure of the process.
If you read XML from an interface like https://api.m66.net/data.xml , please first use file_get_contents() to read the content, and then pass it into the parser to avoid network errors affecting logical processing.
$xmlContent = file_get_contents("https://api.m66.net/data.xml");
if ($xmlContent === false) {
die("Unable to load XML data");
}
When using xml_parse() to process nested XML, common parsing errors often come from inconsistent character data, illegal characters, irregular encoding, or unclear parsing logic. Through reasonable coding specifications, preprocessing and structured design, you can greatly improve the robustness and stability of XML parsing.
Mastering the underlying parsing logic can also help you to efficiently process XML data structures when you are unable to use advanced XML libraries such as SimpleXML or DOMDocument.
If you need to further simplify the parsing process, you can consider encapsulating a class to manage parser and data collection logic, which can better reuse and maintain the code.