Current Location: Home> Latest Articles> xml_parse cannot handle nested tags correctly: Common errors and solutions

xml_parse cannot handle nested tags correctly: Common errors and solutions

M66 2025-04-24

When using PHP to process XML data, xml_parse() is a relatively basic function, belonging to PHP's XML parser extension (based on Expat). However, many developers often encounter situations that cannot be handled correctly when using it to parse nested tags. This article will take you into the deep understanding of the causes of this problem and provide practical solutions.

1. Introduction to the working principle of xml_parse

xml_parse() uses event-driven model to process XML documents. When the parser reads the start tag, end tag or character data, the corresponding callback function will be called.

The sample initialization code is as follows:

 $parser = xml_parser_create();

xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");

$data = '<book><title>PHP Guide</title><author>John</author></book>';
xml_parse($parser, $data, true);
xml_parser_free($parser);

In this example, the parser will trigger the startElement , characterData , and endElement callback functions in turn.

2. Why can't xml_parse handle nested tags correctly?

The main reasons are as follows:

1. The logic of the callback function processing is incomplete

Many developers fail to properly maintain parsed state or structure when dealing with nested tags. Since xml_parse() will not automatically help you create an XML tree structure, nested data needs to be built manually.

For example, the following code fails to handle nested nodes correctly:

 function startElement($parser, $name, $attrs) {
    global $currentTag;
    $currentTag = $name;
}

function characterData($parser, $data) {
    global $currentTag;
    echo "$currentTag: $data\n";
}

In nested tags, $currentTag will be constantly overwritten, resulting in the inability to identify which tag the data belongs to.

2. The stack structure is not used to save nested state

In order to parse nested XML, it is recommended to use a stack to maintain the current tag path:

 $tagStack = [];

function startElement($parser, $name, $attrs) {
    global $tagStack;
    array_push($tagStack, $name);
}

function endElement($parser, $name) {
    global $tagStack;
    array_pop($tagStack);
}

function characterData($parser, $data) {
    global $tagStack;
    $path = implode(' > ', $tagStack);
    echo "[$path] $data\n";
}

This code can show the hierarchy of nested tags more clearly, for example:

 <article>
    <header><title>News Title</title></header>
    <body>Content section</body>
</article>

The output will be:

 [ARTICLE > HEADER > TITLE] News Title
[ARTICLE > BODY] Content section

3. The data is erroneously truncated or incomplete

If the data passed into xml_parse() is incomplete or is not correctly marked as true as the is_final parameter (i.e. the last parameter) is true , it will also cause parsing to fail:

 xml_parse($parser, $data, true); // The third parameter must be true Indicates that the data is complete

3. How to correctly parse nested XML?

The most recommended way is to use a higher-level XML parser, such as:

1. SimpleXML

 $xml = simplexml_load_string('<book><title>PHP Guide</title></book>');
echo $xml->title; // Output: PHP Guide

2. DOMDocument

 $doc = new DOMDocument();
$doc->loadXML('<site><url>https://m66.net</url></site>');
$nodes = $doc->getElementsByTagName('url');
echo $nodes->item(0)->nodeValue; // Output: https://m66.net

These parsers have handled the nested structure of nodes for you, the code is clearer and the maintenance is simpler.

4. Summary and Suggestions

  • xml_parse() uses event-driven models and does not automatically build structured trees, so nested tags need to be processed manually;

  • It is recommended to use a stack structure to track the current tag path;

  • In projects, if event-driven methods are not specifically required, SimpleXML or DOMDocument is recommended to handle nested XML;

  • Pay attention to the integrity of the incoming data to avoid truncation;

XML parsing is actually not complicated. The key is to choose the right tool and understand its underlying principles. I hope this article can help you better deal with nested XML issues in actual development.