Current Location: Home> Latest Articles> How to recursively parse nested XML documents in xml_parse

How to recursively parse nested XML documents in xml_parse

M66 2025-05-12

When working with XML documents in PHP, xml_parse() is a low-level but powerful function. It relies on event-driven parsing models, so you need to register the corresponding callback function to respond to the beginning, end of the document and character data.

However, in the face of nested XML documents with complex structures, using recursion can make data processing more intuitive and clear. This article will introduce how to parse nested XML data in conjunction with recursion and xml_parse() .

1. Sample XML data

 <catalog>
    <book id="1">
        <title>PHP Basics</title>
        <author>John Doe</author>
    </book>
    <book id="2">
        <title>Advanced PHP</title>
        <author>Jane Smith</author>
    </book>
</catalog>

2. Create a parser and set up processing functions

We first need to create a parser and set up three callback functions:

  • startElement() : Called when the parser encounters a start tag;

  • endElement() : Called when the parser encounters an end tag;

  • characterData() : Called when the parser encounters text in the tag.

3. The key to implementing recursive structure: building a tree-like array

Here is a complete code example:

 <?php

$xml = <<<XML
<catalog>
    <book id="1">
        <title>PHP Basics</title>
        <author>John Doe</author>
    </book>
    <book id="2">
        <title>Advanced PHP</title>
        <author>Jane Smith</author>
    </book>
</catalog>
XML;

$parser = xml_parser_create();
$dataStack = [];
$currentData = null;

// Start tag
function startElement($parser, $name, $attrs) {
    global $dataStack, $currentData;

    $element = [
        'tag' => $name,
        'attributes' => $attrs,
        'children' => [],
        'value' => ''
    ];

    if ($currentData !== null) {
        array_push($dataStack, $currentData);
    }

    $currentData = $element;
}

// End tag
function endElement($parser, $name) {
    global $dataStack, $currentData;

    if (!empty($dataStack)) {
        $parent = array_pop($dataStack);
        $parent['children'][] = $currentData;
        $currentData = $parent;
    }
}

// Text content
function characterData($parser, $data) {
    global $currentData;

    if (isset($currentData['value'])) {
        $currentData['value'] .= trim($data);
    }
}

xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");

if (!xml_parse($parser, $xml, true)) {
    die(sprintf("XML mistake: %s In the %d OK",
        xml_error_string(xml_get_error_code($parser)),
        xml_get_current_line_number($parser)));
}

xml_parser_free($parser);

// Print the final structure
print_r($currentData);
?>

4. Output structure description

After parsing, you will get a nested array structure similar to the following:

 Array
(
    [tag] => CATALOG
    [attributes] => Array()
    [children] => Array
        (
            [0] => Array
                (
                    [tag] => BOOK
                    [attributes] => Array ( [ID] => 1 )
                    [children] => Array
                        (
                            [0] => Array ( [tag] => TITLE [value] => PHP Basics )
                            [1] => Array ( [tag] => AUTHOR [value] => John Doe )
                        )
                )
            [1] => Array
                (
                    [tag] => BOOK
                    [attributes] => Array ( [ID] => 2 )
                    [children] => Array
                        (
                            [0] => Array ( [tag] => TITLE [value] => Advanced PHP )
                            [1] => Array ( [tag] => AUTHOR [value] => Jane Smith )
                        )
                )
        )
)

5. Convert parsed structure to JSON or other formats

You can easily convert the above array structure to JSON for front-end calls or interface output:

 echo json_encode($currentData, JSON_PRETTY_PRINT);

Six. Tips

  • If the XML document comes from the network, such as https://m66.net/data/books.xml , you can use file_get_contents() to get it: