PHP's built-in XML parser (based on the Expat library) is a very powerful tool when dealing with complex XML data. Through xml_parser_create() and the accompanying processing functions, we can flexibly parse the XML structure. Especially when you are facing XML documents with deep nesting levels and many elements, a custom element handler will significantly improve parsing efficiency and readability.
This article will explain in detail how to use xml_set_element_handler() to customize the XML element processor and parse a complex XML data structure with sample code.
When parsing XML streams using xml_parse() , we can register two callback functions for the parser via xml_set_element_handler() :
startElementHandler : The callback function to start the tag
endElementHandler : callback function for ending tags
The signatures of these two functions are usually as follows:
function startElement($parser, $name, $attrs)
function endElement($parser, $name)
where $name is the name of the current node and $attrs is the associative array, representing the attributes of the node.
Suppose we get XML data in the following format from an API:
<catalog>
<book id="001">
<title>PHP Development practice</title>
<author>Zhang San</author>
<price currency="CNY">89.00</price>
</book>
<book id="002">
<title>In-depth understanding XML</title>
<author>Li Si</author>
<price currency="CNY">75.50</price>
</book>
</catalog>
We will write a parser that extracts the title, author, and price information for each book and outputs it.
<?php
$xmlData = file_get_contents('https://m66.net/api/books.xml');
// Used to store parsed results
$books = [];
$currentBook = [];
$currentTag = "";
// create XML Parser
$parser = xml_parser_create("UTF-8");
// Set the processing functions for the start and end tags
xml_set_element_handler($parser, "startElement", "endElement");
// Set character data processing function
xml_set_character_data_handler($parser, "characterData");
// 设置Parser参数
xml_parser_set_option($parser, XML_OPTION_CASE_FOLDING, false); // Keep label case consistent
// Define processing functions
function startElement($parser, $name, $attrs) {
global $currentBook, $currentTag;
$currentTag = $name;
if ($name == "book") {
$currentBook = [
"id" => $attrs['id'] ?? null,
"title" => "",
"author" => "",
"price" => "",
"currency" => ""
];
}
if ($name == "price" && isset($attrs['currency'])) {
$currentBook['currency'] = $attrs['currency'];
}
}
function endElement($parser, $name) {
global $books, $currentBook, $currentTag;
if ($name == "book") {
$books[] = $currentBook;
$currentBook = [];
}
$currentTag = "";
}
function characterData($parser, $data) {
global $currentBook, $currentTag;
$data = trim($data);
if (empty($data)) return;
switch ($currentTag) {
case "title":
$currentBook["title"] .= $data;
break;
case "author":
$currentBook["author"] .= $data;
break;
case "price":
$currentBook["price"] .= $data;
break;
}
}
// Execute parsing
if (!xml_parse($parser, $xmlData, true)) {
die(sprintf("XML mistake: %s In the %d OK",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser)));
}
xml_parser_free($parser);
// Output analysis results
foreach ($books as $book) {
echo "Book title: {$book['title']}\n";
echo "author: {$book['author']}\n";
echo "price: {$book['price']} {$book['currency']}\n";
echo "------------------------\n";
}
Tracking context using status variables <br> State variables like $currentTag and $currentBook are very critical when nested deeply, and can help you determine which node you are currently in.
Filter whitespace characters
CharacterData may receive a large number of newlines and spaces, and you need to trim() to determine whether it is empty.
Avoid repeated assignments <br> Some tag content may be returned in multiple segments (especially long text), and using .= splicing can prevent data truncation.
Using namespace to process complex XML
If XML uses a namespace, it is recommended to use advanced APIs such as xml_set_start_namespace_decl_handler() to cooperate with parsing.