When doing web development, we often need to process XML data, such as data responses obtained from third-party APIs, or parse certain configuration files. In PHP, xml_parse and xml_set_element_handler provide an event-driven way to process XML data, especially suitable for processing large files or streaming parsing.
This article will use a practical case to demonstrate how to use these two functions to parse XML content and extract the data we need.
There are many ways to process XML in PHP, such as using SimpleXML , DOMDocument , XMLReader , etc. But xml_parse is an event-based parsing method, also called the SAX (Simple API for XML) parser. This parsing method is great for handling large volume XML because it does not require the entire XML to be loaded into memory.
xml_parser_create() : Create an XML parser.
xml_set_element_handler() : Sets the callback function that handles the start and end tags.
xml_parse() : Start parsing XML.
xml_parser_free() : Release parser resources.
Let's assume that there is such a piece of XML data from a content platform:
<?xml version="1.0"?>
<articles>
<article>
<title>PHP XMLAnalysis Guide</title>
<author>Li Lei</author>
<url>https://m66.net/articles/php-xml-guide</url>
</article>
<article>
<title>In-depth understanding xml_set_element_handler</title>
<author>Han Meimei</author>
<url>https://m66.net/articles/xml-handler-deepdive</url>
</article>
</articles>
Goal: We want to extract the titles and links of all articles and save them in an array.
<?php
$xmlData = <<<XML
<?xml version="1.0"?>
<articles>
<article>
<title>PHP XMLAnalysis Guide</title>
<author>Li Lei</author>
<url>https://m66.net/articles/php-xml-guide</url>
</article>
<article>
<title>In-depth understanding xml_set_element_handler</title>
<author>Han Meimei</author>
<url>https://m66.net/articles/xml-handler-deepdive</url>
</article>
</articles>
XML;
$parser = xml_parser_create("UTF-8");
$articles = [];
$currentTag = "";
$currentArticle = [];
// Define the start tag processor
function startElement($parser, $name, $attrs) {
global $currentTag;
$currentTag = strtolower($name);
}
// Define the end tag processor
function endElement($parser, $name) {
global $currentTag, $currentArticle, $articles;
if (strtolower($name) == 'article') {
$articles[] = $currentArticle;
$currentArticle = [];
}
$currentTag = "";
}
// Define character data processor
function characterData($parser, $data) {
global $currentTag, $currentArticle;
$data = trim($data);
if ($data === '') return;
if (in_array($currentTag, ['title', 'author', 'url'])) {
$currentArticle[$currentTag] = $data;
}
}
// Setting up processing functions
xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");
// Start parsing
if (!xml_parse($parser, $xmlData, true)) {
die(sprintf("XML mistake: %s In the %d OK",
xml_error_string(xml_get_error_code($parser)),
xml_get_current_line_number($parser)));
}
xml_parser_free($parser);
// Output analysis results
echo "<pre>";
print_r($articles);
echo "</pre>";
?>
Array
(
[0] => Array
(
[title] => PHP XMLAnalysis Guide
[author] => Li Lei
[url] => https://m66.net/articles/php-xml-guide
)
[1] => Array
(
[title] => In-depth understanding xml_set_element_handler
[author] => Han Meimei
[url] => https://m66.net/articles/xml-handler-deepdive
)
)
Through this case, we can see that xml_parse and xml_set_element_handler can allow us to process every tag and text content as needed. Although this method is slightly more complex than SimpleXML , its advantages appear when the XML file is very large or comes from a network stream.
Applicable to the following scenarios:
Memory-sensitive processing of large XML files;
Parse XML data in network streams in real time;
Scenarios where custom fine control of XML tags are required.
If you are building a system that needs to deal with complex XML structures, it is recommended to try this SAX parsing method, which is often more efficient and controllable than you think.