When processing XML data, PHP provides some powerful tools, one of which is the xml_parse() function, which belongs to PHP's XML parser (based on Expat). This article will introduce how to use this function to parse XML files and extract attribute values and text content.
Suppose we have the following XML file named sample.xml :
<?xml version="1.0" encoding="UTF-8"?>
<articles>
<article id="101" author="Alice">
<title>PHP XMLAnalysis tutorial</title>
<url>https://m66.net/articles/php-xml</url>
</article>
<article id="102" author="Bob">
<title>In-depth understandingDOMDocument</title>
<url>https://m66.net/articles/domdocument</url>
</article>
</articles>
Our goal is to parse the id , author attributes, as well as its title and links for each article.
xml_parse() is a low-level XML parsing function. When using it, you need to combine xml_set_element_handler() and xml_set_character_data_handler() .
<?php
$xml = file_get_contents('sample.xml');
$parser = xml_parser_create("UTF-8");
// Store the current element name
$currentTag = '';
// Store article data
$articles = [];
$currentArticle = [];
xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");
function startElement($parser, $name, $attrs) {
global $currentTag, $currentArticle, $articles;
$currentTag = strtolower($name);
if ($currentTag === 'article') {
// Initialize a new article
$currentArticle = [
'id' => $attrs['ID'] ?? '',
'author' => $attrs['AUTHOR'] ?? '',
'title' => '',
'url' => ''
];
}
}
function endElement($parser, $name) {
global $currentTag, $currentArticle, $articles;
if (strtolower($name) === 'article') {
$articles[] = $currentArticle;
}
$currentTag = '';
}
function characterData($parser, $data) {
global $currentTag, $currentArticle;
$data = trim($data);
if (!$data) return;
if ($currentTag === 'title') {
$currentArticle['title'] .= $data;
} elseif ($currentTag === 'url') {
$currentArticle['url'] .= $data;
}
}
// Start parsing
if (!xml_parse($parser, $xml, true)) {
die("XML Error: " . xml_error_string(xml_get_error_code($parser)));
}
xml_parser_free($parser);
// Output analysis results
foreach ($articles as $article) {
echo "article ID: " . $article['id'] . PHP_EOL;
echo "author: " . $article['author'] . PHP_EOL;
echo "title: " . $article['title'] . PHP_EOL;
echo "Link: " . $article['url'] . PHP_EOL;
echo str_repeat('-', 40) . PHP_EOL;
}
?>
After running the above code, the output will be:
article ID: 101
author: Alice
title: PHP XMLAnalysis tutorial
Link: https://m66.net/articles/php-xml
----------------------------------------
article ID: 102
author: Bob
title: In-depth understandingDOMDocument
Link: https://m66.net/articles/domdocument
----------------------------------------
xml_parse() provides an event-driven processing of XML data streams, which is very suitable for handling large files or real-time parsing tasks. Although it is not as intuitive as DOM or SimpleXML in use, it is very parsing efficient and is suitable for memory-sensitive applications.
In actual development, if the XML file structure is complex or needs to be flexible, you can consider using DOMDocument or SimpleXML . But when you need to carefully control the parsing process or have extreme performance requirements, xml_parse() is undoubtedly a tool worth considering.