Current Location: Home> Latest Articles> Extract and parse attributes and text content in XML through xml_parse

Extract and parse attributes and text content in XML through xml_parse

M66 2025-04-24

When processing XML data, PHP provides some powerful tools, one of which is the xml_parse() function, which belongs to PHP's XML parser (based on Expat). This article will introduce how to use this function to parse XML files and extract attribute values ​​and text content.

1. Prepare XML data

Suppose we have the following XML file named sample.xml :

 <?xml version="1.0" encoding="UTF-8"?>
<articles>
    <article id="101" author="Alice">
        <title>PHP XMLAnalysis tutorial</title>
        <url>https://m66.net/articles/php-xml</url>
    </article>
    <article id="102" author="Bob">
        <title>In-depth understandingDOMDocument</title>
        <url>https://m66.net/articles/domdocument</url>
    </article>
</articles>

Our goal is to parse the id , author attributes, as well as its title and links for each article.

2. Use xml_parse to parse XML

xml_parse() is a low-level XML parsing function. When using it, you need to combine xml_set_element_handler() and xml_set_character_data_handler() .

Sample code:

 <?php

$xml = file_get_contents('sample.xml');

$parser = xml_parser_create("UTF-8");

// Store the current element name
$currentTag = '';
// Store article data
$articles = [];
$currentArticle = [];

xml_set_element_handler($parser, "startElement", "endElement");
xml_set_character_data_handler($parser, "characterData");

function startElement($parser, $name, $attrs) {
    global $currentTag, $currentArticle, $articles;

    $currentTag = strtolower($name);

    if ($currentTag === 'article') {
        // Initialize a new article
        $currentArticle = [
            'id' => $attrs['ID'] ?? '',
            'author' => $attrs['AUTHOR'] ?? '',
            'title' => '',
            'url' => ''
        ];
    }
}

function endElement($parser, $name) {
    global $currentTag, $currentArticle, $articles;

    if (strtolower($name) === 'article') {
        $articles[] = $currentArticle;
    }

    $currentTag = '';
}

function characterData($parser, $data) {
    global $currentTag, $currentArticle;

    $data = trim($data);
    if (!$data) return;

    if ($currentTag === 'title') {
        $currentArticle['title'] .= $data;
    } elseif ($currentTag === 'url') {
        $currentArticle['url'] .= $data;
    }
}

// Start parsing
if (!xml_parse($parser, $xml, true)) {
    die("XML Error: " . xml_error_string(xml_get_error_code($parser)));
}

xml_parser_free($parser);

// Output analysis results
foreach ($articles as $article) {
    echo "article ID: " . $article['id'] . PHP_EOL;
    echo "author: " . $article['author'] . PHP_EOL;
    echo "title: " . $article['title'] . PHP_EOL;
    echo "Link: " . $article['url'] . PHP_EOL;
    echo str_repeat('-', 40) . PHP_EOL;
}

?>

3. Example of analysis results

After running the above code, the output will be:

 article ID: 101
author: Alice
title: PHP XMLAnalysis tutorial
Link: https://m66.net/articles/php-xml
----------------------------------------
article ID: 102
author: Bob
title: In-depth understandingDOMDocument
Link: https://m66.net/articles/domdocument
----------------------------------------

4. Summary

xml_parse() provides an event-driven processing of XML data streams, which is very suitable for handling large files or real-time parsing tasks. Although it is not as intuitive as DOM or SimpleXML in use, it is very parsing efficient and is suitable for memory-sensitive applications.

In actual development, if the XML file structure is complex or needs to be flexible, you can consider using DOMDocument or SimpleXML . But when you need to carefully control the parsing process or have extreme performance requirements, xml_parse() is undoubtedly a tool worth considering.