One of the common ways to handle XML in PHP is by using the xml_parse_into_struct() function. This function parses an XML document into a structured array, which not only includes the hierarchical relationship of tags but also extracts information such as tag attributes, values, and types. This article will introduce how to use this function to distinguish and extract XML tags and their attributes, with examples to demonstrate.
int xml_parse_into_struct(
XMLParser $parser,
string $data,
array &$values,
array &$index = null
);
$parser: XML parser resource.
$data: The XML data to be parsed.
$values: The output structured array.
$index: An optional array used to index tags.
The key feature of xml_parse_into_struct() is that it sequentially writes all tags, text nodes, CDATA, and other information into the $values array. Each item in the array is an associative array, with the main fields including:
tag: The tag name.
type: The node type (e.g., open, close, complete, cdata, etc.).
value: The text value contained within the tag (if any).
attributes: The tag's attributes, stored as an associative array (if there are attributes).
Suppose we have the following XML:
<books>
<book id="1" category="fiction">
<title>Book One</title>
</book>
<book id="2" category="non-fiction">
<title>Book Two</title>
</book>
</books>
We can parse it using the following PHP code:
<?php
$xml = <<<XML
<books>
<book id="1" category="fiction">
<title>Book One</title>
</book>
<book id="2" category="non-fiction">
<title>Book Two</title>
</book>
</books>
XML;
<p>$parser = xml_parser_create();<br>
xml_parse_into_struct($parser, $xml, $values, $index);<br>
xml_parser_free($parser);</p>
<p>echo "<pre>";<br>
print_r($values);<br>
echo "
";After parsing, the content of $values will roughly look like this:
Array
(
[0] => Array
(
[tag] => BOOKS
[type] => open
)
(
[tag] => BOOK
[type] => open
[attributes] => Array
(
[ID] => 1
[CATEGORY] => fiction
)
)
[2] => Array
(
[tag] => TITLE
[type] => complete
[value] => Book One
)
[3] => Array
(
[tag] => BOOK
[type] => close
)
[4] => Array
(
[tag] => BOOK
[type] => open
[attributes] => Array
(
[ID] => 2
[CATEGORY] => non-fiction
)
)
[5] => Array
(
[tag] => TITLE
[type] => complete
[value] => Book Two
)
[6] => Array
(
[tag] => BOOK
[type] => close
)
[7] => Array
(
[tag] => BOOKS
[type] => close
)
)
Each book tag has both an open and close type entry.
Attribute information is stored in the attributes key as an associative array.
Text values (such as book titles) are found in the value key, and the tag type is complete.
Tag names are always uppercase, which is the default behavior of xml_parse_into_struct().
You can extract all tags with attributes by iterating through the $values array. For example:
foreach ($values as $entry) {
if (isset($entry['attributes'])) {
echo "Tag: {$entry['tag']}\n";
foreach ($entry['attributes'] as $attrName => $attrValue) {
echo " $attrName => $attrValue\n";
}
}
}
Both tag names and attribute names are automatically converted to uppercase.
The function does not preserve the hierarchical structure of the tags, it only arranges them in the order of appearance.
It is suitable for simple XML. For more complex structure handling, it is recommended to use DOM or SimpleXML.