Current Location: Home> Latest Articles> A Practical Guide to Parsing HTML and XML with PHP

A Practical Guide to Parsing HTML and XML with PHP

M66 2025-07-01

Practical Methods for Parsing HTML and XML with PHP

In web development, HTML and XML are commonly used data formats essential for content presentation and data exchange. PHP, beyond generating web pages, can also parse and manipulate these structures efficiently. This guide introduces how to use PHP's DOMDocument class to work with HTML and XML documents.

Parsing HTML with PHP

HTML files contain elements like tags, attributes, and text. PHP's built-in DOMDocument class can load and parse HTML documents easily. Here’s a simple example:

<?php
$html = '<html><body><h1>Title</h1><p>Content</p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);

echo $dom->saveHTML();
?>

This code snippet loads an HTML string into a DOM object and then outputs it. You can manipulate the structure further as needed.

Parsing XML Files

XML is a markup language used for structured data storage and transfer. DOMDocument also supports XML parsing:

<?php
$xml = '<root><element1>Value 1</element1><element2>Value 2</element2></root>';

$dom = new DOMDocument();
$dom->loadXML($xml);

echo $dom->saveXML();
?>

This code loads the XML string into a DOM object and outputs it as a well-formatted XML document—useful for APIs or data processing tasks.

Extracting Content from HTML/XML

Using DOMDocument, you can extract specific elements such as titles and paragraph text:

<?php
$html = '<html><body><h1>Title</h1><p>Content</p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$title = $dom->getElementsByTagName('h1')->item(0)->nodeValue;
$content = $dom->getElementsByTagName('p')->item(0)->nodeValue;

echo "Title: " . $title . "<br>";
echo "Content: " . $content . "<br>";
?>

This code accesses DOM elements by tag name and retrieves their textual content—ideal for content scraping or analysis.

Modifying Elements in HTML

You can also update the content of HTML elements. Here's how to change a heading:

<?php
$html = '<html><body><h1>Title</h1><p>Content</p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$title = $dom->getElementsByTagName('h1')->item(0);
$title->nodeValue = 'New Title';

echo $dom->saveHTML();
?>

This replaces the original heading content with "New Title" and outputs the modified HTML.

Adding New Elements

To insert new elements dynamically, you can create and append them to the DOM structure. For example, adding a subtitle:

<?php
$html = '<html><body><h1>Title</h1><p>Content</p></body></html>';

$dom = new DOMDocument();
$dom->loadHTML($html);

$newElement = $dom->createElement('h2', 'Subtitle');
$dom->getElementsByTagName('body')->item(0)->appendChild($newElement);

echo $dom->saveHTML();
?>

This adds an

tag with the content “Subtitle” to the section of the HTML document.

Conclusion

PHP's DOMDocument class provides powerful tools to parse, read, edit, and extend HTML or XML content. Whether you're building a web scraper, templating engine, or working with structured data, DOMDocument is a valuable asset for any PHP developer.