In web development, HTML and XML are commonly used data formats essential for content presentation and data exchange. PHP, beyond generating web pages, can also parse and manipulate these structures efficiently. This guide introduces how to use PHP's DOMDocument class to work with HTML and XML documents.
HTML files contain elements like tags, attributes, and text. PHP's built-in DOMDocument class can load and parse HTML documents easily. Here’s a simple example:
<?php $html = '<html><body><h1>Title</h1><p>Content</p></body></html>'; $dom = new DOMDocument(); $dom->loadHTML($html); echo $dom->saveHTML(); ?>
This code snippet loads an HTML string into a DOM object and then outputs it. You can manipulate the structure further as needed.
XML is a markup language used for structured data storage and transfer. DOMDocument also supports XML parsing:
<?php $xml = '<root><element1>Value 1</element1><element2>Value 2</element2></root>'; $dom = new DOMDocument(); $dom->loadXML($xml); echo $dom->saveXML(); ?>
This code loads the XML string into a DOM object and outputs it as a well-formatted XML document—useful for APIs or data processing tasks.
Using DOMDocument, you can extract specific elements such as titles and paragraph text:
<?php $html = '<html><body><h1>Title</h1><p>Content</p></body></html>'; $dom = new DOMDocument(); $dom->loadHTML($html); $title = $dom->getElementsByTagName('h1')->item(0)->nodeValue; $content = $dom->getElementsByTagName('p')->item(0)->nodeValue; echo "Title: " . $title . "<br>"; echo "Content: " . $content . "<br>"; ?>
This code accesses DOM elements by tag name and retrieves their textual content—ideal for content scraping or analysis.
You can also update the content of HTML elements. Here's how to change a heading:
<?php $html = '<html><body><h1>Title</h1><p>Content</p></body></html>'; $dom = new DOMDocument(); $dom->loadHTML($html); $title = $dom->getElementsByTagName('h1')->item(0); $title->nodeValue = 'New Title'; echo $dom->saveHTML(); ?>
This replaces the original heading content with "New Title" and outputs the modified HTML.
To insert new elements dynamically, you can create and append them to the DOM structure. For example, adding a subtitle:
<?php $html = '<html><body><h1>Title</h1><p>Content</p></body></html>'; $dom = new DOMDocument(); $dom->loadHTML($html); $newElement = $dom->createElement('h2', 'Subtitle'); $dom->getElementsByTagName('body')->item(0)->appendChild($newElement); echo $dom->saveHTML(); ?>
This adds an tag with the content “Subtitle” to the section of the HTML document.
PHP's DOMDocument class provides powerful tools to parse, read, edit, and extend HTML or XML content. Whether you're building a web scraper, templating engine, or working with structured data, DOMDocument is a valuable asset for any PHP developer.