Current Location: Home> Latest Articles> Practical Guide to Parsing HTML Pages with PHP Simple HTML DOM Parser

Practical Guide to Parsing HTML Pages with PHP Simple HTML DOM Parser

M66 2025-06-24

Simple Way to Parse HTML Pages with PHP

In web development, it is often necessary to extract structured data from HTML pages for display, storage, or analysis. With the help of open-source tools, we can significantly simplify this process. PHP Simple HTML DOM Parser is one such powerful and easy-to-use library, and this article will walk you through how to use it step by step.

What is PHP Simple HTML DOM Parser?

PHP Simple HTML DOM Parser is a lightweight HTML parsing library that allows developers to access HTML elements in a document using CSS-like selectors. Its syntax is similar to jQuery, which means it has a low learning curve and is suitable for various web data extraction tasks.

Step 1: Download and Include the Library

First, you need to download the latest version of the library from its official source. Once downloaded, place it into your PHP project directory and include it like this:

require('simple_html_dom.php');

Step 2: Load HTML Page Content

Once the library is included, you can use the file_get_html() function to load the web page content. This function supports both remote URLs and local HTML file paths:

$html = file_get_html('http://www.example.com');

Step 3: Extract HTML Elements from the Web Page

After loading the HTML, you can use CSS selectors to find and manipulate DOM nodes. Here are a few common operations:

Find Specific Tags

For example, to get all elements:

$elements = $html->find('span');

Get Element Attributes

To read an element's attribute value, such as getting the href value of the first link:

$url = $elements[0]->getAttribute('href');

Get Element Text Content

You can access the plain text content within a tag using the innertext property, for example:

  
foreach ($elements as $element) {  
    $text = $element->innertext;  
    echo $text;  
}  

Step 4: Release DOM Resources

After completing the operations, it is recommended to clean up the resources to free memory:

$html->clear();

Complete Example Code

Here is a full example of HTML parsing code:

  
require('simple_html_dom.php');  
<p>$html = file_get_html('<a rel="noopener" target="_new" class="" href="http://www.example.com&#039">http://www.example.com&#039</a>;);</p>
<p>$elements = $html->find('span');</p>
<p>// Get the URL attribute of the first link<br>
$url = $elements[0]->getAttribute('href');<br>
echo $url;</p>
<p>// Get the text content of all titles<br>
foreach ($elements as $element) {<br>
$text = $element->innertext;<br>
echo $text;<br>
}</p>
<p>$html->clear();<br>

Conclusion

With PHP Simple HTML DOM Parser, you can easily parse HTML pages into structured data without the need for complex regular expressions. Its simple and intuitive API is perfect for quickly developing web scrapers or data extraction scripts. By following the steps and examples in this article, you can easily get started with the library and improve your HTML processing efficiency.