Parse HTML5 Documents in PHP 8.4

Parse HTML5 Documents in PHP 8.4

Since PHP 8.4, the new DOM API offers standards-compliant support for parsing HTML5 documents, resolves several longstanding compliance issues in DOM functionality, and provides functions to simplify working with documents.

Using the old DOMDocument class, working with HTML documents involved a combination of DOM and XPath operations. For example, to extract elements with specific attributes, you would first load the HTML into a DOMDocument instance using the loadHTML method. Next, an DOMXPath object was created to perform queries against the DOM tree.

<?php

$html = '<main>
    <div><label>Name</label><span class="name">John</span></div>
    <div><label>Name</label><span class="name">Patrick</span></div>
</main>';

$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_NOERROR);

$xpath = new DOMXPath($dom);
$nodes = $xpath->query('//span[@class="name"]');
echo $nodes[0]->nodeValue; // John

Using the new Dom\HTMLDocument class introduced in PHP 8.4, working with HTML documents becomes much more streamlined and intuitive. HTML5 compliance ensures that documents are parsed correctly. The createFromString method simplifies loading HTML content directly, while the built-in querySelectorAll method provides a modern and user-friendly way to select elements using CSS selectors. This eliminates the need for verbose XPath expressions and makes the code more readable and maintainable.

<?php

use Dom\HTMLDocument;

$html = '<main>
    <div><label>Name</label><span class="name">John</span></div>
    <div><label>Name</label><span class="name">Patrick</span></div>
</main>';

$dom = HTMLDocument::createFromString($html, LIBXML_NOERROR);

$nodes = $dom->querySelectorAll('.name');
echo $nodes[0]->textContent; // John

Leave a Comment

Cancel reply

Your email address will not be published.