DOM Parser Example

The DOM extension in PHP comes with extensive functionality with which we can perform various operations on XML and HTML documents. We can dynamically construct a DOM object, load a DOM document from a HTML file or a string with HTML tag tree. We can also save the DOM document to a XML file, or extract the DOM tree from a XML document.

The DOMDocument class is one the most important classes defined in the DOM extension.

$obj=newDOMDocument($version="1.0",$encoding="")

It represents an entire HTML or XML document; serves as the root of the document tree. The DOMDocument class includes definitions of a number of static methods, some of which are introduced here −

Sr.NoMethods & Description
1createElementCreate new element node
2createAttributeCreate new attribute
3createTextNodeCreate new text node
4getElementByIdSearches for an element with a certain id
5getElementsByTagNameSearches for all elements with given local tag name
6loadLoad XML from a file
7loadHTMLLoad HTML from a string
8loadHTMLFileLoad HTML from a file
9loadXMLLoad XML from a string
10saveDumps the internal XML tree back into a file
11saveHTMLDumps the internal document into a string using HTML formatting
12saveHTMLFileDumps the internal document into a file using HTML formatting
13saveXMLDumps the internal XML tree back into a string

Example

Let us use the following HTML file for this example −

Open Compiler

<html><head><title>Tutorialspoint</title></head><body><h2>Course details</h2><table border = "0"><tbody><tr><td>Android</td><td>Gopal</td><td>Sairam</td></tr><tr><td>Hadoop</td><td>Gopal</td><td>Satish</td></tr><tr><td>HTML</td><td>Gopal</td><td>Raju</td></tr><tr><td>Web technologies</td><td>Gopal</td><td>Javed</td></tr><tr><td>Graphic</td><td>Gopal</td><td>Satish</td></tr><tr><td>Writer</td><td>Kiran</td><td>Amith</td></tr><tr><td>Writer</td><td>Kiran</td><td>Vineeth</td></tr></tbody></table></body></html>

We shall now extract the Document Object Model from the above HTML file by calling the loadHTMLFile() method in the following PHP code −

<?php 

   /*** a new dom object ***/ 
   $dom = new domDocument; 

   /*** load the html into the object ***/ 
   $dom->loadHTMLFile("hello.html");

   /*** discard white space ***/ 
   $dom->preserveWhiteSpace = false; 

   /*** the table by its tag name ***/ 
   $tables = $dom->getElementsByTagName('table'); 

   /*** get all rows from the table ***/ 
   $rows = $tables[0]->getElementsByTagName('tr'); 

   /*** loop over the table rows ***/ 
   foreach ($rows as $row) {
   
      /*** get each column by tag name ***/ 
      $cols = $row->getElementsByTagName('td'); 

      /*** echo the values ***/ 
      echo 'Designation: '.$cols->item(0)->nodeValue.'<br />'; 
      echo 'Manager: '.$cols->item(1)->nodeValue.'<br />'; 
      echo 'Team: '.$cols->item(2)->nodeValue; 
      echo '<hr />'; 
   }
   
?>

It will produce the following output −

Designation: Android
Manager: Gopal
Team: Sairam
________________________________________
Designation: Hadoop
Manager: Gopal
Team: Satish
________________________________________
Designation: HTML
Manager: Gopal
Team: Raju
________________________________________
Designation: Web technologies
Manager: Gopal
Team: Javed
________________________________________
Designation: Graphic
Manager: Gopal
Team: Satish
________________________________________
Designation: Writer
Manager: Kiran
Team: Amith
________________________________________
Designation: Writer
Manager: Kiran
Team: Vineeth
________________________________________

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *