PHP Simple HTML DOM Parser: Manual in simplehtmldom API 5.2
Same filename and directory in other branches
Index
- Quick Start
- How to create HTML DOM object?
- How to find HTML elements?
- How to access the HTML element's attributes?
- How to traverse the DOM tree?
- How to dump contents of DOM object?
- How to customize the parsing behavior?
- API Reference
- FAQ
Quick Start
Top
// Create DOM from URL or file
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
$html = file_get_html('http://www.google.com/');
// Find all images
foreach($html->find('img') as $element)
echo $element->src . '<br>';
// Find all links
foreach($html->find('a') as $element)
echo $element->href . '<br>';
// Create DOM from string
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>
$html = str_get_html('<div id="hello">Hello</div><div id="world">World</div>');
$html->find('div', 1)->class = 'bar';
$html->find('div[id=hello]', 0)->innertext = 'foo';
echo $html; // Output: <div id="hello">foo</div><div id="world" class="bar">World</div>
// Create DOM from URL
$html = file_get_html('http://slashdot.org/');
// Find all article blocks
foreach($html->find('div.article') as $article) {
$item['title'] = $article->find('div.title', 0)->plaintext;
$item['intro'] = $article->find('div.intro', 0)->plaintext;
$item['details'] = $article->find('div.details', 0)->plaintext;
$articles[] = $item;
}
print_r($articles);
$html = file_get_html('http://slashdot.org/');
// Find all article blocks
foreach($html->find('div.article') as $article) {
$item['title'] = $article->find('div.title', 0)->plaintext;
$item['intro'] = $article->find('div.intro', 0)->plaintext;
$item['details'] = $article->find('div.details', 0)->plaintext;
$articles[] = $item;
}
print_r($articles);
How to create HTML DOM object?
Top// Create a DOM object from a string
$html = str_get_html('<html><body>Hello!</body></html>');
// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');
// Create a DOM object from a HTML file
$html = file_get_html('test.htm');
$html = str_get_html('<html><body>Hello!</body></html>');
// Create a DOM object from a URL
$html = file_get_html('http://www.google.com/');
// Create a DOM object from a HTML file
$html = file_get_html('test.htm');
// Create a DOM object
$html = new simple_html_dom();
// Load HTML from a string
$html->load('<html><body>Hello!</body></html>');
// Load HTML from a URL
$html->load_file('http://www.google.com/');
// Load HTML from a HTML file
$html->load_file('test.htm');
$html = new simple_html_dom();
// Load HTML from a string
$html->load('<html><body>Hello!</body></html>');
// Load HTML from a URL
$html->load_file('http://www.google.com/');
// Load HTML from a HTML file
$html->load_file('test.htm');
How to find HTML elements?
Top // Find all anchors, returns a array of element objects
$ret = $html->find('a');
// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);
// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', -1);
// Find all <div> with the id attribute
$ret = $html->find('div[id]');
// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]');
$ret = $html->find('a');
// Find (N)th anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', 0);
// Find lastest anchor, returns element object or null if not found (zero based)
$ret = $html->find('a', -1);
// Find all <div> with the id attribute
$ret = $html->find('div[id]');
// Find all <div> which attribute id=foo
$ret = $html->find('div[id=foo]');
// Find all element which id=foo
$ret = $html->find('#foo');
// Find all element which class=foo
$ret = $html->find('.foo');
// Find all element has attribute id
$ret = $html->find('*[id]');
// Find all anchors and images
$ret = $html->find('a, img');
// Find all anchors and images with the "title" attribute
$ret = $html->find('a[title], img[title]');
$ret = $html->find('#foo');
// Find all element which class=foo
$ret = $html->find('.foo');
// Find all element has attribute id
$ret = $html->find('*[id]');
// Find all anchors and images
$ret = $html->find('a, img');
// Find all anchors and images with the "title" attribute
$ret = $html->find('a[title], img[title]');
Supports these operators in attribute selectors:
Filter | Description |
---|---|
[attribute] | Matches elements that have the specified attribute. |
[!attribute] | Matches elements that don't have the specified attribute. |
[attribute=value] | Matches elements that have the specified attribute with a certain value. |
[attribute!=value] | Matches elements that don't have the specified attribute with a certain value. |
[attribute^=value] | Matches elements that have the specified attribute and it starts with a certain value. |
[attribute$=value] | Matches elements that have the specified attribute and it ends with a certain value. |
[attribute*=value] | Matches elements that have the specified attribute and it contains a certain value. |
// Find all <li> in <ul>
$es = $html->find('ul li');
// Find Nested <div> tags
$es = $html->find('div div div');
// Find all <td> in <table> which class=hello
$es = $html->find('table.hello td');
// Find all td tags with attribite align=center in table tags
$es = $html->find(''table td[align=center]');
$es = $html->find('ul li');
// Find Nested <div> tags
$es = $html->find('div div div');
// Find all <td> in <table> which class=hello
$es = $html->find('table.hello td');
// Find all td tags with attribite align=center in table tags
$es = $html->find(''table td[align=center]');
// Find all text blocks
$es = $html->find('text');
// Find all comment (<!--...-->) blocks
$es = $html->find('comment');
$es = $html->find('text');
// Find all comment (<!--...-->) blocks
$es = $html->find('comment');
// Find all <li> in <ul>
foreach($html->find('ul') as $ul)
{
foreach($ul->find('li') as $li)
{
// do something...
}
}
// Find first <li> in first <ul>
$e = $html->find('ul', 0)->find('li', 0);
foreach($html->find('ul') as $ul)
{
foreach($ul->find('li') as $li)
{
// do something...
}
}
// Find first <li> in first <ul>
$e = $html->find('ul', 0)->find('li', 0);
How to access the HTML element's attributes?
Top
// Get a attribute ( If the attribute is non-value attribute (eg. checked, selected...), it will returns true or false)
$value = $e->href;
// Set a attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's value as true or false)
$e->href = 'my link';
// Remove a attribute, set it's value as null!
$e->href = null;
// Determine whether a attribute exist?
if(isset($e->href))
echo 'href exist!';
$value = $e->href;
// Set a attribute(If the attribute is non-value attribute (eg. checked, selected...), set it's value as true or false)
$e->href = 'my link';
// Remove a attribute, set it's value as null!
$e->href = null;
// Determine whether a attribute exist?
if(isset($e->href))
echo 'href exist!';
// Example
$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);
echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"
$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);
echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"
Attribute Name | Usage |
---|---|
$e->tag | Read or write the tag name of element. |
$e->outertext | Read or write the outer HTML text of element. |
$e->innertext | Read or write the inner HTML text of element. |
$e->plaintext | Read or write the plain text of element. |
// Extract contents from HTML
echo $html->plaintext;
// Wrap a element
$e->outertext = '<div class="wrap">' . $e->outertext . '<div>';
// Remove a element, set it's outertext as an empty string
$e->outertext = '';
// Append a element
$e->outertext = $e->outertext . '<div>foo<div>';
// Insert a element
$e->outertext = '<div>foo<div>' . $e->outertext;
echo $html->plaintext;
// Wrap a element
$e->outertext = '<div class="wrap">' . $e->outertext . '<div>';
// Remove a element, set it's outertext as an empty string
$e->outertext = '';
// Append a element
$e->outertext = $e->outertext . '<div>foo<div>';
// Insert a element
$e->outertext = '<div>foo<div>' . $e->outertext;
How to traverse the DOM tree?
Top // If you are not so familiar with HTML DOM, check this link to learn more...
// Example
echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;
// or
echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');
// Example
echo $html->find("#div1", 0)->children(1)->children(1)->children(2)->id;
// or
echo $html->getElementById("div1")->childNodes(1)->childNodes(1)->childNodes(2)->getAttribute('id');
You can also call methods with Camel naming convertions.
Method | Description |
---|---|
mixed $e->children ( [int $index] ) |
Returns the Nth child object if index is set, otherwise return an array of children. |
element $e->parent () |
Returns the parent of element. |
element $e->first_child () |
Returns the first child of element, or null if not found. |
element $e->last_child () |
Returns the last child of element, or null if not found. |
element $e->next_sibling () |
Returns the next sibling of element, or null if not found. |
element $e->prev_sibling () |
Returns the previous sibling of element, or null if not found. |
How to dump contents of DOM object?
Top// Dumps the internal DOM tree back into string
$str = $html->save();
// Dumps the internal DOM tree back into a file
$html->save('result.htm');
$str = $html->save();
// Dumps the internal DOM tree back into a file
$html->save('result.htm');
// Dumps the internal DOM tree back into string
$str = $html;
// Print it!
echo $html;
$str = $html;
// Print it!
echo $html;
How to customize the parsing behavior?
Top// Write a function with parameter "$element"
function my_callback($element) {
// Hide all <b> tags
if ($element->tag=='b')
$element->outertext = '';
}
// Register the callback function with it's function name
$html->set_callback('my_callback');
// Callback function will be invoked while dumping
echo $html;
function my_callback($element) {
// Hide all <b> tags
if ($element->tag=='b')
$element->outertext = '';
}
// Register the callback function with it's function name
$html->set_callback('my_callback');
// Callback function will be invoked while dumping
echo $html;
Author: S.C. Chen (me578022@gmail.com)
Original idea is from Jose Solorzano's HTML Parser for PHP 4.
Contributions by: Contributions by: Yousuke Kumakura, Vadim Voituk, Antcs
File
simplehtmldom/manual/manual.htmView source
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> <html> <head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> <title>PHP Simple HTML DOM Parser: Manual</title> <link href="css/default.css" rel="stylesheet" type="text/css"> <link rel="stylesheet" href="css/ui.tabs.css" type="text/css" media="print, projection, screen"> <script type="text/javascript" src="js/jquery-1.2.3.pack.js"></script> <script type="text/javascript" src="js/ui.tabs.pack.js"></script> <script language="JavaScript" type="text/JavaScript"> $(document).ready(function(){ $(function() {$('#container_quickstart > ul').tabs();}); $(function() {$('#container_create > ul').tabs();}); $(function() {$('#container_find > ul').tabs();}); $(function() {$('#container_access > ul').tabs();}); $(function() {$('#container_traverse > ul').tabs();}); $(function() {$('#container_dump > ul').tabs();}); $(function() {$('#container_callback > ul').tabs();}); }); </script> </head> <body> <h1><a name="top"></a>PHP Simple HTML DOM Parser Manual</h1> <div id="content"> <h2>Index</h2> <ul> <li><a href="#section_quickstart">Quick Start</a></li> <li><a href="#section_create">How to create HTML DOM object?</a></li> <li><a href="#section_find">How to find HTML elements?</a></li> <li><a href="#section_access">How to access the HTML element's attributes?</a> </li> <li><a href="#section_traverse">How to traverse the DOM tree?</a></li> <li><a href="#section_dump">How to dump contents of DOM object?</a></li> <li><a href="#section_callback">How to customize the parsing behavior?</a></li> <li><a href="manual_api.htm">API Reference</a></li> <li><a href="manual_faq.htm">FAQ</a></li> </ul> <a name="section_quickstart"></a> <h2>Quick Start</h2> <a class="top" href="#top">Top</a> <div id="container_quickstart"> <ul> <li><a href="#fragment-11"><span>Get HTML elements</span></a></li> <li><a href="#fragment-12"><span>Modify HTML elements</span></a></li> <li><a href="#fragment-13"><span>Extract contents from HTML</span></a></li> <li><a href="#fragment-14"><span>Scraping Slashdot!</span></a></li> </ul> <div id="fragment-11"> <div class="code"> <span class="comment">// Create DOM from URL or file</span><br> $html = <strong>file_get_html</strong>(<span class="var">'http://www.google.com/'</span>);<br> <br> <span class="comment">// Find all images </span><br> foreach($html-><strong>find</strong>(<span class="var">'img'</span>) as $element) <br> echo $element-><strong>src</strong> . <span class="var">'<br>'</span>;<br> <br> <span class="comment">// Find all links </span><br> foreach($html-><strong>find</strong>(<span class="var">'a'</span>) as $element) <br> echo $element-><strong>href</strong> . <span class="var">'<br>'</span>; </div> </div> <div id="fragment-12"> <div class="code"> <span class="comment">// Create DOM from string</span><br> $html = <strong>str_get_html</strong>(<span class="var">'<div id="hello">Hello</div><div id="world">World</div>'</span>);<span class="comment"><br> <br> </span> $html-><strong>find</strong>(<span class="var">'div', 1</span>)-><strong>class</strong> = <span class="var">'bar'</span>;<br> <br> $html-><strong>find</strong>(<span class="var">'div[id=hello]', 0</span>)-><strong>innertext</strong> = <span class="var">'foo'</span>;<br> <br> echo $html; <span class="comment">// Output: <div id="hello"><strong>foo</strong></div><div id="world" <strong>class="bar"</strong>>World</div></span> </div> </div> <div id="fragment-13"> <div class="code"><br> <span class="comment">// Dump contents (without tags) from HTML</span><br> echo <strong>file_get_html</strong>(<span class="var">'http://www.google.com/'</span>)-><strong>plaintext</strong>; <br> <br> </div> </div> <div id="fragment-14"> <div class="code"> <span class="comment">// Create DOM from URL</span><br> $html = <strong>file_get_html</strong>(<span class="var">'http://slashdot.org/'</span>);<br> <br> <span class="comment">// Find all article blocks</span><br> foreach($html-><strong>find</strong>(<span class="var">'div.article'</span>) as $article) {<br> $item[<span class="var">'title'</span>] = $article-><strong>find</strong>(<span class="var">'div.title'</span>, <span class="var">0</span>)-><strong>plaintext</strong>;<br> $item[<span class="var">'intro'</span>] = $article-><strong>find</strong>(<span class="var">'div.intro'</span>, <span class="var">0</span>)-><strong>plaintext</strong>;<br> $item[<span class="var">'details'</span>] = $article-><strong>find</strong>(<span class="var">'div.details'</span>, <span class="var">0</span>)-><strong>plaintext</strong>;<br> $articles[] = $item;<br> }<br> <br> print_r($articles); </div> </div> </div> <a name="section_create"></a> <h2>How to create HTML DOM object?</h2> <a class="top" href="#top">Top</a> <div id="container_create"> <ul> <li><a href="#frag_create_quick"><span>Quick way</span></a></li> <li><a href="#frag_create_oo"><span>Object-oriented way</span></a></li> </ul> <div id="frag_create_quick"> <div class="code"><span class="comment">// Create a DOM object from a string</span><br> $html = <strong>str_get_html</strong>(<span class="var">'<html><body>Hello!</body></html>'</span>);<br> <br> <span class="comment">// Create a DOM object from a URL</span><br> $html = <strong>file_get_html</strong>(<span class="var">'http://www.google.com/'</span>);<br> <br> <span class="comment">// Create a DOM object from a HTML file</span><br> $html = <strong>file_get_html</strong>(<span class="var">'test.htm'</span>);<span class="comment"><br> </span></div> </div> <div id="frag_create_oo"> <div class="code"><span class="comment">// Create a DOM object</span><br> $html = new <strong>simple_html_dom</strong>();<br> <br> <span class="comment">// Load HTML from a string</span><br> $html-><strong>load</strong>(<span class="var">'<html><body>Hello!</body></html>'</span>);<br> <br> <span class="comment">// Load HTML from a URL </span> <br> $html-><strong>load_file</strong>(<span class="var">'http://www.google.com/'</span>);<br> <br> <span class="comment">// Load HTML from a HTML file</span> <br> $html-><strong>load_file</strong>(<span class="var">'test.htm'</span>);</div> </div> </div> <a name="section_find"></a> <h2>How to find HTML elements?</h2> <a class="top" href="#top">Top</a> <div id="container_find"> <ul> <li><a href="#frag_find_basic"><span>Basics</span></a></li> <li><a href="#frag_find_advanced"><span>Advanced</span></a></li> <li><a href="#frag_find_chain"><span>Descendant selectors</span></a></li> <li><a href="#frag_find_nested"><span>Nested selectors</span></a></li> <li><a href="#frag_find_attr"><span>Attribute Filters</span></a></li> <li><a href="#frag_find_textcomment"><span>Text & Comments</span></a></li> </ul> <div id="frag_find_basic"> <div class="code"> <span class="comment">// Find all <strong>anchors</strong>, returns a <strong>array</strong> of element objects</span><br> $ret = $html->find(<span class="var">'<strong>a</strong>'</span>);<br> <br> <span class="comment">// Find <strong>(N)th</strong> <strong>anchor</strong>, returns element object or <strong>null</strong> if not found</span> <span class="comment">(zero based)</span><br> $ret = $html->find(<span class="var">'<strong>a</strong>', <strong>0</strong></span>);<br> <br> <span class="comment">// Find <strong>lastest</strong> <strong>anchor</strong>, returns element object or <strong>null</strong> if not found</span> <span class="comment">(zero based)</span><br> $ret = $html->find(<span class="var">'<strong>a</strong>', <strong>-1</strong></span>); <br> <br> <span class="comment">// Find all <strong><div></strong> with the <strong>id</strong> attribute</span><br> $ret = $html->find(<span class="var">'<strong>div[id]</strong>'</span>);<br> <br> <span class="comment">// Find all <strong><div></strong> which attribute <strong>id=foo</strong></span><br> $ret = $html->find(<span class="var">'<strong>div[id=foo]</strong>'</span>); <br> </div> </div> <div id="frag_find_advanced"> <div class="code"><span class="comment">// Find all element which <strong>id</strong>=foo</span><br> $ret = $html->find(<span class="var">'<strong>#foo</strong>'</span>);<br> <br> <span class=comment>// Find all element which <strong>class</strong>=foo</span><br> $ret = $html->find(<span class=var>'<strong>.foo</strong>'</span>);<br> <br> <span class="comment">// Find all element has attribute<strong> id</strong></span><br> $ret = $html->find(<span class="var">'<strong>*[id]</strong>'</span>); <br> <br> <span class="comment">// Find all <strong>anchors</strong> and <strong>images</strong> </span><br> $ret = $html->find(<span class="var">'<strong>a, img</strong>'</span>); <br> <br> <span class="comment">// Find all <strong>anchors</strong> and <strong>images</strong> with the "title" attribute</span><br> $ret = $html->find(<span class="var">'<strong>a[title], img[title]</strong>'</span>);<br> </div> </div> <div id="frag_find_attr"> <div class="code"> Supports these operators in attribute selectors:<br><br> <table cellpadding="1" cellspacing="1"> <tr> <th width="25%">Filter</th> <th width="75%">Description</th> </tr> <tr> <td>[attribute]</td> <td>Matches elements that <strong>have</strong> the specified attribute.</td> </tr> <tr> <td>[!attribute]</td> <td>Matches elements that <strong>don't have</strong> the specified attribute.</td> </tr> <tr> <td>[attribute=value]</td> <td>Matches elements that have the specified attribute with a <strong>certain value</strong>.</td> </tr> <tr> <td>[attribute!=value]</td> <td>Matches elements that <strong>don't have</strong> the specified attribute with a certain value.</td> </tr> <tr> <td>[attribute^=value]</td> <td>Matches elements that have the specified attribute and it <strong>starts</strong> with a certain value.</td> </tr> <tr> <td>[attribute$=value]</td> <td>Matches elements that have the specified attribute and it <strong>ends</strong> with a certain value.</td> </tr> <tr> <td>[attribute*=value]</td> <td>Matches elements that have the specified attribute and it <strong>contains</strong> a certain value.</td> </tr> </table> </div> </div> <div id="frag_find_chain"> <div class="code"><span class="comment">// Find all <li> in <ul> </span><br> $es = $html->find(<span class="var">'<strong>ul li</strong>'</span>);<br> <br> <span class="comment">// Find Nested <div> </span><span class="comment">tags</span><br> $es = $html->find(<span class="var">'<strong>div div div</strong>'</span>); <br> <br> <span class="comment">// Find all <td> in <table> which class=hello </span><br> $es = $html->find(<span class="var">'<strong>table.hello td</strong>'</span>);<br> <br> <span class="comment">// Find all td tags with attribite align=center in table tags </span><br> $es = $html->find(<span class="var">''<strong>table</strong><strong> td[align=center]</strong>'</span>);<br> </div> </div> <div id="frag_find_textcomment"> <div class="code"><span class="comment"> // Find all text blocks </span><br> $es = $html->find(<span class="var">'<strong>text</strong>'</span>);<br> <br> <span class="comment">// Find all comment (<!--...-->) blocks </span><br> $es = $html->find(<span class="var">'<strong>comment</strong>'</span>);<br> </div> </div> <div id="frag_find_nested"> <div class="code"> <span class="comment">// Find all <li> in <ul> </span><br> foreach($html->find(<span class="var">'<strong>ul</strong>'</span>) as $ul) <br> {<br> foreach($ul->find(<span class="var">'<strong>li</strong>'</span>) as $li) <br> {<br> <span class="comment">// do something...</span><br> }<br> }<br> <br> <span class="comment">// Find first <li> in first <ul></span> <br> $e = $html->find(<span class="var">'<strong>ul</strong>', <strong>0</strong></span>)->find(<span class="var">'<strong>li</strong>', <strong>0</strong></span>);<br> </div> </div> </div> <a name="section_access"></a> <h2>How to access the HTML element's attributes?</h2> <a class="top" href="#top">Top</a> <div id="container_access"> <ul> <li><a href="#frag_access_attr"><span>Get, Set and Remove attributes</span></a></li> <li><a href="#frag_access_special"><span>Magic attributes</span></a></li> <li><a href="#frag_access_tips"><span>Tips</span></a></li> </ul> <div id="frag_access_attr"> <div class="code"> <span class="comment">// <strong>Get</strong> a attribute ( If the attribute is <strong>non-value</strong> attribute (eg. checked, selected...), it will returns <strong>true</strong> or <strong>false</strong>)</span><br> $value = $e-><strong>href</strong>;<br> <br> <span class="comment">// <strong>Set</strong> a attribute(If the attribute is <strong>non-value</strong> attribute (eg. checked, selected...), set it's value as <strong>true</strong> or <strong>false</strong>)</span><br> $e-><strong>href</strong> = <span class="var">'my link'</span>;<br> <br> <span class="comment">// <strong>Remove</strong> a attribute, set it's value as null! </span><br> $e-><strong>href</strong> = <strong><span class="var">null</span></strong>;<br> <br> <span class="comment">// <strong>Determine</strong> whether a attribute exist?</span> <br> if(isset($e-><strong>href</strong>)) <br> echo <span class="var">'href exist!'</span>;<br> </div> </div> <div id="frag_access_special"> <div class="code"> <span class="comment">// Example</span><br> <span class="hl-var">$</span><span class="hl-code">html = </span>str_get_html<span class="hl-brackets">(</span><span class="var">"<div>foo <b>bar</b></div>"</span><span class="hl-brackets">)</span><span class="hl-code">;</span> <br> $e = $html->find(<span class="var">"div"</span>, <span class="var">0</span>);<br> <br> echo $e-><strong>tag</strong>; <span class="comment">// Returns: " <strong>div</strong>"</span><br> echo $e-><strong>outertext</strong>; <span class="comment">// Returns: " <strong><div>foo <b>bar</b></div></strong>"</span><br> echo $e-><strong>innertext</strong>; <span class="comment">// Returns: " <strong>foo <b>bar</b></strong>"</span><br> echo $e-><strong>plaintext</strong>; <span class="comment">// Returns: " <strong>foo </strong><strong>bar</strong>"<br> <br> </span> <table cellspacing="1" cellpadding="1"> <tr bgcolor="#CCCCCC"> <th width="25%">Attribute Name</th> <th width="75%">Usage</th> </tr> <tr> <td>$e-><strong>tag</strong></td> <td>Read or write the <strong>tag name</strong> of element.</td> </tr> <tr> <td>$e-><strong>outertext</strong></td> <td>Read or write the <strong>outer HTML text </strong> of element.</td> </tr> <tr> <td>$e-><strong>innertext</strong></td> <td>Read or write the <strong>inner HTML text </strong> of element.</td> </tr> <tr> <td>$e-><strong>plaintext</strong></td> <td>Read or write the <strong>plain text </strong> of element.</td> </tr> </table> </div> </div> <div id="frag_access_tips"> <div class="code"><span class="comment">// <strong>Extract</strong> contents from HTML </span><br> echo <strong>$html</strong>-><strong>plaintext</strong>;<br> <br> <span class="comment"> // <strong>Wrap</strong> a element</span><br> $e-><strong>outertext</strong> = <span class="var">'<div class="wrap">'</span> . $e-><strong>outertext</strong> . <span class="var">'<div></span>';<br> <br> <span class="comment">// <strong>Remove</strong> a element, set it's outertext as an empty string </span><br> $e-><strong>outertext</strong> = <span class="var">''</span>;<br> <br> <span class="comment">// <strong>Append</strong> a element</span><br> $e-><strong>outertext</strong> = $e-><strong>outertext</strong> . <span class="var">'<div>foo</span><span class="var"><div></span>';<br> <br> <span class="comment">// <strong>Insert</strong> a element</span><br> $e-><strong>outertext</strong> = <span class="var">'<div>foo</span><span class="var"><div></span>' . $e-><strong>outertext</strong>;<br> </div> </div> </div> <a name="section_traverse"></a> <h2>How to traverse the DOM tree?</h2> <a class="top" href="#top">Top</a> <div id="container_traverse"> <ul> <li><a href="#frag_traverse_background"><span>Background Knowledge</span></a></li> <li><a href="#frag_traverse_traverse"><span>Traverse the DOM tree</span></a></li> </ul> <div id="frag_traverse_background"> <div class="code"> <span class="comment">// If you are not so familiar with HTML DOM, check this <a href="http://php.net/manual/en/book.dom.php" target="_blank"><span class="var">link</span></a> to learn more... </span><br> <br> <span class="comment">// Example</span><br> echo $html-><strong>find</strong>(<span class="var">"#div1", 0</span>)-><strong>children</strong>(<span class="var">1</span>)-><strong>children</strong>(<span class="var">1</span>)-><strong>children</strong>(<span class="var">2</span>)-><span class="var">id</span>;<br> <span class="comment">// or</span> <br> echo $html-><strong>getElementById</strong>(<span class="var">"div1"</span>)-><strong>childNodes</strong>(<span class="var">1</span>)-><strong>childNodes</strong>(<span class="var">1</span>)-><strong>childNodes</strong>(<span class="var">2</span>)-><strong>getAttribute</strong>(<span class="var">'id'</span>); </div> </div> <div id="frag_traverse_traverse"> <div class="code">You can also call methods with <a href="manual_api.htm#camel"><span class="var">Camel naming convertions</span></a>.<br> <table cellspacing="1" cellpadding="1"> <tr> <th> Method </th> <th> Description</th> </tr> <tr> <td> <div class="returns">mixed</div>$e-><strong>children</strong> ( <span class="var">[int $index]</span> ) </td> <td>Returns the Nth <strong>child object</strong> if <strong>index</strong> is set, otherwise return an <strong>array of children</strong>. </td> </tr> <tr> <td> <div class="returns">element</div>$e-><strong>parent</strong> () </td> <td>Returns the <strong>parent</strong> of element. </td> </tr> <tr> <td> <div class="returns">element</div>$e-><strong>first_child</strong> () </td> <td>Returns the <strong>first child</strong> of element, or <strong>null</strong> if not found. </td> </tr> <tr> <td> <div class="returns">element</div>$e-><strong>last_child</strong> () </td> <td>Returns the <strong>last child</strong> of element, or <strong>null</strong> if not found. </td> </tr> <tr> <td> <div class="returns">element</div>$e-><strong>next_sibling</strong> () </td> <td>Returns the <strong>next sibling</strong> of element, or<strong> null</strong> if not found. </td> </tr> <tr> <td> <div class="returns">element</div>$e-><strong>prev_sibling</strong> () </td> <td>Returns the <strong>previous sibling</strong> of element, or <strong>null</strong> if not found. </td> </tr> </table> </div> </div> </div> <a name="section_dump"></a> <h2>How to dump contents of DOM object?</h2> <a class="top" href="#top">Top</a> <div id="container_dump"> <ul> <li><a href="#frag_dump_quick"><span>Quick way</span></a></li> <li><a href="#frag_dump_oo"><span>Object-oriented way</span></a></li> </ul> <div id="frag_dump_oo"> <div class="code"><span class="comment">// </span><span class="comment">Dumps the internal DOM tree back into string </span><br> $str = $html-><strong>save</strong>();<br> <br> <span class="comment">// Dumps the internal DOM tree back into a file</span> <br> $html-><strong>save</strong>(<span class="var">'result.htm'</span>);</div> </div> <div id="frag_dump_quick"> <div class="code"><span class="comment">// </span><span class="comment">Dumps the internal DOM tree back into string </span><br> $str = $html;<br> <br> <span class="comment">// Print it!</span><br> echo $html; <br> </div> </div> </div> <a name="section_callback"></a> <h2>How to customize the parsing behavior?</h2> <a class="top" href="#top">Top</a> <div id="container_callback"> <ul> <li><a href="#frag_callback"><span>Callback function</span></a></li> </ul> <div id="frag_callback"> <div class="code"><span class="comment">// Write a function with parameter "<strong>$element</strong>"</span><br> function my_callback(<span class="var">$element</span>) {<br> <span class="comment">// Hide all <b> tags </span><br> if ($element->tag==<span class="var">'b'</span>)<br> $element->outertext = '';<br> } <br> <br> <span class="comment">// Register the callback function with it's <strong>function name</strong></span><br> $html-><strong>set_callback</strong>(<span class="var">'my_callback'</span>);<br> <br> <span class="comment">// Callback function will be invoked while dumping</span><br> echo $html; </div> </div> </div> <div><br> Author: S.C. Chen (me578022@gmail.com)<br> Original idea is from Jose Solorzano's <a href="http://php-html.sourceforge.net/">HTML Parser for PHP 4</a>. <br> Contributions by: Contributions by: Yousuke Kumakura, Vadim Voituk, Antcs<br> </div> </div> </body> </html> <!--$Rev: 165 $-->