-
Notifications
You must be signed in to change notification settings - Fork 114
Basic Usage
The typical use of html5-php is to parse html5 to a DOM or to turn a DOM into html5.
To create a new HTML5 parser just write
// composer autoload
require "vendor/autoload.php";
use Masterminds\HTML5;
$html5 = new HTML5($options);
The three ways to easily parse html5 are html5 strings, html5 files, and html5 fragments.
// An example HTML document:
$html = <<< 'HERE'
<html>
<head>
<title>TEST</title>
</head>
<body id='foo'>
<h1>Hello World</h1>
<p>This is a test of the HTML5 parser.</p>
</body>
</html>
HERE;
// Parse the document. $dom is a DOMDocument.
$dom = $html5->loadHTML($html);
DOMDocument
is the same object returned when parsing html4, xml, and xhtml with the built in tools from libxml.
Parsing a file or resource can happen without loading the markup to a string.
// Parse the document. $dom is a DOMDocument.
$dom = $html5->loadHTMLFile('path/to/file.html');
// An example HTML fragment:
$fragment = "<p>This is a test of the HTML5 parser.</p>";
// Parse the document. $dom is a DOMDocumentFragment.
$dom = $html5->loadHTMLFragment($fragment);
DOMDocumentFragment
is similar to DOMDocument
in that it is a container for elements. DOMDocumentFragments can be attached to DOMDocuments. When that happens all the children are moved to the DOMDocument.
The serializer can write DOMDocuments and DOMDocumentFragments to strings and files.
// $dom is either a DOMDocument, DOMDocumentFragment, or DOMNodeList.
$string = $html5->saveHTML($dom);
// $dom is either a DOMDocument, DOMDocumentFragment, or DOMNodeList.
$string = $html5->save($dom, 'path/to/file.html');
html5 has a long list of entities to encode going beyond the typical use cases. These include characters like periods, commas, and thousands of other common characters. There is an option of whether to encode the entire list or whether to encode only the basics as done by htmlspecialchars
. The default is only the basic characters.
To change the default value to encode all entities:
$html5 = new HTML5(array('encode_entities' => TRUE));
To encode all entities at call time:
// $dom is either a DOMDocument, DOMDocumentFragment, or DOMNodeList.
$string = $html5->saveHTML($dom, array('encode_entities' => TRUE));