# Buildsheet autogenerated by ravenadm tool -- Do not edit. NAMEBASE= python-html5lib VERSION= 1.1 KEYWORDS= python VARIANTS= v12 v13 SDESC[v12]= HTML parser based on WHATWG specification (3.12) SDESC[v13]= HTML parser based on WHATWG specification (3.13) HOMEPAGE= https://github.com/html5lib/html5lib-python CONTACT= Python_Automaton[python@ironwolf.systems] DOWNLOAD_GROUPS= main SITES[main]= PYPIWHL/6c/dd/a834df6482147d48e225a49515aabc28974ad5a4ca3215c18a882565b028 DISTFILE[1]= html5lib-1.1-py2.py3-none-any.whl:main DIST_SUBDIR= python-src DF_INDEX= 1 SPKGS[v12]= single SPKGS[v13]= single OPTIONS_AVAILABLE= PY312 PY313 OPTIONS_STANDARD= none VOPTS[v12]= PY312=ON PY313=OFF VOPTS[v13]= PY312=OFF PY313=ON DISTNAME= html5lib-1.1.dist-info GENERATED= yes [PY312].RUN_DEPENDS_ON= python-six:single:v12 python-webencodings:single:v12 [PY312].USES_ON= python:v12,wheel [PY313].RUN_DEPENDS_ON= python-six:single:v13 python-webencodings:single:v13 [PY313].USES_ON= python:v13,wheel [FILE:2542:descriptions/desc.single] html5lib ======== html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers. Usage ----- Simple usage follows this pattern: .. code-block:: python import html5lib with open("mydocument.html", "rb") as f: document = html5lib.parse(f) or: .. code-block:: python import html5lib document = html5lib.parse("
Hello World!") By default, the document will be an ``xml.etree`` element instance. Whenever possible, html5lib chooses the accelerated ElementTree implementation (i.e. ``xml.etree.cElementTree`` on Python 2.x). Two other tree types are supported: ``xml.dom.minidom and lxml.etree``. To use an alternative format, specify the name of a treebuilder: .. code-block:: python import html5lib with open("mydocument.html", "rb") as f: lxml_etree_document = html5lib.parse(f, treebuilder="lxml") When using with urllib2 (Python 2), the charset from HTTP should be pass into html5lib as follows: .. code-block:: python from contextlib import closing from urllib2 import urlopen import html5lib with closing(urlopen("http://example.com/")) as f: document = html5lib.parse(f, transport_encoding=f.info().getparam("charset")) When using with ``urllib.request`` (Python 3), the charset from HTTP should be pass into html5lib as follows: .. code-block:: python from urllib.request import urlopen import html5lib with urlopen("http://example.com/") as f: document = html5lib.parse(f, transport_encoding=f.info().get_content_charset()) To have more control over the parser, create a parser object explicitly. For instance, to make the parser raise exceptions on parse errors, use: .. code-block:: python import html5lib with open("mydocument.html", "rb") as f: parser = html5lib.HTMLParser(strict=True) document = parser.parse(f) When you're instantiating parser objects explicitly, pass a treebuilder class as the tree keyword argument to use an alternative document format: .. code-block:: python import html5lib parser = html5lib.HTMLParser(tree=html5lib.getTreeBuilder("dom")) minidom_document = parser.parse("
Hello World!") More documentation is available at https://html5lib.readthedocs.io/. Installation ------------ html5lib works on CPython 2.7+, CPython 3.5+ and PyPy. To install: .. code-block:: bash $ pip install html5lib The goal is to support a (non-strict) superset of the versions that [pip supports ]. Optional Dependencies [FILE:123:distinfo] 0d78f8fde1c230e99fe37986a60526d7049ed4bf8a9fadbad5f00e22e58e041d 112173 python-src/html5lib-1.1-py2.py3-none-any.whl