The files in this data are derived from: * `html.html`: from [html](http://github.com/whatwg/html), revision 77db356a293f2b152b648c836b6989d17afe42bb. This is the first 5000 lines of `source`. (This is representative of the input to [Anolis](https://bitbucket.org/ms2ger/anolis/); first 5000 lines chosen to make it parse in a reasonable time.) * `wpt`: see `wpt/README.md`.