` tag. ```{r load-more-libraries, include = F} library(xml2) ``` ### Example ```{r pressure, echo=FALSE, fig.cap="See [DOM model](https://en.wikipedia.org/wiki/Document_Object_Model) on Wikipedia", out.width = '65%'} knitr::include_graphics("wikipedia-dom-model.png") ``` ### Structure XML structured data looks like this: ``` #xml prologue ..... ``` ### Xpath XPath is a syntax for defining and navigating parts of an XML document. They can be used across many languages. Xpath has over 200 functions according to W3schools. See the W3 page for [xpath syntax](https://www.w3schools.com/xml/xpath_syntax.asp). ## XML Data All of the functions below are from the `xlm2` package and the help [page](https://xml2.r-lib.org). ```{r read-in-xml, echo = T} #get W3 data url <- "https://www.w3schools.com/xml/simple.xml" # read breakfast menu bkfst <- read_xml(x = url) str(bkfst) ``` ```{r xml-name, echo = T} xml_name(bkfst) ``` ### See XML Data Structure ```{r xml-structure, echo = T} xml_structure(bkfst) ``` ```{r xml-text, echo = T} # see text xml_text(bkfst) ``` ### Extract XML The real key is understanding how to define or set the "xpath" argument. ```{r get-bkfst-name, echo = T} xml_find_all(bkfst, xpath = "//name") ``` ```{r get-bkfst-price, echo = T} xml_text(xml_find_all(bkfst, xpath = "//name")) ``` ### Build table ```{r build-bkfst-table, echo = T} library(tibble) name <- xml_text(xml_find_all(bkfst, xpath = "//name")) price <- xml_text(xml_find_all(bkfst, xpath = "//price")) df <- tibble(name = name, price = price) df ``` ## Lists and XML Lists and xml are similar and easily converted within the `xml2` package. ```{r create-nested-list, echo = T} document <- list(root = list(parent = list(child = list(1)))) xmlDoc <- xml2::as_xml_document(document) xml2::xml_structure(xmlDoc) ``` ```{r extract-one} xml_find_all(xmlDoc, xpath = ".//child") ``` ## Conclusion If you can get a basic understanding of html web pages and xpath syntax, then you should be able to parse xml files efficiently with `xlm2`. ## Acknowledgements (Get bibliographic stuff from "archetype hill".) ## References ## Disclaimer The views, analysis and conclusions presented within this paper represent the author’s alone and not of any other person, organization or government entity. While I have made every reasonable effort to ensure that the information in this article was correct, it will nonetheless contain errors, inaccuracies and inconsistencies. It is a working paper subject to revision without notice as additional information becomes available. Any liability is disclaimd as to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause. The author(s) received no financial support for the research, authorship, and/or publication of this article. ## Reproducibility ```{r reproducibility, echo = FALSE} ## Reproducibility info options(width = 120) session_info() ```