# R Markdown ## Source The source for this file is https://raw.githubusercontent.com/cjgeyer/Orientation2018/master/01-rmarkdown.Rmd. To fully understand it you have to compare what you see here (output) to the the source. So open the source in another tab in your browser. ## What is It, and Why do I Want It? ### What is It? R markdown is the latest in a long line of R packages that provide * [literate programming](https://en.wikipedia.org/wiki/Literate_programming) and * [reproducible research](https://en.wikipedia.org/wiki/Literate_programming) using R. It allows you to mix R code that is executed in the production of the document with a document. Of course plain code with comments goes a little way to explaining code, but literate programming is much better. R markdown can be converted to output formats other than HTML. Among these are PDF, Microsoft Word, and e-book formats. Other output formats are [explained in the Rmarkdown documentation](http://rmarkdown.rstudio.com/lesson-9.html). ### Newbie Data Analysis The way most newbies use R or any other statistical package is to dive right in * typing commands into R, * typing commands into a file and cut-and-pasting them into R, or * using RStudio. None of these actually document what was done because commands get edited a lot. If you are in the habit of saving your workspace when you leave R or RStudio, can you explain *exactly* how every R object in there was created? *Starting from raw data?* Probably not. If you work this way, you are never an "expert" even if you have 50 years experience. You are also **not** doing reproducible research. ### Expert Data Analysis The way experts use plain R is to type commands into a file, say `foo.R` and use ```{r engine='bash', eval=FALSE} R CMD BATCH --vanilla foo.R ``` to run R (from the operating system command line) to do the analysis. There are several ways experts use literate programming with R. Type commands with explanations into an R Markdown file, and render it in a clean R environment (empty global environment). Either start R with a clean global environment via ```{r engine='bash', eval=FALSE} R --vanilla ``` and do ```{r eval=FALSE} library(rmarkdown) render("foo.Rmd") ``` or start RStudio with a clean global environment (on the "Tools" menu, select "Global Options" and uncheck "Restore .RData into workspace at startup", then close and restart) load the R Markdown file and click "Knit". Or use an older competitor of R Markdown, such as R function `Sweave` or R package `knitr`. The important thing is using a clean R environment so all calculations are fully reproducible. Same results every time the analysis is rerun by you or by anybody, anywhere, on any computer that has R. That's (the computing part of) reproducible research! ### No Snarf and Barf [Snarf and barf](http://www.catb.org/jargon/html/S/snarf-ampersand-barf.html) is a colorful hacker term for cut and paste. When doing reproducible research you must **never snarf and barf**. It will inevitably get out of date so snarf-and-barfed output does not match the code in the document that purportedly produces it. In short, snarf and barf inevitably leads to lies (inadvertent, but still lies). So don't. With R Markdown (or `Sweave` or `knitr`) you never need to. ## Getting Started You don't need RStudio to use R Markdown (despite it being created by people who are now all RStudio employees). If you have an R Markdown file `baz.Rmd` then ```{r engine='bash', eval=FALSE} Rscript -e 'rmarkdown::render("baz.Rmd", "all")' ``` run from the operating system command line renders it into whatever output formats it says it does and ```{r engine='bash', eval=FALSE} Rscript -e 'rmarkdown::render("baz.Rmd", "html_document")' ``` does a specific format. But for today, we'll use RStudio. * Start RStudio. * On the File menu + select "New File" + then select "R Markdown" - and in the dialog that pops up fill in a title and author - and click the "OK" button You now should have a toy R Markdown document in the upper left panel of the RStudio app. Now * click the button labeled "knit" having a yarn and needles icon * and in the dialog that pops up give it a file name and location for where to save the file **Caution!** The extension of the file must be `Rmd` or `rmd` because RStudio refuses to process it otherwise. * and then the rendered document should show up in the lower right panel or in a pop up We're in business! We have done R Markdown! As we go along we can try things. ## Syntax not Involving R [Section 2.3 of the R Markdown book](https://bookdown.org/yihui/rmarkdown/cheat-sheets.html) recommends the RStudio cheat sheets. [Section 2.5 of the R Markdown book](https://bookdown.org/yihui/rmarkdown/markdown-syntax.html) has its take on the markdown part of R Markdown. Things to try in your toy document * italics * bold face * monospace font (for code) (uses backticks) * hyperlinks * section headings * lists * math (if you already know LaTeX or just want to copy the examples in the R Markdown book --- we aren't going to try to teach LaTeX here) ## Syntax Involving R ### Code Chunks In your toy document RStudio already gives you two code chunks (R that is executed when the document is rendered and the output stuffed in the document). One does a summary and the other a plot. Here's a trivial code chunk ```{r} 2 + 2 ``` Note that the result is not in the document source. Every time the document is rendered, R executes the code producing new results. If you change the code, the results also change (unlike what happens if you snarf-and-barf code and results). ### Plots The toy document RStudio provides shows a plot using R base graphics, here is another using R package `ggplot2`. ```{r fig.align='center', label='histogram', fig.cap='Histogram with probability density function.'} # set.seed(42) # uncomment to always get the same plot # for ggplot all data must be in data frame mydata <- data.frame(x = rnorm(1000)) library(ggplot2) ggplot(mydata, aes(x)) + geom_histogram(aes(y = ..density..), binwidth = 0.5, fill = "cornsilk", color = "black") + stat_function(fun = dnorm, color = "maroon") ``` The analog with base graphics doesn't work as well because it is two commands that don't talk to each other. ```{r fig.align='center', label='histogram-too', fig.cap='Histogram with probability density function (base graphics).'} # for base graphics we don't need data in data frames x <- mydata$x hist(x, probability = TRUE) curve(dnorm(x), add = TRUE) ``` ### Tables The no-snarf-and-barf rule means we have to use R to construct all our tables. To see how to do that, first we need a table. Make up some regression data. ```{r} n <- 50 x <- seq(1, n) a.true <- 3 b.true <- 1.5 y.true <- a.true + b.true * x s.true <- 17.3 y <- y.true + s.true * rnorm(n) ``` And fit some models to it. ```{r} out1 <- lm(y ~ x) out2 <- lm(y ~ x + I(x^2)) out3 <- lm(y ~ x + I(x^2) + I(x^3)) anova(out1, out2, out3) ``` That is a table of sorts. In order to make that a table nicely formatted for the document we are making, first we have to figure out what the output of R function `anova` is and capture it so we can use it. ```{r} foo <- anova(out1, out2, out3) class(foo) ``` It is a data frame, which is the easiest thing to turn into a table, and the simplest way to do that seems to be the `kable` option on our R chunk ```{r kable, echo = FALSE} options(knitr.kable.NA = '') knitr::kable(foo, caption = "ANOVA Table", digits = c(0, 1, 0, 2, 3, 3)) ``` ### In-Line R You can also execute R not in a code chunk but just in-line in your writing. Here is an example of that. First let's get some computation we want to quote. ```{r} summary(out3) ``` The coefficients for x^2^ and x^3^ are not statistically significant, as shown by the ANOVA table. What they actually were was $`r out3$coef[3]`$ and $`r out3$coef[4]`$. Magic! Note that these are coefficients for a regression on random data that is different every time the document is run (although wouldn't be if we uncommented the `set.seed` command). By not using snarf-and-barf we always have the right numbers. Also note that R Markdown (in its wisdom or unwisdom) forces you to know a little bit of LaTeX. When the number needs "scientific notation" R Markdown emits LaTeX here so the number must be enclosed in dollar signs (the LaTeX math marker for in-line math). So if these weren't enclosed in dollar signs nonsense would be printed. ### R Chunk Options We have already illustrated a few R chunk options. Another useful one is `echo = FALSE`. In this paragraph there is such a code chunk, but you don't see it. ```{r echo=FALSE} hide <- 2 * pi ``` That is, you don't see it unless you look at the source. But we can use it in another code chunk. ```{r} hide ``` Any knitr chunk option found at https://yihui.name/knitr/options/ can be used as an R Markdown chunk option (knitr underlies rmarkdown). Another useful option is `cache = TRUE`. That will be illustrated in the next chapter.