A dive into R Markdown

MACS 30500 University of Chicago

Reproducibility in scientific research

R Markdown basics

---
title: "Gun deaths"
date: 2017-02-01
output: html_document
---

```{r setup, include = FALSE}
library(tidyverse)
library(rcfss)

youth <- gun_deaths %>%
  filter(age <= 65)
```

We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

```{r youth-dist, echo = FALSE}
youth %>% 
  ggplot(aes(age)) + 
  geom_freqpoly(binwidth = 1)
```

Major components

  1. A YAML header surrounded by ---s
  2. Chunks of R code surounded by ```
  3. Text mixed with simple text formatting using the Markdown syntax

Knitting process

Exercise

Code chunks

  • Naming code chunks
  • Code chunk options
  • eval = FALSE
  • include = FALSE
  • echo = FALSE
  • message = FALSE or warning = FALSE
  • results = 'hide'
  • error = TRUE

Caching

  • Saving results of computationally intensive chunks
  • Reuse each time chunk is knitted

Dependencies

```{r raw_data}
rawdata <- readr::read_csv("a_very_large_file.csv")
```

```{r processed_data, cache = TRUE}
processed_data <- rawdata %>% 
  filter(!is.na(import_var)) %>% 
  mutate(new_variable = complicated_transformation(x, y, z))
```

Dependencies

```{r raw_data}
rawdata <- readr::read_csv("a_very_large_file.csv")
```

```{r processed_data, cache = TRUE, dependson = "raw_data"}
processed_data <- rawdata %>% 
  filter(!is.na(import_var)) %>% 
  mutate(new_variable = complicated_transformation(x, y, z))
```

Global options

knitr::opts_chunk$set(
  echo = FALSE
)

Inline code

We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

We have data about 100798 individuals killed by guns. Only 15687 are older than 65. The distribution of the remainder is shown below:

Exercise: practice chunk options

YAML header

---
title: "Gun deaths"
author: "Benjamin Soltoff"
date: 2017-02-01
output: html_document
---
  • Yet Another Markup Language
  • Standardized format for storing hierarchical data in a human-readable syntax
  • Defines how rmarkdown renders your .Rmd file

HTML document

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output: html_document
---

Table of contents

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    toc: true
    toc_depth: 2

Appearance and style

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    theme: readable
    highlight: pygments
---
  • theme specifies the Bootstrap theme to use for the page
  • highlight specifies the syntax highlighting style for code chunks

Code folding

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    code_folding: hide
---

Keeping Markdown

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    keep_md: true
---

Exercise: test HTML options

PDF document

---
title: "Gun deaths"
date: 2017-02-01
output: pdf_document
---
  • Renders as PDF using \(\LaTeX\)

Table of contents

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  pdf_document:
    toc: true
    toc_depth: 2

Syntax highlighting

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  pdf_document:
    highlight: pygments
---

\(\LaTeX\) options

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output: pdf_document
fontsize: 11pt
geometry: margin=1in
---

Keep intermediate TeX

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  pdf_document:
    keep_tex: true
---

Exercise: test PDF options

Presentations

Exercise: build a presentation

Multiple formats

output:
  html_document:
    toc: true
    toc_float: true
  pdf_document: default

Rendering multiple outputs programmatically

rmarkdown::render("my-document.Rmd",
                  output_format = "all")

Exercise: render in multiple formats

R scripts

# gun-deaths.R
# 2017-02-01
# Examine the distribution of age of victims in gun_deaths


# load packages
library(tidyverse)
library(rcfss)

# filter data for under 65
youth <- gun_deaths %>%
  filter(age <= 65)

# number of individuals under 65 killed
nrow(gun_deaths) - nrow(youth)

# graph the distribution of youth
youth %>% 
  ggplot(aes(age)) + 
  geom_freqpoly(binwidth = 1)

When to use a script

  • For troubleshooting
  • Initial stages of project
  • Building a reproducible pipeline
  • It depends

  • Running scripts interactively
  • Running scripts programmatically
    • source()

Running scripts via the shell

Rscript gun-deaths.R
Rscript -e "rmarkdown::render('gun-deaths.Rmd')"

Exercise: execute R scripts