A dive into R Markdown

MACS 30500 University of Chicago

Reproducibility in scientific research

R Markdown basics

---
title: "Gun deaths"
date: 2017-02-01
output: html_document
---

```{r setup, include = FALSE}
library(tidyverse)
library(rcfss)

youth <- gun_deaths %>%
  filter(age <= 65)
```

We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

```{r youth-dist, echo = FALSE}
youth %>% 
  ggplot(aes(age)) + 
  geom_freqpoly(binwidth = 1)
```

Major components

A YAML header surrounded by ---s
Chunks of R code surounded by ```
Text mixed with simple text formatting using the Markdown syntax

Knitting process

Exercise

Code chunks

Naming code chunks
Code chunk options
eval = FALSE
include = FALSE
echo = FALSE
message = FALSE or warning = FALSE
results = 'hide'
error = TRUE

Caching

Saving results of computationally intensive chunks
Reuse each time chunk is knitted

Dependencies

```{r raw_data}
rawdata <- readr::read_csv("a_very_large_file.csv")
```

```{r processed_data, cache = TRUE}
processed_data <- rawdata %>% 
  filter(!is.na(import_var)) %>% 
  mutate(new_variable = complicated_transformation(x, y, z))
```

Dependencies

```{r raw_data}
rawdata <- readr::read_csv("a_very_large_file.csv")
```

```{r processed_data, cache = TRUE, dependson = "raw_data"}
processed_data <- rawdata %>% 
  filter(!is.na(import_var)) %>% 
  mutate(new_variable = complicated_transformation(x, y, z))
```

Global options

knitr::opts_chunk$set(
  echo = FALSE
)

Inline code

We have data about `r nrow(gun_deaths)` individuals killed by guns. Only `r nrow(gun_deaths) - nrow(youth)` are older than 65. The distribution of the remainder is shown below:

We have data about 100798 individuals killed by guns. Only 15687 are older than 65. The distribution of the remainder is shown below:

Exercise: practice chunk options

YAML header

---
title: "Gun deaths"
author: "Benjamin Soltoff"
date: 2017-02-01
output: html_document
---

Yet Another Markup Language
Standardized format for storing hierarchical data in a human-readable syntax
Defines how rmarkdown renders your .Rmd file

HTML document

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output: html_document
---

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    toc: true
    toc_depth: 2

Appearance and style

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    theme: readable
    highlight: pygments
---

theme specifies the Bootstrap theme to use for the page
highlight specifies the syntax highlighting style for code chunks

Code folding

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    code_folding: hide
---

Keeping Markdown

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  html_document:
    keep_md: true
---

Exercise: test HTML options

PDF document

---
title: "Gun deaths"
date: 2017-02-01
output: pdf_document
---

Renders as PDF using \(\LaTeX\)

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  pdf_document:
    toc: true
    toc_depth: 2

Syntax highlighting

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  pdf_document:
    highlight: pygments
---

\(\LaTeX\) options

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output: pdf_document
fontsize: 11pt
geometry: margin=1in
---

Keep intermediate TeX

---
title: "Untitled"
author: "Benjamin Soltoff"
date: "February 1, 2017"
output:
  pdf_document:
    keep_tex: true
---

Exercise: test PDF options

Presentations

Exercise: build a presentation

Multiple formats

output:
  html_document:
    toc: true
    toc_float: true
  pdf_document: default

Rendering multiple outputs programmatically

rmarkdown::render("my-document.Rmd",
                  output_format = "all")

Exercise: render in multiple formats

R scripts

# gun-deaths.R
# 2017-02-01
# Examine the distribution of age of victims in gun_deaths


# load packages
library(tidyverse)
library(rcfss)

# filter data for under 65
youth <- gun_deaths %>%
  filter(age <= 65)

# number of individuals under 65 killed
nrow(gun_deaths) - nrow(youth)

# graph the distribution of youth
youth %>% 
  ggplot(aes(age)) + 
  geom_freqpoly(binwidth = 1)

When to use a script

For troubleshooting
Initial stages of project
Building a reproducible pipeline
It depends
Running scripts interactively
Running scripts programmatically
- source()

Running scripts via the shell

Rscript gun-deaths.R

Rscript -e "rmarkdown::render('gun-deaths.Rmd')"

A dive into R Markdown

MACS 30500 University of Chicago

Reproducibility in scientific research

R Markdown basics

Major components

Knitting process

Exercise

Code chunks

Caching

Dependencies

Dependencies

Global options

Inline code

Exercise: practice chunk options

YAML header

HTML document

Table of contents

Appearance and style

Code folding

Keeping Markdown

Exercise: test HTML options

PDF document

Table of contents

Syntax highlighting

\(\LaTeX\) options

Keep intermediate TeX

Exercise: test PDF options

Presentations

Exercise: build a presentation

Multiple formats

Rendering multiple outputs programmatically

Exercise: render in multiple formats

R scripts

When to use a script

Running scripts via the shell

Exercise: execute R scripts