--- title: "Causal Inference and Regression" output: pdf_document --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(tidyverse) library(knitr) set.seed(04132021) ``` Causal inference can be characterized as a predictive problem, where the question is \vfill Simulate and visualize data with two potential outcomes \vfill ```{r, echo = F} n <- 40 y0 <- rnorm(n) y1 <- y0 + rnorm(n, mean = 2) ``` \vfill ```{r, echo = F} dat <- tibble(id = factor(rep(1:n, 2)), response = c(y0, y1), `potential outcome` = rep(c('y0','y1'), each = n)) dat %>% ggplot(aes(y = response, x = `potential outcome`)) + geom_violin(draw_quantiles = c(.025, .5, .975)) + geom_jitter(aes(color = id), shape = rep(c(0:19),4)) + theme_bw() + theme(legend.position = 'none') + ggtitle('Potential Outcomes') ``` \newpage Then randomly assign treatments to each unit and visualize differences \vfill ```{r, echo = F} treatment <- sample(rep(0:1, each = n/2)) sample_dat <- tibble(id = factor(1:n), response = y0, `potential outcome` = rep('y0', n)) %>% filter(treatment == 0) %>% bind_rows( tibble(id = factor(1:n), response = y1, `potential outcome` = rep('y1', n)) %>% filter(treatment == 1) ) ``` \vfill ```{r, echo = F} sample_dat %>% ggplot(aes(y = response, x = `potential outcome`)) + geom_violin(draw_quantiles = c(.025, .5, .975)) + geom_jitter(aes(color = id)) + theme_bw() + theme(legend.position = 'none') + ggtitle('Factual Outcomes') ``` \newpage #### Pre-treatment covariates Consider a setting with: \vfill \vfill \vfill \vfill \vfill