---
title: "Causal Inference and Designed Experiments"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(knitr)
```

In 506 & 506, we have used predictive language for discussing linear models. In particular, we have used words like, "the expected difference between a unit with factor y and another unit with factor x" that focus on differences between units rather than differences within a unit.

\vfill

Now we introduce causal language focusing on differences within a unit. 


\vfill

Causal inference is generally focused on a comparison of potential outcomes. 


\vfill

The textbook has a running example of taking fish oil supplements, we will consider a clinical trials setting with COVID-19 vaccination looking at antibody measurements for individuals with and without a vaccination.

\vfill

Let $y_i^0$ be the antibody measurement (IgG) for individual $i$ having received a control and $y_i^1$ be the antibody measurement (IgG) for individual $i$ having received a vaccine.


\vfill

In the experiment, individual $i$ either receives the control or the vaccine, hence, only one potential outcome is observed, which is denoted as the _factual outcome._ 

\vfill


\vfill

\newpage

Unfortunately, we cannot observe both $y_i^1$ and $y_i^0$


\vfill

If treatments are randomly assigned, we can estimate an "average causal effect" across the respondents, but can't say anything about unit $i$.

\vfill

Another approach would be to attempt to have a replacement for one of the counterfactuals. For instance, could a pre-score be used in place of $y_i^0$? 


\vfill

An experiment could also be conceived that randomizes the ordering of the assignment such that each unit receives both treatments over the course of the study, 

\vfill

The same idea applies for multiple treatment levels, continuous treatments...

\vfill

Again, we cannot estimate $\tau_i = y_i^1 - y_i^0$, but we can estimate the sample average treatment effect (SATE)


\vfill


\vfill

With statistics, the interest is rarely just the sample itself, but rather a broader population. Hence the target is often


\newpage

If the treatment and control groups are not similar, then $\tau_{SATE} = \frac{1}{n} \sum_i^n y_i^1 -  \frac{1}{n} \sum_i^n y_i^0$

\vfill

### Randomized experiments

Randomization can ensure treatment and control groups are balanced, on average (in expectation).

\vfill

In a completely randomized experiment, the probability of being assigned any given treatment is the same for all units.

\vfill

Note, that a completely randomized experiment does not guarantee a balanced sample for any particular realization.

```{r}
set.seed(10)
tibble(factor = sample(rep(c('blue','gold'), each =4),4)) %>% 
  bind_cols(tibble(treatment =rep('treat', 4))) %>%
  kable()
```

\vfill

#### Design Notation

An _unbiased estimate_ is correct, on average. In other words, the mean of the sampling distribution is equal to the estimand. 

\vfill

The sampling distribution of an _efficient_ estimate has small variance.

\vfill

Similarly, using the randomization distribution (based on repeated allocation of treatments) of the estimate 

\vfill

\newpage

If there are other observable factors that would be expected to result in different outcomes, this can be used in the experimental design. Recall the completely randomized design that resulted in all of th "blue" units being assigned the treatment.

```{r}
set.seed(10)
tibble(factor = sample(rep(c('blue','gold'), each =4),4)) %>% 
  bind_cols(tibble(treatment =rep('treat', 4))) %>%
  kable()
```

\vfill

A randomized block design allocates treatments and controls within each group of similar units.

```{r}
tibble(block = rep(c('blue','gold'), each =4), 
       level = c(sample(rep(c('treat','control'), each = 2)),sample(rep(c('treat','control'), each = 2)))) %>% kable()
```

\vfill


\vfill

Blocks are defined by pre-treatment variables. Blocks can be defined by anything that might be expected to predictive of the outcome.

\vfill

Other experimental design structures include

\vfill


\vfill


\vfill

Ideally any information about differences in units would be accounted for in the design phase, but it can also be included in the analysis.

\newpage

#### Ignorability

With ignorability the assignment of a treatment is independent of the potential outcomes. With a complete randomized design, this can be written as 


\vfill


\vfill

Randomized blocks use conditional ignorability


\vfill

#### Efficiency

Efficiency is a measure of the variability in the estimator.

\vfill

Using blocking variables can result in a more efficient estimator if the units within the blocks are similar, but the blocks are different.

\vfill

Regression methods can also be used to account for pre-treatment variables and result in a more efficient estimator.