---
title: "Causal Inference and Regression"
output: pdf_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
library(knitr)
set.seed(04152021)
```

While adjusting for pre-treatment variables is advised, post treatment variables should not be treated in the same fashion.

\vfill

Recall the idea of ignorability, where $y^0, y^i \perp z$, then under randomization there will be no differences, on average, 

$$y = \tau z + \epsilon$$ 


\vfill

Conditional ignorability also holds on a pretreatment variable, $x$ such that $y^0, y^i \perp z | x$ then under randomization there will be no differences, on average, in the distribution of potential outcomes is the same across levels of the treatment after controlling for x.


\vfill

However for a post-treatment variable $q$, we cannot, in general, state that $y^0, y^i \perp z | x, q$. The issue here is that q can be influenced by the treatment. The result is that 

$$y = \tau^* z + x \beta^* + \delta q + \epsilon$$ 

\vfill

\newpage

The variable $q$ is often referred to as an intermediate variables. The issue is when the intermediate variables is influenced by the treatment.

\vfill

Hence, estimating the treatment effects and potential outcomes in y, conditional on $q$, would require accounting for both potential outcomes in $q$.

\vfill

ROS states "randomized experiments are a black box approach to causal inference. We see what goes in (treatments) and see what comes out (outcomes), and we can make inferences about the relationships between these inputs and outputs."

\vfill

Note that post treatment "mediating variables" induce challenges in interpreting the "causal paths" and require more thought than using regression for intermediate outcomes.

\vfill

\newpage

### Observational studies and causal inference

We have looked at causal inference through the lens of randomized, designed experiments. Designed experiments, and ignorability in treatment assignment (based on potential outcomes), enabled estimates of average treatment effects.

\vfill

Unfortunately, random treatment assignment is not always possible.

\vfill

ROS describes an observational study to be the opposite of a designed (randomized) experiment.

\vfill

With an observational study, under this definition, there may or may not be a direct manipulation of the treatment.

\vfill

Generally, it is not reasonable to consider the treatment assignment as random across the groups. 

\vfill


\vfill


\vfill

\newpage

Selection bias, where units receive a treatment or control based on some non-randomized mechanism, is a major issue for causal inference in observational studies. Often treatment assignment is confounded with other information, which presents challenges in estimating treatment effects.

\vfill

If outcomes were compared, conditional on the confounding variable, then we _could_ make causal claims.

\vfill


\vfill


\vfill


\vfill

\newpage

Failing to account for lurking variables results in biased estimates of the treatment effect.

\vfill

Consider the following "true" model:

$$y_i = \beta_0 + \beta_1 z_i + \beta_2 x_i + \epsilon_i$$

if $x_i$ is not included in the model, but should be,


\vfill

\vfill

Then the original model can be rewritten as

$$y_i = \beta_0 + \beta_2 \gamma_0 + (\beta_1 + \beta_2 \gamma_1) z_i + \epsilon_i + \beta_2 \nu_i$$
where $\beta_1^* = \beta_1 + \beta_2 \gamma_1$.

\vfill

In omitting the lurking variable, we hope to estimate $\beta_1$, but instead estimate $\beta_1^*$, which is biased unless:

\vfill

\vfill