--- title: "Estimating Estimands with Estimators" author: "Jake Bowers" date: "`r format(Sys.time(), '%d %B %Y')`" output: beamer_presentation: keep_tex: yes slide_level: 2 toc: yes revealjs::revealjs_presentation: center: no fig_caption: yes highlight: pygments pandoc_args: --toc reveal_options: chalkboard: theme: whiteboard toggleNotesButton: no previewLinks: yes slideNumber: yes reveal_plugins: - notes - search - chalkboard self_contained: no smart: no theme: default transition: fade bibliography: ../learningdays-book.bib header-includes: | \setbeamertemplate{footline}{\begin{beamercolorbox}{section in head/foot} \includegraphics[height=.5cm]{../Images/egap-logo.png} \hfill \insertframenumber/\inserttotalframenumber \end{beamercolorbox}} \usepackage{makecell} \usepackage{tikz} \usepackage{tikz-cd} \usetikzlibrary{arrows,automata,positioning,trees,babel} \usepackage{textpos} \usepackage{booktabs,multirow} link-citations: yes colorlinks: yes biblio-style: apalike --- ```{r setup, include=FALSE} source("rmd_setup.R") # Load all the libraries we need library(here) library(tidyverse) library(kableExtra) library(DeclareDesign) library(estimatr) library(styler) ``` # Key points ## Key points about estimation I - A causal effect, $\tau_i$, is a comparison of unobserved potential outcomes for each unit $i$: examples $\tau_{i} = Y_{i}(T_{i}=1) - Y_{i}(T_{i}=0)$ or $\tau_{i} = \frac{Y_{i}(T_{i}=1)}{ Y_{i}(T_{i}=0)}$. - To learn about $\tau_{i}$, we can treat $\tau_{i}$ as an **estimand** or target quantity to be estimated (discussed here) or as a target quantity to be hypothesized about (session on hypothesis testing). - Many focus on the **average treatment effect (ATE)**, $\bar{\tau}=\sum_{i=1}^n\tau_{i}$, in part, because it is easy to **estimate** and its variability from experiment to experiment is easy to calculate. ## Key points about estimation II - The key to estimation for causal inference is to choose an estimand that helps you learn about your theoretical or policy question. So, one could use the ATE but other common estimands include the ITT, LATE/CACE, ATT, or ATE for some subgroup (or even a difference of causal effects between groups) or even quantile effect. - An **estimator** is a recipe for calculating a guess about the value of an estimand. For example, the difference of observed means for $m$ treated units is one estimator of $\bar{\tau}$: $\hat{\bar{\tau}} = \frac{\sum_{i=1}^n (T_i Y_i)}{m} - \frac{\sum_{i=1}^n ( ( 1 - T_i)Y_i)}{(n-m)}$. ## Key points about estimation III - The **standard error** of an estimator in a randomized experiment summarizes how the estimates would vary if the experiment were repeated. Different randomizations will produce different values of the same estimator targeting the same estimand. A **standard error** summarizes this variability in an estimator. - We use the **standard error** and assumptions about the distribution of a test statistic to produce **confidence intervals** and **p-values**: so that we can begin with an estimator, treat it as a test statistic, and end at a hypothesis test. - A $100(1-\alpha)$% **confidence interval** is a collection of hypotheses that cannot be rejected at the $\alpha$ level. We tend to report confidence intervals containing hypotheses about values of our estimand and use our estimator as a test statistic: at least this is what most regression tables report. ## Key points about estimation IV - Estimators should: - avoid systematic error in their guessing of the estimand (be unbiased); - vary little in their guesses from experiment to experiment (be precise or efficient); and - perhaps ideally converge to the estimand as they use more and more information (be consistent). ## Key points about estimation V - **Analyze as you randomize** in the context of estimation means that (1) our standard errors should measure variability arising from randomization (not necessarily variability arising from sampling unless we know how the sample was sampled from the population) and (2) our estimators should target estimands defined in terms of potential outcomes. - We do not **control for** background covariates when we analyze data from randomized experiments. But covariates can make our estimation more **precise**. This is called **covariance adjustment** (or covariate adjustment). **Covariance adjustment** in randomized experiments differs from covariance adjustment in observational studies. # Review ## Review: Causal effects Review: Causal inference refers to a comparison of unobserved, fixed, potential outcomes. For example: - the potential, or possible, outcome for unit $i$ when assigned to treatment, $T_i=1$ is $Y_{i}(T_{i}=1)$. - the potential, or possible, outcome for unit $i$ when assigned to control, $T_i=0$ is $Y_{i}(T_{i}=0)$. Treatment assignment, $T_i$, has a causal effect on unit $i$, that we call $\tau_i$, if $Y_{i}(T_{i}=1) - Y_{i}(T_{i}=0) \ne 0$ or $Y_{i}(T_{i}=1) \ne Y_{i}(T_{i}=0)$. # Estimands and estimators and averages ## How can we learn about causal effects from observed data? 1. Recall: we can **test hypotheses** about the pair of potential outcomes $\{ Y_{i}(T_{i}=1), Y_{i}(T_{i}=0) \}$. 2. We can **define estimands** in terms of $\{ Y_{i}(T_{i}=1), Y_{i}(T_{i}=0) \}$ or $\tau_i$, **develop estimators** for those estimands, and then calculate **estimates** and **standard errors** for those estimators. ## A common estimand and estimator: The average treatment effect and the difference of means Say we are interested in the ATE, or $\bar{\tau}=\sum_{i=1}^n \tau_{i}$. What is a good estimator? Two candidates: 1. The difference of means: $\hat{\bar{\tau}} = \frac{\sum_{i=1}^n (T_i Y_i)}{m} - \frac{\sum_{i=1}^n ( ( 1 - T_i) Y_i)}{n-m}$. 2. A difference of means after top-coding the highest $Y_i$ observation (a kind of "winsorized" mean to prevent extreme values from exerting too much influence over our estimator --- to increase *precision*). How would we know which estimator is best for our particular research design? Let's simulate! ## Simulation Step 1: create some data with a known ATE Notice that we need to *know* the potential outcomes and the treatment assignment in order to learn whether our proposed estimator does a good job. ```{r echo=FALSE} ## First, create some data, ## y0 is potential outcome to control N <- 10 y0 <- c(0, 0, 0, 1, 1, 3, 4, 5, 190, 200) ## Each unit has its own treatment effect tau <- c(10, 30, 200, 90, 10, 20, 30, 40, 90, 20) ## y1 is potential outcome to treatment y1 <- y0 + tau ## Z is treatment assignment (note we're using Z instead of T) set.seed(12345) block <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b") Z <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1) ## Y is observed outcomes Y <- Z * y1 + (1 - Z) * y0 ## The data dat <- data.frame(Z = Z, y0 = y0, y1 = y1, tau = tau, b = block, Y = Y) ## dat <- dat[,c("Z","y0","y1")] ``` \begin{center} ```{r} kableExtra::kable(dat[, c("Z", "y0", "y1")]) ``` \end{center} ```{r ate, echo=FALSE, results="markup", message=TRUE} ATE <- with(dat, mean(y1 - y0)) message("The true ATE is ", ATE) ``` In reality, we would observe only one of the potential outcomes. Note that each unit has its own treatment effect. ## First make fake data The table in the previous slide was generated in R with: ```{r echo=TRUE} # We have ten units N <- 10 # y0 is potential outcome to control y0 <- c(0, 0, 0, 1, 1, 3, 4, 5, 190, 200) # Each unit has its own treatment effect tau <- c(10, 30, 200, 90, 10, 20, 30, 40, 90, 20) # y1 is potential outcome to treatment y1 <- y0 + tau # Two blocks, a and b block <- c("a", "a", "a", "a", "a", "a", "b", "b", "b", "b") # Z is treatment assignment (Z instead of T in the code) Z <- c(0, 0, 0, 0, 1, 1, 0, 0, 1, 1) # Y is observed outcomes Y <- Z * y1 + (1 - Z) * y0 # The data dat <- data.frame(Z = Z, y0 = y0, y1 = y1, tau = tau, b = block, Y = Y) set.seed(12345) ``` ## Using DeclareDesign DeclareDesign represents research designs in a few steps shown below: ```{r dd1, echo=TRUE} # take just the potential outcomes under treatment and control from our # fake data small_dat <- dat[, c("y0", "y1")] # DeclareDesign first asks you to declare your population pop <- declare_population(small_dat) N <- nrow(small_dat) # 2 units assigned to treatment; default is simple random assignment with # probability 0.5 trt_assign <- declare_assignment(Z = conduct_ra(N = N, m = 2), legacy = FALSE) # observed Y is y1 if Z=1 and y0 if Z=0 pot_out <- declare_potential_outcomes(Y ~ Z * y1 + (1 - Z) * y0) # specify outcome and assignment variables reveal <- declare_reveal(Y, Z) # the basic research design object includes these four objects base_design <- pop + trt_assign + pot_out + reveal ``` ## Using DeclareDesign: make fake data DeclareDesign renames `y0` and `y1` by default to `Y_Z_0` and `Y_Z_1`: ```{r echo=TRUE} ## A simulation is one random assignment of treatment sim_dat1 <- draw_data(base_design) ## Simulated data (just the first 6 lines) head(sim_dat1) ``` ## Using DeclareDesign: define estimand and estimators No output here. Just define functions and estimators and one estimand. ```{r dd2, echo=TRUE} ## The estimand estimandATE <- declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) ## The first estimator is difference-in-means diff_means <- declare_estimator(Y ~ Z, inquiry = estimandATE, model = lm_robust, se_type = "classical", label = "Diff-Means/OLS" ) ``` ## Using DeclareDesign: define estimand and estimators ```{r dd2a, echo=TRUE} ## The second estimator is top-coded difference-in-means diff_means_topcoded_fn <- function(data) { data$rankY <- rank(data$Y) ## Code the maximum value of Y as the second to maximum value of Y data$newY <- with( data, ifelse(rankY == max(rankY), Y[rankY == (max(rankY) - 1)], Y) ) obj <- lm_robust(newY ~ Z, data = data, se_type = "classical") res <- tidy(obj) %>% filter(term == "Z") return(res) } diff_means_topcoded <- declare_estimator( handler = label_estimator(diff_means_topcoded_fn), inquiry = estimandATE, label = "Top-coded Diff Means" ) ``` ## Using DeclareDesign: define estimand and estimators Here we show how the DD estimators work using our simulated data. ```{r dd3, echo=TRUE} ## Demonstrate that the estimand works: estimandATE(sim_dat1) ## Demonstrate that the estimators estimate ## Estimator 1 (difference in means) diff_means(sim_dat1)[-c(1, 2, 10, 11)] ## Estimator 2 (top-coded difference in means) diff_means_topcoded(sim_dat1)[-c(1, 2, 10, 11)] ``` ## Then simulate with one randomization Recall the true ATE: ```{r trueATE, echo=TRUE} trueATE <- with(sim_dat1, mean(y1 - y0)) with(sim_dat1, mean(Y_Z_1 - Y_Z_0)) ``` In one experiment (one simulation of the data) here are the simple estimates: ```{r echo=TRUE} ## Two ways to calculate the difference of means estimator est_diff_means_1 <- with(sim_dat1, mean(Y[Z == 1]) - mean(Y[Z == 0])) est_diff_means_2 <- coef(lm_robust(Y ~ Z, data = sim_dat1, se = "classical" ))[["Z"]] c(est_diff_means_1, est_diff_means_2) ``` ## Then simulate with one randomization In one experiment (one simulation of the data) here are the estimates after top-coding: ```{r echo=TRUE} ## Two ways to calculate the topcoded difference of means estimator sim_dat1$rankY <- rank(sim_dat1$Y) sim_dat1$Y_tc <- with(sim_dat1, ifelse(rankY == max(rankY), Y[rankY == (max(rankY) - 1)], Y )) est_topcoded_1 <- with(sim_dat1, mean(Y_tc[Z == 1]) - mean(Y_tc[Z == 0])) est_topcoded_2 <- coef(lm_robust(Y_tc ~ Z, data = sim_dat1, se = "classical" ))[["Z"]] c(est_topcoded_1, est_topcoded_2) ``` ## Then simulate a different randomization and estimate the ATE with the same estimators Now calculate your estimate with the same estimators using a **different** randomization. Notice that the answers differ. The estimators are estimating the *same estimand* but now they have a different randomization to work with. ```{r echo=TRUE} # do another random assignment of the treatment in DeclareDesign # this produces a new simulated dataset with a different random assignment sim_dat2 <- draw_data(base_design) # the first estimator (difference in means) coef(lm_robust(Y ~ Z, data = sim_dat2, se = "classical"))[["Z"]] # the second estimator (top-coded difference in means) sim_dat2$rankY <- rank(sim_dat2$Y) sim_dat2$Y_tc <- with(sim_dat2, ifelse(rankY == max(rankY), Y[rankY == (max(rankY) - 1)], Y )) coef(lm_robust(Y_tc ~ Z, data = sim_dat2, se = "classical"))[["Z"]] ``` ## How do our estimators behave in general for this design? Our estimates vary across randomizations. Do our two estimators vary in the same ways? ```{r diagnose, echo=TRUE, cache=TRUE} ## Combine into one DeclareDesign design object ## This has the base design, estimand, then our two estimators design_plus_ests <- base_design + estimandATE + diff_means + diff_means_topcoded ## Run 100 simulations (reassignments of treatment) and ## apply the two estimators (diff_means and diff_means_topcoded) diagnosis1 <- diagnose_design(design_plus_ests, bootstrap_sims = 0, sims = 100 ) sims1 <- get_simulations(diagnosis1) head(sims1[, -c(1, 2, 3, 6, 9, 13)]) ``` ## How do our estimators behave in general for this design? Our estimates vary across randomizations. Do our two estimators vary in the same ways? How should we interpret this plot? ```{r sim_plot, out.width=".8\\textwidth"} sim_plot <- ggplot(sims1, aes(y = estimate, x = estimator, color = estimator)) + geom_boxplot() + geom_hline(yintercept = trueATE) + geom_point(aes(group = estimator)) + geom_text(aes(x = .5, y = trueATE), label = "True ATE", inherit.aes = FALSE) + theme(text = element_text(size = 20)) print(sim_plot) ``` ## Which estimator is closer to the truth? One way to choose among estimators is to choose the one that is **close to the truth** whenever we use it --- regardless of the specific randomization. An "unbiased" estimator is one for which **average of the estimates across repeated designs** is the same as the truth (or $E_R(\hat{\bar{\tau}})=\bar{\tau}$). An unbiased estimator has "no systematic error" but doesn't guarantee closeness to the truth. Another measure of closeness is **root mean squared error** (RMSE) which records squared distances between the truth and the individual estimates. Which estimator is better? (One is closer to the truth on average (RMSE) and is more precise. The other has no systematic error --- is unbiased.) ```{r} kableExtra::kable(reshape_diagnosis(diagnosis1, select = c("Estimator", "Bias", "RMSE", "SD Estimate", "Power"))[, -c(1, 2, 4, 5, 6)]) ``` ## Unbiased and biased estimators Summary: - We have a *choice* of both estimands and estimators - A good estimator performs well regardless of the particular randomization of a given design. And *performs well* can mean "unbiased" and/or "low mse" (or "consistent" --- which means increasingly close to the truth as the sample size increases). - We can learn about how a given estimator performs in a given study using simulation. # Block randomization ```{r setup_block_dat, echo=FALSE} library(randomizr) # We have ten units N_blocked <- 100 # Three blocks, a and b and c block <- c(rep("a", 10), rep("b", 30), rep("c", N_blocked - (10 + 30))) set.seed(12345) # Z is treatment assignment (Z instead of T in the code) Z_blocked <- block_ra(blocks = block, block_m = c(5, 5, 5)) dat_blocked <- data.frame(Z = Z_blocked, b = block) # y0 is potential outcome to control dat_blocked <- dat_blocked %>% group_by(b) %>% mutate( nb = n(), y0 = rgeom(nb, prob = .1 * (nb / N)), tau = sd(y0) * runif(nb, min = 0, max = max(y0)), y1 = y0 + tau, Y = Z * y1 + (1 - Z) * y0 ) ``` ## Block-randomized experiments are a collection of mini-experiments What is the **ATE** estimand in a block-randomized experiment? If we think of the unit-level ATE as: $(1/N) \sum_{i=1}^N y_{i,1} - y_{i,0}$ then we could re-express this equivalently using the ATE in block $j$ is $ATE_j$ as follows: \[ ATE = \frac{1}{J}\sum^J_{j=1} \sum^{N_j}_{i=1} \frac{y_{i,1} - y_{i,0}}{N_j} = \sum^J_{j=1} \frac{N_j}{N} ATE_j \] And it would be natural to *estimate* this quantity by plugging in what we can calculate: $\widehat{ATE} = \displaystyle\sum^J_{j=1} \frac{N_j}{N} \widehat{ATE}_j$ ## Block-randomized experiments are a collection of mini-experiments And we could *define* the standard error of the estimator by also just averaging the within-block standard errors (if our blocks are large enough): $SE(\widehat{ATE}) = \sqrt{\sum^J_{j=1} (\frac{N_{j}}{N})^2SE^2(\widehat{ATE}_j)}$ ## Estimating the ATE in block-randomized experiments One approach to estimation simply replaces $ATE_j$ with $\widehat{ATE}$ above: ```{r br1, echo=TRUE} with(dat_blocked, table(b, Z)) ``` We have 10 units in block `a`, 5 of which are assigned to treatment, and 30 units in block `b`, 5 of which are assignment to treatment. ## Estimating the ATE in block-randomized experiments Here we see that the true $ATE$ can be defined and calculated either using individual observations directly or as a block-size weighted average of the within-block true $ATE_j$. ```{r br2, echo=TRUE} datb <- dat_blocked %>% group_by(b) %>% summarize( nb = n(), pb = mean(Z), estateb = mean(Y[Z == 1]) - mean(Y[Z == 0]), ateb = mean(y1 - y0), .groups = "drop" ) datb ## Notice the True ate by block: with(dat_blocked, mean(y1 - y0)) ## This is another way to calculate the true ate with(datb, sum(ateb * (nb / sum(nb)))) ``` ## Estimating the ATE in block-randomized experiments One approach is to estimate the overall ATE using block-size weights, simply replacing $ATE_j$ with $\widehat{ATE}$ above: ```{r br3, echo=TRUE} ## Showing that difference_in_means uses the blocksize weight. e1 <- difference_in_means(Y ~ Z, blocks = b, data = dat_blocked) e2 <- with(datb, sum(estateb * (nb / sum(nb)))) c(coef(e1)[["Z"]], e2) ``` ## Estimating the ATE in block-randomized experiments Notice that this is **not** the same as either of the following: ```{r br4, echo=TRUE} ## Ignoring blocks e3 <- lm(Y ~ Z, data = dat_blocked) coef(e3)[["Z"]] ## With block fixed effects e4 <- lm(Y ~ Z + b, data = dat_blocked) coef(e4)[["Z"]] ``` How do they differ? (The first ignores the blocks. The second uses a different set of weights that are created by use of "fixed effects" or "indicator" or "dummy" variables.) ## Which estimator should we use? We now have three estimators each with a different estimate (imagining they all target the same estimand): ```{r echo=TRUE} c(coef(e1)[["Z"]], coef(e3)[["Z"]], coef(e4)[["Z"]]) ``` Which estimator should we use for this design? We can set up a DeclareDesign simulation to figure this out. ```{r blockdd0, cache=TRUE, echo=TRUE} ## declare a new base design that includes the block indicator b base_design_blocks <- # declare the population declare_population(dat_blocked[, c("b", "y0", "y1")]) + # tell DD that b indicates block and to assign 2 treated units in each block declare_assignment( Z = conduct_ra(N = N, block_m = c(5, 5, 5), blocks = b), Z_cond_prob = obtain_condition_probabilities(assignment = Z, blocks = b, block_m = c(5, 5, 5)) ) + # relationship of potential outcomes to observed outcome declare_potential_outcomes(Y ~ Z * y1 + (1 - Z) * y0) + # observed outcome and treatment assignment declare_reveal(Y, Z) ``` ## Which estimator should we use? ```{r blockdd1, echo=TRUE, cache=TRUE} # the estimand is the average treatment effect estimandATEb <- declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0)) # three different estimators est1 <- declare_estimator(Y ~ Z, inquiry = estimandATEb, model = lm_robust, label = "Ignores Blocks" ) est2 <- declare_estimator(Y ~ Z, inquiry = estimandATEb, model = difference_in_means, blocks = b, label = "DiM: Block-Size Weights" ) est3 <- declare_estimator(Y ~ Z, inquiry = estimandATEb, model = lm_robust, weights = (Z / Z_cond_prob) + ((1 - Z) / (Z_cond_prob)), label = "LM: Block Size Weights" ) ``` ## Which estimator should we use? ```{r blockdd1a, echo=TRUE, cache=TRUE} # two more estimators est4 <- declare_estimator(Y ~ Z, inquiry = estimandATEb, model = lm_robust, fixed_effects = ~b, label = "Precision Weights" ) est5 <- declare_estimator(Y ~ Z + b, inquiry = estimandATEb, model = lm_robust, label = "Precision Weights (LSDV)" ) ## new design object has the base design, the estimand, and five estimators design_blocks <- base_design_blocks + estimandATEb + est1 + est2 + est3 + est4 + est5 trueATE_blocked <- draw_estimand(design_blocks)$estimand ``` Then we will run 10,000 simulations (reassign treatment 10,000 times) and summarize the estimates produced by each of these five estimators. ## Which estimator should we use? ```{r futurelibs, echo=FALSE} ## This to enable parallel processing when doing the diagnoses. ## Notice that we have turned on "cache" for some code chunks so that they are not ## re-run each time we change the slides. library(future) library(future.apply) ``` ```{r diagnosis2, echo=FALSE, cache=TRUE} ## The next few lines use all of the cores on your computer to speed up the computation plan(multicore) set.seed(12345) diagnosis2 <- diagnose_design(design_blocks, bootstrap_sims = 0, sims = 10000) sims2 <- get_simulations(diagnosis2) plan(sequential) ``` How should we interpret this plot? ```{r sim_plot2, message=FALSE, warning=FALSE, out.width=".9\\textwidth"} sim_plot2 <- ggplot(sims2, aes(y = estimate, x = estimator, color = estimator)) + geom_boxplot() + geom_hline(yintercept = trueATE_blocked) + geom_point(aes(group = estimator)) + theme( text = element_text(size = 20), axis.text.x = element_text(angle = 45, hjust = 1), legend.position = "none", legend.title = element_blank() ) + xlab("") print(sim_plot2) ``` ## Which estimator is closer to the truth? Which estimator works better on this design and these data? ```{r blocktab} blocktab <- reshape_diagnosis(diagnosis2, select = c("Estimator", "Bias", "RMSE", "SD Estimate", "Power", "Coverage"))[, -c(1, 2, 4, 5, 6)] kableExtra::kable(blocktab, col.names = c("Estimator", "Bias", "RMSE", "SD Est", "Power", "Coverage")) ``` Notice that the coverage is not always at 95% in all cases. We used 10,000 simulations so simulation error is around $\pm 2 \sqrt{p(1-p)/10000}$ or, say, for coverage calculated as .93, a different simulation could have easily produced `r .93 -2*sqrt( .93 * (1-.93)/10000 )` or `r .93+2*sqrt( .93 * (1-.93)/10000 )` (or would rarely have produced coverage numbers outside that range just by chance). # Cluster randomization ## In cluster-randomized experiments, units are randomized as a group (cluster) to treatment {.allowframebreaks} - **Example 1:** an intervention is randomized across neighborhoods, so **all** households in a neighborhood will be assigned to the same treatment condition, but different neighborhoods will be assigned different treatment conditions. - **Example 2:** an intervention is randomized across people and each person is measured four times after treatment, so our data contain four rows per person. - **Not An Example 1:** Neighborhoods are chosen for the study. Within each neighborhood about half of the people are assigned to treatment and half to control. (What kind of study is this? It is not a cluster-randomized study.) - **Not an Example 2:** an intervention is randomized to some neighborhoods and not to others, the outcomes include measurements of neighborhood-level trust in government and total land area in the neighborhood devoted to gardens. (Sometimes a cluster randomized experiment can be turned into a simple randomized experiment. Or may contain more than one possible approach to analysis and interpretation.) How might the distribution of test statistics and estimators differ from an experiment where individual units (not clusters) are randomized? ## Estimating the ATE in cluster-randomized experiments Bias problems in cluster-randomized experiments: - When clusters are the same size, the usual difference-in-means estimator is unbiased. - But be careful when clusters have different numbers of units or you have very few clusters because then treatment effects may be correlated with cluster size. - When cluster size is related to potential outcomes, the usual difference-in-means estimator is biased. ## Estimating the SE for the ATE in cluster-randomized experiments {.allowframebreaks} - **Misleading statistical inferences:** The default SE will generally underestimate precision in such designs and thus produce tests with false positive rates that are too high (or equivalently confidence intervals coverage rates that are too low). - The "cluster robust standard errors" implemented in common software work well **when the number of clusters is large** (like more than 50 in some simulation studies). - The default cluster-appropriate standard errors in `lm_robust` (the `CR2` SEs) work better than the common approach in Stata (as of this writing). - The wild bootstrap helps control error rates but gives up statistical power much more than perhaps necessary in a cluster randomized study where direct randomization inference is possible. - When in doubt, one can produce $p$-values by direct simulation (direct randomization inference) to see if they agree with one of the cluster robust approaches. Overall, it is worth simulating to study the performance of your estimators, tests, and confidence intervals if you have any worries or doubts. ## An example of estimation ```{r makedatclus, echo=FALSE, results="hide"} ## see https://declaredesign.org/blog/bias-cluster-randomized-trials.html ## for more like this. N_clusters <- 10 # number of clusters n_indivs <- c(100, 10) # possible size of clusters thepop <- declare_population( clus_id = add_level( # set the number of clusters N = N_clusters, # 1/5 of clusters have 100 individuals, 4/5 of clusters have 10 individuals cl_size = rep(n_indivs, c(N / 5, N - N / 5)), cl_sizeF = factor(cl_size), # Each cluster has a different mean level (u) and different background variability (sd of u) effect = ifelse(cl_size == 100, .1, 1) ), indiv = add_level( N = cl_size, u = rnorm(N, mean = log(cl_size), sd = effect) ) ) theys <- declare_potential_outcomes(Y_Z_0 = u, Y_Z_1 = Y_Z_0 + effect) thetarget_indiv <- declare_inquiry(ATE_indiv = mean(Y_Z_1 - Y_Z_0)) ## Complete random assignment to clusters theassign <- declare_assignment(Z = conduct_ra(N = N, clusters = clus_id)) thereveal <- declare_reveal(Y, Z) ## 7 different estimators est1 <- declare_estimator(Y ~ Z, inquiry = "ATE_indiv", clusters = clus_id, model = lm_robust, label = "Y~Z, CR2 SE" ) est2 <- declare_estimator(Y ~ Z, inquiry = "ATE_indiv", model = lm_robust, label = "Y~Z, HC2 SE" ) est3 <- declare_estimator(Y ~ Z, inquiry = "ATE_indiv", model = lm_robust, se_type = "classical", label = "Y~Z, IID SE" ) est4 <- declare_estimator(Y ~ Z + cl_sizeF, inquiry = "ATE_indiv", clusters = clus_id, model = lm_robust, label = "Y~Z+clus_size_fixed_effects, CR2 SE" ) est5 <- declare_estimator(Y ~ Z, inquiry = "ATE_indiv", fixed_effects = ~cl_sizeF, clusters = clus_id, model = lm_robust, label = "Y~Z, fixed_effects=~clus_size_fixed_effects, CR2 SE " ) est6 <- declare_estimator(Y ~ Z, inquiry = "ATE_indiv", covariates = ~cl_size, clusters = clus_id, model = lm_lin, label = "Y~Z*I(clus_size-mean(clus_size)), CR2 SE" ) est7 <- declare_estimator(Y ~ Z, inquiry = "ATE_indiv", weight = cl_size, clusters = clus_id, model = lm_robust, label = "Y~Z, weight=clus_size, CR2 SE" ) ### Some experimental stuff here: ## remotes::install_github("markmfredrickson/RItools",ref="proj1-balT") ## est7tmp <- balanceTest(Z~Y+cluster(clus_id),data=dat1,report="all") ## ## est7fn <- function(data){ ## bt <- balanceTest(Z~Y+cluster(clus_id),data=data,report="all") ## resdat <- data.frame(estimate=bt$results[,"adj.mean diff",]) ## return(resdat) ## } ## est7fn <- function(data){ ## thelme <- lmer(Y~Z+(1|clus_id),data=data) ## cilme <- confint(thelme) ## lmecoef <- summary(thelme)$coefficients["Z",] ## resdat <- data.frame(estimate=lmecoef["Estimate"], ## std.error=lmecoef["Std. Error"], ## statistic=lmecoef["t value"], ## p.value=NA, ## conf.low=min(cilme["Z",]), ## conf.high=max(cilme["Z",])) ## return(resdat) ## } ## ## est7 <- declare_estimator(handler=tidy_estimator(est7fn),label="mlm: rand intercept") des <- thepop + theys + theassign + thereveal set.seed(12345) dat1 <- draw_data(des) head(dat1) table(dat1$clus_id) with(dat1, table(clus_id, Z)) dat1 %>% group_by(clus_id) %>% summarize(mean(Y_Z_1 - Y_Z_0)) ## g1 <- ggplot(data=dat1,aes(x=Y,group=clus_id,fill=clus_id,color=clus_id))+ ## geom_density() ## g1 est1(dat1) est2(dat1) est3(dat1) est4(dat1) est5(dat1) est6(dat1) est7(dat1) ``` Imagine we had data from 10 clusters with either 100 people (for 2 clusters) or 10 people per cluster (for 8 clusters). The total size of the data is `r nrow(dat1)`. ```{r} tmp <- dplyr::filter(dat1, clus_id %in% c("03", "01")) %>% group_by(clus_id) %>% sample_n(3) %>% arrange(clus_id, indiv) %>% select(clus_id, indiv, Y_Z_0, Y_Z_1, Z, Y) tmp ``` ## An example of estimation Which estimator should we use? Which test should we use? On what basis should we choose among these approaches? ```{r clusest, echo=TRUE} lmc1 <- lm_robust(Y ~ Z, data = dat1) lmc2 <- lm_robust(Y ~ Z, clusters = clus_id, data = dat1) lmc3 <- lm_robust(Y ~ Z + cl_sizeF, clusters = clus_id, data = dat1) tidy(lmc1)[2, ] tidy(lmc2)[2, ] tidy(lmc3)[2, ] ``` ## Use simulation to assess estimators and tests If you look at the code for the slides you will see that we simulate the design 5000 times, each time calculating an estimate and confidence interval for different estimators of the ATE. What should we learn from this table? (Coverage? `sd_estimate` versus `mean_se`). ```{r simdesign, warning=FALSE, results="hide"} des_plus_est <- des + thetarget_indiv + est1 + est2 + est3 + est4 + est5 + est6 + est7 des_plus_est ``` ```{r diag_clust, cache=TRUE, warning=FALSE, message=FALSE} set.seed(12346) plan(multicore) diag_clus <- diagnose_design(des_plus_est, bootstrap_sims = 0, sims = 1000) sim_clus <- get_simulations(diag_clus) # simulate_design(des_plus_est,sims=1000) trueclusATE <- thetarget_indiv(dat1)[["estimand"]] plan(sequential) ``` ```{r cluster_sim_res} ## Notice that the lin estimator is great but sometimes cannot produce an answer res_clus <- sim_clus %>% na.omit() %>% group_by(estimator) %>% summarize( bias = mean(estimate - estimand), rmse = sqrt(mean((estimate - estimand)^2)), power = mean(p.value < .05), coverage = mean(estimand <= conf.high & estimand >= conf.low), # mean_estimate = mean(estimate), sd_estimate = sd(estimate), mean_se = mean(std.error) ) res_clus[2, "estimator"] <- "Y~Z, cl_size fe, CR2" res_clus[6, "estimator"] <- "Y~Z*I(cl_size-mean(cl_size)), CR2" res_clus[7, "estimator"] <- "Y~Z+cl_sizeF, CR2" res_clus$estimator <- gsub(" SE", "", res_clus$estimator) ``` ```{r showresclus1} kableExtra::kable(res_clus[, c(1, 5:7)], digits = 2, booktabs = TRUE, linesep = "", caption = "Estimator and Test Performance in 5000 simulations of the cluster randomized design for different estimators and confidence intervals") ``` ## Use simulation to assess estimators and tests What should we learn from this table? (Bias? Closeness to truth?) ```{r showresclus2} kableExtra::kable(res_clus[, c(1:3)], digits = 3, booktabs = TRUE, linesep = "", caption = "Estimator and Test Performance in 5000 simulations of the cluster randomized design for different estimators and confidence intervals") ``` ## Use simulation to assess estimators and tests How should we interpret this plot? ```{r sim_plot_clus, warning=FALSE, out.width=".95\\textwidth"} sim_plot3 <- ggplot(sim_clus, aes(y = estimate, x = estimator, color = estimator)) + geom_boxplot() + coord_flip() + geom_hline(yintercept = trueclusATE) + geom_point(aes(group = estimator)) + theme( text = element_text(size = 20), legend.position = "none", legend.title = element_blank() ) + ylab("") print(sim_plot3) ``` ## Summary of estimation and testing in cluster-randomized trials - Cluster randomized trials pose special problems for standard approaches to estimation and testing. - If randomization is at the cluster level, then uncertainty arises from the cluster level randomization. - If we have enough clusters, then one of the "cluster robust" standard errors can help us produce confidence intervals with correct coverage. **Cluster robust standard errors require many clusters**. - If cluster size (or characteristic) is related to effect size, then we can have bias (and we need to adjust somehow). # Binary outcomes ## Binary outcomes: Set up our data for simulation in DeclareDesign ```{r setupbin, echo=TRUE} # population size N <- 20 # declare the population thepop_bin <- declare_population( N = N, x1 = draw_binary(prob = .5, N = N), x2 = rnorm(N) ) # declare the potential outcomes thepo_bin <- declare_potential_outcomes(Y ~ rbinom( n = N, size = 1, prob = 0.5 + 0.05 * Z + x1 * .05 )) # two possible targets: difference in means or difference in log-odds thetarget_ate <- declare_inquiry(ate = mean(Y_Z_1 - Y_Z_0)) thetarget_logodds <- declare_inquiry( logodds = log(mean(Y_Z_1) / (1 - mean(Y_Z_1))) - log(mean(Y_Z_0) / (1 - mean(Y_Z_0))) ) ``` ## Binary outcomes: Set up our data for simulation in DeclareDesign ```{r setupbin2, echo=TRUE} # declare how treatment is assigned # m units are assigned to levels of treatment Z theassign_bin <- declare_assignment(Z = conduct_ra(N = N, m = floor(N / 3))) # declare what outcome values are revealed for possible values of Z thereveal_bin <- declare_reveal(Y, Z) # pull this all together: population, potential outcomes, assignment, # outcome values connected to Z des_bin <- thepop_bin + thepo_bin + theassign_bin + thereveal_bin # then make one draw (randomize treatment once) set.seed(12345) dat2 <- draw_data(des_bin) ``` ## Binary outcomes: Estimands I The simulated data: `Y_Z_1`, `Y_Z_0` are potential outcomes, `Y` is observed, `x1`, `x2` are covariates, `Z` is treatment assignment. Here $N$=`r nrow(dat2)`. ```{r dat2echo, echo=TRUE} ## Look at the first 6 observations only: head(dat2[, -7]) ``` ## Binary outcomes: Estimands II How would we interpret the following true quantities or estimands? (`Y_Z_1`, `Y_Z_0` are potential outcomes, `Y` is observed, `x1`, `x2` are covariates, `Z` is treatment assignment. Here $N$=`r nrow(dat2)`. ```{r bin1, echo=TRUE} ate_bin <- with(dat2, mean(Y_Z_1 - Y_Z_0)) bary1 <- mean(dat2$Y_Z_1) bary0 <- mean(dat2$Y_Z_0) diff_log_odds_bin <- with( dat2, log(bary1 / (1 - bary1)) - log(bary0 / (1 - bary0)) ) c( bary1 = bary1, bary0 = bary0, true_ate = ate_bin, true_diff_log_odds = diff_log_odds_bin ) ``` ## Binary outcomes: Estimands III Do you want to estimate the difference in log-odds? \begin{equation} \delta = \log \frac{\bar{y}_{1}}{1-\bar{y}_{1}} - \log \frac{ \bar{y}_0}{1- \bar{y}_0} \end{equation} Or the difference in proportions? \begin{equation} \bar{\tau} = \bar{y}_{1} - \bar{y}_0 \end{equation} Recall that $\bar{y}_1$ is the *proportion* of $y_{1}=1$ in the data. @freedman2008randomization shows us that the logit coefficient estimator is a biased estimator of the difference in log-odds estimand. He also shows an unbiased estimator of that estimand. We know that the difference of proportions in the sample should be an unbiased estimator of the difference of proportions. ## An example of estimation I How should we interpret the following estimates? (What does the difference of means estimator require in terms of assumptions? What does the logistic regression estimator require in terms of assumptions?) ```{r estexample, echo=TRUE} lmbin1 <- lm_robust(Y ~ Z, data = dat2) glmbin1 <- glm(Y ~ Z, data = dat2, family = binomial(link = "logit")) tidy(lmbin1)[2, ] tidy(glmbin1)[2, ] ``` ## An example of estimation II What about with covariates? Why use covariates? ```{r estexample2, echo=TRUE} lmbin2 <- lm_robust(Y ~ Z + x1, data = dat2) glmbin2 <- glm(Y ~ Z + x1, data = dat2, family = binomial(link = "logit")) tidy(lmbin2)[2, ] tidy(glmbin2)[2, ] ``` ## An example of estimation III Let's compare our estimates ```{r estexample3, echo=TRUE} c( dim = coef(lmbin1)[["Z"]], dim_x1 = coef(lmbin2)[["Z"]], glm = coef(glmbin1)[["Z"]], glm_x1 = coef(glmbin2)[["Z"]] ) ``` ## An example of estimation: The Freedman plugin estimators I No covariate: ```{r pluginest, echo=TRUE } freedman_plugin_estfn1 <- function(data) { glmbin <- glm(Y ~ Z, data = dat2, family = binomial(link = "logit")) preddat <- data.frame(Z = rep(c(0, 1), nrow(dat2))) preddat$yhat <- predict(glmbin, newdata = preddat, type = "response") bary1 <- mean(preddat$yhat[preddat$Z == 1]) bary0 <- mean(preddat$yhat[preddat$Z == 0]) diff_log_odds <- log(bary1 / (1 - bary1)) - log(bary0 / (1 - bary0)) return(data.frame(estimate = diff_log_odds)) } ``` ## An example of estimation: The Freedman plugin estimators II With covariate: ```{r pluginest2, echo=TRUE } freedman_plugin_estfn2 <- function(data) { N <- nrow(data) glmbin <- glm(Y ~ Z + x1, data = data, family = binomial(link = "logit")) preddat <- data.frame(Z = rep(c(0, 1), each = N)) preddat$x1 <- rep(data$x1, 2) preddat$yhat <- predict(glmbin, newdata = preddat, type = "response") bary1 <- mean(preddat$yhat[preddat$Z == 1]) bary0 <- mean(preddat$yhat[preddat$Z == 0]) diff_log_odds <- log(bary1 / (1 - bary1)) - log(bary0 / (1 - bary0)) return(data.frame(estimate = diff_log_odds)) } ``` Let's compare our estimates from the six different estimators ```{r echo=FALSE} c( dim = coef(lmbin1)[["Z"]], dim_x1 = coef(lmbin2)[["Z"]], glm = coef(glmbin1)[["Z"]], glm_x1 = coef(glmbin2)[["Z"]], freedman = freedman_plugin_estfn1(dat2)[["estimate"]], freeman_x1 = freedman_plugin_estfn2(dat2)[["estimate"]] ) ``` ```{r tmleapproach, eval=FALSE} ## Here is another approach to using the plugin estimator but allowing for standard errors, etc.. ## Doesn't require a handwritten function like the ones used above. library(tmle) Y <- as.matrix(dat2$Y, ncol = 1) A <- as.matrix(dat2$Z, ncol = 1) W <- as.matrix(dat2[, c("x1", "x2")], ncol = 1) colnames(W) <- paste("W", 1:2, sep = "") tmle1 <- tmle( Y = Y, A = A, W = W, family = "binomial", Qform = Y ~ A, gform = A ~ 1, cvQinit = FALSE, Q.SL.library = "SL.glm", g.SL.library = "SL.glm", g.Delta.SL.library = "SL.glm" ) tmle1$estimates$ATE tmle1$estimates$OR tmle2 <- tmle( Y = Y, A = A, W = W, family = "binomial", Qform = Y ~ A + W1, gform = A ~ 1, cvQinit = FALSE, # V=0, Q.SL.library = "SL.glm", g.SL.library = "SL.glm", g.Delta.SL.library = "SL.glm" ) tmle2$estimates$ATE tmle2$estimates$OR ``` ## An example of using DeclareDesign to assess our estimators I ```{r ddbinsetup, echo=TRUE} # declare 4 estimators for DD # first estimator: linear regression with ATE as target estb1 <- declare_estimator(Y ~ Z, model = lm_robust, label = "lm1:Z", inquiry = thetarget_ate ) # second estimator: linear regression with covariate, with ATE as target estb2 <- declare_estimator(Y ~ Z + x1, model = lm_robust, label = "lm1:Z,x1", inquiry = thetarget_ate ) # third estimator: logistic regression, with log odds as target estb3 <- declare_estimator(Y ~ Z, model = glm, family = binomial(link = "logit"), label = "glm1:Z", inquiry = thetarget_logodds ) # fourth estimtor: logistic regression with covariate, with log odds as target estb4 <- declare_estimator(Y ~ Z + x1, model = glm, family = binomial(link = "logit"), label = "glm1:Z,x1", inquiry = thetarget_logodds ) ``` ## An example of using DeclareDesign to assess our estimators II ```{r ddbinsetup2, echo=TRUE} # Pull together: des_bin is population, potential outcomes, assignment, # outcome values connected to Z. We add the two targets and four estimators. des_bin_plus_est <- des_bin + thetarget_ate + thetarget_logodds + estb1 + estb2 + estb3 + estb4 ``` ```{r diagnosis_bin, echo=FALSE, cache=TRUE, warning=FALSE, message=FALSE} ## The next few lines use all of the cores on your computer to speed up the computation library(future) library(future.apply) # plan(multiprocess) set.seed(12345) diagnosis_bin <- diagnose_design(des_bin_plus_est, bootstrap_sims = 0, sims = 1000) sims_bin <- get_simulations(diagnosis_bin) trueATE_bin <- thetarget_ate(dat2)[["estimand"]] truelo_bin <- thetarget_logodds(dat2)[["estimand"]] # plan(sequential) ``` ## Using simulation to assess our estimators How should we interpret this plot? (Differences in scales make it difficult.) ```{r sim_plot_bin, out.width=".95\\textwidth"} estimand_dat <- sims_bin %>% group_by(inquiry) %>% summarize(meanestimand = mean(estimand)) sim_plot_bin <- ggplot(sims_bin, aes(y = estimate, x = estimator, color = estimator)) + geom_boxplot() + geom_point() + facet_wrap(~inquiry, scales = "free") + geom_hline(data = estimand_dat, aes(yintercept = meanestimand)) + theme( text = element_text(size = 20), legend.position = "none", legend.title = element_blank() ) print(sim_plot_bin) ``` ## Which estimator is closer to the truth? Which estimator works better on this design and these data? ```{r bin_sim_res} ## Notice that the lin estimator is great but sometimes cannot produce an answer res_bin <- sims_bin %>% group_by(estimator, inquiry) %>% summarize( bias = mean(estimate - estimand), rmse = sqrt(mean((estimate - estimand)^2)), power = mean(p.value < .05), coverage = mean(estimand <= conf.high & estimand >= conf.low, na.rm = TRUE), # mean_estimate = mean(estimate), sd_est = sd(estimate), mean_se = mean(std.error) ) names(res_bin)[1:2] <- c("est", "estimand") ``` ```{r showresbin1} kableExtra::kable(res_bin, digits = 3, booktabs = TRUE, linesep = "", caption = "Estimator and Test Performance in 5000 simulations of the different estimators and confidence intervals for a binary outcome and completely randomized design.") ``` # Other topics in estimation ## Covariance adjustment: Estimands In general, simply "controlling for" produces a biased estimator of the ATE **or** ITT estimand. See for example @lin_agnostic_2013 and @freedman2008rae. @lin_agnostic_2013 shows how to reduce this bias and, importantly, that this bias tends to be small as the sample size increases. # Conclusion ## Final thoughts on basics of estimation - Counterfactual causal estimands are unobserved functions of potential outcomes. - Estimators are recipes or computational formulas that use observed data to learn about an estimand. - Good estimators produce estimates that are close to the true estimand - (Connecting estimation with testing) Standard errors of estimators allow us to calculate confidence intervals and $p$-values. Certain estimators have larger or smaller (or more or less correct) standard errors. - You can assess the utility of a chosen estimator for a chosen estimand by simulation. # Causal effects that differ by groups or covariates ## Effects that differ by groups I If our theory suggests that effects should differ by group, how can we assess evidence for or against such claims? - We can **design** for an assessment of this theory by creating a block-randomized study --- with blocked defined by the theoretically relevant groups. - We can **plan** for such an assessment by (1) **pre-registering specific subgroup analyses** (whether or not we block on that group in the design phase) and (2) making sure to measure group membership during baseline data collection pre-treatment ## Effects that differ by groups II - If we have not planned ahead, subgroup-specific analyses can be useful as explorations but should not be understood as confirmatory: they can too easily create problems of testing too many hypotheses thus inflated false positive rates. - We **should not use groups formed by treatment**. (This is either "mediation analysis" or "conditioning on post-treatment variables" and deserves its own module). # Causal effects when we do not control the dose ## Defining causal effects I Imagine a door-to-door communication experiment where some houses are randomly assigned to receive a visit. Note that we now use $Z$ and $d$ instead of $T$. - $Z_i$ is random assignment to a visit ($Z_i=1$) or not ($Z_i=0$). - $d_{i,Z_i=1}=1$ means that person $i$ would open the door to have a conversation when assigned a visit. - $d_{i,Z_i=1}=0$ means that person $i$ would not open the door to have a conversation when assigned a visit. - Opening the door is an outcome of the treatment. \begin{center} \begin{tikzcd}[ampersand replacement=\&] Z \arrow[from=1-1,to=1-2, "\ne 0"] \arrow[from=1-1, to=1-4, bend left, "\text{0 (exclusion)}"] \& d \arrow[from=1-2,to=1-4] \& \& y \\ (x_1 \ldots x_p) \arrow[from=2-1,to=1-1, "\text{0 (as if randomized)}"] \arrow[from=2-1,to=1-2] \arrow[from=2-1,to=1-4] \end{tikzcd} \end{center} ## Defining causal effects II - $y_{i,Z_i = 1, d_{i,Z_i=1}=1}$ is the potential outcome for people who were assigned a visit and who opened the door. ("Compliers" or "Always-takers") - $y_{i,1, d_{i,Z_i=1}=0}$ is the potential outcome for people who were assigned a visit and who did not open the door. ("Never-takers" or "Defiers") - $y_{i,0, d_{i,0}=1}$ is the potential outcome for people who were not assigned a visit and who opened the door. ("Defiers" or "Always-takers") - $y_{i,0, d_{i,0}=0}$ is the potential outcome for people who were not assigned a visit and who would not have opened the door. ("Compliers" or "Never-takers") ## Defining causal effects III We could also write $y_{i,Z_i = 0, d_{i,Z_i=1}=1}$ for people who were not assigned a visit but who would have opened the door had they been assigned a visit etc. In this case we can simplify our potential outcomes: - $y_{i,0, d_{i,1}=1} = y_{i,0, d_{i,1}=0} = y_{i,0, d_{i,0}=0}$ because your outcome is the same regardless of how you don't open the door. ## Defining causal effects IV We can simplify the ways in which people get a dose of the treatment like so (where $d$ is lower case reflecting the idea that whether you open the door when visited or not is a fixed attribute like a potential outcome). - $Y$ : outcome ($y_{i,Z}$ or $y_{i,Z_i=1}$ for potential outcome to treatment for person $i$, fixed) - $X$ : covariate/baseline variable - $Z$ : treatment assignment ($Z_i=1$ if assigned to a visit, $Z_i=0$ if not assigned to a visit) - $D$ : treatment received ($D_i=1$ if answered phone, $D_i=0$ if person $i$ did not answer the door) (using $D$ here because $D_i = d_{i,1} Z_{i} + d_{i,0} (1-Z_i)$) ## Defining causal effects V We have two causal effects of $Z$: $Z \rightarrow Y$ ($\delta$, ITT, ITT$_Y$), and $Z \rightarrow D$ (GG call this ITT$_D$). And different types of people can react differently to the attempt to move the dose with the instrument. \centering \begin{tabular}{llcc} & & \multicolumn{2}{c}{$Z=1$} \\ & & $D=0$ & $D=1$ \\ \midrule \multirow{2}{*}{$Z=0$} & $D=0$ & Never taker & Complier \\ & $D=1$ & Defier & Always taker \\ \bottomrule \end{tabular} ## Defining causal effects VI The $ITT=ITT_Y=\delta= \bar{y}_{Z=1} - \bar{y}_{Z=0}$. \medskip But, in this design, $\bar{y}_{Z=1}=\bar{y}_{1}$ is split into pieces: the outcome of those who answered the door (Compliers and Always-takers and Defiers). Write $p_C$ for the proportion of compliers in the study. \begin{equation} \bar{y}_{1}=(\bar{y}_{1}|C)p_C + (\bar{y}_{1}|A)p_A + (\bar{y}_1|N)p_N + (\bar{y}_1|D)p_D. \end{equation} And $\bar{y}_{0}$ is also split into pieces: \begin{equation} \bar{y}_{0}=(\bar{y}_{0}|C)p_C + (\bar{y}_{1}|A)p_A + (\bar{y}_{0}|N)p_N + (\bar{y}_0|D)p_D. \end{equation} ## Defining causal effects VII So, the ITT itself is a combination of the effects of $Z$ on $Y$ within these different groups (imagine substituting in and then re-arranging so that we have a set of ITTs, one for each type of subject). But, we can still estimate it because we have unbiased estimators of $\bar{y}_1$ and $\bar{y}_0$ within each type. ## Learning about the ITT I First, let's learn about the effect of the policy itself. To write down the ITT, we do not need to consider all of the types above. We have no defiers ($p_D=0$) and we know the ITT for both Always-takers and Never-takers is 0. \begin{equation} \bar{y}_{1}=(\bar{y}_{1}|C)p_C + (\bar{y}_{1}|A)p_A + (\bar{y}_1|N)p_N \end{equation} \begin{equation} \bar{y}_{0}=(\bar{y}_{0}|C)p_C + (\bar{y}_{0}|A)p_A + (\bar{y}_{0}|N)p_N \end{equation} ## Learning about the ITT II First, let's learn about the effect of the policy itself. To write down the ITT, we do not need to consider all of the types above. We have no defiers ($p_D=0$) and we know the ITT for both Always-takers and Never-takers is 0. \begin{align} ITT = & \bar{y}_{1} - \bar{y}_{0} \\ = & ( (\bar{y}_{1}|C)p_C + (\bar{y}_{1}|A)p_A + (\bar{y}_1|N)p_N ) - \\ & ( (\bar{y}_{0}|C)p_C + (\bar{y}_{0}|A)p_A + (\bar{y}_{0}|N)p_N ) \\ \intertext{collecting each type together --- to have an ITT for each type} = & ( (\bar{y}_{1}|C)p_C - (\bar{y}_{0}|C)p_C ) + ( (\bar{y}_{1}|A)p_A - (\bar{y}_{1}|A)p_A ) + \\ & ( (\bar{y}_1|N)p_N - (\bar{y}_{0}|N)p_N ) \\ = & \left( (\bar{y}_{1}|C) - (\bar{y}_{0}|C) \right)p_C + \\ & \left( (\bar{y}_{1}|A)- (\bar{y}_{0}|A) \right)p_A + \left( (\bar{y}_1|N) - (\bar{y}_{0}|N) \right)p_N \end{align} ## Learning about the ITT III \begin{align} ITT = & \bar{y}_{1} - \bar{y}_{0} \\ = & ( (\bar{y}_{1}|C)p_C + (\bar{y}_{1}|A)p_A + (\bar{y}_1|N)p_N ) - \\ & ( (\bar{y}_{0}|C)p_C + (\bar{y}_{0}|A)p_A + (\bar{y}_{0}|N)p_N ) \\ = & ( (\bar{y}_{1}|C)p_C - (\bar{y}_{0}|C)p_C ) + ( (\bar{y}_{1}|A)p_A - (\bar{y}_{1}|A)p_A ) + \\ & ( (\bar{y}_1|N)p_N - (\bar{y}_{0}|N)p_N ) \\ = & ( (\bar{y}_{1}|C) - (\bar{y}_{0}|C))p_C + ( (\bar{y}_{1}|A)- (\bar{y}_{0}|A))p_A + \\ & ( (\bar{y}_1|N) - (\bar{y}_{0}|N) )p_N \end{align} ## Learning about the ITT IV And, if the effect of the dose can only occur for those who open the door, and you can only open the door when assigned to do so then: \begin{equation} ( (\bar{y}_{1}|A)- (\bar{y}_{0}|A))p_A = 0 \text{ and } ( (\bar{y}_1|N) - (\bar{y}_{0}|N) )p_N = 0 \end{equation} And \begin{equation} ITT = ( (\bar{y}_{1}|C) - (\bar{y}_{0}|C))p_C = ( CACE ) p_C. \end{equation} ## The complier average causal effect I We would also like to learn about the causal effect of answering the door and having the conversation, the theoretically interesting effect. But this comparison is confounded by $x$: a simple $\bar{Y}|D=1 - \bar{Y}|D=0$ comparison tells us about differences in the outcome due to $x$ in addition to the difference caused by $D$. (Numbers below from some simulated data) \begin{center} \begin{tikzcd}[ampersand replacement=\&] Z \arrow[from=1-1,to=1-2] \arrow[from=1-1, to=1-4, bend left, "\text{0 (exclusion)}"] \& D \arrow[from=1-2,to=1-4] \& \& y \\ (x_1 \ldots x_p) \arrow[from=2-1,to=1-1, "\text{-.006 (as if randomized)}"] \arrow[from=2-1,to=1-2, ".06"] \arrow[from=2-1,to=1-4, ".48"] \end{tikzcd} \end{center} ## The complier average causal effect II ```{r cors, eval=FALSE, echo=TRUE, results="hide"} with(dat, cor(Y, x)) ## can be any number with(dat, cor(d, x)) ## can be any number with(dat, cor(Z, x)) ## should be near 0 ``` But we just saw that, in this design, and with these assumptions (including a SUTVA assumption) that $ITT = ( (\bar{y}_{1}|C) - (\bar{y}_{0}|C))p_C = (CACE) p_C$, so we can define $CACE=ITT/p_C$. ## How to calculate the ITT and CACE/LATE I ```{r simivdesign, echo=FALSE} prob_comply <- .8 tau <- .5 the_pop <- declare_population( N = 100, X = sample(1:4, N, replace = TRUE), u = rnorm(N), type = sample(c("Always-Taker", "Never-Taker", "Complier", "Defier"), N, replace = TRUE, prob = c(.1, 1 - unique(prob_comply), unique(prob_comply), 0) ) ) ## The unobserved potential outcomes, Y(Z=1) and Y(Z=0) relate to the observed outcome, Y, via treatment assignment and a const ant additive effect of tau. ## D refers to getting a dose of feedback d_po <- declare_potential_outcomes( D ~ case_when( Z == 0 & type %in% c("Never-Taker", "Complier") ~ 0, Z == 1 & type %in% c("Never-Taker", "Defier") ~ 0, Z == 0 & type %in% c("Always-Taker", "Defier") ~ 1, Z == 1 & type %in% c("Always-Taker", "Complier") ~ 1 ) ) y_po <- declare_potential_outcomes( Y ~ tau * sd(u) * D + u, assignment_variables = c("D", "Z") ) ## Treatment assignment for any given city is a simple fixed proportion. It should be complete or urn-drawing assignment, not simple or coin-flipping assignment. ## theassign <- declare_assignment(m=m) the_assign <- declare_assignment(Z = complete_ra(N)) ## declare_reveal is basically the same as declare_potential_outcomes. I think they have this here to deal with situations of missing data or non-compliance. # thereveal <- declare_reveal(Y, Z) d_reveal <- declare_reveal(D, assignment_variable = "Z") y_reveal <- declare_reveal(Y, assignment_variables = c("D", "Z")) base_design <- the_pop + the_assign + d_po + y_po + d_reveal + y_reveal dat0 <- draw_data(base_design) estimand_cace <- declare_inquiry( CACE = mean((Y_D_1_Z_1 + Y_D_1_Z_0) / 2 - (Y_D_0_Z_1 + Y_D_0_Z_0) / 2), subset = type == "Complier" ) estimand_ate <- declare_inquiry(ATE = mean((Y_D_1_Z_1 + Y_D_1_Z_0) / 2 - (Y_D_0_Z_1 + Y_D_0_Z_0) / 2)) ``` Some example data (where we know all potential outcomes): ```{r showdat0} tempdat <- dat0[1:2, -1] ## Change names to make the table below easier to read names(tempdat)[5] <- "pZ" names(tempdat) <- gsub("_", "", names(tempdat)) kableExtra::kable(tempdat, digits = 2) ``` ## How to calculate the ITT and CACE/LATE II The ITT and CACE (the parts) ```{r echo=TRUE} itt_y <- difference_in_means(Y ~ Z, data = dat0) itt_y itt_d <- difference_in_means(D ~ Z, data = dat0) itt_d ``` ## How to calculate the ITT and CACE/LATE III All together:^[works when $Z \rightarrow D$ is not weak see @imbens2005robust for a cautionary tale] ```{r echo=TRUE} cace_est <- iv_robust(Y ~ D | Z, data = dat0) cace_est ## Notice same as below: coef(itt_y)[["Z"]] / coef(itt_d)[["Z"]] ``` ## Summary of Encouragement/Complier/Dose oriented designs: - Analyze as you randomized, even when you don't control the dose - The danger of per-protocol analysis. ## References