--- title: "Statistical Power" author: "Fill In Your Name" date: '`r format(Sys.time(), "%d %B, %Y")`' header-includes: | \setbeamertemplate{footline}{\begin{beamercolorbox}{section in head/foot} \includegraphics[height=.5cm]{../Images/egap-logo.png} \hfill \insertframenumber/\inserttotalframenumber \end{beamercolorbox}} output: beamer_presentation: keep_tex: yes toc: yes revealjs::revealjs_presentation: center: no highlight: pygments reveal_options: chalkboard: theme: whiteboard toggleNotesButton: no previewLinks: yes slideNumber: yes reveal_plugins: - notes - search - chalkboard self_contained: no smart: no theme: default transition: fade --- ```{r setup, include=FALSE, echo=FALSE} source("rmd_setup.R") library(DesignLibrary) ``` # What is power? ## What is power? - We want to separate signal from noise. - Power = probability of rejecting null hypothesis, given true effect $\ne$ 0. - In other words, it is the ability to detect an effect given that it exists. - Formally: (1 - Type II) error rate. - Thus, power $\in$ (0, 1). - Standard thresholds: 0.8 or 0.9. ## Starting point for power analysis - Power analysis is something we do *before* we run a study. - Helps you figure out the sample you need to detect a given effect size. - Or helps you figure out a minimal detectable difference given a set sample size. - May help you decide whether to run a study. - It is hard to learn from an under-powered null finding. - Was there an effect, but we were unable to detect it? or was there no effect? We can't say. ## Power - Say there truly is a treatment effect and you run your experiment many times. How often will you get a statistically significant result? - Some guesswork to answer this question. - How big is your treatment effect? - How many units are treated, measured? - How much noise is there in the measurement of your outcome? ## Approaches to power calculation - Analytical calculations of power - Simulation ## Power calculation tools - Interactive - [EGAP Power Calculator](https://egap.shinyapps.io/power-app/) - [rpsychologist](https://rpsychologist.com/d3/NHST/) - R Packages - [pwr](https://cran.r-project.org/web/packages/pwr/index.html) - [DeclareDesign](https://cran.r-project.org/web/packages/DeclareDesign/index.html), see also # Analytical calculations of power ## Analytical calculations of power - Formula: \begin{align*} \text{Power} &= \Phi\left(\frac{|\tau| \sqrt{N}}{2\sigma}- \Phi^{-1}(1- \frac{\alpha}{2})\right) \end{align*} - Components: - $\phi$: standard normal CDF is monotonically increasing - $\tau$: the effect size - $N$: the sample size - $\sigma$: the standard deviation of the outcome - $\alpha$: the significance level (typically 0.05) ## Example: Analytical calculations of power ```{r pwrsimp, echo=TRUE, include=TRUE} # Power for a study with 80 obserations and effect # size of 0.25 library(pwr) pwr.t.test(n = 40, d = 0.25, sig.level = 0.05, power = NULL, type = c("two.sample", "one.sample", "paired")) ``` ## Limitations to analytical power calculations - Only derived for some test statistics (differences of means) - Makes specific assumptions about the data-generating process - Incompatible with more complex designs # Simulation-based power calculation ## Simulation-based power calculation - Create dataset and simulate research design. - Assumptions are necessary for simulation studies, but you make your own. - For the DeclareDesign approach, see ## Steps - Define the sample and the potential outcomes function. - Define the treatment assignment procedure. - Create data. - Assign treatment, then estimate the effect. - Do this many times. ## Examples - Complete randomization - With covariates - With cluster randomization ## Example: Simulation-based power for complete randomization ```{r powersim, echo=TRUE, include=TRUE} # install.packages("randomizr") library(randomizr) library(estimatr) ## Y0 is fixed in most field experiments. ## So we only generate it once: make_Y0 <- function(N){ rnorm(n = N) } repeat_experiment_and_test <- function(N, Y0, tau){ Y1 <- Y0 + tau Z <- complete_ra(N = N) Yobs <- Z * Y1 + (1 - Z) * Y0 estimator <- lm_robust(Yobs ~ Z) pval <- estimator$p.value[2] return(pval) } ``` ## Example: Simulation-based power for complete randomization ```{r powersim2, echo=TRUE, include=TRUE} power_sim <- function(N,tau,sims){ Y0 <- make_Y0(N) pvals <- replicate(n=sims, repeat_experiment_and_test(N=N,Y0=Y0,tau=tau)) pow <- sum(pvals < .05) / sims return(pow) } set.seed(12345) power_sim(N=80,tau=.25,sims=100) power_sim(N=80,tau=.25,sims=100) ``` ## Example: Using DeclareDesign {.allowframebreaks} ```{r ddversion, echo=TRUE, message=FALSE, warning=FALSE} library(DeclareDesign) library(tidyverse) P0 <- declare_population(N,u0=rnorm(N)) # declare Y(Z=1) and Y(Z=0) O0 <- declare_potential_outcomes(Y_Z_0 = 5 + u0, Y_Z_1 = Y_Z_0 + tau) # design is to assign m units to treatment A0 <- declare_assignment(Z=conduct_ra(N=N, m=round(N/2))) # estimand is the average difference between Y(Z=1) and Y(Z=0) estimand_ate <- declare_inquiry(ATE=mean(Y_Z_1 - Y_Z_0)) R0 <- declare_reveal(Y,Z) design0_base <- P0 + A0 + O0 + R0 ## For example: design0_N100_tau25 <- redesign(design0_base,N=100,tau=.25) dat0_N100_tau25 <- draw_data(design0_N100_tau25) head(dat0_N100_tau25) with(dat0_N100_tau25,mean(Y_Z_1 - Y_Z_0)) # true ATE with(dat0_N100_tau25,mean(Y[Z==1]) - mean(Y[Z==0])) # estimate lm_robust(Y~Z,data=dat0_N100_tau25)$coef # estimate ``` ```{r ddversion_sim, echo=TRUE, warnings=FALSE} E0 <- declare_estimator(Y~Z,model=lm_robust,label="t test 1", inquiry="ATE") t_test <- function(data) { test <- with(data, t.test(x = Y[Z == 1], y = Y[Z == 0])) data.frame(statistic = test$statistic, p.value = test$p.value) } T0 <- declare_test(handler=label_test(t_test),label="t test 2") design0_plus_tests <- design0_base + E0 + T0 design0_N100_tau25_plus <- redesign(design0_plus_tests,N=100,tau=.25) ## Only repeat the random assignment, not the creation of Y0. Ignore warning names(design0_N100_tau25_plus) design0_N100_tau25_sims <- simulate_design(design0_N100_tau25_plus, sims=c(1,100,1,1,1,1)) # only repeat the random assignment # design0_N100_tau25_sims has 200 rows (2 tests * 100 random assignments) # just look at the first 6 rows head(design0_N100_tau25_sims) # for each estimator, power = proportion of simulations with p.value < 0.5 design0_N100_tau25_sims %>% group_by(estimator) %>% summarize(pow=mean(p.value < .05),.groups="drop") ``` # Power with covariate adjustment ## Covariate adjustment and power - Covariate adjustment can improve power because it mops up variation in the outcome variable. - If prognostic, covariate adjustment can reduce variance dramatically. Lower variance means higher power. - If non-prognostic, power gains are minimal. - All covariates must be pre-treatment. Do not drop observations on account of missingness. - See the module on [threats to internal validity](threats-to-internal-validity-of-randomized-experiments.html) and the [10 things to know about covariate adjustment](https://egap.org/resource/10-things-to-know-about-covariate-adjustment/). - Freedman's bias as n of observations decreases and K covariates increases. ## Blocking - Blocking: randomly assign treatment within blocks - “Ex-ante” covariate adjustment - Higher precision/efficiency implies more power - Reduce “conditional bias”: association between treatment assignment and potential outcomes - Benefits of blocking over covariate adjustment clearest in small experiments ## Example: Simulation-based power with a covariate {.allowframebreaks} ```{r powsimcov, echo=TRUE} ## Y0 is fixed in most field experiments. So we only generate it once make_Y0_cov <- function(N){ u0 <- rnorm(n = N) x <- rpois(n=N,lambda=2) Y0 <- .5 * sd(u0) * x + u0 return(data.frame(Y0=Y0,x=x)) } ## X is moderarely predictive of Y0. test_dat <- make_Y0_cov(100) test_lm <- lm_robust(Y0~x,data=test_dat) summary(test_lm) ## now set up the simulation repeat_experiment_and_test_cov <- function(N, tau, Y0, x){ Y1 <- Y0 + tau Z <- complete_ra(N = N) Yobs <- Z * Y1 + (1 - Z) * Y0 estimator <- lm_robust(Yobs ~ Z+x,data=data.frame(Y0,Z,x)) pval <- estimator$p.value[2] return(pval) } ## create the data once, randomly assign treatment sims times ## report what proportion return p-value < 0.05 power_sim_cov <- function(N,tau,sims){ dat <- make_Y0_cov(N) pvals <- replicate(n=sims, repeat_experiment_and_test_cov(N=N, tau=tau,Y0=dat$Y0,x=dat$x)) pow <- sum(pvals < .05) / sims return(pow) } ``` ```{r, echo=TRUE} set.seed(12345) power_sim_cov(N=80,tau=.25,sims=100) power_sim_cov(N=80,tau=.25,sims=100) ``` # Power for cluster randomization ## Power and clustered designs - Recall the [randomization module](randomization.html). - Given a fixed $N$, a clustered design is weakly less powered than a non-clustered design. - The difference is often substantial. - We have to estimate variance correctly: - Clustering standard errors (the usual) - Randomization inference - To increase power: - Better to increase number of clusters than number of units per cluster. - How much clusters reduce power depends critically on the intra-cluster correlation (the ratio of variance within clusters to total variance). ## A note on clustering in observational research - Often overlooked, leading to (possibly) wildly understated uncertainty. - Frequentist inference based on ratio $\hat{\beta}/\hat{se}$ - If we underestimate $\hat{se}$, we are much more likely to reject $H_0$. (Type-I error rate is too high.) - Many observational designs much less powered than we think they are. ## Example: Simulation-based power for cluster randomization {.allowframebreaks} ```{r powsimclus, echo=TRUE} ## Y0 is fixed in most field experiments. So we only generate it once make_Y0_clus <- function(n_indivs,n_clus){ # n_indivs is number of people per cluster # n_clus is number of clusters clus_id <- gl(n_clus,n_indivs) N <- n_clus * n_indivs u0 <- fabricatr::draw_normal_icc(N=N,clusters=clus_id,ICC=.1) Y0 <- u0 return(data.frame(Y0=Y0,clus_id=clus_id)) } test_dat <- make_Y0_clus(n_indivs=10,n_clus=100) # confirm that this produces data with 10 in each of 100 clusters table(test_dat$clus_id) # confirm ICC ICC::ICCbare(y=Y0,x=clus_id,data=test_dat) repeat_experiment_and_test_clus <- function(N, tau, Y0, clus_id){ Y1 <- Y0 + tau # here we randomize Z at the cluster level Z <- cluster_ra(clusters=clus_id) Yobs <- Z * Y1 + (1 - Z) * Y0 estimator <- lm_robust(Yobs ~ Z, clusters = clus_id, data=data.frame(Y0,Z,clus_id), se_type = "CR2") pval <- estimator$p.value[2] return(pval) } power_sim_clus <- function(n_indivs,n_clus,tau,sims){ dat <- make_Y0_clus(n_indivs,n_clus) N <- n_indivs * n_clus # randomize treatment sims times pvals <- replicate(n=sims, repeat_experiment_and_test_clus(N=N,tau=tau, Y0=dat$Y0,clus_id=dat$clus_id)) pow <- sum(pvals < .05) / sims return(pow) } ``` ```{r, echo=TRUE} set.seed(12345) power_sim_clus(n_indivs=8,n_clus=100,tau=.25,sims=100) power_sim_clus(n_indivs=8,n_clus=100,tau=.25,sims=100) ``` ## Example: Simulation-based power for cluster randomization (DeclareDesign) {.allowframebreaks} ```{r ddversion_clus, echo=TRUE} P1 <- declare_population(N = n_clus * n_indivs, clusters=gl(n_clus,n_indivs), u0=draw_normal_icc(N=N,clusters=clusters,ICC=.2)) O1 <- declare_potential_outcomes(Y_Z_0 = 5 + u0, Y_Z_1 = Y_Z_0 + tau) A1 <- declare_assignment(Z=conduct_ra(N=N, clusters=clusters)) estimand_ate <- declare_inquiry(ATE=mean(Y_Z_1 - Y_Z_0)) R1 <- declare_reveal(Y,Z) design1_base <- P1 + A1 + O1 + R1 + estimand_ate ## For example: design1_test <- redesign(design1_base,n_clus=10,n_indivs=100,tau=.25) test_d1 <- draw_data(design1_test) # confirm all individuals in a cluster have the same treatment assignment with(test_d1,table(Z,clusters)) # three estimators, differ in se_type: E1a <- declare_estimator(Y~Z,model=lm_robust,clusters=clusters, se_type="CR2", label="CR2 cluster t test", inquiry="ATE") E1b <- declare_estimator(Y~Z,model=lm_robust,clusters=clusters, se_type="CR0", label="CR0 cluster t test", inquiry="ATE") E1c <- declare_estimator(Y~Z,model=lm_robust,clusters=clusters, se_type="stata", label="stata RCSE t test", inquiry="ATE") design1_plus <- design1_base + E1a + E1b + E1c design1_plus_tosim <- redesign(design1_plus,n_clus=10,n_indivs=100,tau=.25) ``` ```{r ddversion_clus_sims, echo=TRUE, cache=TRUE, warnings=FALSE} ## Only repeat the random assignment, not the creation of Y0. Ignore warning ## We would want more simulations in practice. set.seed(12355) design1_sims <- simulate_design(design1_plus_tosim, sims=c(1,1000,rep(1,length(design1_plus_tosim)-2))) design1_sims %>% group_by(estimator) %>% summarize(pow=mean(p.value < .05), coverage = mean(estimand <= conf.high & estimand >= conf.low), .groups="drop") ``` ```{r ddversion_clus_2, echo=TRUE, cache=TRUE, results='hide', warning=FALSE} library(DesignLibrary) ## This may be simpler than the above: d1 <- block_cluster_two_arm_designer(N_blocks = 1, N_clusters_in_block = 10, N_i_in_cluster = 100, sd_block = 0, sd_cluster = .3, ate = .25) d1_plus <- d1 + E1b + E1c d1_sims <- simulate_design(d1_plus,sims=c(1,1,1000,1,1,1,1,1)) ``` ```{r ddversion_clus_2_print, echo=TRUE} d1_sims %>% group_by(estimator) %>% summarize(pow=mean(p.value < .05), coverage = mean(estimand <= conf.high & estimand >= conf.low), .groups="drop") ``` # Comparative statics ## Comparative Statics - Power is: - Increasing in $N$ - Increasing in $|\tau|$ - Decreasing in $\sigma$ ## Power by sample size {.allowframebreaks} ```{r powplot_by_n, echo=TRUE} some_ns <- seq(10,800,by=10) pow_by_n <- sapply(some_ns, function(then){ pwr.t.test(n = then, d = 0.25, sig.level = 0.05)$power }) plot(some_ns,pow_by_n, xlab="Sample Size", ylab="Power") abline(h=.8) ## See https://cran.r-project.org/web/packages/pwr/vignettes/pwr-vignette.html ## for fancier plots ## ptest <- pwr.t.test(n = NULL, d = 0.25, sig.level = 0.05, power = .8) ## plot(ptest) ``` ## Power by treatment effect size {.allowframebreaks} ```{r powplot_by_tau, echo=TRUE} some_taus <- seq(0,1,by=.05) pow_by_tau <- sapply(some_taus, function(thetau){ pwr.t.test(n = 200, d = thetau, sig.level = 0.05)$power }) plot(some_taus,pow_by_tau, xlab="Average Treatment Effect (Standardized)", ylab="Power") abline(h=.8) ``` ## EGAP Power Calculator - Try the calculator at: https://egap.shinyapps.io/power-app/ - For cluster randomization designs, try adjusting: - Number of clusters - Number of units per clusters - Intra-cluster correlation - Treatment effect ## Comments - Know your outcome variable. - What effects can you realistically expect from your treatment? - What is the plausible range of variation of the outcome variable? - A design with limited possible movement in the outcome variable may not be well-powered. ## Conclusion: How to improve your power 1. Increase the $N$ - If clustered, increase the number of clusters if at all possible 2. Strengthen the treatment 3. Improve precision - Covariate adjustment - Blocking 4. Better measurement of the outcome variable