---
title: "Statistical Power"
author: "Fill In Your Name"
date: '`r format(Sys.time(), "%d %B, %Y")`'
header-includes: |
\setbeamertemplate{footline}{\begin{beamercolorbox}{section in head/foot}
\includegraphics[height=.5cm]{../Images/egap-logo.png} \hfill
\insertframenumber/\inserttotalframenumber \end{beamercolorbox}}
output:
beamer_presentation:
keep_tex: yes
toc: yes
revealjs::revealjs_presentation:
center: no
highlight: pygments
reveal_options:
chalkboard:
theme: whiteboard
toggleNotesButton: no
previewLinks: yes
slideNumber: yes
reveal_plugins:
- notes
- search
- chalkboard
self_contained: no
smart: no
theme: default
transition: fade
---
```{r setup, include=FALSE, echo=FALSE}
source("rmd_setup.R")
library(DesignLibrary)
```
# What is power?
## What is power?
- We want to separate signal from noise.
- Power = probability of rejecting null hypothesis, given true effect $\ne$ 0.
- In other words, it is the ability to detect an effect given that it exists.
- Formally: (1 - Type II) error rate.
- Thus, power $\in$ (0, 1).
- Standard thresholds: 0.8 or 0.9.
## Starting point for power analysis
- Power analysis is something we do *before* we run a study.
- Helps you figure out the sample you need to detect a given effect size.
- Or helps you figure out a minimal detectable difference given a set sample size.
- May help you decide whether to run a study.
- It is hard to learn from an under-powered null finding.
- Was there an effect, but we were unable to detect it? or was there no effect? We can't say.
## Power
- Say there truly is a treatment effect and you run your experiment many times. How often will you get a statistically significant result?
- Some guesswork to answer this question.
- How big is your treatment effect?
- How many units are treated, measured?
- How much noise is there in the measurement of your outcome?
## Approaches to power calculation
- Analytical calculations of power
- Simulation
## Power calculation tools
- Interactive
- [EGAP Power Calculator](https://egap.shinyapps.io/power-app/)
- [rpsychologist](https://rpsychologist.com/d3/NHST/)
- R Packages
- [pwr](https://cran.r-project.org/web/packages/pwr/index.html)
- [DeclareDesign](https://cran.r-project.org/web/packages/DeclareDesign/index.html), see also
# Analytical calculations of power
## Analytical calculations of power
- Formula:
\begin{align*}
\text{Power} &= \Phi\left(\frac{|\tau| \sqrt{N}}{2\sigma}- \Phi^{-1}(1- \frac{\alpha}{2})\right)
\end{align*}
- Components:
- $\phi$: standard normal CDF is monotonically increasing
- $\tau$: the effect size
- $N$: the sample size
- $\sigma$: the standard deviation of the outcome
- $\alpha$: the significance level (typically 0.05)
## Example: Analytical calculations of power
```{r pwrsimp, echo=TRUE, include=TRUE}
# Power for a study with 80 obserations and effect
# size of 0.25
library(pwr)
pwr.t.test(n = 40, d = 0.25, sig.level = 0.05,
power = NULL, type = c("two.sample",
"one.sample", "paired"))
```
## Limitations to analytical power calculations
- Only derived for some test statistics (differences of means)
- Makes specific assumptions about the data-generating process
- Incompatible with more complex designs
# Simulation-based power calculation
## Simulation-based power calculation
- Create dataset and simulate research design.
- Assumptions are necessary for simulation studies, but you make your own.
- For the DeclareDesign approach, see
## Steps
- Define the sample and the potential outcomes function.
- Define the treatment assignment procedure.
- Create data.
- Assign treatment, then estimate the effect.
- Do this many times.
## Examples
- Complete randomization
- With covariates
- With cluster randomization
## Example: Simulation-based power for complete randomization
```{r powersim, echo=TRUE, include=TRUE}
# install.packages("randomizr")
library(randomizr)
library(estimatr)
## Y0 is fixed in most field experiments.
## So we only generate it once:
make_Y0 <- function(N){ rnorm(n = N) }
repeat_experiment_and_test <- function(N, Y0, tau){
Y1 <- Y0 + tau
Z <- complete_ra(N = N)
Yobs <- Z * Y1 + (1 - Z) * Y0
estimator <- lm_robust(Yobs ~ Z)
pval <- estimator$p.value[2]
return(pval)
}
```
## Example: Simulation-based power for complete randomization
```{r powersim2, echo=TRUE, include=TRUE}
power_sim <- function(N,tau,sims){
Y0 <- make_Y0(N)
pvals <- replicate(n=sims,
repeat_experiment_and_test(N=N,Y0=Y0,tau=tau))
pow <- sum(pvals < .05) / sims
return(pow)
}
set.seed(12345)
power_sim(N=80,tau=.25,sims=100)
power_sim(N=80,tau=.25,sims=100)
```
## Example: Using DeclareDesign {.allowframebreaks}
```{r ddversion, echo=TRUE, message=FALSE, warning=FALSE}
library(DeclareDesign)
library(tidyverse)
P0 <- declare_population(N,u0=rnorm(N))
# declare Y(Z=1) and Y(Z=0)
O0 <- declare_potential_outcomes(Y_Z_0 = 5 + u0, Y_Z_1 = Y_Z_0 + tau)
# design is to assign m units to treatment
A0 <- declare_assignment(Z=conduct_ra(N=N, m=round(N/2)))
# estimand is the average difference between Y(Z=1) and Y(Z=0)
estimand_ate <- declare_inquiry(ATE=mean(Y_Z_1 - Y_Z_0))
R0 <- declare_reveal(Y,Z)
design0_base <- P0 + A0 + O0 + R0
## For example:
design0_N100_tau25 <- redesign(design0_base,N=100,tau=.25)
dat0_N100_tau25 <- draw_data(design0_N100_tau25)
head(dat0_N100_tau25)
with(dat0_N100_tau25,mean(Y_Z_1 - Y_Z_0)) # true ATE
with(dat0_N100_tau25,mean(Y[Z==1]) - mean(Y[Z==0])) # estimate
lm_robust(Y~Z,data=dat0_N100_tau25)$coef # estimate
```
```{r ddversion_sim, echo=TRUE, warnings=FALSE}
E0 <- declare_estimator(Y~Z,model=lm_robust,label="t test 1",
inquiry="ATE")
t_test <- function(data) {
test <- with(data, t.test(x = Y[Z == 1], y = Y[Z == 0]))
data.frame(statistic = test$statistic, p.value = test$p.value)
}
T0 <- declare_test(handler=label_test(t_test),label="t test 2")
design0_plus_tests <- design0_base + E0 + T0
design0_N100_tau25_plus <- redesign(design0_plus_tests,N=100,tau=.25)
## Only repeat the random assignment, not the creation of Y0. Ignore warning
names(design0_N100_tau25_plus)
design0_N100_tau25_sims <- simulate_design(design0_N100_tau25_plus,
sims=c(1,100,1,1,1,1)) # only repeat the random assignment
# design0_N100_tau25_sims has 200 rows (2 tests * 100 random assignments)
# just look at the first 6 rows
head(design0_N100_tau25_sims)
# for each estimator, power = proportion of simulations with p.value < 0.5
design0_N100_tau25_sims %>% group_by(estimator) %>%
summarize(pow=mean(p.value < .05),.groups="drop")
```
# Power with covariate adjustment
## Covariate adjustment and power
- Covariate adjustment can improve power because it mops up variation in the outcome variable.
- If prognostic, covariate adjustment can reduce variance dramatically. Lower variance means higher power.
- If non-prognostic, power gains are minimal.
- All covariates must be pre-treatment. Do not drop observations on account of missingness.
- See the module on [threats to internal validity](threats-to-internal-validity-of-randomized-experiments.html) and the [10 things to know about covariate adjustment](https://egap.org/resource/10-things-to-know-about-covariate-adjustment/).
- Freedman's bias as n of observations decreases and K covariates increases.
## Blocking
- Blocking: randomly assign treatment within blocks
- “Ex-ante” covariate adjustment
- Higher precision/efficiency implies more power
- Reduce “conditional bias”: association between treatment assignment and potential outcomes
- Benefits of blocking over covariate adjustment clearest in small experiments
## Example: Simulation-based power with a covariate {.allowframebreaks}
```{r powsimcov, echo=TRUE}
## Y0 is fixed in most field experiments. So we only generate it once
make_Y0_cov <- function(N){
u0 <- rnorm(n = N)
x <- rpois(n=N,lambda=2)
Y0 <- .5 * sd(u0) * x + u0
return(data.frame(Y0=Y0,x=x))
}
## X is moderarely predictive of Y0.
test_dat <- make_Y0_cov(100)
test_lm <- lm_robust(Y0~x,data=test_dat)
summary(test_lm)
## now set up the simulation
repeat_experiment_and_test_cov <- function(N, tau, Y0, x){
Y1 <- Y0 + tau
Z <- complete_ra(N = N)
Yobs <- Z * Y1 + (1 - Z) * Y0
estimator <- lm_robust(Yobs ~ Z+x,data=data.frame(Y0,Z,x))
pval <- estimator$p.value[2]
return(pval)
}
## create the data once, randomly assign treatment sims times
## report what proportion return p-value < 0.05
power_sim_cov <- function(N,tau,sims){
dat <- make_Y0_cov(N)
pvals <- replicate(n=sims, repeat_experiment_and_test_cov(N=N,
tau=tau,Y0=dat$Y0,x=dat$x))
pow <- sum(pvals < .05) / sims
return(pow)
}
```
```{r, echo=TRUE}
set.seed(12345)
power_sim_cov(N=80,tau=.25,sims=100)
power_sim_cov(N=80,tau=.25,sims=100)
```
# Power for cluster randomization
## Power and clustered designs
- Recall the [randomization module](randomization.html).
- Given a fixed $N$, a clustered design is weakly less powered than a non-clustered design.
- The difference is often substantial.
- We have to estimate variance correctly:
- Clustering standard errors (the usual)
- Randomization inference
- To increase power:
- Better to increase number of clusters than number of units per cluster.
- How much clusters reduce power depends critically on the intra-cluster correlation (the ratio of variance within clusters to total variance).
## A note on clustering in observational research
- Often overlooked, leading to (possibly) wildly understated uncertainty.
- Frequentist inference based on ratio $\hat{\beta}/\hat{se}$
- If we underestimate $\hat{se}$, we are much more likely to reject $H_0$. (Type-I error rate is too high.)
- Many observational designs much less powered than we think they are.
## Example: Simulation-based power for cluster randomization {.allowframebreaks}
```{r powsimclus, echo=TRUE}
## Y0 is fixed in most field experiments. So we only generate it once
make_Y0_clus <- function(n_indivs,n_clus){
# n_indivs is number of people per cluster
# n_clus is number of clusters
clus_id <- gl(n_clus,n_indivs)
N <- n_clus * n_indivs
u0 <- fabricatr::draw_normal_icc(N=N,clusters=clus_id,ICC=.1)
Y0 <- u0
return(data.frame(Y0=Y0,clus_id=clus_id))
}
test_dat <- make_Y0_clus(n_indivs=10,n_clus=100)
# confirm that this produces data with 10 in each of 100 clusters
table(test_dat$clus_id)
# confirm ICC
ICC::ICCbare(y=Y0,x=clus_id,data=test_dat)
repeat_experiment_and_test_clus <- function(N, tau, Y0, clus_id){
Y1 <- Y0 + tau
# here we randomize Z at the cluster level
Z <- cluster_ra(clusters=clus_id)
Yobs <- Z * Y1 + (1 - Z) * Y0
estimator <- lm_robust(Yobs ~ Z, clusters = clus_id,
data=data.frame(Y0,Z,clus_id), se_type = "CR2")
pval <- estimator$p.value[2]
return(pval)
}
power_sim_clus <- function(n_indivs,n_clus,tau,sims){
dat <- make_Y0_clus(n_indivs,n_clus)
N <- n_indivs * n_clus
# randomize treatment sims times
pvals <- replicate(n=sims,
repeat_experiment_and_test_clus(N=N,tau=tau,
Y0=dat$Y0,clus_id=dat$clus_id))
pow <- sum(pvals < .05) / sims
return(pow)
}
```
```{r, echo=TRUE}
set.seed(12345)
power_sim_clus(n_indivs=8,n_clus=100,tau=.25,sims=100)
power_sim_clus(n_indivs=8,n_clus=100,tau=.25,sims=100)
```
## Example: Simulation-based power for cluster randomization (DeclareDesign) {.allowframebreaks}
```{r ddversion_clus, echo=TRUE}
P1 <- declare_population(N = n_clus * n_indivs,
clusters=gl(n_clus,n_indivs),
u0=draw_normal_icc(N=N,clusters=clusters,ICC=.2))
O1 <- declare_potential_outcomes(Y_Z_0 = 5 + u0, Y_Z_1 = Y_Z_0 + tau)
A1 <- declare_assignment(Z=conduct_ra(N=N, clusters=clusters))
estimand_ate <- declare_inquiry(ATE=mean(Y_Z_1 - Y_Z_0))
R1 <- declare_reveal(Y,Z)
design1_base <- P1 + A1 + O1 + R1 + estimand_ate
## For example:
design1_test <- redesign(design1_base,n_clus=10,n_indivs=100,tau=.25)
test_d1 <- draw_data(design1_test)
# confirm all individuals in a cluster have the same treatment assignment
with(test_d1,table(Z,clusters))
# three estimators, differ in se_type:
E1a <- declare_estimator(Y~Z,model=lm_robust,clusters=clusters,
se_type="CR2", label="CR2 cluster t test",
inquiry="ATE")
E1b <- declare_estimator(Y~Z,model=lm_robust,clusters=clusters,
se_type="CR0", label="CR0 cluster t test",
inquiry="ATE")
E1c <- declare_estimator(Y~Z,model=lm_robust,clusters=clusters,
se_type="stata", label="stata RCSE t test",
inquiry="ATE")
design1_plus <- design1_base + E1a + E1b + E1c
design1_plus_tosim <- redesign(design1_plus,n_clus=10,n_indivs=100,tau=.25)
```
```{r ddversion_clus_sims, echo=TRUE, cache=TRUE, warnings=FALSE}
## Only repeat the random assignment, not the creation of Y0. Ignore warning
## We would want more simulations in practice.
set.seed(12355)
design1_sims <- simulate_design(design1_plus_tosim,
sims=c(1,1000,rep(1,length(design1_plus_tosim)-2)))
design1_sims %>% group_by(estimator) %>%
summarize(pow=mean(p.value < .05),
coverage = mean(estimand <= conf.high & estimand >= conf.low),
.groups="drop")
```
```{r ddversion_clus_2, echo=TRUE, cache=TRUE, results='hide', warning=FALSE}
library(DesignLibrary)
## This may be simpler than the above:
d1 <- block_cluster_two_arm_designer(N_blocks = 1,
N_clusters_in_block = 10,
N_i_in_cluster = 100,
sd_block = 0,
sd_cluster = .3,
ate = .25)
d1_plus <- d1 + E1b + E1c
d1_sims <- simulate_design(d1_plus,sims=c(1,1,1000,1,1,1,1,1))
```
```{r ddversion_clus_2_print, echo=TRUE}
d1_sims %>% group_by(estimator) %>% summarize(pow=mean(p.value < .05),
coverage = mean(estimand <= conf.high & estimand >= conf.low),
.groups="drop")
```
# Comparative statics
## Comparative Statics
- Power is:
- Increasing in $N$
- Increasing in $|\tau|$
- Decreasing in $\sigma$
## Power by sample size {.allowframebreaks}
```{r powplot_by_n, echo=TRUE}
some_ns <- seq(10,800,by=10)
pow_by_n <- sapply(some_ns, function(then){
pwr.t.test(n = then, d = 0.25, sig.level = 0.05)$power
})
plot(some_ns,pow_by_n,
xlab="Sample Size",
ylab="Power")
abline(h=.8)
## See https://cran.r-project.org/web/packages/pwr/vignettes/pwr-vignette.html
## for fancier plots
## ptest <- pwr.t.test(n = NULL, d = 0.25, sig.level = 0.05, power = .8)
## plot(ptest)
```
## Power by treatment effect size {.allowframebreaks}
```{r powplot_by_tau, echo=TRUE}
some_taus <- seq(0,1,by=.05)
pow_by_tau <- sapply(some_taus, function(thetau){
pwr.t.test(n = 200, d = thetau, sig.level = 0.05)$power
})
plot(some_taus,pow_by_tau,
xlab="Average Treatment Effect (Standardized)",
ylab="Power")
abline(h=.8)
```
## EGAP Power Calculator
- Try the calculator at: https://egap.shinyapps.io/power-app/
- For cluster randomization designs, try adjusting:
- Number of clusters
- Number of units per clusters
- Intra-cluster correlation
- Treatment effect
## Comments
- Know your outcome variable.
- What effects can you realistically expect from your treatment?
- What is the plausible range of variation of the outcome variable?
- A design with limited possible movement in the outcome variable may not be well-powered.
## Conclusion: How to improve your power
1. Increase the $N$
- If clustered, increase the number of clusters if at all possible
2. Strengthen the treatment
3. Improve precision
- Covariate adjustment
- Blocking
4. Better measurement of the outcome variable