We want to separate signal from noise.
Power = probability of rejecting null hypothesis, given true effect \(\ne\) 0.
In other words, it is the ability to detect an effect given that it exists.
Formally: (1 - Type II) error rate.
Thus, power \(\in\) (0, 1).
Standard thresholds: 0.8 or 0.9.
Power analysis is something we do before we run a study.
Helps you figure out the sample you need to detect a given effect size.
Or helps you figure out a minimal detectable difference given a set sample size.
May help you decide whether to run a study.
It is hard to learn from an under-powered null finding.
Say there truly is a treatment effect and you run your experiment many times. How often will you get a statistically significant result?
Some guesswork to answer this question.
How big is your treatment effect?
How many units are treated, measured?
How much noise is there in the measurement of your outcome?
Analytical calculations of power
Simulation
Interactive
R Packages
DeclareDesign, see also https://declaredesign.org/
Formula: \[\begin{align*} \text{Power} &= \Phi\left(\frac{|\tau| \sqrt{N}}{2\sigma}- \Phi^{-1}(1- \frac{\alpha}{2})\right) \end{align*}\]
Components:
# Power for a study with 80 obserations and effect
# size of 0.25
library(pwr)
pwr.t.test(
n = 40, d = 0.25, sig.level = 0.05,
power = NULL, type = c(
"two.sample",
"one.sample", "paired"
)
)
Two-sample t test power calculation
n = 40
d = 0.25
sig.level = 0.05
power = 0.1972
alternative = two.sided
NOTE: n is number in *each* group
Only derived for some test statistics (differences of means)
Makes specific assumptions about the data-generating process
Incompatible with more complex designs
Create dataset and simulate research design.
Assumptions are necessary for simulation studies, but you make your own.
For the DeclareDesign approach, see https://declaredesign.org/
Define the sample and the potential outcomes function.
Define the treatment assignment procedure.
Create data.
Assign treatment, then estimate the effect.
Do this many times.
Complete randomization
With covariates
With cluster randomization
# install.packages("randomizr")
library(randomizr)
library(estimatr)
## Y0 is fixed in most field experiments.
## So we only generate it once:
make_Y0 <- function(N) {
rnorm(n = N)
}
repeat_experiment_and_test <- function(N, Y0, tau) {
Y1 <- Y0 + tau
Z <- complete_ra(N = N)
Yobs <- Z * Y1 + (1 - Z) * Y0
estimator <- lm_robust(Yobs ~ Z)
pval <- estimator$p.value[2]
return(pval)
}
power_sim <- function(N, tau, sims) {
Y0 <- make_Y0(N)
pvals <- replicate(
n = sims,
repeat_experiment_and_test(N = N, Y0 = Y0, tau = tau)
)
pow <- sum(pvals < .05) / sims
return(pow)
}
set.seed(12345)
power_sim(N = 80, tau = .25, sims = 100)
[1] 0.15
[1] 0.21
library(DeclareDesign)
library(tidyverse)
P0 <- declare_population(N, u0 = rnorm(N))
# declare Y(Z=1) and Y(Z=0)
O0 <- declare_potential_outcomes(Y_Z_0 = 5 + u0, Y_Z_1 = Y_Z_0 + tau)
# design is to assign m units to treatment
A0 <- declare_assignment(Z = conduct_ra(N = N, m = round(N / 2)))
# estimand is the average difference between Y(Z=1) and Y(Z=0)
estimand_ate <- declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0))
R0 <- declare_reveal(Y, Z)
design0_base <- P0 + A0 + O0 + R0
## For example:
design0_N100_tau25 <- redesign(design0_base, N = 100, tau = .25)
dat0_N100_tau25 <- draw_data(design0_N100_tau25)
head(dat0_N100_tau25)
ID u0 Z Y_Z_0 Y_Z_1 Y
1 001 -0.2060 0 4.794 5.044 4.794
2 002 -0.5875 0 4.413 4.663 4.413
3 003 -0.2908 1 4.709 4.959 4.959
4 004 -2.5649 0 2.435 2.685 2.435
5 005 -1.8967 0 3.103 3.353 3.103
6 006 -1.6401 1 3.360 3.610 3.610
[1] 0.25
[1] 0.5569
(Intercept) Z
4.8458 0.5569
E0 <- declare_estimator(Y ~ Z,
model = lm_robust, label = "t test 1",
inquiry = "ATE"
)
t_test <- function(data) {
test <- with(data, t.test(x = Y[Z == 1], y = Y[Z == 0]))
data.frame(statistic = test$statistic, p.value = test$p.value)
}
T0 <- declare_test(handler = label_test(t_test), label = "t test 2")
design0_plus_tests <- design0_base + E0 + T0
design0_N100_tau25_plus <- redesign(design0_plus_tests, N = 100, tau = .25)
## Only repeat the random assignment, not the creation of Y0. Ignore warning
names(design0_N100_tau25_plus)
[1] "P0" "A0" "O0" "R0" "t test 1" "t test 2"
design0_N100_tau25_sims <- simulate_design(design0_N100_tau25_plus,
sims = c(1, 100, 1, 1, 1, 1)
) # only repeat the random assignment
Warning: We recommend you choose a higher number of simulations than 1 for the top level of simulation.
# design0_N100_tau25_sims has 200 rows (2 tests * 100 random assignments)
# just look at the first 6 rows
head(design0_N100_tau25_sims)
design N tau sim_ID estimator term estimate std.error statistic p.value conf.low conf.high df outcome inquiry
1 design0_N100_tau25_plus 100 0.25 1 t test 1 Z 0.1108 0.2150 0.5153 0.60752 -0.3158 0.5374 98 Y ATE
2 design0_N100_tau25_plus 100 0.25 1 t test 2 <NA> NA NA 0.5153 0.60754 NA NA NA <NA> <NA>
3 design0_N100_tau25_plus 100 0.25 2 t test 1 Z 0.2458 0.2154 1.1411 0.25661 -0.1817 0.6733 98 Y ATE
4 design0_N100_tau25_plus 100 0.25 2 t test 2 <NA> NA NA 1.1411 0.25662 NA NA NA <NA> <NA>
5 design0_N100_tau25_plus 100 0.25 3 t test 1 Z 0.5463 0.2133 2.5608 0.01197 0.1229 0.9697 98 Y ATE
6 design0_N100_tau25_plus 100 0.25 3 t test 2 <NA> NA NA 2.5608 0.01203 NA NA NA <NA> <NA>
step_1_draw step_2_draw
1 1 1
2 1 1
3 1 2
4 1 2
5 1 3
6 1 3
# for each estimator, power = proportion of simulations with p.value < 0.5
design0_N100_tau25_sims %>%
group_by(estimator) %>%
summarize(pow = mean(p.value < .05), .groups = "drop")
# A tibble: 2 × 2
estimator pow
<chr> <dbl>
1 t test 1 0.2
2 t test 2 0.2
Covariate adjustment can improve power because it mops up variation in the outcome variable.
If prognostic, covariate adjustment can reduce variance dramatically. Lower variance means higher power.
If non-prognostic, power gains are minimal.
All covariates must be pre-treatment. Do not drop observations on account of missingness.
Freedman’s bias as n of observations decreases and K covariates increases.
Blocking: randomly assign treatment within blocks
“Ex-ante” covariate adjustment
Higher precision/efficiency implies more power
Reduce “conditional bias”: association between treatment assignment and potential outcomes
Benefits of blocking over covariate adjustment clearest in small experiments
## Y0 is fixed in most field experiments. So we only generate it once
make_Y0_cov <- function(N) {
u0 <- rnorm(n = N)
x <- rpois(n = N, lambda = 2)
Y0 <- .5 * sd(u0) * x + u0
return(data.frame(Y0 = Y0, x = x))
}
## X is moderarely predictive of Y0.
test_dat <- make_Y0_cov(100)
test_lm <- lm_robust(Y0 ~ x, data = test_dat)
summary(test_lm)
Call:
lm_robust(formula = Y0 ~ x, data = test_dat)
Standard error type: HC2
Coefficients:
Estimate Std. Error t value Pr(>|t|) CI Lower CI Upper DF
(Intercept) 0.11 0.1880 0.585 0.559753653 -0.263 0.483 98
x 0.44 0.0814 5.413 0.000000441 0.279 0.602 98
Multiple R-squared: 0.231 , Adjusted R-squared: 0.223
F-statistic: 29.3 on 1 and 98 DF, p-value: 0.000000441
## now set up the simulation
repeat_experiment_and_test_cov <- function(N, tau, Y0, x) {
Y1 <- Y0 + tau
Z <- complete_ra(N = N)
Yobs <- Z * Y1 + (1 - Z) * Y0
estimator <- lm_robust(Yobs ~ Z + x, data = data.frame(Y0, Z, x))
pval <- estimator$p.value[2]
return(pval)
}
## create the data once, randomly assign treatment sims times
## report what proportion return p-value < 0.05
power_sim_cov <- function(N, tau, sims) {
dat <- make_Y0_cov(N)
pvals <- replicate(n = sims, repeat_experiment_and_test_cov(
N = N,
tau = tau, Y0 = dat$Y0, x = dat$x
))
pow <- sum(pvals < .05) / sims
return(pow)
}
[1] 0.13
[1] 0.19
Recall the randomization module.
Given a fixed \(N\), a clustered design is weakly less powered than a non-clustered design.
We have to estimate variance correctly:
To increase power:
Often overlooked, leading to (possibly) wildly understated uncertainty.
Frequentist inference based on ratio \(\hat{\beta}/\hat{se}\)
If we underestimate \(\hat{se}\), we are much more likely to reject \(H_0\). (Type-I error rate is too high.)
Many observational designs much less powered than we think they are.
## Y0 is fixed in most field experiments. So we only generate it once
make_Y0_clus <- function(n_indivs, n_clus) {
# n_indivs is number of people per cluster
# n_clus is number of clusters
clus_id <- gl(n_clus, n_indivs)
N <- n_clus * n_indivs
u0 <- fabricatr::draw_normal_icc(N = N, clusters = clus_id, ICC = .1)
Y0 <- u0
return(data.frame(Y0 = Y0, clus_id = clus_id))
}
test_dat <- make_Y0_clus(n_indivs = 10, n_clus = 100)
# confirm that this produces data with 10 in each of 100 clusters
table(test_dat$clus_id)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99
10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10
100
10
[1] 0.09655
repeat_experiment_and_test_clus <- function(N, tau, Y0, clus_id) {
Y1 <- Y0 + tau
# here we randomize Z at the cluster level
Z <- cluster_ra(clusters = clus_id)
Yobs <- Z * Y1 + (1 - Z) * Y0
estimator <- lm_robust(Yobs ~ Z,
clusters = clus_id,
data = data.frame(Y0, Z, clus_id), se_type = "CR2"
)
pval <- estimator$p.value[2]
return(pval)
}
power_sim_clus <- function(n_indivs, n_clus, tau, sims) {
dat <- make_Y0_clus(n_indivs, n_clus)
N <- n_indivs * n_clus
# randomize treatment sims times
pvals <- replicate(
n = sims,
repeat_experiment_and_test_clus(
N = N, tau = tau,
Y0 = dat$Y0, clus_id = dat$clus_id
)
)
pow <- sum(pvals < .05) / sims
return(pow)
}
[1] 0.66
[1] 0.68
P1 <- declare_population(
N = n_clus * n_indivs,
clusters = gl(n_clus, n_indivs),
u0 = draw_normal_icc(N = N, clusters = clusters, ICC = .2)
)
O1 <- declare_potential_outcomes(Y_Z_0 = 5 + u0, Y_Z_1 = Y_Z_0 + tau)
A1 <- declare_assignment(Z = conduct_ra(N = N, clusters = clusters))
estimand_ate <- declare_inquiry(ATE = mean(Y_Z_1 - Y_Z_0))
R1 <- declare_reveal(Y, Z)
design1_base <- P1 + A1 + O1 + R1 + estimand_ate
## For example:
design1_test <- redesign(design1_base, n_clus = 10, n_indivs = 100, tau = .25)
test_d1 <- draw_data(design1_test)
# confirm all individuals in a cluster have the same treatment assignment
with(test_d1, table(Z, clusters))
clusters
Z 1 2 3 4 5 6 7 8 9 10
0 100 0 100 100 100 0 0 100 0 0
1 0 100 0 0 0 100 100 0 100 100
# three estimators, differ in se_type:
E1a <- declare_estimator(Y ~ Z,
model = lm_robust, clusters = clusters,
se_type = "CR2", label = "CR2 cluster t test",
inquiry = "ATE"
)
E1b <- declare_estimator(Y ~ Z,
model = lm_robust, clusters = clusters,
se_type = "CR0", label = "CR0 cluster t test",
inquiry = "ATE"
)
E1c <- declare_estimator(Y ~ Z,
model = lm_robust, clusters = clusters,
se_type = "stata", label = "stata RCSE t test",
inquiry = "ATE"
)
design1_plus <- design1_base + E1a + E1b + E1c
design1_plus_tosim <- redesign(design1_plus, n_clus = 10, n_indivs = 100, tau = .25)
## Only repeat the random assignment, not the creation of Y0. Ignore warning
## We would want more simulations in practice.
set.seed(12355)
design1_sims <- simulate_design(design1_plus_tosim,
sims = c(1, 1000, rep(1, length(design1_plus_tosim) - 2))
)
Warning: We recommend you choose a higher number of simulations than 1 for the top level of simulation.
design1_sims %>%
group_by(estimator) %>%
summarize(
pow = mean(p.value < .05),
coverage = mean(estimand <= conf.high & estimand >= conf.low),
.groups = "drop"
)
# A tibble: 3 × 3
estimator pow coverage
<chr> <dbl> <dbl>
1 CR0 cluster t test 0.155 0.911
2 CR2 cluster t test 0.105 0.936
3 stata RCSE t test 0.131 0.918
library(DesignLibrary)
## This may be simpler than the above:
d1 <- block_cluster_two_arm_designer(
N_blocks = 1,
N_clusters_in_block = 10,
N_i_in_cluster = 100,
sd_block = 0,
sd_cluster = .3,
ate = .25
)
d1_plus <- d1 + E1b + E1c
d1_sims <- simulate_design(d1_plus, sims = c(1, 1, 1000, 1, 1, 1, 1, 1))
d1_sims %>%
group_by(estimator) %>%
summarize(
pow = mean(p.value < .05),
coverage = mean(estimand <= conf.high & estimand >= conf.low),
.groups = "drop"
)
# A tibble: 3 × 3
estimator pow coverage
<chr> <dbl> <dbl>
1 CR0 cluster t test 0.209 0.914
2 estimator 0.143 0.941
3 stata RCSE t test 0.194 0.925
Try the calculator at: https://egap.shinyapps.io/power-app/
For cluster randomization designs, try adjusting:
Increase the \(N\)
Strengthen the treatment
Improve precision
Covariate adjustment
Blocking
Better measurement of the outcome variable
Comments
Know your outcome variable.
What effects can you realistically expect from your treatment?
What is the plausible range of variation of the outcome variable?