--- title: "Analysis of Covariance" --- ## Analysis of covariance * ANOVA: explanatory variables categorical (divide data into groups) * traditionally, analysis of covariance has categorical \$x\$'s plus one numerical \$x\$ ("covariate") to be adjusted for. * `lm` handles this too. * Simple example: two treatments (drugs) (`a` and `b`), with before and after scores. * Does knowing before score and/or treatment help to predict after score? * Is after score different by treatment/before score? ## Data Treatment, before, after: \scriptsize ``` a 5 20 a 10 23 a 12 30 a 9 25 a 23 34 a 21 40 a 14 27 a 18 38 a 6 24 a 13 31 b 7 19 b 12 26 b 27 33 b 24 35 b 18 30 b 22 31 b 26 34 b 21 28 b 14 23 b 9 22 ``` \normalsize ## Packages ```{r bAncova-1} library(tidyverse) library(broom) library(marginaleffects) ``` the last of these for predictions. ## Read in data ```{r bAncova-2, message=F} url <- "http://ritsokiguess.site/datafiles/ancova.txt" prepost <- read_delim(url, " ") prepost ``` ## Making a plot ```{r ancova-plot, fig.height=4.5} ggplot(prepost, aes(x = before, y = after, colour = drug)) + geom_point() ``` ## Comments * As before score goes up, after score goes up. * Red points (drug A) generally above blue points (drug B), for comparable before score. * Suggests before score effect *and* drug effect. ## The means ```{r bAncova-3 } prepost %>% group_by(drug) %>% summarize( before_mean = mean(before), after_mean = mean(after) ) ``` * Mean "after" score slightly higher for treatment A. * Mean "before" score much higher for treatment B. * Greater *improvement* on treatment A. ## Testing for interaction ```{r bAncova-4 } prepost.1 <- lm(after ~ before * drug, data = prepost) anova(prepost.1) ``` * Interaction not significant. Will remove later. ## Predictions Set up values to predict for: ```{r} summary(prepost) ``` ```{r} new <- datagrid(before = c(9.75, 14, 21.25), drug = c("a", "b"), model = prepost.1) ``` ## and then ```{r} cbind(predictions(prepost.1, newdata = new)) %>% select(drug, before, estimate) ``` \normalsize ## Predictions (with interaction included), plotted ```{r, fig.height=4} plot_predictions(model = prepost.1, condition = c("before", "drug")) ``` Lines almost parallel, but not quite. ## Taking out interaction \small ```{r bAncova-8 } prepost.2 <- update(prepost.1, . ~ . - before:drug) anova(prepost.2) ``` \normalsize * Take out non-significant interaction. * `before` and `drug` strongly significant. * Do predictions again and plot them. ## Predictions ```{r} cbind(predictions(prepost.2, newdata = new)) %>% select(drug, before, estimate) ``` ## Plot of predicted values ```{r, fig.height=4} plot_predictions(prepost.2, condition = c("before", "drug")) ``` This time the lines are *exactly* parallel. No-interaction model forces them to have the same slope. ## Different look at model output * `anova(prepost.2)` tests for significant effect of before score and of drug, but doesn't help with interpretation. * `summary(prepost.2)` views as regression with slopes: \scriptsize ```{r bAncova-11 } summary(prepost.2) ``` \normalsize ## Understanding those slopes \footnotesize ```{r bAncova-12} tidy(prepost.2) ``` \normalsize * `before` ordinary numerical variable; `drug` categorical. * `lm` uses first category `druga` as baseline. * Intercept is prediction of after score for before score 0 and *drug A*. * `before` slope is predicted change in after score when before score increases by 1 (usual slope) * Slope for `drugb` is *change* in predicted after score for being on drug B rather than drug A. Same for *any* before score (no interaction). ## Summary * ANCOVA model: fits different regression line for each group, predicting response from covariate. * ANCOVA model with interaction between factor and covariate allows different slopes for each line. * Sometimes those lines can cross over! * If interaction not significant, take out. Lines then parallel. * With parallel lines, groups have consistent effect regardless of value of covariate.