--- title: "Analysis of Covariance" editor: markdown: wrap: 72 --- ## Analysis of covariance - ANOVA: explanatory variables categorical (divide data into groups) - traditionally, analysis of covariance has categorical $x$'s plus one numerical $x$ ("covariate") to be adjusted for. - `lm` handles this too. - Simple example: two treatments (drugs) (`a` and `b`), with before and after scores. - Does knowing before score and/or treatment help to predict after score? - Is after score different by treatment/before score? ## Data Treatment, before, after: ``` a 5 20 a 10 23 a 12 30 a 9 25 a 23 34 a 21 40 a 14 27 a 18 38 a 6 24 a 13 31 b 7 19 b 12 26 b 27 33 b 24 35 b 18 30 b 22 31 b 26 34 b 21 28 b 14 23 b 9 22 ``` \normalsize ## Packages ```{r bAncova-1} library(tidyverse) library(broom) library(marginaleffects) ``` the last of these for predictions. ## Read in data ```{r bAncova-2, message=F} url <- "http://ritsokiguess.site/datafiles/ancova.txt" prepost <- read_delim(url, " ") prepost ``` ## Making a plot ```{r ancova-plot, fig.height=4.5} ggplot(prepost, aes(x = before, y = after, colour = drug)) + geom_point() + geom_smooth(method = "lm") ``` ## Comments - As before score goes up, after score goes up. - Red points (drug A) generally above blue points (drug B), for comparable before score. - Suggests before score effect *and* drug effect. ## The means ```{r bAncova-3 } prepost %>% group_by(drug) %>% summarize( before_mean = mean(before), after_mean = mean(after) ) ``` - Mean "after" score slightly higher for treatment A. - Mean "before" score much higher for treatment B. - Greater *improvement* on treatment A. ## Testing for interaction ```{r bAncova-4 } prepost.1 <- lm(after ~ before * drug, data = prepost) anova(prepost.1) summary(prepost.1) ``` - Interaction not significant. Will remove later. ## Predictions Set up values to predict for: ```{r} summary(prepost) ``` ```{r} new <- datagrid(before = c(9.75, 14, 21.25), drug = c("a", "b"), model = prepost.1) new ``` ## and then ```{r} cbind(predictions(prepost.1, newdata = new)) %>% select(drug, before, estimate, conf.low, conf.high) ``` \normalsize ## Predictions (with interaction included), plotted ```{r, fig.height=4} plot_predictions(model = prepost.1, condition = c("before", "drug")) ``` Lines almost parallel, but not quite. ## Taking out interaction \small ```{r bAncova-8 } prepost.2 <- update(prepost.1, . ~ . - before:drug) summary(prepost.2) anova(prepost.2) ``` \normalsize - Take out non-significant interaction. - `before` and `drug` strongly significant. - Do predictions again and plot them. ## Predictions ```{r} cbind(predictions(prepost.2, newdata = new)) %>% select(drug, before, estimate) ``` ## Plot of predicted values ```{r, fig.height=4} plot_predictions(prepost.2, condition = c("before", "drug")) ``` This time the lines are *exactly* parallel. No-interaction model forces them to have the same slope. ## Different look at model output - `anova(prepost.2)` tests for significant effect of before score and of drug, but doesn't help with interpretation. - `summary(prepost.2)` views as regression with slopes: \scriptsize ```{r bAncova-11 } summary(prepost.2) ``` \normalsize ## Understanding those slopes \footnotesize ```{r bAncova-12} tidy(prepost.2) ``` \normalsize - `before` ordinary numerical variable; `drug` categorical. - `lm` uses first category `druga` as baseline. - Intercept is prediction of after score for before score 0 and *drug A*. - `before` slope is predicted change in after score when before score increases by 1 (usual slope) - Slope for `drugb` is *change* in predicted after score for being on drug B rather than drug A. Same for *any* before score (no interaction). ## Summary - ANCOVA model: fits different regression line for each group, predicting response from covariate. - ANCOVA model with interaction between factor and covariate allows different slopes for each line. - Sometimes those lines can cross over! - If interaction not significant, take out. Lines then parallel. - With parallel lines, groups have consistent effect regardless of value of covariate.