--- title: "Repeated measures analysis" editor: markdown: wrap: 72 --- ## Repeated measures by profile analysis - More than one response *measurement* for each subject. Might be - measurements of the same thing at different times - measurements of different but related things - Generalization of matched pairs ("matched triples", etc.). - Variation: each subject does several different treatments at different times (called *crossover design*). - Expect measurements on same subject to be correlated, so assumptions of independence will fail. - Called *repeated measures*. Different approaches, but *profile analysis* uses `Manova` (set up right way). - Another approach uses *mixed models* (random effects). ## Packages ```{r bProfile-1, results="hide", message=FALSE} library(car) library(tidyverse) library(lme4) # for mixed models later ``` ## Example: histamine in dogs - 8 dogs take part in experiment. - Dogs randomized to one of 2 different drugs. - Response: log of blood concentration of histamine 0, 1, 3 and 5 minutes after taking drug. (Repeated measures.) - Data in `dogs.txt`, column-aligned. ## Read in data ```{r bProfile-2 } my_url <- "http://ritsokiguess.site/datafiles/dogs.txt" dogs <- read_table(my_url) ``` ## Setting things up ```{r bProfile-3} dogs response <- with(dogs, cbind(lh0, lh1, lh3, lh5)) response ``` ## The repeated measures MANOVA Get list of response variable names; we call them `times`. Save in data frame. ```{r bProfile-4, echo=FALSE} options(width = 70) ``` \small ```{r bProfile-5, error=TRUE} times <- colnames(response) times times.df <- data.frame(times=factor(times)) times.df ``` \normalsize ## Fitting the model ```{r} dogs.1 <- lm(response ~ drug, data = dogs) dogs.2 <- Manova(dogs.1, idata = times.df, idesign = ~times ) ``` ## The output (some; there is a lot) \tiny ```{r} summary(dogs.2) ``` \normalsize ## What there is here - three sets of tests, for - times - drug - their interaction - two *types* of test for each of these: - multivariate - univariate - multivariate is the same as MANOVA - univariate is more powerful *if* it applies ## Sphericity - The thing that decides whether the univariate tests apply is called "sphericity". - This holds if the outcomes have equal variance (to each other) and have the same (positive) correlation across subjects. - Tested using Mauchly's test (part of output) - If sphericity rejected, there are adjustments to the univariate P-values due to Huynh-Feldt and Greenhouse-Geisser. Huynh-Feldt better if responses not actually normal (safer). ## Univariate tests \scriptsize ```{r} summary(dogs.2)$sphericity.tests summary(dogs.2)$pval.adjustments summary(dogs.2)$univariate.tests ``` \normalsize ## Comments - The sphericity test for the interaction is almost significant - The H-F adjusted P-value for the interaction is a bit bigger than the univariate one, but still strongly significant. - Therefore any lack of sphericity does not affect our conclusion: there is an interaction between drug and time - ie that the effect of time on log-histamine is different for the two drugs. ## Comments - Here, univariate test with Huynh-Feldt correction to P-value for interaction was 0.00073. - Significant interaction *is* the conclusion here. - If the interaction had not been significant: - cannot remove interaction with time - so look at univariate (better, especially if adjusted for sphericity) tests of main effects in *this* model ## Next - Interaction significant. Pattern of response over time different for the two drugs. - Want to investigate interaction. ## The wrong shape - But data frame has several observations per line ("wide format"): \scriptsize ```{r bProfile-6 } dogs %>% slice(1:6) ``` \normalsize - Plotting works with data in "long format": one response per line. - The responses are log-histamine at different times, labelled `lh`-something. Call them all `lh` and put them in one column, with the time they belong to labelled. ## Running `pivot_longer`, try 1 \footnotesize ```{r bProfile-7, size="footnotesize"} dogs %>% pivot_longer(starts_with("lh"), names_to = "time", values_to = "lh") ``` \normalsize ## Getting the times Not quite right: for the times, we want just the numbers, not the letters `lh` every time. Want new variable containing just number in `time`: `parse_number`. \footnotesize ```{r bProfile-8} dogs %>% pivot_longer(starts_with("lh"), names_to = "timex", values_to = "lh") %>% mutate(time = parse_number(timex)) ``` \normalsize ## What I did differently - I realized that `pivot_longer` was going to produce something like `lh1`, which I needed to do something further with, so this time I gave it a temporary name `timex`. - This enabled me to use the name `time` for the actual numeric time. - This works now, so next save into a new data frame `dogs.long`. ## Saving the pipelined results ```{r bProfile-9 } dogs %>% pivot_longer(starts_with("lh"), names_to = "timex", values_to = "lh") %>% mutate(time = parse_number(timex)) -> dogs.long dogs.long ``` ## Comments This says: - Take data frame dogs, and then: - Combine the columns `lh0` through `lh5` into one column called `lh`, with the column that each `lh` value originally came from labelled by `timex`, and then: - Pull out numeric values in `timex`, saving in `time` and then: - save the result in a data frame `dogs.long`. ## Interaction plot \small ```{r bProfile-10, fig.height=3.5} ggplot(dogs.long, aes(x = time, y = lh, colour = drug, group = drug)) + stat_summary(fun = mean, geom = "point") + stat_summary(fun = mean, geom = "line") ``` \normalsize ## Comments - Plot mean `lh` value at each time, joining points on same drug by lines. - drugs same at time 0 - after that, Trimethaphan higher than Morphine. - Effect of drug not consistent over time: significant interaction. ## Take out time zero - Lines on interaction plot would then be parallel, and so interaction should no longer be significant. - Go back to original "wide" `dogs` data frame. ```{r bProfile-11, size="footnotesize", error=TRUE} response <- with(dogs, cbind(lh1, lh3, lh5)) # excl time 0 dogs.1 <- lm(response ~ drug, data = dogs) times <- colnames(response) times.df <- data.frame(times=factor(times)) dogs.2 <- Manova(dogs.1, idata = times.df, idesign = ~times ) ``` ## Results (univariate) \scriptsize ```{r} summary(dogs.2)$sphericity.tests summary(dogs.2)$pval.adjustments summary(dogs.2)$univariate.tests ``` \normalsize ## Comments - sphericity: no problem (P-value 0.25) - univariate test for interaction no longer significant (P-value 0.082) - look at main effects: - strong significance of time, even after taking out time 0 - actually *not* significant drug effect, despite interaction plot ## Is the non-significant drug effect reasonable? - Plot *actual data*: `lh` against `days`, labelling observations by drug: "spaghetti plot". - Uses long data frame (confusing, yes I know): - Plot (time,lh) points coloured by drug - and connecting measurements for each *dog* by lines. - This time, we want `group = dog` (want the measurements for each *dog* joined by lines), but `colour = drug`: ```{r platanias} ggplot(dogs.long, aes(x = time, y = lh, colour = drug, group = dog)) + geom_point() + geom_line() -> g ``` ## The spaghetti plot ```{r hoverla,fig.height=4.5} g ``` ## Comments - For each dog over time, there is a strong increase and gradual decrease in log-histamine. The gradual decrease explains the significant time effect after we took out time 0. - The pattern is more or less the same for each dog, regardless of drug. This explains the non-significant interaction. - Most of the trimethaphan dogs (blue) have higher log-histamine throughout (time 1 and after), and some of the morphine dogs have lower. - *But* two of the morphine dogs have log-histamine profiles like the trimethaphan dogs. This ambiguity is probably why the `drug` effect is not quite significant. ## Mixed models - Another way to fit repeated measures - Subjects (on whom repeated measures taken) are *random sample of all possible subjects* (random effects) - Times and treatments are *the only ones we care about* (fixed effects) - Use package `lme4` function `lmer` (like `lm` in some ways) - Uses long-format "tidy" data ## Fitting the model (uses `lme4`) ```{r bProfile-13, message=FALSE} # dogs.long including time zero dogs.3 <- lmer(lh~drug*time+(1|dog), data=dogs.long) ``` - note specification of random effect: each dog has "random intercept" that moves log-histamine up or down for that dog over all times ## What can we drop? - using `drop1`: ```{r bProfile-14} drop1(dogs.3,test="Chisq") ``` - Interaction again not significant, but P-value smaller than before ## Re-fit without interaction ```{r bProfile-15} dogs.4 <- update(dogs.3,.~.-drug:time) drop1(dogs.4,test="Chisq") ``` - This time neither drug nor (surprisingly) time is significant. - MANOVA and `lmer` methods won't agree, but both valid ways to approach problem. ## The exercise data - 30 people took part in an exercise study. - Each subject was randomly assigned to one of two diets ("low fat" or "non-low fat") and to one of three exercise programs ("at rest", "walking", "running"). - There are $2\times3 = 6$ experimental treatments, and thus each one is replicated $30/6=5$ times. - Nothing unusual so far. - However, each subject had their pulse rate measured at three different times (1, 15 and 30 minutes after starting their exercise), so have repeated measures. ## Reading the data Separated by *tabs*: ```{r bProfile-16 } url <- "http://ritsokiguess.site/datafiles/exercise2.txt" exercise.long <- read_tsv(url) exercise.long ``` - This is "long format", which is usually what we want. - But for repeated measures analysis, we want *wide* format! - `pivot_wider`. ## Making wide format - `pivot_wider` needs: a column that is going to be split, and the column to make the values out of: \footnotesize ```{r bProfile-18} exercise.long %>% pivot_wider(names_from=time, values_from=pulse) -> exercise.wide exercise.wide %>% sample_n(5) ``` \normalsize - Normally `pivot_longer` \texttt{min01, min15, min30} into one column called `pulse` labelled by the number of minutes. But `Manova` needs it the other way. ## Setting up the repeated-measures analysis - Make a response variable consisting of `min01, min15, min30`: \small ```{r bProfile-19 } response <- with(exercise.wide, cbind(min01, min15, min30)) ``` \normalsize - Predict that from `diet` and `exertype` and interaction using `lm`: \small ```{r bProfile-20 } exercise.1 <- lm(response ~ diet * exertype, data = exercise.wide ) ``` \normalsize - Run this through `Manova`: \small ```{r bProfile-21, error=TRUE} times <- colnames(response) times.df <- data.frame(times=factor(times)) exercise.2 <- Manova(exercise.1, idata = times.df, idesign = ~times) ``` \normalsize ## Sphericity tests ```{r} summary(exercise.2)$sphericity.tests ``` No problem with sphericity; go to univariate tests. ## Univariate tests ```{r, include=FALSE} w <- getOption("width") options(width = 120) ``` \scriptsize ```{r} summary(exercise.2)$univariate.tests ``` \normalsize ```{r, include=FALSE} options(width = w) ``` - The three-way interaction is significant - the effect of diet on pulse rate over time is different for the different exercise types ## Making some graphs - Three-way interactions are difficult to understand. To make an attempt, look at some graphs. - Plot time trace of pulse rates for each individual, joined by lines, and make *separate* plots for each `diet-exertype` combo. - `ggplot` again. Using *long* data frame: ```{r bProfile-24 } g <- ggplot(exercise.long, aes( x = time, y = pulse, group = id )) + geom_point() + geom_line() + facet_grid(diet ~ exertype) ``` - `facet_grid(diet~exertype)`: do a separate plot for each combination of diet and exercise type, with diets going down the page and exercise types going across. (Graphs are usually landscape, so have the factor `exertype` with more levels going across.) ## The graph(s) ```{r bProfile-25, fig.height=5} g ``` ## Comments on graphs - For subjects who were at rest, no change in pulse rate over time, for both diet groups. - For walking subjects, not much change in pulse rates over time. Maybe a small increase on average between 1 and 15 minutes. - For both running groups, an overall increase in pulse rate over time, but the increase is stronger for the `lowfat` group. - No consistent effect of diet over all exercise groups. - No consistent effect of exercise type over both diet groups. - No consistent effect of time over all diet-exercise type combos. ## "Simple effects" of diet for the subjects who ran - Looks as if there is only any substantial time effect for the runners. For them, does diet have an effect? - Pull out only the runners from the wide data: ```{r bProfile-26 } exercise.wide %>% filter(exertype == "running") -> runners.wide ``` - Create response variable and do MANOVA. Some of this looks like before, but I have different data now: \footnotesize ```{r bProfile-27} response <- with(runners.wide, cbind(min01, min15, min30)) runners.1 <- lm(response ~ diet, data = runners.wide) times <- colnames(response) times.df <- data.frame(times=factor(times)) runners.2 <- Manova(runners.1, idata = times.df, idesign = ~times ) ``` \normalsize ## Sphericity tests ```{r} summary(runners.2)$sphericity.tests ``` - No problem, look at univariate tests. ## Univariate tests \footnotesize ```{r} summary(runners.2)$univariate.tests ``` \normalsize - Interaction still significant - dependence of pulse rate on time still different for the two diets ## How is the effect of diet different over time? - Table of means. Only I need long data for this: ```{r bProfile-29 } runners.wide %>% pivot_longer(starts_with("min"), names_to = "time", values_to = "pulse") %>% group_by(time, diet) %>% summarize( mean = mean(pulse), sd = sd(pulse) ) -> summ ``` - Result of `summarize` is data frame, so can save it (and do more with it if needed). ## Understanding diet-time interaction - The summary: \footnotesize ```{r bProfile-30} summ ``` \normalsize - Pulse rates at any given time higher for `lowfat` (diet effect), - Pulse rates increase over time of exercise (time effect), - but the *amount by which pulse rate higher* for a diet depends on time: `diet` by `time` interaction. ## Interaction plot - We went to trouble of finding means by group, so making interaction plot is now mainly easy: ```{r bProfile-31, fig.height=2.7} ggplot(summ, aes(x = time, y = mean, colour = diet, group = diet)) + geom_point() + geom_line() ``` ## Comment on interaction plot - The lines are not parallel, so there is interaction between diet and time for the runners. - The effect of time on pulse rate is different for the two diets, even though all the subjects here were running.