---
title: "Repeated measures analysis"
---
## Repeated measures by profile analysis
* More than one response *measurement* for each subject. Might be
* measurements of the same thing at different times
* measurements of different but related things
* Generalization of matched pairs ("matched triples", etc.).
* Variation: each subject does several different treatments at different times (called *crossover design*).
* Expect measurements on same subject to be correlated, so
assumptions of independence will fail.
* Called *repeated measures*. Different approaches, but *profile analysis* uses `Manova` (set up right way).
* Another approach uses *mixed models* (random effects).
## Packages
```{r bProfile-1, results="hide", message=FALSE}
library(car)
library(tidyverse)
library(lme4) # for mixed models later
```
## Example: histamine in dogs
* 8 dogs take part in experiment.
* Dogs randomized to one of 2 different drugs.
* Response: log of blood concentration of histamine 0, 1, 3 and 5 minutes after taking drug. (Repeated measures.)
* Data in `dogs.txt`, column-aligned.
## Read in data
```{r bProfile-2 }
my_url <- "http://ritsokiguess.site/datafiles/dogs.txt"
dogs <- read_table(my_url)
```
## Setting things up
```{r bProfile-3}
dogs
response <- with(dogs, cbind(lh0, lh1, lh3, lh5))
dogs.1 <- lm(response ~ drug, data = dogs)
response
```
## The repeated measures MANOVA
Get list of response variable names; we call them `times`. Save
in data frame.
```{r bProfile-4, echo=FALSE}
options(width = 70)
```
\small
```{r bProfile-5, error=TRUE}
times <- colnames(response)
times
times.df <- data.frame(times=factor(times))
times.df
```
\normalsize
## Fitting the model
```{r}
dogs.2 <- Manova(dogs.1,
idata = times.df,
idesign = ~times
)
```
## The output (some; there is a lot)
\tiny
```{r}
summary(dogs.2)
```
\normalsize
## What there is here
- three sets of tests, for
- times
- drug
- their interaction
- two *types* of test for each of these:
- multivariate
- univariate
- multivariate is the same as MANOVA
- univariate is more powerful *if* it applies
## Sphericity
- The thing that decides whether the univariate tests apply is called "sphericity".
- This holds if the outcomes have equal variance (to each other) and have the same (positive) correlation across subjects.
- Tested using Mauchly's test (part of output)
- If sphericity rejected, there are adjustments to the univariate P-values due to Huynh-Feldt and Greenhouse-Geisser. Huynh-Feldt better if responses not actually normal (safer).
## Univariate tests
\scriptsize
```{r}
summary(dogs.2)$sphericity.tests
summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```
\normalsize
## Comments
- The sphericity test for the interaction is almost significant
- The H-F adjusted P-value for the interaction is a bit bigger than the univariate one, but still strongly significant.
- Therefore any lack of sphericity does not affect our conclusion: there is an interaction between drug and time
- ie that the effect of time on log-histamine is different for the two drugs.
## Comments
- Here, univariate test with Huynh-Feldt correction to P-value for interaction was 0.00073.
- Significant interaction *is* the conclusion here.
- If the interaction had not been significant:
- cannot remove interaction with time
- so look at univariate (better, especially if adjusted for sphericity) tests of main effects in *this* model
## Next
- Interaction significant. Pattern of response over time different
for the two drugs.
* Want to investigate interaction.
## The wrong shape
* But data frame has several observations per line ("wide format"):
\scriptsize
```{r bProfile-6 }
dogs %>% slice(1:6)
```
\normalsize
* Plotting works with data in "long format":
one response per line.
* The responses are log-histamine at different times, labelled
`lh`-something. Call them all `lh` and put them in
one column, with the time they belong to labelled.
## Running `pivot_longer`, try 1
\footnotesize
```{r bProfile-7, size="footnotesize"}
dogs %>% pivot_longer(starts_with("lh"),
names_to = "time", values_to = "lh")
```
\normalsize
## Getting the times
Not quite right: for the times, we want just the numbers, not the
letters `lh` every time. Want new variable
containing just number in `time`:
`parse_number`.
\footnotesize
```{r bProfile-8}
dogs %>%
pivot_longer(starts_with("lh"),
names_to = "timex", values_to = "lh") %>%
mutate(time = parse_number(timex))
```
\normalsize
## What I did differently
* I realized that `pivot_longer` was going to produce something
like `lh1`, which I needed to do something further with, so
this time I gave it a temporary name `timex`.
* This enabled me to use the name `time` for the actual
numeric time.
* This works now, so next save into a new data frame `dogs.long`.
## Saving the pipelined results
```{r bProfile-9 }
dogs %>%
pivot_longer(starts_with("lh"),
names_to = "timex", values_to = "lh") %>%
mutate(time = parse_number(timex)) -> dogs.long
dogs.long
```
## Comments
This says:
* Take data frame dogs, and then:
* Combine the columns `lh0` through `lh5` into one
column called `lh`, with the column that each `lh`
value originally came from labelled by `timex`, and then:
* Pull out numeric values in `timex`, saving in `time` and then:
* save the result in a data frame `dogs.long`.
## Interaction plot
\small
```{r bProfile-10, fig.height=3.5}
ggplot(dogs.long, aes(x = time, y = lh,
colour = drug, group = drug)) +
stat_summary(fun = mean, geom = "point") +
stat_summary(fun = mean, geom = "line")
```
\normalsize
## Comments
* Plot mean `lh` value at each time, joining points on same
drug by lines.
* drugs same at time 0
* after that, Trimethaphan higher than Morphine.
* Effect of drug not consistent over time: significant interaction.
## Take out time zero
* Lines on interaction plot would then be parallel, and so interaction should
no longer be significant.
* Go back to original "wide" `dogs` data frame.
```{r bProfile-11, size="footnotesize", error=TRUE}
response <- with(dogs, cbind(lh1, lh3, lh5)) # excl time 0
dogs.1 <- lm(response ~ drug, data = dogs)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
dogs.2 <- Manova(dogs.1,
idata = times.df,
idesign = ~times
)
```
## Results (univariate)
\scriptsize
```{r}
summary(dogs.2)$sphericity.tests
summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```
\normalsize
## Comments
- sphericity: no problem (P-value 0.25)
- univariate test for interaction no longer significant (P-value 0.082)
- look at main effects:
- strong significance of time, even after taking out time 0
- actually *not* significant drug effect, despite interaction plot
## Is the non-significant drug effect reasonable?
* Plot *actual data*: `lh` against `days`,
labelling observations by drug: "spaghetti plot".
* Uses long data frame (confusing, yes I know):
* Plot (time,lh) points coloured by drug
* and connecting measurements for each *dog* by lines.
* This time, we want `group = dog` (want the measurements for each
*dog* joined by lines), but `colour = drug`:
```{r platanias}
ggplot(dogs.long, aes(x = time, y = lh,
colour = drug, group = dog)) +
geom_point() + geom_line() -> g
```
## The spaghetti plot
```{r hoverla,fig.height=4.5}
g
```
## Comments
* For each dog over time, there is a strong increase and gradual
decrease in log-histamine. The gradual decrease explains
the significant time effect after we took out time 0.
* The pattern is more or less the same for each dog, regardless
of drug. This explains the non-significant interaction.
* Most of the trimethaphan dogs (blue) have higher log-histamine
throughout (time 1 and after), and some of the morphine dogs have
lower.
* *But* two of the morphine dogs have log-histamine
profiles like the trimethaphan dogs. This ambiguity is probably
why the `drug` effect is not quite significant.
## Mixed models
- Another way to fit repeated measures
- Subjects (on whom repeated measures taken) are *random sample of all possible subjects* (random effects)
- Times and treatments are *the only ones we care about* (fixed effects)
- Use package `lme4` function `lmer` (like `lm` in some ways)
- Uses long-format "tidy" data
## Fitting the model (uses `lme4`)
```{r bProfile-13, message=FALSE}
# dogs.long including time zero
dogs.3 <- lmer(lh~drug*time+(1|dog), data=dogs.long)
```
- note specification of random effect: each dog has "random intercept" that moves log-histamine up or down for that dog over all times
## What can we drop?
- using `drop1`:
```{r bProfile-14}
drop1(dogs.3,test="Chisq")
```
- Interaction again not significant, but P-value smaller than before
## Re-fit without interaction
```{r bProfile-15}
dogs.4 <- update(dogs.3,.~.-drug:time)
drop1(dogs.4,test="Chisq")
```
- This time neither drug nor (surprisingly) time is significant.
- MANOVA and `lmer` methods won't agree, but both valid ways to approach problem.
## The exercise data
* 30 people took part in an exercise study.
* Each subject was
randomly assigned to one of two diets ("low fat" or "non-low
fat") and to one of three exercise programs ("at rest",
"walking", "running").
* There are $2\times3 = 6$ experimental treatments, and thus
each one is replicated $30/6=5$ times.
* Nothing unusual so far.
* However, each subject had their pulse rate measured at three
different times (1, 15 and 30 minutes after starting their
exercise), so have repeated measures.
## Reading the data
Separated by *tabs*:
```{r bProfile-16 }
url <- "http://ritsokiguess.site/datafiles/exercise2.txt"
exercise.long <- read_tsv(url)
exercise.long
```
* This is "long format", which is usually what we want.
* But for repeated measures analysis, we want *wide* format!
* `pivot_wider`.
## Making wide format
* `pivot_wider` needs: a column that is
going to be split, and the column to make the values out of:
\footnotesize
```{r bProfile-18}
exercise.long %>% pivot_wider(names_from=time,
values_from=pulse) -> exercise.wide
exercise.wide %>% sample_n(5)
```
\normalsize
* Normally `pivot_longer` \texttt{min01, min15,
min30} into one column called `pulse` labelled by the
number of minutes. But `Manova` needs it the other way.
## Setting up the repeated-measures analysis
* Make a response variable consisting of `min01, min15, min30`:
\small
```{r bProfile-19 }
response <- with(exercise.wide, cbind(min01, min15, min30))
```
\normalsize
* Predict that from `diet` and `exertype` and
interaction using `lm`:
\small
```{r bProfile-20 }
exercise.1 <- lm(response ~ diet * exertype,
data = exercise.wide
)
```
\normalsize
* Run this through `Manova`:
\small
```{r bProfile-21, error=TRUE}
times <- colnames(response)
times.df <- data.frame(times=factor(times))
exercise.2 <- Manova(exercise.1,
idata = times.df,
idesign = ~times)
```
\normalsize
## Sphericity tests
```{r}
summary(exercise.2)$sphericity.tests
```
No problem with sphericity; go to univariate tests.
## Univariate tests
```{r, include=FALSE}
w <- getOption("width")
options(width = 120)
```
\scriptsize
```{r}
summary(exercise.2)$univariate.tests
```
\normalsize
```{r, include=FALSE}
options(width = w)
```
- The three-way interaction is significant
- the effect of diet on pulse rate over time is different for the different exercise types
## Making some graphs
* Three-way interactions are difficult to understand. To make
an attempt, look at some graphs.
* Plot time trace of pulse rates for each individual, joined by
lines, and make *separate* plots for each
`diet-exertype` combo.
* `ggplot` again. Using *long* data frame:
```{r bProfile-24 }
g <- ggplot(exercise.long, aes(
x = time, y = pulse,
group = id
)) + geom_point() + geom_line() +
facet_grid(diet ~ exertype)
```
* `facet_grid(diet~exertype)`: do a separate plot for each
combination of diet and exercise type, with diets going down the
page and exercise types going across. (Graphs are usually landscape,
so have the factor `exertype` with more levels going across.)
## The graph(s)
```{r bProfile-25, fig.height=5}
g
```
## Comments on graphs
* For subjects who were at rest, no change in pulse rate over
time, for both diet groups.
* For walking subjects, not much change in pulse rates over
time. Maybe a small increase on average between 1 and 15 minutes.
* For both running groups, an overall increase in pulse rate
over time, but the increase is stronger for the `lowfat`
group.
* No consistent effect of diet over all exercise groups.
* No consistent effect of exercise type over both diet groups.
* No consistent effect of time over all diet-exercise type combos.
## "Simple effects" of diet for the subjects who ran
* Looks as if there is only any substantial time effect for the
runners. For them, does diet have an effect?
* Pull out only the runners from the wide data:
```{r bProfile-26 }
exercise.wide %>%
filter(exertype == "running") -> runners.wide
```
* Create response variable and do MANOVA. Some of this looks like
before, but I have different data now:
\footnotesize
```{r bProfile-27}
response <- with(runners.wide, cbind(min01, min15, min30))
runners.1 <- lm(response ~ diet, data = runners.wide)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
runners.2 <- Manova(runners.1,
idata = times.df,
idesign = ~times
)
```
\normalsize
## Sphericity tests
```{r}
summary(runners.2)$sphericity.tests
```
- No problem, look at univariate tests.
## Univariate tests
\footnotesize
```{r}
summary(runners.2)$univariate.tests
```
\normalsize
- Interaction still significant
- dependence of pulse rate on time still different for the two diets
## How is the effect of diet different over time?
* Table of means. Only I need long data for this:
```{r bProfile-29 }
runners.wide %>%
pivot_longer(starts_with("min"),
names_to = "time", values_to = "pulse") %>%
group_by(time, diet) %>%
summarize(
mean = mean(pulse),
sd = sd(pulse)
) -> summ
```
* Result of `summarize` is data frame, so can save it (and
do more with it if needed).
## Understanding diet-time interaction
* The summary:
\footnotesize
```{r bProfile-30}
summ
```
\normalsize
* Pulse rates at any given time higher for `lowfat` (diet
effect),
* Pulse rates increase over time of exercise (time effect),
* but the *amount by which pulse rate higher* for a diet depends on
time: `diet` by `time` interaction.
## Interaction plot
* We went to trouble of finding means by group, so making
interaction plot is now mainly easy:
```{r bProfile-31, fig.height=2.7}
ggplot(summ, aes(x = time, y = mean, colour = diet,
group = diet)) + geom_point() + geom_line()
```
## Comment on interaction plot
* The lines are not parallel, so there is interaction between diet
and time for the runners.
- The effect of time on pulse rate is different for the two diets, even though all the subjects here were running.