---
title: "Repeated measures analysis"
editor: 
  markdown: 
    wrap: 72
---

## Repeated measures by profile analysis

-   More than one response *measurement* for each subject. Might be

    -   measurements of the same thing at different times

    -   measurements of different but related things

-   Generalization of matched pairs ("matched triples", etc.).

-   Variation: each subject does several different treatments at
    different times (called *crossover design*).

-   Expect measurements on same subject to be correlated, so assumptions
    of independence will fail.

-   Called *repeated measures*. Different approaches, but *profile
    analysis* uses `Manova` (set up right way).

-   Another approach uses *mixed models* (random effects).

## Packages

```{r bProfile-1, results="hide", message=FALSE}
library(car)
library(tidyverse)
library(lme4) # for mixed models later
```

## Example: histamine in dogs

-   8 dogs take part in experiment.

-   Dogs randomized to one of 2 different drugs.

-   Response: log of blood concentration of histamine 0, 1, 3 and 5
    minutes after taking drug. (Repeated measures.)

-   Data in `dogs.txt`, column-aligned.

## Read in data

```{r bProfile-2 }
my_url <- "http://ritsokiguess.site/datafiles/dogs.txt"
dogs <- read_table(my_url)
```

## Setting things up

```{r bProfile-3}
dogs
response <- with(dogs, cbind(lh0, lh1, lh3, lh5))
response
```

## The repeated measures MANOVA

Get list of response variable names; we call them `times`. Save in data
frame.

```{r bProfile-4, echo=FALSE}
options(width = 70)
```

\small

```{r bProfile-5, error=TRUE}
times <- colnames(response)
times
times.df <- data.frame(times=factor(times))
times.df
```

\normalsize

## Fitting the model

```{r}
dogs.1 <- lm(response ~ drug, data = dogs)
dogs.2 <- Manova(dogs.1,
  idata = times.df,
  idesign = ~times
)
```

## The output (some; there is a lot)

\tiny

```{r}
summary(dogs.2)
```

\normalsize

## What there is here

-   three sets of tests, for

    -   times
    -   drug
    -   their interaction

-   two *types* of test for each of these:

    -   multivariate
    -   univariate

-   multivariate is the same as MANOVA

-   univariate is more powerful *if* it applies

## Sphericity

-   The thing that decides whether the univariate tests apply is called
    "sphericity".
-   This holds if the outcomes have equal variance (to each other) and
    have the same (positive) correlation across subjects.
-   Tested using Mauchly's test (part of output)
-   If sphericity rejected, there are adjustments to the univariate
    P-values due to Huynh-Feldt and Greenhouse-Geisser. Huynh-Feldt
    better if responses not actually normal (safer).

## Univariate tests

\scriptsize

```{r}
summary(dogs.2)$sphericity.tests
summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```

\normalsize

## Comments

-   The sphericity test for the interaction is almost significant
-   The H-F adjusted P-value for the interaction is a bit bigger than
    the univariate one, but still strongly significant.
-   Therefore any lack of sphericity does not affect our conclusion:
    there is an interaction between drug and time
-   ie that the effect of time on log-histamine is different for the two
    drugs.

## Comments

-   Here, univariate test with Huynh-Feldt correction to P-value for
    interaction was 0.00073.
-   Significant interaction *is* the conclusion here.
-   If the interaction had not been significant:
    -   cannot remove interaction with time
    -   so look at univariate (better, especially if adjusted for
        sphericity) tests of main effects in *this* model

## Next

-   Interaction significant. Pattern of response over time different for
    the two drugs.

-   Want to investigate interaction.

## The wrong shape

-   But data frame has several observations per line ("wide format"):

\scriptsize

```{r bProfile-6 }
dogs %>% slice(1:6)
```

\normalsize

-   Plotting works with data in "long format": one response per line.

-   The responses are log-histamine at different times, labelled
    `lh`-something. Call them all `lh` and put them in one column, with
    the time they belong to labelled.

## Running `pivot_longer`, try 1

\footnotesize

```{r bProfile-7, size="footnotesize"}
dogs %>% pivot_longer(starts_with("lh"), 
                      names_to = "time", values_to = "lh") 
```

\normalsize

## Getting the times

Not quite right: for the times, we want just the numbers, not the
letters `lh` every time. Want new variable containing just number in
`time`: `parse_number`.

\footnotesize

```{r bProfile-8}
dogs %>%
  pivot_longer(starts_with("lh"), 
               names_to = "timex", values_to = "lh") %>% 
  mutate(time = parse_number(timex)) 
```

\normalsize

## What I did differently

-   I realized that `pivot_longer` was going to produce something like
    `lh1`, which I needed to do something further with, so this time I
    gave it a temporary name `timex`.

-   This enabled me to use the name `time` for the actual numeric time.

-   This works now, so next save into a new data frame `dogs.long`.

## Saving the pipelined results

```{r bProfile-9 }
dogs %>%
  pivot_longer(starts_with("lh"), 
               names_to = "timex", values_to = "lh") %>% 
  mutate(time = parse_number(timex)) -> dogs.long
dogs.long
```

## Comments

This says:

-   Take data frame dogs, and then:

-   Combine the columns `lh0` through `lh5` into one column called `lh`,
    with the column that each `lh` value originally came from labelled
    by `timex`, and then:

-   Pull out numeric values in `timex`, saving in `time` and then:

-   save the result in a data frame `dogs.long`.

## Interaction plot

\small

```{r bProfile-10, fig.height=3.5}
ggplot(dogs.long, aes(x = time, y = lh, 
                      colour = drug, group = drug)) +
  stat_summary(fun = mean, geom = "point") +
  stat_summary(fun = mean, geom = "line")
```

\normalsize

## Comments

-   Plot mean `lh` value at each time, joining points on same drug by
    lines.

-   drugs same at time 0

-   after that, Trimethaphan higher than Morphine.

-   Effect of drug not consistent over time: significant interaction.

## Take out time zero

-   Lines on interaction plot would then be parallel, and so interaction
    should no longer be significant.

-   Go back to original "wide" `dogs` data frame.

```{r bProfile-11, size="footnotesize", error=TRUE}
response <- with(dogs, cbind(lh1, lh3, lh5)) # excl time 0
dogs.1 <- lm(response ~ drug, data = dogs)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
dogs.2 <- Manova(dogs.1,
  idata = times.df,
  idesign = ~times
)
```

## Results (univariate)

\scriptsize

```{r}
summary(dogs.2)$sphericity.tests
summary(dogs.2)$pval.adjustments
summary(dogs.2)$univariate.tests
```

\normalsize

## Comments

-   sphericity: no problem (P-value 0.25)
-   univariate test for interaction no longer significant (P-value
    0.082)
-   look at main effects:
    -   strong significance of time, even after taking out time 0
    -   actually *not* significant drug effect, despite interaction plot

## Is the non-significant drug effect reasonable?

-   Plot *actual data*: `lh` against `days`, labelling observations by
    drug: "spaghetti plot".

-   Uses long data frame (confusing, yes I know):

-   Plot (time,lh) points coloured by drug

-   and connecting measurements for each *dog* by lines.

-   This time, we want `group = dog` (want the measurements for each
    *dog* joined by lines), but `colour = drug`:

```{r platanias}
ggplot(dogs.long, aes(x = time, y = lh,
  colour = drug, group = dog)) +
  geom_point() + geom_line() -> g
```

## The spaghetti plot

```{r hoverla,fig.height=4.5}
g
```

## Comments

-   For each dog over time, there is a strong increase and gradual
    decrease in log-histamine. The gradual decrease explains the
    significant time effect after we took out time 0.

-   The pattern is more or less the same for each dog, regardless of
    drug. This explains the non-significant interaction.

-   Most of the trimethaphan dogs (blue) have higher log-histamine
    throughout (time 1 and after), and some of the morphine dogs have
    lower.

-   *But* two of the morphine dogs have log-histamine profiles like the
    trimethaphan dogs. This ambiguity is probably why the `drug` effect
    is not quite significant.

## Mixed models

-   Another way to fit repeated measures
-   Subjects (on whom repeated measures taken) are *random sample of all
    possible subjects* (random effects)
-   Times and treatments are *the only ones we care about* (fixed
    effects)
-   Use package `lme4` function `lmer` (like `lm` in some ways)
-   Uses long-format "tidy" data

## Fitting the model (uses `lme4`)

```{r bProfile-13, message=FALSE}
# dogs.long including time zero
dogs.3 <- lmer(lh~drug*time+(1|dog), data=dogs.long)
```

-   note specification of random effect: each dog has "random intercept"
    that moves log-histamine up or down for that dog over all times

## What can we drop?

-   using `drop1`:

```{r bProfile-14}
drop1(dogs.3,test="Chisq")
```

-   Interaction again not significant, but P-value smaller than before

## Re-fit without interaction

```{r bProfile-15}
dogs.4 <- update(dogs.3,.~.-drug:time)
drop1(dogs.4,test="Chisq")
```

-   This time neither drug nor (surprisingly) time is significant.
-   MANOVA and `lmer` methods won't agree, but both valid ways to
    approach problem.

## The exercise data

-   30 people took part in an exercise study.

-   Each subject was randomly assigned to one of two diets ("low fat" or
    "non-low fat") and to one of three exercise programs ("at rest",
    "walking", "running").

-   There are $2\times3 = 6$ experimental treatments, and thus each one
    is replicated $30/6=5$ times.

-   Nothing unusual so far.

-   However, each subject had their pulse rate measured at three
    different times (1, 15 and 30 minutes after starting their
    exercise), so have repeated measures.

## Reading the data

Separated by *tabs*:

```{r bProfile-16 }
url <- "http://ritsokiguess.site/datafiles/exercise2.txt"
exercise.long <- read_tsv(url)
exercise.long
```

-   This is "long format", which is usually what we want.

-   But for repeated measures analysis, we want *wide* format!

-   `pivot_wider`.

## Making wide format

-   `pivot_wider` needs: a column that is going to be split, and the
    column to make the values out of:

\footnotesize

```{r bProfile-18}
exercise.long %>% pivot_wider(names_from=time, 
                              values_from=pulse) -> exercise.wide
exercise.wide %>% sample_n(5)
```

\normalsize

-   Normally `pivot_longer` \texttt{min01, min15,
    min30} into one column called `pulse` labelled by the number of
    minutes. But `Manova` needs it the other way.

## Setting up the repeated-measures analysis

-   Make a response variable consisting of `min01, min15, min30`:

\small

```{r bProfile-19 }
response <- with(exercise.wide, cbind(min01, min15, min30))
```

\normalsize

-   Predict that from `diet` and `exertype` and interaction using `lm`:

\small

```{r bProfile-20 }
exercise.1 <- lm(response ~ diet * exertype,
  data = exercise.wide
)
```

\normalsize

-   Run this through `Manova`:

\small

```{r bProfile-21, error=TRUE}
times <- colnames(response)
times.df <- data.frame(times=factor(times))
exercise.2 <- Manova(exercise.1, 
                     idata = times.df, 
                     idesign = ~times)
```

\normalsize

## Sphericity tests

```{r}
summary(exercise.2)$sphericity.tests
```

No problem with sphericity; go to univariate tests.

## Univariate tests

```{r, include=FALSE}
w <- getOption("width")
options(width = 120)
```

\scriptsize

```{r}
summary(exercise.2)$univariate.tests
```

\normalsize

```{r, include=FALSE}
options(width = w)
```

-   The three-way interaction is significant
    -   the effect of diet on pulse rate over time is different for the
        different exercise types

## Making some graphs

-   Three-way interactions are difficult to understand. To make an
    attempt, look at some graphs.

-   Plot time trace of pulse rates for each individual, joined by lines,
    and make *separate* plots for each `diet-exertype` combo.

-   `ggplot` again. Using *long* data frame:

```{r bProfile-24 }
g <- ggplot(exercise.long, aes(
  x = time, y = pulse,
  group = id
)) + geom_point() + geom_line() +
  facet_grid(diet ~ exertype)
```

-   `facet_grid(diet~exertype)`: do a separate plot for each combination
    of diet and exercise type, with diets going down the page and
    exercise types going across. (Graphs are usually landscape, so have
    the factor `exertype` with more levels going across.)

## The graph(s)

```{r bProfile-25, fig.height=5}
g
```

## Comments on graphs

-   For subjects who were at rest, no change in pulse rate over time,
    for both diet groups.

-   For walking subjects, not much change in pulse rates over time.
    Maybe a small increase on average between 1 and 15 minutes.

-   For both running groups, an overall increase in pulse rate over
    time, but the increase is stronger for the `lowfat` group.

-   No consistent effect of diet over all exercise groups.

-   No consistent effect of exercise type over both diet groups.

-   No consistent effect of time over all diet-exercise type combos.

## "Simple effects" of diet for the subjects who ran

-   Looks as if there is only any substantial time effect for the
    runners. For them, does diet have an effect?

-   Pull out only the runners from the wide data:

```{r bProfile-26 }
exercise.wide %>%
  filter(exertype == "running") -> runners.wide
```

-   Create response variable and do MANOVA. Some of this looks like
    before, but I have different data now:

\footnotesize

```{r bProfile-27}
response <- with(runners.wide, cbind(min01, min15, min30))
runners.1 <- lm(response ~ diet, data = runners.wide)
times <- colnames(response)
times.df <- data.frame(times=factor(times))
runners.2 <- Manova(runners.1,
  idata = times.df,
  idesign = ~times
)
```

\normalsize

## Sphericity tests

```{r}
summary(runners.2)$sphericity.tests
```

-   No problem, look at univariate tests.

## Univariate tests

\footnotesize

```{r}
summary(runners.2)$univariate.tests
```

\normalsize

-   Interaction still significant
    -   dependence of pulse rate on time still different for the two
        diets

## How is the effect of diet different over time?

-   Table of means. Only I need long data for this:

```{r bProfile-29 }
runners.wide %>%
  pivot_longer(starts_with("min"), 
               names_to = "time", values_to = "pulse") %>%
  group_by(time, diet) %>%
  summarize(
    mean = mean(pulse),
    sd = sd(pulse)
  ) -> summ
```

-   Result of `summarize` is data frame, so can save it (and do more
    with it if needed).

## Understanding diet-time interaction

-   The summary:

\footnotesize

```{r bProfile-30}
summ
```

\normalsize

-   Pulse rates at any given time higher for `lowfat` (diet effect),

-   Pulse rates increase over time of exercise (time effect),

-   but the *amount by which pulse rate higher* for a diet depends on
    time: `diet` by `time` interaction.

## Interaction plot

-   We went to trouble of finding means by group, so making interaction
    plot is now mainly easy:

```{r bProfile-31, fig.height=2.7}
ggplot(summ, aes(x = time, y = mean, colour = diet,
                 group = diet)) + geom_point() + geom_line()
```

## Comment on interaction plot

-   The lines are not parallel, so there is interaction between diet and
    time for the runners.
-   The effect of time on pulse rate is different for the two diets,
    even though all the subjects here were running.