---
title: "Presenting Regression Analyses in R"
author: "Kevin Donovan"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output: slidy_presentation
---
```{r setup, include=FALSE, echo=FALSE}
knitr::opts_chunk$set(echo=FALSE, message = FALSE, warning = FALSE, fig.width = 8,
fig.height = 4)
library(tidyverse)
library(shiny)
library(rmarkdown)
library(broom)
library(gtsummary)
library(flextable)
library(ggpubr)
library(nlme)
library(lme4)
library(broom.mixed)
library(GGally)
library(corrplot)
library(ggcorrplot)
```
```{r}
brain_data <- read_csv("../Data/IBIS_brain_data_ex.csv")
```
# Introduction
- We have discussed how to do regression analyses
- Presenting analyses to communicate results just as important
- Maximizes the impact of your work
- We will use packages we have discussed before: `ggplot`, `gtsummary`, etc.
- As well as new ones: `GGally`, `corrplot`, etc.
# Correlation analyses
- Correlation analyses often presented using boring tables
```{r}
brain_data_v24 <-
brain_data %>%
select(names(brain_data)[grepl("V24", names(brain_data))]) %>%
select(EACSF_V24:RightAmygdala_V24 )
cor(x=brain_data_v24, method="pearson", use="pairwise.complete.obs")
```
# Correlation analyses
- Instead, let's use visualizations!
```{r fig.width = 7, fig.height = 6}
brain_data_v24 <-
brain_data %>%
select(names(brain_data)[grepl("V24", names(brain_data))]) %>%
select(EACSF_V24:RightAmygdala_V24 )
p.mat <- cor_pmat(brain_data_v24)
ggcorrplot(cor(x=brain_data_v24, method="pearson", use="pairwise.complete.obs"),
hc.order = TRUE, type = "lower", lab = TRUE, outline.col = "white",
p.mat = p.mat)
corrplot(cor(x=brain_data_v24, method="pearson", use="pairwise.complete.obs"))
```
# Correlation analyses
- Can also add in visualizations which look at distributions too
```{r fig.width = 10, fig.height = 8}
brain_data_v24 <-
brain_data %>%
select(c(names(brain_data)[grepl("V24", names(brain_data))], "RiskGroup")) %>%
select(EACSF_V24:RightAmygdala_V24, RiskGroup)
ggpairs(brain_data_v24,
columns = names(brain_data_v24)[!grepl("RiskGroup", names(brain_data_v24))],
ggplot2::aes(colour=RiskGroup, alpha=0.25)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
```
# Summary statistics
- Can create easily formatted summary stats tables **in code**
```{r fig.width = 10, fig.height = 8}
tbl_summary(data=brain_data_v24, by=RiskGroup,
missing_text = "Missing",
statistic = list(all_continuous() ~ "{mean} ({sd})")) %>%
add_p(list(all_continuous() ~ "aov",
all_categorical() ~ "chisq")) %>%
add_n() %>%
as_flex_table() %>%
bold(bold = TRUE, part = "header") %>%
autofit()
```
# Regression summaries
- Let's model 24 month MSEL expressive language scores as Amygdala volume and ASD diagnosis at 24 months
- Common way to look at results: `summary`
```{r}
lm_fit <- lm(V24_MSEL_EL_ae~RiskGroup+RightAmygdala_V24+
RiskGroup*RightAmygdala_V24,
data=brain_data)
summary(lm_fit)
```
- Not very eye catching
# Regression summaries: tables
- Let's start with improved tables containing this same information
- Remade table: `tbl_regression`
- Can also be used with generalized linear models
```{r}
tbl_regression(lm_fit,
pvalue_fun = ~style_pvalue(.x, digits = 3),
estimate_fun = ~style_number(.x, digits = 3)) %>%
as_flex_table() %>%
autofit()
```
# Regression summaries: tables
- Can create from scratch using `broom` and `flex_table` packages
- Can also be used with mixed models using `broom.mixed`
```{r}
summary(lm_fit)
tidy(lm_fit)
```
# Regression summaries: visuals
- Tables tell detailed info, but visuals may make more impact
- Ex. group-specific lines of best fit
```{r}
lm_fit_data <- model.frame(lm_fit)
lm_fit_data$pred_y <- predict(lm_fit, newdata = lm_fit_data)
ggplot(data=lm_fit_data,
mapping=aes(x=RightAmygdala_V24, y=V24_MSEL_EL_ae,
color=RiskGroup))+
geom_point()+
geom_line(mapping=aes(y=pred_y), size=1.5)+
theme_bw()
```
# Regression summaries: visuals
- Least square means (LS means) are a common way of presenting results in interpretable manner
- Can be computed for cross-sectional regression and mixed models
- How are they computed?
$$
\begin{align}
&MSEL = \beta_0+\beta_1Amyg+\beta_2HRneg+\beta_3LRneg+\beta_4HRneg*Amyg+\beta_5LRneg*Amyg=\epsilon \\
&LSM(HRneg)=\beta_0+\beta_2+\beta_1x+\beta_4x \text{ where x is fixed value, e.g. mean Amygfala} \\
&LSM(LRneg)=\beta_0+\beta_3+\beta_1x+\beta_5x\\
&LSM(HRASD)=\beta_0+\beta_1x
\end{align}
$$
- **Comparing group means** controlling for other covariates in model
- **Note**: not the best metric when interaction terms are included
# Regression summaries: visuals
- LS means can be computed using `lsmeans` function
- Can then plot using `ggplot`
# Regression summaries: visuals
- Forest plots are one way of visualizing ``beta'' estimates and confidence intervals
- Very useful when you don't have interaction terms and have many main effects