---
title: "Overview of Generalized Linear Models"
output: pdf_document
---

\renewcommand{\vec}[1]{\mathbf{#1}}

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, fig.height = 4, fig.width = 6, fig.align = 'center')
library(tidyverse) 
library(rstanarm)
library(arm)
library(gridExtra)
set.seed(11012020)
```

Logistic regression is a special case of *a generalized linear model*

\vfill

### Logistic Regression

The logistic function maps an input from the unit range (0,1) to the real line:


\vfill

More importantly, the inverse-logit function maps a continous variable to the unit range (0,1)


\vfill

The `qlogis` (for logit) and `plogis` (inverse-logit) functions in R can be used for this calculation. For instance `plogis(1) =` `r plogis(1)`.

\vfill

Formally, the inverse-logistic function is used as part of the GLM:

\vfill
\vfill

\newpage

Recall the `beer` dataset, but now instead of trying to model consumption, lets consider whether a day is a weekday or weekend.

```{r, message = F}
beer <- read_csv('http://math.montana.edu/ahoegh/Data/Brazil_cerveja.csv') %>%
  mutate(consumed = consumed - mean(consumed))
```

\vfill

```{r}
bayes_logistic <- stan_glm(weekend ~ consumed, data = beer,
                           family = binomial(link = "logit"), refresh = 0)
```


\vfill

Now how to interpret the model coefficients? 

```{r}
bayes_logistic
```
\vfill


Interpreting the coefficients can be challenging due to the non-linear relationship between the outcome and the predictors. 

\newpage

### Predictive interpretation

One way to interpret the coefficients is in a predictive standpoint.  For instance, consider an day with average consumption, then the probability of a weekend would be `invlogit(-1.2 + 0.3 * 0) =` `r round(plogis(-1.2),2)`, where as the probability of a day with 10 more liters of consumption (relative to an average day) would have a weekend probability of `invlogit(-1.2 + 0.3 * 10) =` `r round(plogis(-1.2 + 0.3 * 10),2)`

\vfill

Of course, we should always think about uncertainty, so we can extract simulations from the model. 

\vfill

`posterior_linpred` was useful with regression, but need `posterior_epred` here
```{r}
new_data <- data.frame(consumed = c(0,10))
posterior_sims <- posterior_epred(bayes_logistic, newdata = new_data)
summary(posterior_sims)
```

\vfill

It can also be useful to consider predictions of an individual data point.

```{r}
new_obs <- posterior_predict(bayes_logistic, newdata = new_data)
head(new_obs)
colMeans(new_obs)
```

\newpage

### odds ratios and log odds

logistic regression can be re-written as

\begin{align}
y & \sim Bernoulli\\
\log \left( \frac{Pr[y = 1|X]}{Pr[y = 0|X]} \right)& = \beta_0 + \beta_1 x \\
\log \left( \frac{Pr[y = 1|X]}{1-Pr[y = 1|X]} \right)& = \beta_0 + \beta_1 x \\
\end{align}


\vfill

Furthermore, logistic regression can also re-written as

\begin{align}
y & \sim Bernoulli\\
\log \left( \frac{Pr[y = 1|X]}{Pr[y = 0|X]} \right)& = \beta_0 + \beta_1 x \\
\frac{Pr[y = 1|X]}{1-Pr[y = 1|X]}& = \exp \left(\beta_0 + \beta_1 x \right)\\
\end{align}

\vfill

\begin{align}
\exp(\beta_1) &= \frac{\exp(\beta_0 + \beta_1 (x + 1))}{\exp(\beta_0 + \beta_1 (x))}\\
&= \frac{Pr[y = 1|X= x + 1]/Pr[y = 0|X= x + 1]}{Pr[y = 1|X= x]/Pr[y = 0|X= x]}
\end{align}


\vfill

Interpretation of log odds and odds ratios can be difficult; however, interpreting the impact on probabilities requires setting other parameter values and the change is non-linear (different change in probability for a one unit change in a predictor).

\newpage

### Model Comparison

We can use cross validation in the same manner a standard linear models.

```{r}
loo(bayes_logistic)

temp_model <- stan_glm(weekend~max_tmp, data = beer, refresh=0)
loo(temp_model)
```