---
title: "HW 4"
author: "Your name here"
output: pdf_document
---
```{r knitr_options , include=FALSE}
#Here's some preamble, which makes ensures your figures aren't too big
knitr::opts_chunk$set(fig.width=6, fig.height=3.6, warning=FALSE,
message=FALSE)
```
```{r, message = FALSE, warning = FALSE}
library(mosaic)
library(RCurl)
url <- getURL("https://raw.githubusercontent.com/statsbylopez/StatsSports/master/Kickers.csv")
nfl.kick <- read.csv(text = url)
```
For this homework, we will be using data provided in the `nfl.kick` data set, as was done during class. Our goals will be to confirm our knowledge of logistic regression, our interpretations of slopes, as well as kicker-specific analysis.
## Exploratory data analysis
### Question 1
Use R to find the **kicker** with the best percentage of successful field goals. Why might one argue that this specific kicker may not be the most accurate, even though he has the highest percentage?
### Question 2
Use R to find the **team** with the best percentage of successful field goals. Why might one argue that this team may not have had the best kickers even though they've posted the highest overall percentage?
### Question 3
Identify the teams that have kicked the highest percentage of their field goals on grass (recall: the `Grass` variable is a TRUE/FALSE indicator for whether or not each kick was kicked on a grass surface.).
## Logistic regression
### Question 4
There are several variables in this data set. Using the AIC criteria, identify the logistic regression model that is the best fit for our `Success` outcome.
### Question 5
Using your model from (4), interpret the coefficient for `Distance` as an odds ratio.
### Question 6
Odds ratios are multiplicative. That is, if the odds of a successful outcome are $e^{\beta_1}$ given a one-unit increase in $x_1$, the odds of a successful outcome are $e^{c*\beta_1}$ given a $c$-unit increase in $x_1$. Given your model in (4) what are the odds of making a field goal that is 10 yards longer?
### Question 7
Using the model below, estimate the probability of a successful 40-yard field goal, kicked on a non-grass surface in 2015.
```{r, eval = FALSE}
fit.3 <- glm(Success ~ Distance + Grass + Year, data = nfl.kick, family ="binomial")
summary(fit.3)
```
## Expected points
### Question 8
Use your answer to Question (7) to estimate the expected points of a 40-yard field goal, kicked on a non-grass surface in 2015.
### Question 9
Kicker A hits the field goal in Question (8) while Kicker B misses it. How many expected points has Kicker A added to his team given this single kick? How about Kicker B?
### Question 10
It is straightforward to estimate the value of kickers using expected points.
First, we generate predicted probabilities for each field goal using `fit.3`. Next, we use that to estimate the expected points for each field goal (`predict.points`). Finally, we use the result of the field goal (`Success` = 0 or a 1) and the value of the kick (3 points) to get an expected points added (`EPA`) for each kicker on each kick.
```{r, eval = FALSE}
nfl.kick <- mutate(nfl.kick, predict.Success = predict(fit.3, nfl.kick, type = "response"),
predict.points = 3*predict.Success,
EPA = Success*3 - predict.points)
```
The first row corresponds to a David Akers kick in 2005. What was the predicted success rate for Akers on this kick? What relative worth (in terms of `EPA`) did Akers provide on this kick?
### Question 11
One metric we may be interested in is the relative worth, in terms of total `EPA`, among all kickers in our data set. The `dplyr` function makes it simple.
```{r, eval = FALSE}
options(dplyr.print_max = 1e9)
kick.summary <- nfl.kick %>%
group_by(Kicker) %>%
summarize(percent.success = mean(Success), total.kicks = length(EPA), total.EPA = sum(EPA)) %>%
arrange(total.EPA)
kick.summary
```
The above function calculates kicker-specific percentages, each kicker's total number of kicks, and each kickers total EPA.
Since 2005, who has been worth the most (and least) total `EPA` to their teams?
### Question 12
Interpret the R-squared calculated below. What does it suggest about the fraction of unexplained variability when it comes to kicker EPA?
```{r, eval = FALSE}
xyplot(total.EPA ~ percent.success, data = filter(kick.summary, total.kicks >=50 ))
cor(total.EPA ~ percent.success, data = filter(kick.summary, total.kicks >=50 ))^2
```
### Question 13
Given your readings, are there any other variables that you would want to account for when measuring field goal success that aren't in the current data set? How may it effect the ranking of kickers in Question Question (12)?