---
title: "Lab 05: - Tempe Accident Analysis"
author: "Your Name"
date: "`r format(Sys.time(), '%B %d, %Y')`"
output:
html_document:
df_print: paged
theme: flatly
highlight: haddock
toc: yes
toc_float: yes
self_contained: true
---
# Setup
The following analysis uses data comprised of traffic accidents in the city of
Tempe, Arizona. They constitute 28,470 observations and 32 variables. You can
find more details in the dataset documentation at the
[**Tempe Open Data Portal**](https://data.tempe.gov/dataset/high-severity-traffic-crashes-1-08).
```{r setup, include = FALSE}
# SET UP GLOBAL OPTIONS
knitr::opts_chunk$set(echo = TRUE,
message = FALSE,
warning = FALSE,
fig.width = 10)
```
```{r include = FALSE}
# LOAD REQUIRED PACKAGES
library(ggmap) # Packages per instructions
library(dplyr)
library(pander)
library(viridis)
library(ggplot2)
library(ggthemes)
library(scales) # Recommended package
# Function 'percent()' from "scales" allows
# easy formatting of proportions to percentages
# Funtion 'number()' from "scales" allows
# easy formatting of numbers with argument
# 'big.mark = TRUE'.
# Note that 'percent()' and 'number()' format
# values as character strings! So they won't
# work like numbers any more (e.g. in 'sum()')!
```
```{r include = FALSE}
# READ IN DATA
url <- paste0("https://github.com/DS4PS/Data-Science-Class/blob",
"/master/DATA/TempeTrafficAccidents.rds?raw=true")
dat <- readRDS(gzcon(url(url))) # Method per instructions
```
Explore the first six observations in the following interactive table:
```{r echo = TRUE}
head(dat)
```
# Preprocessing
All data preprocessing are available in **Appendix Code A: Data Preprocessing**.
```{r include = FALSE}
date.vec <- strptime(dat$DateTime,
format = "%m/%d/%y %H:%M") # Create date fields
dat$hour <- format(date.vec, format = "%H")
dat$month <- format(date.vec, format = "%b")
dat$day <- format(date.vec, format = "%a")
dat$day365 <- format(date.vec, format = "%j")
dat$week <- format(date.vec, format = "%V")
dat$day <- factor(dat$day,
levels = c("Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun")) # Order weekdays
dat$hour12 <- format(date.vec,
format="%l %p") # Create 12-hour format
time.levels <- c("12 AM", " 1 AM", " 2 AM", " 3 AM", " 4 AM", " 5 AM",
" 6 AM", " 7 AM", " 8 AM", " 9 AM", "10 AM", "11 AM",
"12 PM", " 1 PM", " 2 PM", " 3 PM", " 4 PM", " 5 PM",
" 6 PM", " 7 PM", " 8 PM", " 9 PM", "10 PM", "11 PM" )
dat$hour12 <- factor(dat$hour12,
levels = time.levels) # Order time intervals
age.labels <- paste0("Age ",
c(16,18,25,35,45,55,65,75), "-",
c(18,25,35,45,55,65,75,100) )
dat$age <- cut(dat$Age_Drv1,
breaks = c(16,18,25,
35,45,55,
65,75,100),
labels = age.labels) # Discretize age ranges
```
# Part I: Data Summaries
The following provides brief exploratory summaries on key data points.
## 1. Accidents on Mondays
```{r}
x <- sum(dat$day == "Mon",
na.rm = TRUE)
comma(x) # Example solution
```
**`r comma(x)`** accidents occured on Mondays.
## 2. Monday Accidents (%)
```{r}
# Code
```
**[X]%** of all accidents occur on Mondays.
## 3. Harmful Monday Accidents (%)
**Note:** *"Harm" is defiend as any accident that causes at least one injury or*
*fatality, i.e. a casualty.*
```{r}
# Code
```
**[X]%** of all Monday accidents have at least one casualty.
## 4. Top Monday Accident Type
```{r}
dat %>%
filter(day == "Mon") %>%
count(Collisionmanner) %>%
arrange(desc(n))
```
**[X]** collisions are the most common accident types.
## 5. Week vs. Weekend Accidents
```{r}
# Code
```
Significantly more accidents occur **[on weekdays or weekends?]**.
## 6. Harm by Weekday
```{r}
dat %>%
group_by(day) %>%
mutate(harm_acc = Totalinjuries > 0 | Totalfatalities > 0) %>%
summarize(n = n(),
injuries = sum(Totalinjuries),
fatalities = sum(Totalfatalities),
harm.rate = mean(harm_acc))
```
# Part II: Age Groups
The following provides summaries of accidents by age groups.
## 1. Accidents by Hour & Age
```{r}
# Code
```
Drivers of **[X]** have the greatest number of accidents from 7:00 to 7:59 AM.
# Part III: Rates of Harm
The following reports the accidents, casualties, proportion, and average
casualties per harmful accident.
## 1. Accidents by Hour
```{r}
dat %>%
group_by(hour) %>%
summarize(accidents = number(n(),
big.mark = ",")) # Example solution
dat %>%
group_by(hour) %>%
summarize(n = n()) %>%
plot(type = "b",
bty = "n",
pch = 19,
cex = 2,
xlab = "Hour",
ylab = "Total Accidents",
main = "Total Accidents by Time of Day")
```
## 2. Total Casualties by Hour
```{r}
# Code
```
## 3. Accidents with Casualties (%)
```{r}
# Code
```
## 4. Mean Casualties/Harmful Accident
```{r}
# Code
```
# Appendix
## Code A: Data Preprocessing
```{r eval = FALSE, echo = TRUE, include = TRUE}
date.vec <- strptime(dat$DateTime,
format = "%m/%d/%y %H:%M") # Create date fields
dat$hour <- format(date.vec, format = "%H")
dat$month <- format(date.vec, format = "%b")
dat$day <- format(date.vec, format = "%a")
dat$day365 <- format(date.vec, format = "%j")
dat$week <- format(date.vec, format = "%V")
dat$day <- factor(dat$day,
levels = c("Mon",
"Tue",
"Wed",
"Thu",
"Fri",
"Sat",
"Sun")) # Order weekdays
dat$hour12 <- format(date.vec,
format="%l %p") # Create 12-hour format
time.levels <- c("12 AM", " 1 AM", " 2 AM", " 3 AM", " 4 AM", " 5 AM",
" 6 AM", " 7 AM", " 8 AM", " 9 AM", "10 AM", "11 AM",
"12 PM", " 1 PM", " 2 PM", " 3 PM", " 4 PM", " 5 PM",
" 6 PM", " 7 PM", " 8 PM", " 9 PM", "10 PM", "11 PM" )
dat$hour12 <- factor(dat$hour12,
levels = time.levels) # Order time intervals
age.labels <- paste0("Age ",
c(16,18,25,35,45,55,65,75), "-",
c(18,25,35,45,55,65,75,100) )
dat$age <- cut(dat$Age_Drv1,
breaks = c(16,18,25,
35,45,55,
65,75,100),
labels = age.labels) # Discretize age ranges
```