---
title: "Homework 8: Strings and Mapping"
author: "KEY"
date: "Due: 5/31"
output: 
  html_document:
    preserve_yaml: true
    toc: true
    toc_float: true
published: false
---


```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(dplyr)
library(tidyr)
library(readr)
library(stringr)
library(ggplot2)
library(ggmap)
library(ggrepel)
```

# 0. Use the following code to download the King County restaurants data:

```{r datadownload}
load(url("https://pearce790.github.io/CSSS508/Lectures/Lecture8/restaurants.Rdata"))
```

# 1. Strings

### Question 1.1: Use a function to determine how long the following character string is: `paste0(letters,1:5,collapse=":")`

```{r}
nchar(paste0(letters,1:5,collapse=":"))
```

### Question 1.2: Describe, in 1-2 complete sentences, the difference between the arguments "sep" and "collapse" in the `paste()` function.

> ANSWER: "sep" is what separates strings provided by multiple arguments to the `paste` function. "collapse" is what separates the already-pasted strings after additionally collapsing them into a single string.

### Question 1.3: Filter your data to only include rows in which the `Name` includes the word "coffee" (in any case!)

```{r}
coffee <- restaurants
coffee$Name <- str_to_lower(coffee$Name)
coffee <- coffee %>% filter(str_detect(Name,"coffee"))
```

### Question 1.4: Create a new variable in your data which includes the length of the business name, after removing beginning/trailing whitespace.

```{r}
coffee$NameLength <- str_length(str_trim(coffee$Name))
```

### Question 1.5: Create a new variable in your data for the inspection year, *using a `stringr` function!*

```{r}
coffee$Year <- str_sub(coffee$Inspection_Date,-4,-1)
```


### Question 1.6: Create side-by-side boxplots for the length of business name vs. year.

```{r warning=FALSE,message=FALSE}
ggplot(coffee,aes(Year,NameLength))+geom_boxplot()
```


### Question 1.7: Calculate the maximum `Inspection_Score` by business (`Name`) and `Year`.

```{r}
coffee_summary <- coffee %>% group_by(Name,Year) %>% 
  summarize(MaxScore=max(Inspection_Score))
coffee_summary %>% head(6)
```

### Question 1.8: Create a line plot of maximum score ("MaxScore") over time ("Year"), by business ("Name"). That is, you should have a single line for each business. (Don't try to label them, as there are far too many!)

```{r warning=FALSE}
ggplot(coffee_summary,aes(Year,MaxScore,group=Name))+
  geom_line(alpha=.2)+theme_bw()
```


# 2. Mapping

### Question 2.1: Using your data from part 1, create a ggplot displaying each coffee shop in King County by their latitude/longitude. For this question, no need to display any actual map data!


```{r warning=FALSE,message=FALSE}
ggplot(coffee,aes(x=Longitude,y=Latitude))+
  geom_point()+
  theme_bw()
```

### Question 2.2: Modify the `City` variable so that it is in title case. Then, modify your plot from 2.1 such that each city has a different color.

```{r warning=FALSE,message=FALSE}
coffee$City <- str_to_title(coffee$City)
ggplot(coffee,aes(x=Longitude,y=Latitude,color=City))+
  geom_point()+
  theme_bw()
```

### Question 2.3: Recreate the plot from 2.2 using the `qmplot` function in the `gmap` package

```{r warning=FALSE,message=FALSE}
qmplot(data=coffee,x=Longitude,y=Latitude,
       color=City)
```

### Question 2.4: Create a density plot of coffee shops in Bellevue

Filter to coffee shops in Bellevue first!!

```{r warning=FALSE,message=FALSE}
bellevue_coffee <- coffee %>% filter(City == "Bellevue")
qmplot(data = bellevue_coffee,
       geom = "blank",
       x = Longitude, 
       y = Latitude,
       darken = 0.5)+
  stat_density_2d(
    aes(fill = stat(level)), #<<
    geom = "polygon", 
    alpha = .2, color = NA
  )+
  scale_fill_gradient2(
    "Coffee Shops", 
    low = "white", 
    mid = "yellow", 
    high = "red")
```

### Question 2.5: Create a new dataset called that includes the name, latitude, and longitude of each Starbucks coffee store in Bellevue. Remove any duplicates by year.

Hint: Use the `select`, `filter`, and `distinct` functions (in that order). Within `filter`, you'll use `str_detect`.


```{r warning=FALSE, message=FALSE}
unique_bellevue_coffee <- bellevue_coffee %>% 
  select(Name,Latitude,Longitude) %>% 
  filter(str_detect(string = Name,pattern = "starbucks")) %>%
  distinct()
```

### Question 2.6: Plot all Bellevue coffee shops, then add labels for the Starbucks stores using `geom_label_repel`


```{r warning=FALSE, message=FALSE}
qmplot(data=bellevue_coffee,
       x=Longitude,y=Latitude,
       alpha=I(.1))+
  geom_label_repel(
    data = unique_bellevue_coffee,
    aes(label = Name), 
    size=2)
```