--- title: "Homework 8: Strings and Mapping" author: "KEY" date: "Due: 5/31" output: html_document: preserve_yaml: true toc: true toc_float: true published: false --- ```{r setup, include=FALSE} knitr::opts_chunk$set(echo = TRUE) library(dplyr) library(tidyr) library(readr) library(stringr) library(ggplot2) library(ggmap) library(ggrepel) ``` # 0. Use the following code to download the King County restaurants data: ```{r datadownload} load(url("https://pearce790.github.io/CSSS508/Lectures/Lecture8/restaurants.Rdata")) ``` # 1. Strings ### Question 1.1: Use a function to determine how long the following character string is: `paste0(letters,1:5,collapse=":")` ```{r} nchar(paste0(letters,1:5,collapse=":")) ``` ### Question 1.2: Describe, in 1-2 complete sentences, the difference between the arguments "sep" and "collapse" in the `paste()` function. > ANSWER: "sep" is what separates strings provided by multiple arguments to the `paste` function. "collapse" is what separates the already-pasted strings after additionally collapsing them into a single string. ### Question 1.3: Filter your data to only include rows in which the `Name` includes the word "coffee" (in any case!) ```{r} coffee <- restaurants coffee$Name <- str_to_lower(coffee$Name) coffee <- coffee %>% filter(str_detect(Name,"coffee")) ``` ### Question 1.4: Create a new variable in your data which includes the length of the business name, after removing beginning/trailing whitespace. ```{r} coffee$NameLength <- str_length(str_trim(coffee$Name)) ``` ### Question 1.5: Create a new variable in your data for the inspection year, *using a `stringr` function!* ```{r} coffee$Year <- str_sub(coffee$Inspection_Date,-4,-1) ``` ### Question 1.6: Create side-by-side boxplots for the length of business name vs. year. ```{r warning=FALSE,message=FALSE} ggplot(coffee,aes(Year,NameLength))+geom_boxplot() ``` ### Question 1.7: Calculate the maximum `Inspection_Score` by business (`Name`) and `Year`. ```{r} coffee_summary <- coffee %>% group_by(Name,Year) %>% summarize(MaxScore=max(Inspection_Score)) coffee_summary %>% head(6) ``` ### Question 1.8: Create a line plot of maximum score ("MaxScore") over time ("Year"), by business ("Name"). That is, you should have a single line for each business. (Don't try to label them, as there are far too many!) ```{r warning=FALSE} ggplot(coffee_summary,aes(Year,MaxScore,group=Name))+ geom_line(alpha=.2)+theme_bw() ``` # 2. Mapping ### Question 2.1: Using your data from part 1, create a ggplot displaying each coffee shop in King County by their latitude/longitude. For this question, no need to display any actual map data! ```{r warning=FALSE,message=FALSE} ggplot(coffee,aes(x=Longitude,y=Latitude))+ geom_point()+ theme_bw() ``` ### Question 2.2: Modify the `City` variable so that it is in title case. Then, modify your plot from 2.1 such that each city has a different color. ```{r warning=FALSE,message=FALSE} coffee$City <- str_to_title(coffee$City) ggplot(coffee,aes(x=Longitude,y=Latitude,color=City))+ geom_point()+ theme_bw() ``` ### Question 2.3: Recreate the plot from 2.2 using the `qmplot` function in the `gmap` package ```{r warning=FALSE,message=FALSE} qmplot(data=coffee,x=Longitude,y=Latitude, color=City) ``` ### Question 2.4: Create a density plot of coffee shops in Bellevue Filter to coffee shops in Bellevue first!! ```{r warning=FALSE,message=FALSE} bellevue_coffee <- coffee %>% filter(City == "Bellevue") qmplot(data = bellevue_coffee, geom = "blank", x = Longitude, y = Latitude, darken = 0.5)+ stat_density_2d( aes(fill = stat(level)), #<< geom = "polygon", alpha = .2, color = NA )+ scale_fill_gradient2( "Coffee Shops", low = "white", mid = "yellow", high = "red") ``` ### Question 2.5: Create a new dataset called that includes the name, latitude, and longitude of each Starbucks coffee store in Bellevue. Remove any duplicates by year. Hint: Use the `select`, `filter`, and `distinct` functions (in that order). Within `filter`, you'll use `str_detect`. ```{r warning=FALSE, message=FALSE} unique_bellevue_coffee <- bellevue_coffee %>% select(Name,Latitude,Longitude) %>% filter(str_detect(string = Name,pattern = "starbucks")) %>% distinct() ``` ### Question 2.6: Plot all Bellevue coffee shops, then add labels for the Starbucks stores using `geom_label_repel` ```{r warning=FALSE, message=FALSE} qmplot(data=bellevue_coffee, x=Longitude,y=Latitude, alpha=I(.1))+ geom_label_repel( data = unique_bellevue_coffee, aes(label = Name), size=2) ```