--- title: "CSSS508, Lecture 9" subtitle: "Mapping" author: "Michael Pearce
(based on slides from Chuck Lanfear)" date: "May 24, 2023" output: xaringan::moon_reader: lib_dir: libs css: xaringan-themer.css nature: highlightStyle: tomorrow-night-bright highlightLines: true countIncrementalSlides: false titleSlideClass: ["center","top"] --- class:inverse ```{r setup, include=FALSE, purl=FALSE} options(htmltools.dir.version = FALSE, width = 70) knitr::opts_chunk$set(comment = "##") ``` # Topics Last time, we learned about, 1. Basics of Strings 1. Strings in Base R 1. Strings in `stringr` (Tidyverse) -- Today, we will cover, 1. Basic mapping: + Simple `ggplot`s with coordinates + Density plots in `ggmap` + Labeling points with `ggrepel` 2. Advanced mapping: + GIS with `sf` + `tidycensus` --- # Mapping in R: A quick plug .pull-left[ .image-75[ ![](img/bivand.jpg) ] ] .pull-right[ Mapping can be **complex!** If you are interested in digging deeper, here are some resources: * If you are interested in mapping, GIS, and geospatial analysis in R, *acquire this book*. * [RSpatial.org](http://rspatial.org/) is a great resource for working with spatial data. * For more information on *spatial statistics*, consider taking Jon Wakefield's **CSSS 554: Statistical Methods for Spatial Data** in the winter quarter. ] --- class: inverse # 1. Basic mapping with `ggmap` --- ## One Day of SPD Incidents Today, we'll study data from the Seattle Police Department regarding incidents on just one day: March 25, 2016. -- The data can be downloaded from my predecessor's Github using the code below (code in the R companion file). ```{r, message=FALSE, warning=FALSE} library(ggplot2) library(readr) library(dplyr) library(tidyr) library(stringr) ``` ```{r read_spd_data, cache=FALSE, message=FALSE, warning=FALSE} spd_raw <- read_csv("https://clanfear.github.io/CSSS508/Seattle_Police_Department_911_Incident_Response.csv") ``` --- ## Taking a `glimpse()` .small[ ```{r} glimpse(spd_raw) ``` ] What does each **row represent?** What are the **variables and values?** --- ## Simple Plotting: Regular `ggplots` .pull-left[ Coordinates, such as longitude and latitude, can be provided in `aes()` as `x` and `y` values. This is ideal when you don't need to place points over some map for reference. .small[ ```{r, eval=FALSE} ggplot(spd_raw, aes(Longitude, Latitude)) + geom_point() + coord_fixed() + # evenly spaces x and y ggtitle("Seattle Police Incidents", subtitle="March 25, 2016") + theme_classic() ``` ] Sometimes, however, we want to plot these points over existing maps. ] .pull-right[ ```{r, message = FALSE, echo=FALSE, fig.height = 5.5, fig.width=3, dev="svg"} ggplot(spd_raw, aes(Longitude, Latitude)) + geom_point() + coord_fixed() + ggtitle("Seattle Police Incidents", subtitle="March 25, 2016") + theme_classic() ``` ] --- ## Better plots with `ggmap` `ggmap` is a package that works with `ggplot2` to plot spatial data directly on map images. -- What this package does for you: 1. Queries servers for a map at the location and scale you want 2. Plots the image as a `ggplot` object 3. Lets you add more `ggplot` layers like points, 2D density plots, text annotations 4. Additional functions for interacting with Google Maps (beyond this course) --- ## Installation We can install `ggmap` like other packages: ```{r, eval=FALSE} install.packages("ggmap") ``` **Note: If prompted to "install from sources the package which needs compilation", I find that typing "no" works best!** Because the map APIs it uses change frequently, sometimes you may need to get a newer development version of `ggmap` from the author's GitHub. This can be done using the `remotes` package. ```{r, eval=FALSE} if(!requireNamespace("remotes")){install.packages("remotes")} remotes::install_github("dkahle/ggmap", ref = "tidyup") ``` Note, this may require compilation on your computer. ```{r, message=FALSE, warning=FALSE, purl=FALSE} library(ggmap) ``` --- ## Quick Maps with `qmplot()` .pull-left[ `qmplot` will automatically set the map region based on your data: ```{r, eval=FALSE, purl=FALSE} qmplot(data = spd_raw, x = Longitude, y = Latitude) ``` All I provided was numeric latitude and longitude, and it placed the data points correctly on a raster map of Seattle. ] .pull-right[ ```{r qmplot, cache=TRUE, message = FALSE, echo=FALSE, fig.height = 5, fig.width=2.5, dev="svg"} qmplot(data = spd_raw, x = Longitude, y = Latitude) ``` ] --- ## `get_map()` `qmplot()` internally uses the function `get_map()`, which retrieves a base map layer. Some options: * `location=` search query or numeric vector of longitude and latitude * `zoom=` a zoom level (3 = continent, 10 = city, 21 = building) * `maptype=`: `"watercolor"`, `"toner"`, `"toner-background"`, `"toner-lite"` * `color=`: `"color"` or `"bw"` -- There are fancier options available, but many require a Google Maps API. See the **help page** for `gmplot()` or `get_map` for more information! --- ## Nicer Example .pull-left[ Let's add *color* to our plot from before ```{r, eval=FALSE, purl=FALSE} qmplot(data = spd_raw, x = Longitude, y = Latitude, color=I("navy") ) ``` `I()` is used here to specify *set* (constant) rather than *mapped* values. ] .pull-right[ ```{r cache=TRUE, message = FALSE, echo=FALSE, fig.height = 5, fig.width=2.5, dev="svg"} qmplot(data = spd_raw, x = Longitude, y = Latitude, color=I("navy") ) ``` ] --- ## Nicer Example .pull-left[ Add *transparency* ```{r, eval=FALSE, purl=FALSE} qmplot(data = spd_raw, x = Longitude, y = Latitude, color=I("navy"), alpha=I(0.5) ) ``` ] .pull-right[ ```{r cache=TRUE, message = FALSE, echo=FALSE, fig.height = 5, fig.width=2.5, dev="svg"} qmplot(data = spd_raw, x = Longitude, y = Latitude, color=I("navy"), alpha=I(0.5) ) ``` ] --- class:inverse # Density plots --- ## Density Layers .pull-left[ We can create a **spatial density plot** using the layer function `stat_density_2d()`: * What to use when creating a "density" plot (where the observations occur) * Aesthetics of the density plot * Additional layer functions for customization ```{r eval=FALSE,echo=FALSE} qmplot(data = spd_raw, geom = "blank", #<< x = Longitude, y = Latitude, darken = 0.5)+ stat_density_2d( aes(fill = stat(level)), #<< geom = "polygon", alpha = .2, color = NA )+ scale_fill_gradient2( "Incident\nConcentration", low = "white", mid = "yellow", high = "red")+ theme(legend.position = "bottom") ``` ] .pull-right[ ```{r echo=FALSE, message = FALSE, cache=TRUE, fig.height = 5, fig.width=2.5, dev="svg"} qmplot(data = spd_raw, geom = "blank", #<< x = Longitude, y = Latitude, darken = 0.5)+ stat_density_2d( aes(fill = stat(level)), #<< geom = "polygon", alpha = .2, color = NA )+ scale_fill_gradient2( "Incident\nConcentration", low = "white", mid = "yellow", high = "red")+ theme(legend.position = "bottom") ``` ] --- ## Basic Map .pull-left[ Start with basic plot, remove points: ```{r eval=FALSE} qmplot(data = spd_raw, geom = "blank", #<< x = Longitude, y = Latitude) ``` ] .pull-right[ ```{r echo=FALSE, message = FALSE, cache=TRUE, fig.height = 5, fig.width=2.5, dev="svg"} qmplot(data = spd_raw, geom = "blank", #<< x = Longitude, y = Latitude) ``` ] --- ## Add Density Layer .pull-left[ Add `stat_density_2d` layer: ```{r eval=FALSE} qmplot(data = spd_raw, geom = "blank", x = Longitude, y = Latitude)+ stat_density_2d( #<< aes(fill = stat(level)), #<< geom = "polygon") #<< ``` `stat(level)` indicates we want `fill=` to be based on `level` values calculated by the layer (i.e., number of observations). ] .pull-right[ ```{r echo=FALSE, message = FALSE, cache=TRUE, fig.height = 6, fig.width=4, dev="svg"} qmplot(data = spd_raw, geom = "blank", x = Longitude, y = Latitude)+ stat_density_2d( #<< aes(fill = stat(level)), #<< geom = "polygon") #<< ``` ] --- ## Color Scale .pull-left[ Specify color gradient ```{r eval=FALSE} qmplot(data = spd_raw, geom = "blank", x = Longitude, y = Latitude)+ stat_density_2d( aes(fill = stat(level)), geom = "polygon", )+ scale_fill_gradient2( #<< low = "white", #<< mid = "yellow", #<< high = "red") #<< ``` ] .pull-right[ ```{r echo=FALSE, message = FALSE, cache=TRUE, fig.height = 6, fig.width=4, dev="svg"} qmplot(data = spd_raw, geom = "blank", x = Longitude, y = Latitude)+ stat_density_2d( aes(fill = stat(level)), geom = "polygon", )+ scale_fill_gradient2( #<< low = "white", #<< mid = "yellow", #<< high = "red") #<< ``` ] --- ## Customize transparency and legend .pull-left[ .small[ ```{r eval=FALSE} qmplot(data = spd_raw, geom = "blank", x = Longitude, y = Latitude, darken = 0.5)+ #<< stat_density_2d( aes(fill = stat(level)), geom = "polygon", alpha = .2, #<< )+ scale_fill_gradient2( "Incident\nConcentration", #<< low = "white", mid = "yellow", high = "red")+ theme(legend.position = "bottom") #<< ``` ] Done! ] .pull-right[ ```{r echo=FALSE, message = FALSE, cache=TRUE, fig.height = 5, fig.width=2.5, dev="svg"} qmplot(data = spd_raw, geom = "blank", x = Longitude, y = Latitude, darken = 0.5)+ #<< stat_density_2d( aes(fill = stat(level)), geom = "polygon", alpha = .2, #<< )+ scale_fill_gradient2( "Incident\nConcentration", #<< low = "white", mid = "yellow", high = "red")+ theme(legend.position = "bottom") #<< ``` ] --- class:inverse # Labeling points with `ggrepel` --- ## Focus in on Downtown Seattle First, let's filter our data to downtown based on values "eyeballed" from our earlier map: ```{r flag_assaults} downtown <- spd_raw %>% filter(Latitude > 47.58, Latitude < 47.64, Longitude > -122.36, Longitude < -122.31) ``` We'll plot just these values from now on! -- Let's also make a dataframe that includes just assaults and robberies downtown: ```{r} assaults <- downtown %>% filter(`Event Clearance Group` %in% c("ASSAULTS", "ROBBERY")) %>% mutate(assault_label = `Event Clearance Description`) ``` We'll **label** these observations! --- .pull-left[ ## Labels Now let's plot the events and label them with `geom_label()`: .smallish[ ```{r labels_1, eval=FALSE, purl=FALSE} qmplot(data = downtown, x = Longitude, y = Latitude, maptype = "toner-lite", color = I("firebrick"), alpha = I(0.5)) + geom_label(data = assaults, #<< aes(label = assault_label), #<< size=2) #<< ``` ] Note that one dataframe is used for points (`downtown`) and another for labels (`assaults`)!! ] .pull-right[ ```{r labels_2, echo=FALSE, message = FALSE, cache=TRUE, fig.height = 6, fig.width=3, dev="svg"} qmplot(data = downtown, x = Longitude, y = Latitude, maptype = "toner-lite", color = I("firebrick"), alpha = I(0.5)) + geom_label(data = assaults, aes(label = assault_label), size=2) ``` ] --- ## Fixing Overlapping Labels .pull-left[ The `ggrepel` package lets use fix or reduce overlapping labels using the function `geom_label_repel()`. .small[ ```{r ggrepel_1, eval=FALSE, purl=FALSE} library(ggrepel) qmplot(data = downtown, x = Longitude, y = Latitude, maptype = "toner-lite", color = I("firebrick"), alpha = I(0.5)) + geom_label_repel( #<< data = assaults, aes(label = assault_label), size=2) ``` ] ] .pull-right[ ```{r ggrepel_2, echo=FALSE, message = FALSE, cache=TRUE, warning=FALSE, fig.height = 5, fig.width=3, dev="svg"} library(ggrepel) qmplot(data = downtown, x = Longitude, y = Latitude, maptype = "toner-lite", color = I("firebrick"), alpha = I(0.5)) + geom_label_repel( data = assaults, aes(label = assault_label), size=2) ``` ] --- class:inverse # 2. Advanced mapping --- ## `sf` Until recently, the main way to work with geospatial data in R was through the `sp` package. `sp` works well but does not store data the same way as most GIS packages and can be bulky and complicated. -- The more recent `sf` package implements the GIS standard of [**Simple Features**](https://en.wikipedia.org/wiki/Simple_Features) in R. `sf` is also integrated into the `tidyverse`: e.g. `geom_sf()` in `ggplot2`. -- The package is somewhat new but is expected to *replace* `sp` eventually. The principle authors and contributors to `sf` are the same authors as `sp` but with new developers from the `tidyverse` as well. Because `sf` is the new standard, we will focus on it today. ```{r, eval=FALSE, purl=FALSE} library(sf) ``` ```{r, include=FALSE} library(sf) ``` --- ## Simple Features A [**Simple Feature**](https://en.wikipedia.org/wiki/Simple_Features) is a single observation with some defined geospatial location(s). Features are stored in special data frames (class `sf`) with two properties: -- * **Geometry**: Properties describing a location (usually on Earth). * Usually 2 dimensions, but support for up to 4. * Stored in a single reserved *list-column* (`geom`, of class `sfc`).1 * Contain a defined coordinate reference system. .footnote[ [1] A list-column is the same length as all other columns in the data, but each element contains *sub-elements* (class `sfg`) with all the geometrical components. *List-columns* require special functions to manipulate, *including removing them*. ] -- * **Attributes**: Characteristics of the location (such as population). * These are non-spatial measures that describe a feature. * Standard data frame columns. --- ## Coordinate Reference Systems **Coordinate reference systems** (**CRS**) specify what location on Earth geometry coordinates are *relative to* (e.g. what location is (0,0) when plotting). -- The most commonly used is [**WGS84**](https://en.wikipedia.org/wiki/World_Geodetic_System), the standard for Google Earth, the Department of Defense, and GPS satellites. -- There are two common ways to define a CRS in `sf`: * [**EPSG codes**](http://spatialreference.org/ref/epsg/) (`epsg` in R) * Numeric codes which *refer to a predefined CRS* * Example: WGS84 is `4326` -- * [**PROJ.4 strings**](https://proj4.org/usage/quickstart.html) (`proj4string` in R) * Text strings of parameters that *define a CRS* * Example: NAD83(NSRS2007) / Washington North ``` +proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 +lat_0=47 +lon_0=-120.8333333333333 +x_0=500000 +y_0=0 +ellps=GRS80 +towgs84=0,0,0,0,0,0,0 +units=m +no_defs ``` --- ## Shapefiles Geospatial data is typically stored in **shapefiles** which store geometric data as **vectors** with associated attributes (variables) -- Shapefiles actually consist of multiple individual files. There are usually at least three (but up to 10+): * `.shp`: The feature geometries * `.shx`: Shape positional index * `.dbf`: Attributes describing features1 Often there will also be a `.prj` file defining the coordinate system. .footnote[[2] This is just a dBase IV file which is an ancient and common database storage file format.] --- class: inverse # Using `sf` --- ## Selected `sf` Functions `sf` is a huge, feature-rich package. Here is a sample of useful functions: * `st_read()`, `st_write()`: Read and write shapefiles. * `geom_sf()`: `ggplot()` layer for `sf` objects. * `st_as_sf()`: Convert a data frame into an `sf` object. * `st_join()`: Join data by spatial relationship. * `st_transform()`: Convert between CRS. * `st_drop_geometry()`: Remove geometry from a `sf` data frame. * `st_relate()`: Compute relationships between geometries (like neighbor matrices). * `st_interpolate_aw()`: Areal-weighted interpolation of polygons.1 .footnote[[1] I recommend the dedicated `areal` package for this though!] --- ## Loading Data We will work with the voting data from Homework 5. You can obtain a shape file of King County voting precincts from the [county GIS data portal](https://gis-kingcounty.opendata.arcgis.com/datasets/voting-districts-of-king-county--votdst-area). -- We can load the file using `st_read()`. ```{r, cache=TRUE, message=FALSE, warning=FALSE} precinct_shape <- st_read("./data/district/votdst.shp", stringsAsFactors = F) %>% select(Precinct=NAME, geometry) ``` [If following along, click here to download a zip of the shapefile.](https://github.com/clanfear/CSSS508/raw/master/Lectures/Week9/data/district.zip) --- ## Voting Data: Processing .small[ ```{r, cache=TRUE, message=FALSE, warning=FALSE} precincts_votes_sf <- read_csv("./data/king_county_elections_2016.txt") %>% filter(Race=="US President & Vice President", str_detect(Precinct, "SEA ")) %>% select(Precinct, CounterType, SumOfCount) %>% group_by(Precinct) %>% filter(CounterType %in% c("Donald J. Trump & Michael R. Pence", "Hillary Clinton & Tim Kaine", "Registered Voters", "Times Counted")) %>% mutate(CounterType = recode(CounterType, `Donald J. Trump & Michael R. Pence` = "Trump", `Hillary Clinton & Tim Kaine` = "Clinton", `Registered Voters`="RegisteredVoters", `Times Counted` = "TotalVotes")) %>% spread(CounterType, SumOfCount) %>% mutate(P_Dem = Clinton / TotalVotes, P_Rep = Trump / TotalVotes, Turnout = TotalVotes / RegisteredVoters) %>% select(Precinct, P_Dem, P_Rep, Turnout) %>% filter(!is.na(P_Dem)) %>% left_join(precinct_shape) %>% st_as_sf() # Makes sure resulting object is an sf dataframe ``` ] --- ## Taking a `glimpse()` .smallish[ ```{r} glimpse(precincts_votes_sf) ``` ] Notice the `geometry` column and its unusual class: `MULTIPOLYGON` A single observation (row) has a geometry which may consist of multiple polygons. --- .pull-left[ ## Voting Map We can plot `sf` geometry using `geom_sf()`. ```{r, eval=FALSE, purl=FALSE} ggplot(precincts_votes_sf, aes(fill=P_Dem)) + #<< geom_sf(size=NA) + theme_void() + theme(legend.position = "right") ``` * `fill=P_Dem` maps color inside precincts to `P_Dem`. * `size=NA` removes precinct outlines. * `theme_void()` removes axes and background. ] .pull-right[ .image-100[ ```{r, echo=FALSE, message=FALSE, warning=FALSE, cache=FALSE, dev="png", fig.retina=5, fig.width = 5, fig.height=8} ggplot(precincts_votes_sf, aes(fill=P_Dem)) + #<< geom_sf(size=NA) + theme_void() + theme(legend.position = "right") ``` ] ] --- class: inverse # `tidycensus` --- ## `tidycensus` `tidycensus` can be used to search the American Community Survey (ACS) and Dicennial Census for variables, then download them and automatically format them as tidy dataframes. **These dataframes include geographical boundaries such as tracts!** -- This package utilizes the Census API, so you will need to obtain a [Census API key](https://api.census.gov/data/key_signup.html). **Application Program Interface (API)**: A type of computer interface that exists as the "native" method of communication between computers, often via http (usable via `httr` package). * R packages that interface with websites and databases typically use APIs. * APIs make accessing data easy while allowing websites to control access. See [the developer's GitHub page](https://walkerke.github.io/tidycensus/articles/basic-usage.html) for detailed instructions. ```{r, eval=FALSE, echo=FALSE} # If following along, you can install with this # Note you'll ALSO need to go get a Census API key above install.packages("tidycensus") ``` --- ## Key `tidycensus` Functions * `census_api_key()` - Install a census api key. * Note you will need to run this prior to using any `tidycensus` functions. * `load_variables()` - Load searchable variable lists. * `year =`: Sets census year or endyear of 5-year ACS * `dataset =`: Sets dataset (see `?load_variables`) * `get_decennial()` - Load Census variables and geographical boundaries. * `variables = `: Provide vector of variable IDs * `geography =`: Sets unit of analysis (e.g. `state`, `tract`, `block`) * `year =`: Census year (`1990`, `2000`, or `2010`) * `geometry = TRUE`: Returns `sf` geometry * `get_acs()` - Load ACS variables and boundaries. --- ## Searching for Variables ```{r, echo=FALSE, warning = FALSE, message=FALSE} library(tidycensus) ``` ```{r, cache=TRUE, message=FALSE, warning=FALSE} library(tidycensus) # census_api_key("PUT YOUR KEY HERE", install=TRUE) acs_2015_vars <- load_variables(2015, "acs5") acs_2015_vars[10:18,] %>% print() ``` --- ## Getting Data ```{r, include=FALSE, cache=TRUE} king_county <- get_acs(geography="tract", state="WA", county="King", geometry = TRUE, variables=c("B02001_001E", "B02009_001E"), output="wide") ``` ```{r, eval=FALSE, purl=FALSE} king_county <- get_acs(geography="tract", state="WA", county="King", geometry = TRUE, variables=c("B02001_001E", "B02009_001E"), output="wide") ``` What do these look like? .smallish[ ```{r} glimpse(king_county) ``` ] With `output="wide"`, **estimates** end in `E` and *error margins* in `M`. --- ## Processing Data We can drop the margins of error, rename the estimates then, `mutate()` into a proportion `Any Black` measure. ```{r} king_county <- king_county %>% select(-ends_with("M")) %>% rename(`Total Population`=B02001_001E, `Any Black`=B02009_001E) %>% mutate(`Any Black` = `Any Black` / `Total Population`) glimpse(king_county) ``` --- ## Mapping Code ```{r, eval=FALSE, purl=FALSE} king_county %>% ggplot(aes(fill = `Any Black`)) + geom_sf(size = NA) + coord_sf(crs = "+proj=longlat +datum=WGS84", datum=NA) + scale_fill_continuous(name = "Any Black\n", low = "#d4d5f9", high = "#00025b") + theme_minimal() + ggtitle("Proportion Any Black") ``` New functions: * `geom_sf()` draws Simple Features coordinate data. * `size = NA` removes outlines * `coord_sf()` is used here with these arguments: * `crs`: Modifies the coordinate reference system (CRS); WGS84 is possibly the most commonly used CRS. * `datum=NA`: Removes graticule lines, which are geographical lines such as meridians and parallels. --- ```{r king_plot, eval=TRUE, echo=FALSE, dev="svg", fig.height = 9} king_county %>% ggplot(aes(fill = `Any Black`)) + geom_sf(size = NA) + coord_sf(crs = "+proj=longlat +datum=WGS84", datum = NA) + scale_fill_continuous(name = "Any Black\n", low = "#d4d5f9", high = "#00025b") + theme_minimal() + ggtitle("Proportion Any Black") ``` --- ## Removing Water With a simple function and boundaries of water bodies in King County, we can replace water with empty space. ```{r, include=FALSE, cache=FALSE, message=FALSE, warning=FALSE} st_erase <- function(x, y) { st_difference(x, st_make_valid(st_union(st_combine(y)))) } kc_water <- tigris::area_water("WA", "King", class = "sf") sf::sf_use_s2(FALSE) kc_nowater <- king_county %>% st_erase(kc_water) ``` ```{r, eval=FALSE, purl=FALSE} st_erase <- function(x, y) { st_difference(x, st_make_valid(st_union(st_combine(y)))) } kc_water <- tigris::area_water("WA", "King", class = "sf") kc_nowater <- king_county %>% st_erase(kc_water) ``` * `st_combine()` merges all geometries into one * `st_union()` resolves internal boundaries * `st_difference()` subtracts `y` geometry from `x` * `st_make_valid()` fixes geometry errors from subtraction * `area_water()` obtains `sf` geometry of water bodies Then we can reproduce the same plot using `kc_nowater`... --- ```{r, eval=TRUE, echo=FALSE, dev="svg", dpi = 300, fig.height = 9} ggplot(kc_nowater, aes(fill=`Any Black`)) + geom_sf(size = NA) + coord_sf(crs = "+proj=longlat +datum=WGS84", datum=NA) + scale_fill_continuous(name="Any Black\n", low="#d4d5f9", high="#00025b") + theme_minimal() + ggtitle("Proportion Any Black") ``` --- class: inverse # Lab/Homework For the remainder of class, we'll work on Homework 8, which is due next week! + The first part is based on Lecture 8 (strings) + This includes lab questions from last lecture! + The second part is based on Lecture 9 (mapping) + This includes an analysis of the restaurant data from last week! *There is no lab next week due to Memorial Day!*