This document is intended to help you quickly learn how to perform basic manipulation and visualization of spatial data that can be represented using simple features. We will primarily focus on using the sf package (Pebesma 2022) for our analysis.

An accompanying YouTube playlist that walks through this document is available by following the link here. I have also included direct, relevant video links throughout the file (e.g., immediately after the the relevant heading). An html version of this document can be downloaded to your current working direction (run getwd() in the Console to see where this is) by running the following command:

download.file("https://raw.githubusercontent.com/jfrench/DataWrangleViz/master/05-simple-features-wrangling-visualizing.nb.html", "05-simple-features-wrangling-visualizing.nb.html")

The raw R Markdown code used to generate the html file can be obtained by toggling the “Code” box in the upper-right corner of the html file and selecting “Download Rmd”.

Wrangling and visualizing simple features spatial data in R (Video: YouTube, Panopto)

Spatial data can take many forms.

For a data scientist, spatial data is usually data for which each observation is geographic region or location.

Simple Features (officially Simple Feature Access) is a set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries (point, line, polygon, multi-point, multi-line, etc.) used by geographic information systems. It is formalized by both the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO). Wikipedia

Simple features are comprised of:

  1. A geometry object (e.g., a point, a polygon, a line, etc.)
  2. Attribute data associated with the geometry object (e.g., the temperature at a certain time on a certain day, the number of new cases of a disease in a county in the last month).

Simple features

As stated in the vignette to the sf package:

Features have a geometry describing where on Earth the feature is located, and they have attributes, which describe other properties. The geometry of a tree can be the delineation of its crown, of its stem, or the point indicating its centre. Other properties may include its height, color, diameter at breast height at a particular date, and so on.

Just to clarify:

  • The geometry of an observation describes where the object is located.
    • A geometry can be a point, a polygon (which is really just an ordered collection of points), or something more complicated.
  • The attributes of a geometry object are what data scientists would usually consider the “data”.

All geometries are made of points, which can be combined to create more and more complex objects.

A point can be 2-, 3-, or 4-dimensional.

The most common kinds of points are two-dimensional and are described using a 2-dimensional set of coordinates (X and Y), e.g., longitude and latitude or easting and northing.

  • 2-dimensional points are referred to as XY points.

A 3-dimensional point could include the X and Y coordinates as well as the altitude of the 2-dimensional point in 3-dimensional space.

  • The Z coordinate is another dimension denoting altitude of the point.
  • Combining a Z coordinate with an XY coordinate results in a 3-dimensional XYZ point.

Alternatively, a 3-dimensional point could include some other measure associated with the point.

  • The M coordinate is another dimension denoting some measure associated with the point.
  • It’s pretty rare, but examples include time or measurement error.
  • Combining an M coordinate with an XY coordinate results in a 3-dimensional XYM point.

Combining X, Y, Z, and M coordinates results in a 4-dimensional point.

Packages and tools for working with simple features (Video: YouTube, Panopto)

In what follows, we will use several packages.

  • The sf package (Pebesma 2022) is an R package for working with simple features (sf objects) both in terms of the geometry objects and the associated attributes.
    • The sf package can import, manipulate, and plot sf objects.
    • The sf package is intended to supersede the sp package (Pebesma and Bivand 2022), which is an older R package for working with spatial data.
    • Since the sp package is being superseded by the sf package, I recommend working with the sf package for spatial data analysis moving forward.

Because of how sf objects are represented in R, simple features (once constructed or imported) can be manipulated and plotted by many other well-known packages.

  • The dplyr package (Wickham et al. 2022) can be used to manipulate sf objects, and we will utilize it as certain times.
  • The ggplot2 package (Wickham 2016) can also be used to plot sf objects, which we will demonstrate.

Choosing a color palette to represent the attributes of your simple features is very important.

  • The grDevices package included with base R provides many color palettes.
    • The traditional palettes are rainbow, heat.colors, terrain.colors, topo.colors, and cm.colors.
    • The hcl.colors palette function was added in R 4.0.0 and provides numerous excellent color palettes.
    • Running the Examples in the documentation found in ?grDevices::hcl.colors will show you examples of the available palettes through grDevices.
    • We provide color swatches for many palettes at the end of this tutorial (without the code).
  • The colorspace package (Ihaka et al. 2022) provides many of the same HCL palettes available through the hcl.colors palette, but it also provides convenient functions for accessing this in ggplot2.

We load all of these packages below, with the exception of colorspace, as it has a coords function that masks (i.e., replaces) a needed function in the sf package. So we will access the necessary colorspace function using the :: approach.

# library(colorspace)
library(sf)
library(dplyr)
library(ggplot2)

Simple feature geometry objects (Video: YouTube, Panopto)

The most common simple feature geometry objects used by data scientists are:

geometry function description
POINT sf::st_point A geometry containing a single point
POLYGON sf::st_polygon A geometry with a sequence of points that form a closed ring that doesn’t intersect itself. Multiple rings form outer rings and holes within the polygon.

We go through the process of creating and plotting these geometries below.

# create an XY point
(p1 = st_point(c(1,2)))
POINT (1 2)
# create an XYZ point
(p2 = st_point(c(1,2,3)))
POINT Z (1 2 3)
# the points look the same when plotted in two dimensions
plot(p1)

plot(p2)

# create a ring (connected set of points)
outer <- matrix(c(0, 0, 10, 0, 10, 10, 0, 10, 0, 0), ncol = 2, byrow = TRUE)
# create additional rings for holes
hole1 <- matrix(c(1, 1, 1, 2, 2, 2, 2, 1, 1, 1), ncol = 2, byrow = TRUE)
hole2 <- matrix(c(5, 5, 5, 6, 6, 6, 6, 5, 5, 5), ncol = 2, byrow = TRUE)
# combine rings into a list
pts <- list(outer, hole1, hole2)
# turn list of rings into a polygon
(pl1 <- st_polygon(pts))
POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 1 2, 2 2, 2 1, 1 1), (5 5, 5 6, 6 6, 6 5, 5 5))
# plot polygon
plot(pl1)

The other common geometry objects are:

geometry function description
LINESTRING sf::st_linestring A sequence of points that is connected with straight, non-self intersecting lines
MULTIPOINT sf::st_multipoint A set of points
MULTIPOLYGON sf::st_multipolygon A set of polygons
MULTILINESTRING sf::st_multipoint A set of line strings
GEOMETRYCOLLECTION sf::st_geometrycollection A set of the other geometries (except itself)

We provide examples of creating and plotting these geometries below.

# create a matrix of multiple points
(pts <- matrix(rnorm(10), ncol = 2))
           [,1]       [,2]
[1,] -0.2942841  0.4042901
[2,] -0.5631947 -2.8190104
[3,]  0.3776016 -1.3146976
[4,] -0.1430829  1.4013844
[5,]  0.2235321 -1.9659875
# convert matrix of points to linestring
(ls1 <- st_linestring(pts))
LINESTRING (-0.2942841 0.4042901, -0.5631947 -2.81901, 0.3776016 -1.314698, -0.1430829 1.401384, 0.2235321 -1.965988)
plot(ls1)

# convert matrix fo points to multipoints
(mp1 <- st_multipoint(pts))
MULTIPOINT ((-0.2942841 0.4042901), (-0.5631947 -2.81901), (0.3776016 -1.314698), (-0.1430829 1.401384), (0.2235321 -1.965988))
plot(mp1)

# create multipolygons
pol1 <- list(outer, hole1, hole2)
pol2 <- list(outer + 24)
mp <- list(pol1, pol2)
(mpl1 <- st_multipolygon(mp))
MULTIPOLYGON (((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 1 2, 2 2, 2 1, 1 1), (5 5, 5 6, 6 6, 6 5, 5 5)), ((24 24, 34 24, 34 34, 24 34, 24 24)))
plot(mpl1, axes = TRUE)

# create a multilinestring
(pts2 <- matrix(rnorm(6), ncol = 2))
         [,1]       [,2]
[1,] 1.189007 -1.0965091
[2,] 1.721922 -0.7875281
[3,] 0.918795  0.6693628
(ml1 <- st_multilinestring(list(pts, pts2)))
MULTILINESTRING ((-0.2942841 0.4042901, -0.5631947 -2.81901, 0.3776016 -1.314698, -0.1430829 1.401384, 0.2235321 -1.965988), (1.189007 -1.096509, 1.721922 -0.7875281, 0.918795 0.6693628))
plot(ml1)

# create a geometry collection
(gc <- st_geometrycollection(list(p1, pl1, ls1)))
GEOMETRYCOLLECTION (POINT (1 2), POLYGON ((0 0, 10 0, 10 10, 0 10, 0 0), (1 1, 1 2, 2 2, 2 1, 1 1), (5 5, 5 6, 6 6, 6 5, 5 5)), LINESTRING (-0.2942841 0.4042901, -0.5631947 -2.81901, 0.3776016 -1.314698, -0.1430829 1.401384, 0.2235321 -1.965988))
plot(gc, col = "grey")

How do I need what type of geometry I need?

  • Often, this is determined automatically when you read in shapefiles (which we’ll discuss later).
  • Attributes observed at a single location require a POINT.
  • Most regions can be represented by a POLYGON.
  • Complicated objects made of regions, e.g., an island chain like Hawaii, require MULTIPOLYGONS.
  • The other types are for more complicated objects.
  • There are 10 other rarer geometry types that we do not discuss (CIRCULARSTRING, CURVE, SURFACE, TRIANGLE, COMPOUNDCURVE, CURVEPOLYGON, MULTICURVE, MULTISURFACE, POLYHEDRALSURFACE, TIN). You can learn about them through the additional resources provided at the end of this document.

Coordinate reference systems

A coordinate reference system (CRS) must be provided in order to place a point on the earth’s surface.

When your import a geometry object from file, the CRS will often be provided.

  • The WGS84 CRS is often the default for longitude/latitude coordinates.

Sometimes, in order to combine geometry objects, you must specify the CRS of a geometry object.

There are many CRSs. A CRS is often used because it is known to have a certain desirable property. A discussion of CRSs is beyond the scope of this tutorial. And when you do need to know about CRSs, it will probably be so specific that a general discussion won’t help. However, here are a few references related to CRSs that may provide a bit more detail:

Constructing simple feature (sf) objects (Video: YouTube, Panopto)

An sf object is a data.frame that has a simple feature geometry list column (i.e., a column of geometry objects). So you can work with sf objects similar to how you would work with a data.frame object, though it may have different default behaviors because it has been extended to an sf object.

  • The geometry-list column contains the geometry object for each observation.
  • The other columns of the sf object contain the attributes of the geometry object.
  • Each row of the sf object is a simple feature. Alternatively, each observation is a simple feature.

In practice, sf objects are often initially created by reading in a shapefile. However, particularly for POINT observations, you may need to create an sf object manually.

In what follows, we create sf objects for POINT geometry objects and POLYGON geometry objects.

  • We can use the same previously discussed functions (e.g., sf::st_point, sf::st_polygon, etc.) to create the geometry objects.
  • The sf::st_sfc function is used to create a geometry list column.
  • The sf::st_sf function is used to combined a data.frame of attributes with the geometry list column.
# create POINT objects
pt1 <- st_point(c(0, 0))
pt2 <- st_point(c(0, 1))
pt3 <- st_point(c(1, 1))
# create geometry list column
glc1 <- st_sfc(list(pt1, pt2, pt3))
# is glc1 a list?
is.list(glc1)
[1] TRUE
# what class is glc1
class(glc1)
[1] "sfc_POINT" "sfc"      
# create attribute data.frame with temperature and rainfall attributes
df1 <- data.frame(temperature = c(10, 11, 10.4), rainfall = c(1, 1.3, 0.9))
# create sf object
sf1 <- st_sf(df1, geometry = glc1)
# class of sf1
class(sf1)
[1] "sf"         "data.frame"
# plot sf1
plot(sf1["temperature"], pch = 20, axes = TRUE)

# create polygon objects
outer1 <- matrix(c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0), ncol = 2)
pl1 <- st_polygon(list(outer1))
# outer2 is outer1 shifted 1 unit to the right
outer2 <- matrix(c(1, 1, 2, 2, 1, 0, 1, 1, 0, 0), ncol = 2)
pl2 <- st_polygon(list(outer2))
# create geometry list columns
glc2 <- st_sfc(list(pl1, pl2))
# what class is glc2
class(glc2)
[1] "sfc_POLYGON" "sfc"        
# create second sf object (only include an attribute column and geometry)
sf2 <- st_sf(cases = c(10, 12), geometry = glc2)
# class of sf2
class(sf2)
[1] "sf"         "data.frame"
# plot sf2
plot(sf2)

# what happens if you combine geometry types?
glc3 <- st_sfc(list(pt1, pt2, pt3, pl1, pl2))
# what class is glc3
class(glc3)
[1] "sfc_GEOMETRY" "sfc"         
# create sf3
sf3 <- st_sf(attribute1 = rnorm(5), geometry = glc3)
# class of sf3
class(sf3)
[1] "sf"         "data.frame"
# plot sf3
plot(sf3, pch = 20, pal = topo.colors)

# plot sf3 with even breaks
plot(sf3, pch = 20, pal = topo.colors, breaks = seq(min(sf3$attribute1), max(sf3$attribute1), length = 12))

Importing shapefiles as sf objects (Video: YouTube, Panopto)

A data scientist is most likely to work with sf objects obtained from importing a shapefile into R.

ArcGIS defines a shapefile as:

A shapefile is a simple, nontopological format for storing the geometric location and attribute information of geographic features. Geographic features in a shapefile can be represented by points, lines, or polygons (areas). What is a shapefile?

Generally, a shapefile can be imported into R as an sf object.

  • Each row is an observation related to a geometry object.
  • There should be a geometry column that contains the geometry object for each observation (this is the geometry list column).
  • The other columns will represent the attributes associated with each geometry object.

Shapefiles are widely available for describing many different spatial objects like counties, census tracts, zip code tabulation areas (ZCTAs), states, countries, etc. There are even shapefiles that can be used to describe other spatial objects like roads, rivers, lakes, etc.

  • A simple web search with appropriate terms will usually bring up a website with a relevant shapefile for your data.
  • e.g., search “usa shapefile” brings up a Census bureau page with many different shapefiles for different areas and characteristics of the United States.
  • A “shapefile” is often a zipped folder that contains many files inside it.
  • The .shp file is usually the file you want to import.

We can import that shapefile into R as an sf object using the st_read function.

  • Typically, we want to provide the shp file to the dsn argument (data source name) of st_read.

In this case, I have downloaded the cb_2018_us_state_20m.zip containing a shapefile of the U.S. states. I have unzipped this file into the cb_2018_us_state_20m folder in the data folder in my current working directory.

  • The current working directory is the location to which R currently reads or saves files.
  • The can learn what your current working directory is by running getwd() in the Console.
  • You can change your current working directory by running setwd("path") in the Console, where path is the directory path you want to set as your working directory.

In our case, we want to read the file cb_2018_us_state_20m.shp in the cb_2018_us_state_20m folder in the data folder of our current working directory. We can run the following command to import the desired shapefile.

  • ./” indicates the current directory.
  • The dsn argument “./data/cb_2018_us_state_20m/cb_2018_us_state_20m.shp tells R to look for the cb_state_20m.shp file in the file path ./data/cb_state_2018_us_state_20m.

For simplicity, I have uploaded cb_2018_us_state_20m.zip to my GitHub repository. Prior to importing the the .shp file, running the download and unzip commands should download and unzip the necessary folder into your working directory so that you can run the same code to import the shapefile.

# download.file("https://github.com/jfrench/DataWrangleViz/raw/master/data/ca-county-boundaries.zip", destfile = "./ca-county-boundaries.zip", method = "auto")
# unzip("cb_2018_us_state_20m.zip", exdir = "./data/cb_2018_us_state_20m")
us_sf <- st_read(dsn = "./data/cb_2018_us_state_20m/cb_2018_us_state_20m.shp")
Reading layer `cb_2018_us_state_20m' from data source 
  `C:\Users\frencjos\Documents\OneDrive - The University of Colorado Denver\teaching\math3376\github_dwv\data\cb_2018_us_state_20m\cb_2018_us_state_20m.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 52 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83

The st_read provides helpful information automatically when run. In this case, we learn:

  • We imported an sf object with 52 features (observations).
  • The imported sf object has 9 attributes (columns).
  • The features appear to be the simple feature geometry MULTIPOLGYON.
  • The dimension is XY, so we are working with 2-dimensional data.
  • The bounding box tells us the largest and smallest y-coordinates.
  • The CRS is NAD83.

We use the class function to see the classes of us_sf.

class(us_sf)
[1] "sf"         "data.frame"

us_sf is a data.frame that has been extended to the sf class.

Let’s use the str function to learn more about the structure of us_sf.

str(us_sf)
Classes ‘sf’ and 'data.frame':  52 obs. of  10 variables:
 $ STATEFP : chr  "24" "19" "10" "39" ...
 $ STATENS : chr  "01714934" "01779785" "01779781" "01085497" ...
 $ AFFGEOID: chr  "0400000US24" "0400000US19" "0400000US10" "0400000US39" ...
 $ GEOID   : chr  "24" "19" "10" "39" ...
 $ STUSPS  : chr  "MD" "IA" "DE" "OH" ...
 $ NAME    : chr  "Maryland" "Iowa" "Delaware" "Ohio" ...
 $ LSAD    : chr  "00" "00" "00" "00" ...
 $ ALAND   : num  2.52e+10 1.45e+11 5.05e+09 1.06e+11 1.16e+11 ...
 $ AWATER  : num  6.98e+09 1.08e+09 1.40e+09 1.03e+10 3.39e+09 ...
 $ geometry:sfc_MULTIPOLYGON of length 52; first list element: List of 2
  ..$ :List of 1
  .. ..$ : num [1:6, 1:2] -76 -76 -76 -76 -76 ...
  ..$ :List of 1
  .. ..$ : num [1:261, 1:2] -79.5 -79.5 -79.5 -79.4 -79 ...
  ..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
 - attr(*, "sf_column")= chr "geometry"
 - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA
  ..- attr(*, "names")= chr [1:9] "STATEFP" "STATENS" "AFFGEOID" "GEOID" ...

us_sf has 52 rows and 10 columns. The (most) useful columns are:

  • STUSPS: the state abbreviation
  • NAME: the state name
  • ALAND: the land area of each state
  • AWATER: the water area of each state

The geometry column is the simple feature geometry list column and contains the geometry object for each observation.

Wrangling sf objects (Video: YouTube, Panopto)

An sf object is a type of data frame, similar to how a tibble is a type of data frame. Both classes extend the data.frame class.

We can select columns of the us_sf sf object in the following ways:

us_sf$STUSPS
 [1] "MD" "IA" "DE" "OH" "PA" "NE" "WA" "PR" "AL" "AR" "NM" "TX" "CA" "KY"
[15] "GA" "WI" "OR" "MO" "VA" "TN" "LA" "NY" "MI" "ID" "FL" "AK" "IL" "MT"
[29] "MN" "IN" "MA" "KS" "NV" "VT" "CT" "NJ" "DC" "NC" "UT" "ND" "SC" "MS"
[43] "CO" "SD" "OK" "WY" "WV" "ME" "HI" "NH" "AZ" "RI"
us_sf[,5]
Simple feature collection with 52 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83
First 10 features:
   STUSPS                       geometry
1      MD MULTIPOLYGON (((-76.04621 3...
2      IA MULTIPOLYGON (((-96.62187 4...
3      DE MULTIPOLYGON (((-75.77379 3...
4      OH MULTIPOLYGON (((-82.86334 4...
5      PA MULTIPOLYGON (((-80.51989 4...
6      NE MULTIPOLYGON (((-104.0531 4...
7      WA MULTIPOLYGON (((-123.2371 4...
8      PR MULTIPOLYGON (((-65.34207 1...
9      AL MULTIPOLYGON (((-88.46866 3...
10     AR MULTIPOLYGON (((-94.61792 3...
us_sf[,"STUSPS"]
Simple feature collection with 52 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83
First 10 features:
   STUSPS                       geometry
1      MD MULTIPOLYGON (((-76.04621 3...
2      IA MULTIPOLYGON (((-96.62187 4...
3      DE MULTIPOLYGON (((-75.77379 3...
4      OH MULTIPOLYGON (((-82.86334 4...
5      PA MULTIPOLYGON (((-80.51989 4...
6      NE MULTIPOLYGON (((-104.0531 4...
7      WA MULTIPOLYGON (((-123.2371 4...
8      PR MULTIPOLYGON (((-65.34207 1...
9      AL MULTIPOLYGON (((-88.46866 3...
10     AR MULTIPOLYGON (((-94.61792 3...
us_sf["STUSPS"]
Simple feature collection with 52 features and 1 field
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83
First 10 features:
   STUSPS                       geometry
1      MD MULTIPOLYGON (((-76.04621 3...
2      IA MULTIPOLYGON (((-96.62187 4...
3      DE MULTIPOLYGON (((-75.77379 3...
4      OH MULTIPOLYGON (((-82.86334 4...
5      PA MULTIPOLYGON (((-80.51989 4...
6      NE MULTIPOLYGON (((-104.0531 4...
7      WA MULTIPOLYGON (((-123.2371 4...
8      PR MULTIPOLYGON (((-65.34207 1...
9      AL MULTIPOLYGON (((-88.46866 3...
10     AR MULTIPOLYGON (((-94.61792 3...

Note that the $ operator extracts the column from us_sf (returning a character vector), while the other choices subset us_sf and remain sf objects (i.e., the geometry list column is retained).

We can select rows in a similar fashion.

us_sf[2:3,]
Simple feature collection with 2 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -96.6247 ymin: 38.45101 xmax: -75.04894 ymax: 43.50088
Geodetic CRS:  NAD83
  STATEFP  STATENS    AFFGEOID GEOID STUSPS     NAME LSAD        ALAND
2      19 01779785 0400000US19    19     IA     Iowa   00 144661267977
3      10 01779781 0400000US10    10     DE Delaware   00   5045925646
      AWATER                       geometry
2 1084180812 MULTIPOLYGON (((-96.62187 4...
3 1399985648 MULTIPOLYGON (((-75.77379 3...
us_sf[us_sf$STUSPS == "CO",]
Simple feature collection with 1 feature and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -109.0601 ymin: 36.99243 xmax: -102.0419 ymax: 41.00307
Geodetic CRS:  NAD83
   STATEFP  STATENS    AFFGEOID GEOID STUSPS     NAME LSAD        ALAND
43      08 01779779 0400000US08    08     CO Colorado   00 268422891711
       AWATER                       geometry
43 1181621593 MULTIPOLYGON (((-109.06 38....
us_sf[startsWith(us_sf$STUSPS, "C"),]
Simple feature collection with 3 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.4096 ymin: 32.53416 xmax: -71.78936 ymax: 42.04964
Geodetic CRS:  NAD83
   STATEFP  STATENS    AFFGEOID GEOID STUSPS        NAME LSAD        ALAND
13      06 01779778 0400000US06    06     CA  California   00 403503931312
35      09 01779780 0400000US09    09     CT Connecticut   00  12542497068
43      08 01779779 0400000US08    08     CO    Colorado   00 268422891711
        AWATER                       geometry
13 20463871877 MULTIPOLYGON (((-118.594 33...
35  1815617571 MULTIPOLYGON (((-73.69594 4...
43  1181621593 MULTIPOLYGON (((-109.06 38....
us_sf %>% filter(startsWith(us_sf$STUSPS, "C"))
Simple feature collection with 3 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.4096 ymin: 32.53416 xmax: -71.78936 ymax: 42.04964
Geodetic CRS:  NAD83
  STATEFP  STATENS    AFFGEOID GEOID STUSPS        NAME LSAD        ALAND
1      06 01779778 0400000US06    06     CA  California   00 403503931312
2      09 01779780 0400000US09    09     CT Connecticut   00  12542497068
3      08 01779779 0400000US08    08     CO    Colorado   00 268422891711
       AWATER                       geometry
1 20463871877 MULTIPOLYGON (((-118.594 33...
2  1815617571 MULTIPOLYGON (((-73.69594 4...
3  1181621593 MULTIPOLYGON (((-109.06 38....

Note that startsWith is a base R function that finds the elements of a character vector that start with a certain set of characters while start_with is a dplyr function that is used to select columns of a data frame based on a pattern.

A really neat feature of sf objects is that can use a spatial object to select rows. Let’s extract the “Colorado” row from us_sf.

co <- us_sf[us_sf$NAME == "Colorado",]
class(co)
[1] "sf"         "data.frame"

If we pass the co object as the row argument inside the square brackets, [], then the rows of us_sf with geometry objects that intersect the co geometry object will be returned

(co_intersects <- us_sf[co,])
Simple feature collection with 8 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -114.8142 ymin: 31.33224 xmax: -94.43151 ymax: 45.0059
Geodetic CRS:  NAD83
   STATEFP  STATENS    AFFGEOID GEOID STUSPS       NAME LSAD        ALAND
6       31 01779792 0400000US31    31     NE   Nebraska   00 198956658395
11      35 00897535 0400000US35    35     NM New Mexico   00 314196306401
32      20 00481813 0400000US20    20     KS     Kansas   00 211755344060
39      49 01455989 0400000US49    49     UT       Utah   00 212886221680
43      08 01779779 0400000US08    08     CO   Colorado   00 268422891711
45      40 01102857 0400000US40    40     OK   Oklahoma   00 177662925723
46      56 01779807 0400000US56    56     WY    Wyoming   00 251458544898
51      04 01779777 0400000US04    04     AZ    Arizona   00 294198551143
       AWATER                       geometry
6  1371829134 MULTIPOLYGON (((-104.0531 4...
11  728776523 MULTIPOLYGON (((-109.0492 3...
32 1344141205 MULTIPOLYGON (((-102.0517 4...
39 6998824394 MULTIPOLYGON (((-114.0525 3...
43 1181621593 MULTIPOLYGON (((-109.06 38....
45 3374587997 MULTIPOLYGON (((-103.0025 3...
46 1867670745 MULTIPOLYGON (((-111.0569 4...
51 1027337603 MULTIPOLYGON (((-114.7997 3...
plot(co_intersects["NAME"])

plot(st_geometry(co_intersects))

caco <- us_sf[us_sf$NAME == "Colorado" | 
                us_sf$NAME == "California",]
(caco_intersects <- us_sf[caco,])
Simple feature collection with 11 features and 9 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -124.5524 ymin: 31.33224 xmax: -94.43151 ymax: 46.26913
Geodetic CRS:  NAD83
First 10 features:
   STATEFP  STATENS    AFFGEOID GEOID STUSPS       NAME LSAD        ALAND
6       31 01779792 0400000US31    31     NE   Nebraska   00 198956658395
11      35 00897535 0400000US35    35     NM New Mexico   00 314196306401
13      06 01779778 0400000US06    06     CA California   00 403503931312
17      41 01155107 0400000US41    41     OR     Oregon   00 248606993270
32      20 00481813 0400000US20    20     KS     Kansas   00 211755344060
33      32 01779793 0400000US32    32     NV     Nevada   00 284329506470
39      49 01455989 0400000US49    49     UT       Utah   00 212886221680
43      08 01779779 0400000US08    08     CO   Colorado   00 268422891711
45      40 01102857 0400000US40    40     OK   Oklahoma   00 177662925723
46      56 01779807 0400000US56    56     WY    Wyoming   00 251458544898
        AWATER                       geometry
6   1371829134 MULTIPOLYGON (((-104.0531 4...
11   728776523 MULTIPOLYGON (((-109.0492 3...
13 20463871877 MULTIPOLYGON (((-118.594 33...
17  6192386935 MULTIPOLYGON (((-124.5524 4...
32  1344141205 MULTIPOLYGON (((-102.0517 4...
33  2047206072 MULTIPOLYGON (((-120.0048 3...
39  6998824394 MULTIPOLYGON (((-114.0525 3...
43  1181621593 MULTIPOLYGON (((-109.06 38....
45  3374587997 MULTIPOLYGON (((-103.0025 3...
46  1867670745 MULTIPOLYGON (((-111.0569 4...
plot(st_geometry(caco_intersects))

The base merge and dplyr *_join functions can be used to merge a data frame with an sf object.

Let’s access a COVID-19 related data frame available in the bayesutils package, which can be installed from GitHub.

# install.packages("remotes")
# remotes::install_github("jfrench/bayesutils")
data("covid_20210307", package = "bayesutils")

The state_abb column of covid_20210307 has the state abbreviations and matches the STUSPS column of us_sf.

We use the base merge function to unite these two objects into a new object, covid_us.

covid_us <- merge(us_sf, covid_20210307,
                  by.x = "STUSPS", by.y = "state_abb")
head(covid_us)
Simple feature collection with 6 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -179.1743 ymin: 30.22831 xmax: 179.7739 ymax: 71.35256
Geodetic CRS:  NAD83
  STUSPS STATEFP  STATENS    AFFGEOID GEOID       NAME LSAD        ALAND
1     AK      02 01785533 0400000US02    02     Alaska   00 1.478840e+12
2     AL      01 01779775 0400000US01    01    Alabama   00 1.311740e+11
3     AR      05 00068085 0400000US05    05   Arkansas   00 1.347689e+11
4     AZ      04 01779777 0400000US04    04    Arizona   00 2.941986e+11
5     CA      06 01779778 0400000US06    06 California   00 4.035039e+11
6     CO      08 01779779 0400000US08    08   Colorado   00 2.684229e+11
        AWATER state_name deaths   cases population income   hs   bs
1 245481577452     Alaska    305   56886     738516  35455 91.0 27.9
2   4593327154    Alabama  10148  499819    4864680  25734 82.1 21.9
3   2962859592   Arkansas   5319  324818    2990671  25359 82.9 19.5
4   1027337603    Arizona  16328  826454    6946685  29348 85.6 25.9
5  20463871877 California  54124 3501394   39148760  31086 80.7 30.1
6   1181621593   Colorado   5989  436602    5531141  35053 89.7 36.4
  vote_diff_2020                       geometry
1          -10.5 MULTIPOLYGON (((179.4813 51...
2          -25.8 MULTIPOLYGON (((-88.46866 3...
3          -28.4 MULTIPOLYGON (((-94.61792 3...
4            0.3 MULTIPOLYGON (((-114.7997 3...
5           29.8 MULTIPOLYGON (((-118.594 33...
6           13.9 MULTIPOLYGON (((-109.06 38....

Alternatively, we could have used a dplyr *_join function. We’ll use full_join. Note the special syntax in the by argument to address the fact that we want to merge the rows based on different columns in the data frames.

covid_us2 <- full_join(us_sf, covid_20210307, by = c("STUSPS" = "state_abb"))
head(covid_us2)
Simple feature collection with 6 features and 17 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -104.0532 ymin: 37.91685 xmax: -74.69491 ymax: 43.50088
Geodetic CRS:  NAD83
  STATEFP  STATENS    AFFGEOID GEOID STUSPS         NAME LSAD        ALAND
1      24 01714934 0400000US24    24     MD     Maryland   00  25151100280
2      19 01779785 0400000US19    19     IA         Iowa   00 144661267977
3      10 01779781 0400000US10    10     DE     Delaware   00   5045925646
4      39 01085497 0400000US39    39     OH         Ohio   00 105828882568
5      42 01779798 0400000US42    42     PA Pennsylvania   00 115884442321
6      31 01779792 0400000US31    31     NE     Nebraska   00 198956658395
       AWATER   state_name deaths  cases population income   hs   bs
1  6979966958     Maryland   7955 387319    6003435  39682 88.1 36.1
2  1084180812         Iowa   5558 282384    3132499  31509 90.6 24.9
3  1399985648     Delaware   1473  88354     949495  32928 87.7 27.8
4 10268850702         Ohio  17656 978471   11641879  29602 88.1 24.6
5  3394589990 Pennsylvania  24349 948643   12791181  30681 88.4 27.1
6  1371829134     Nebraska   2113 203026    1904760  31609 90.4 28.6
  vote_diff_2020                       geometry
1           34.1 MULTIPOLYGON (((-76.04621 3...
2           -8.4 MULTIPOLYGON (((-96.62187 4...
3           19.3 MULTIPOLYGON (((-75.77379 3...
4           -8.2 MULTIPOLYGON (((-82.86334 4...
5            1.2 MULTIPOLYGON (((-80.51989 4...
6          -19.6 MULTIPOLYGON (((-104.0531 4...

If a new row is added to the sf object without a corresponding geometry, then an empty geometry object is added for that row.

Plotting simple features

The power of the sf package is the ability to easily create plots of spatial data.

Plotting sfc objects (Video: YouTube, Panopto)

To simply plot the geometry list column of an sf object, you can use:

  • st_geometry to extract the list column of simple feature geometries from the sf object (this will be an object of class sfc).
  • plot to plot the sfc object.
plot(st_geometry(us_sf))

Alternatively, you can directly extract the sfc component of the sf object using $, then plot the sfc object.

plot(us_sf$geometry)

You can easily change aspects of the plotted sfc object (or an sf) object using the standard arguments:

  • axes can be set to TRUE to show the axes
  • xlab and ylab will change the axis labels
  • xlim and ylim can be used to constrain the plotting regions.
    • Note that “W” longitude values are indicated using negative numbers, while “E” longitude values are positive numbers.
    • Note that “N” latitude values in the northern hemisphere are positive numbers. “S” latitude values in the southern hemisphere are negative numbers.

Each geometry has specific arguments that the user can change (consider looking at the documentation for ?sf::plot.sf for details). In this case, we can change the fill color of the MULTIPOLYGON objects using the color argument and the border using the border argument.

plot(us_sf$geometry, axes = TRUE,
     xlab = "longitude", ylab = "latitude",
     col = "grey", border = "blue",
     xlim = c(-125, -75),
     ylim = c(22, 50))

You can change the colors of the individual geometry objects with a little creativity. Let’s color all the “C states” (California, Colorado, Connecticut) a little differently than the other states.

# determine indices of C states
c_states <- startsWith(us_sf$STUSPS, "C")
# create a character vector of replicated "grey"
# values matching the number of rows in us_sf
mycol <- rep("grey", nrow(us_sf))
#change "grey" to "orange" for the c_states indices
mycol[c_states] <- "orange"
# create a character vector of replicated "yellow"
# values matching the number of rows in us_sf
myborder <- rep("yellow", nrow(us_sf))
#change "yellow" to "blue" for the c_states indices
myborder[c_states] <- "blue"
# create the customized plot
plot(st_geometry(us_sf), col = mycol, border = myborder, xlim = c(-125, -75),
     ylim = c(22, 50))

Customize color for subset of sf object (Video: YouTube, Panopto)

Let’s look at a different approach to coloring a subset of the sf object differently from the rest of the sf object.

# draw all geometries
plot(st_geometry(us_sf), col = "grey", border = "yellow",
     xlim = c(-125, -75), ylim = c(22, 50))
# plot subset of all geometries with different colors
# add = TRUE overlays new drawing on current graphic
plot(st_geometry(us_sf)[c_states],
     col = "orange", border = "blue",
     add = TRUE)

Plotting attributes of an sf object (Video: YouTube, Panopto)

By default, use you can use the plot function to plot all the attributes of an sf object. In general, this isn’t very useful.

plot(covid_us)
Warning: plotting the first 10 out of 17 attributes; use max.plot = 17 to plot all

More likely, you will want to plot a single attribute (variable), which can be done by subsetting that variable in the sf object and plotting the subsetted object.

Let’s plot the land area of the states, excluding Alaska and Hawaii. First, we exclude the Alaska and Hawaii rows (and save the filtered object).

covid_us <- covid_us %>% filter(!is.element(STUSPS, c("AK", "HI")))
plot(covid_us["ALAND"])

Land area is directly related to the area of the state. It would be interesting to visualize the states that have the greatest percentage of water. Let’s create a new variable in covid_us that computes the percentage of the state that is water. We’ll then plot this variable (excluding Alaska and Hawaii)

covid_us <- covid_us %>% mutate(prop_water = AWATER/(AWATER + ALAND))
plot(covid_us["prop_water"])

Not surprisingly, coastal states and states on the Great Lakes have the highest percentage of water.

We can use the following commands to identify the states with the 6 largest proportions of water.

covid_us %>% slice_max(prop_water, n = 6) %>% select(NAME, prop_water)
Simple feature collection with 6 features and 2 fields
Geometry type: MULTIPOLYGON
Dimension:     XY
Bounding box:  xmin: -90.41814 ymin: 24.49813 xmax: -69.92826 ymax: 48.21065
Geodetic CRS:  NAD83
           NAME prop_water                       geometry
1      Michigan  0.4147358 MULTIPOLYGON (((-84.61622 4...
2  Rhode Island  0.3307977 MULTIPOLYGON (((-71.63147 4...
3 Massachusetts  0.2608345 MULTIPOLYGON (((-70.27553 4...
4      Maryland  0.2172342 MULTIPOLYGON (((-76.04621 3...
5      Delaware  0.2171897 MULTIPOLYGON (((-75.77379 3...
6       Florida  0.1841410 MULTIPOLYGON (((-81.81169 2...

We can change the number of breaks in our color bar via the nbreaks.

We can change the colors in our color bar by specifying the pal argument.

  • The pal argument takes a function that, when given n, the number of desired colors, returns a vector of colors.
  • The hcl.colors, rainbow, heat.colors, terrain.colors, topo.colors, and cm.colors are all color palettes in base R that can be used to change the colors palettes.
  • The hcl.colors palette is particularly good, as it includes color palettes viridis and cividis (corrected viridis) that are particularly well-suited to displaying colors that are color-blind and can be understood when printed in black and white.
  • The colorspace package also has a wide variety of palettes you might consider.

However, the hcl.colors function has a palette argument to specify the desired palette. To use an hcl.color palette when plotting an sf object, we need to build a custom palette that only requires the number of colors desired. We create those function for the viridis and cividis palettes below. We then see that it produces the desired results.

viridis_pal <- function(n) hcl.colors(n, palette = "viridis")
cividis_pal <- function(n) hcl.colors(n, palette = "cividis")
# dirty approach to see colors in the palette
image(matrix(0:4), col = viridis_pal(5))

image(matrix(0:4), col = cividis_pal(5))

Let’s do some analysis of the actual COVID-19 data. Let’s create a new column for death_rate_100k, which is the number of confirmed and probable COVID-19 deaths per 100,000 persons. Let’s display the death rate using the viridis palette and then the cividis palette.

covid_us <- covid_us %>% mutate(death_rate_100k = deaths/population*100000)
plot(covid_us["death_rate_100k"],
     nbreaks = 5,
     pal = viridis_pal)

plot(covid_us["death_rate_100k"],
     nbreaks = 5,
     pal = cividis_pal)

Plotting sf objects with ggplot2 (Video: YouTube, Panopto)

The ggplot2 package includes a geometry for sf objects, geom_sf, the is typically adequate for using ggplot2 to produce graphics for sf objects.

  • The sf object is passed as the data argument to the ggplot function.
  • geom_sf is the geometry of the data
  • The XY coordinates of the sf object are automatically mapped to x and y aesthetics.
  • The color, linetype, fill, etc., of the geometry objects can be changed by mapping an attribute of the sf object to the appropriate aesthetic.

If we only want to plot the geometry list column (i.e., the geometry objects) of each observation, then we don’t have to specify any aesthetics.

ggplot(covid_us) + geom_sf()

We will create a choropleth map of the covid_us data. * A choropleth map is a map of regions colored to indicate the level of some variable associated with the regions.

ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k))

Changing the color palette for our fill aesthetic to a viridis palette.

ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  colorspace::scale_fill_continuous_sequential(palette = "viridis")

Change the color palette to viridis using ggplot2’s built-in function.

ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  scale_fill_viridis_c(option = "viridis")

Reverse the order of the colors.

ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  scale_fill_viridis_c(option = "viridis", direction = -1)

Use the cividis palette using the scale_fill_viridis function from the viridis package (Garnier 2021).

ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  viridis::scale_fill_viridis(option = "E")

Additional resources

sf help and tutorials from the package authors [https://r-spatial.github.io/sf/]

Geocomputation with R by Robin Lovelace [https://bookdown.org/robinlovelace/geocompr/]. This book covers tons of aspect of spatial data (not just from an R perspective, but general theory and concepts). This will help you to learn a lot more about representing and working with spatial data in general, not just working with it in R.

Spatial Data Science with applications in R by Edzer Pebesma and Roger Bivand [https://keen-swartz-3146c4.netlify.app/]. Also covers much much about representing spatial data than you probably ever thought possible! The authors are the main creators of the sf package.

Color palettes available with base R

References

Csárdi, Gábor, Jim Hester, Hadley Wickham, Winston Chang, Martin Morgan, and Dan Tenenbaum. 2021. Remotes: R Package Installation from Remote Repositories, Including GitHub. https://CRAN.R-project.org/package=remotes.
Garnier, Simon. 2021. Viridis: Colorblind-Friendly Color Maps for r. https://CRAN.R-project.org/package=viridis.
Ihaka, Ross, Paul Murrell, Kurt Hornik, Jason C. Fisher, Reto Stauffer, Claus O. Wilke, Claire D. McWhite, and Achim Zeileis. 2022. Colorspace: A Toolbox for Manipulating and Assessing Colors and Palettes. https://CRAN.R-project.org/package=colorspace.
Pebesma, Edzer. 2022. Sf: Simple Features for r. https://CRAN.R-project.org/package=sf.
Pebesma, Edzer, and Roger Bivand. 2022. Sp: Classes and Methods for Spatial Data. https://CRAN.R-project.org/package=sp.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Romain François, Lionel Henry, and Kirill Müller. 2022. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.
---
title: "Wrangling and visualizing simple features spatial data in R"
author: "Joshua French"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output:
  html_notebook: default
bibliography:
  - dwv.bib
  - packages_sf.bib
---

```{r, include=FALSE}
knitr::opts_chunk$set(
  tidy = TRUE
)
knitr::write_bib(c("dplyr", "tibble", "tidyr", "palmerpenguins", "magrittr", "data.table", "reshape", "reshape2", "dbplyr", "dtplyr", "sf", "sp", "colorspace", "remotes", "viridis"), file = "packages_sf.bib")
set.seed(10403)
```

This document is intended to help you quickly learn how to perform basic manipulation and visualization of spatial data that can be represented using simple features. We will primarily focus on using the **sf** package [@R-sf] for our analysis.

An accompanying YouTube playlist that walks through this document is available
by following the link [here](https://youtube.com/playlist?list=PLkrJrLs7xfbWjD2rp3pIV85lby-tR3Cnu). I have also included direct, relevant video links throughout the file (e.g., immediately after the the relevant heading). An html version of this document can be downloaded to your current working direction (run `getwd()` in the Console to see where this is) by running the following command:
```{r, eval=FALSE}
download.file("https://raw.githubusercontent.com/jfrench/DataWrangleViz/master/05-simple-features-wrangling-visualizing.nb.html", "05-simple-features-wrangling-visualizing.nb.html")
```
The raw R Markdown code used to generate the html file can be obtained by toggling the "Code" box in the upper-right corner of the html file and selecting "Download Rmd".


# Wrangling and visualizing simple features spatial data in R (Video: [YouTube](https://youtu.be/9BdFCm8N6vQ), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=6752b3cf-c2d3-4144-85b5-af3201549178))

Spatial data can take many forms.

For a data scientist, spatial data is usually data for which each observation is geographic region or location.

> *Simple Features* (officially *Simple Feature Access*) is a set of standards that specify a common storage and access model of geographic feature made of mostly two-dimensional geometries (point, line, polygon, multi-point, multi-line, etc.) used by geographic information systems. It is formalized by both the Open Geospatial Consortium (OGC) and the International Organization for Standardization (ISO). [Wikipedia](https://en.wikipedia.org/wiki/Simple_Features)

Simple features are comprised of:

1. A geometry object (e.g., a point, a polygon, a line, etc.) 
2. Attribute data associated with the geometry object (e.g., the temperature at a certain time on a certain day, the number of new cases of a disease in a county in the last month).

## Simple features

As stated in the vignette to the **sf** package:

> Features have a geometry describing where on Earth the feature is located, and they have attributes, which describe other properties. The geometry of a tree can be the delineation of its crown, of its stem, or the point indicating its centre. Other properties may include its height, color, diameter at breast height at a particular date, and so on.

Just to clarify:

* The *geometry* of an observation describes where the object is located.
  * A geometry can be a point, a polygon (which is really just an ordered collection of points), or something more complicated.
* The *attributes* of a geometry object are what data scientists would usually consider the "data".

All geometries are made of points, which can be combined to create more and more complex objects.

A point can be 2-, 3-, or 4-dimensional.

The most common kinds of points are two-dimensional and are described using a 2-dimensional set of coordinates (X and Y), e.g., longitude and latitude or easting and northing.
  
  * 2-dimensional points are referred to as XY points.
  
A 3-dimensional point could include the X and Y coordinates as well as the altitude of the 2-dimensional point in 3-dimensional space.
  
  * The Z coordinate is another dimension denoting altitude of the point.
  * Combining a Z coordinate with an XY coordinate results in a 3-dimensional XYZ point.
  
Alternatively, a 3-dimensional point could include some other measure associated with the point.
  
  * The M coordinate is another dimension denoting some measure associated with the point.
  * It's pretty rare, but examples include time or measurement error.
  * Combining an M coordinate with an XY coordinate results in a 3-dimensional XYM point.
  
Combining X, Y, Z, and M coordinates results in a 4-dimensional point.

## Packages and tools for working with simple features (Video: [YouTube](https://youtu.be/uT8mr2V3YPA), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=f7a79e04-1a13-4bc1-a35b-af3201547997))

In what follows, we will use several packages.

* The **sf** package [@R-sf] is an R package for working with simple features (`sf` objects) both in terms of the geometry objects and the associated attributes.
  * The **sf** package can import, manipulate, and plot `sf` objects.
  * The **sf** package is intended to supersede the **sp** package [@R-sp], which is an older R package for working with spatial data.
  * Since the **sp** package is being superseded by the **sf** package, I recommend working with the **sf** package for spatial data analysis moving forward.
  
Because of how `sf` objects are represented in R, simple features (once constructed or imported) can be manipulated and plotted by many other well-known packages.

* The **dplyr** package [@R-dplyr] can be used to manipulate `sf` objects, and we will utilize it as certain times.
* The **ggplot2** package [@ggplot22016] can also be used to plot `sf` objects, which we will demonstrate.

Choosing a color palette to represent the attributes of your simple features is very important.

* The **grDevices** package included with **base** R provides  many color palettes.
  * The traditional palettes are `rainbow`, `heat.colors`, `terrain.colors`, `topo.colors`, and `cm.colors`.
  * The `hcl.colors` palette function was added in R 4.0.0 and provides numerous excellent color palettes.
  * Running the Examples in the documentation found in `?grDevices::hcl.colors` will show you examples of the available palettes through **grDevices**.
  * We provide color swatches for many palettes at the end of this tutorial (without the code).

* The **colorspace** package [@R-colorspace] provides many of the same HCL palettes available through the `hcl.colors` palette, but it also provides convenient functions for accessing this in **ggplot2**. 

We load all of these packages below, with the exception of **colorspace**, as it has a `coords` function that *masks* (i.e., replaces) a needed function in the **sf** package. So we will access the necessary **colorspace** function using the `::` approach.

```{r}
# library(colorspace)
library(sf)
library(dplyr)
library(ggplot2)
```

## Simple feature geometry objects (Video: [YouTube](https://youtu.be/EI5f5tWKYaY), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=98c11f3f-6616-4148-a92b-af320154647b))
The most common simple feature geometry objects used by data scientists are:

geometry | function | description
--- | --- | ---
`POINT` |	`sf::st_point` | A geometry containing a single point
`POLYGON` |	`sf::st_polygon` | A geometry with a sequence of points that form a closed ring that doesn't intersect itself. Multiple rings form outer rings and holes within the polygon.

We go through the process of creating and plotting these geometries below.

```{r}
# create an XY point
(p1 = st_point(c(1,2)))
# create an XYZ point
(p2 = st_point(c(1,2,3)))
# the points look the same when plotted in two dimensions
plot(p1)
plot(p2)
# create a ring (connected set of points)
outer <- matrix(c(0, 0, 10, 0, 10, 10, 0, 10, 0, 0), ncol = 2, byrow = TRUE)
# create additional rings for holes
hole1 <- matrix(c(1, 1, 1, 2, 2, 2, 2, 1, 1, 1), ncol = 2, byrow = TRUE)
hole2 <- matrix(c(5, 5, 5, 6, 6, 6, 6, 5, 5, 5), ncol = 2, byrow = TRUE)
# combine rings into a list
pts <- list(outer, hole1, hole2)
# turn list of rings into a polygon
(pl1 <- st_polygon(pts))
# plot polygon
plot(pl1)
```

The other common geometry objects are:

geometry | function | description
--- | --- | ---
`LINESTRING` |	`sf::st_linestring` | A sequence of points that is connected with straight, non-self intersecting lines
`MULTIPOINT` |	`sf::st_multipoint` | A set of points
`MULTIPOLYGON` |	`sf::st_multipolygon` | A set of polygons
`MULTILINESTRING` |	`sf::st_multipoint` | A set of line strings
`GEOMETRYCOLLECTION` |	`sf::st_geometrycollection` | A set of the other geometries (except itself)

We provide examples of creating and plotting these geometries below.
```{r}
# create a matrix of multiple points
(pts <- matrix(rnorm(10), ncol = 2))
# convert matrix of points to linestring
(ls1 <- st_linestring(pts))
plot(ls1)
# convert matrix fo points to multipoints
(mp1 <- st_multipoint(pts))
plot(mp1)
# create multipolygons
pol1 <- list(outer, hole1, hole2)
pol2 <- list(outer + 24)
mp <- list(pol1, pol2)
(mpl1 <- st_multipolygon(mp))
plot(mpl1, axes = TRUE)
# create a multilinestring
(pts2 <- matrix(rnorm(6), ncol = 2))
(ml1 <- st_multilinestring(list(pts, pts2)))
plot(ml1)
# create a geometry collection
(gc <- st_geometrycollection(list(p1, pl1, ls1)))
plot(gc, col = "grey")
```

How do I need what type of geometry I need?

* Often, this is determined automatically when you read in shapefiles (which we'll discuss later).
* Attributes observed at a single location require a `POINT`.
* Most regions can be represented by a `POLYGON`.
* Complicated objects made of regions, e.g., an island chain like Hawaii, require `MULTIPOLYGONS`.
* The other types are for more complicated objects.
* There are 10 other rarer geometry types that we do not discuss (`CIRCULARSTRING`, `CURVE`, `SURFACE`, `TRIANGLE`, `COMPOUNDCURVE`, `CURVEPOLYGON`, `MULTICURVE`, `MULTISURFACE`, `POLYHEDRALSURFACE`, `TIN`). You can learn about them through the additional resources provided at the end of this document.

## Coordinate reference systems

A coordinate reference system (CRS) must be provided in order to place a point on the earth's surface.

When your import a geometry object from file, the CRS will often be provided.

* The WGS84 CRS is often the default for longitude/latitude coordinates.

Sometimes, in order to combine geometry objects, you must specify the CRS of a geometry object.

There are many CRSs. A CRS is often used because it is known to have a certain desirable property. A discussion of CRSs is beyond the scope of this tutorial. And when you do need to know about CRSs, it will probably be so specific that a general discussion won't help. However, here are a few references related to CRSs that may provide a bit more detail:

* [QGIS CRSs](https://docs.qgis.org/3.16/en/docs/gentle_gis_introduction/coordinate_reference_systems.html)
* [Introduction to CRSs](https://bookdown.org/tep/gisbooklet/introduction-to-coordinate-reference-systems.html)

## Constructing simple feature (`sf`) objects (Video: [YouTube](https://youtu.be/agHB8EeNUNk), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=4e876f84-8053-46b1-ae5e-af3201544502))

An `sf` object is a `data.frame` that has a simple feature geometry list column (i.e., a column of geometry objects). So you can work with `sf` objects similar to how you would work with a `data.frame` object, though it may have different default behaviors because it has been extended to an `sf` object.

* The geometry-list column contains the geometry object for each observation.
* The other columns of the `sf` object contain the attributes of the geometry object.
* Each row of the `sf` object is a simple feature. Alternatively, each observation is a simple feature. 

In practice, `sf` objects are often initially created by reading in a shapefile. However, particularly for `POINT` observations, you may need to create an `sf` object manually.

In what follows, we create `sf` objects for `POINT` geometry objects and `POLYGON` geometry objects.

* We can use the same previously discussed functions (e.g., `sf::st_point`, `sf::st_polygon`, etc.) to create the geometry objects. 
* The `sf::st_sfc` function is used to create a geometry list column.
* The `sf::st_sf` function is used to combined a `data.frame` of attributes with the geometry list column.

```{r}
# create POINT objects
pt1 <- st_point(c(0, 0))
pt2 <- st_point(c(0, 1))
pt3 <- st_point(c(1, 1))
# create geometry list column
glc1 <- st_sfc(list(pt1, pt2, pt3))
# is glc1 a list?
is.list(glc1)
# what class is glc1
class(glc1)
# create attribute data.frame with temperature and rainfall attributes
df1 <- data.frame(temperature = c(10, 11, 10.4), rainfall = c(1, 1.3, 0.9))
# create sf object
sf1 <- st_sf(df1, geometry = glc1)
# class of sf1
class(sf1)
# plot sf1
plot(sf1["temperature"], pch = 20, axes = TRUE)
# create polygon objects
outer1 <- matrix(c(0, 0, 1, 1, 0, 0, 1, 1, 0, 0), ncol = 2)
pl1 <- st_polygon(list(outer1))
# outer2 is outer1 shifted 1 unit to the right
outer2 <- matrix(c(1, 1, 2, 2, 1, 0, 1, 1, 0, 0), ncol = 2)
pl2 <- st_polygon(list(outer2))
# create geometry list columns
glc2 <- st_sfc(list(pl1, pl2))
# what class is glc2
class(glc2)
# create second sf object (only include an attribute column and geometry)
sf2 <- st_sf(cases = c(10, 12), geometry = glc2)
# class of sf2
class(sf2)
# plot sf2
plot(sf2)
# what happens if you combine geometry types?
glc3 <- st_sfc(list(pt1, pt2, pt3, pl1, pl2))
# what class is glc3
class(glc3)
# create sf3
sf3 <- st_sf(attribute1 = rnorm(5), geometry = glc3)
# class of sf3
class(sf3)
# plot sf3
plot(sf3, pch = 20, pal = topo.colors)
# plot sf3 with even breaks
plot(sf3, pch = 20, pal = topo.colors, breaks = seq(min(sf3$attribute1), max(sf3$attribute1), length = 12))
```

## Importing shapefiles as `sf` objects (Video: [YouTube](https://youtu.be/Ec6D12-La7w), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=03acc9c5-aba3-4562-9b22-af3201543bb9))

A data scientist is most likely to work with `sf` objects obtained from importing a shapefile into R. 

ArcGIS defines a shapefile as:

> A shapefile is a simple, nontopological format for storing the geometric location and attribute information of geographic features. Geographic features in a shapefile can be represented by points, lines, or polygons (areas). [What is a shapefile?](https://desktop.arcgis.com/en/arcmap/10.3/manage-data/shapefiles/what-is-a-shapefile.htm)

Generally, a shapefile can be imported into R as an `sf` object.
  
* Each row is an observation related to a geometry object.
* There should be a `geometry` column that contains the geometry object for each observation (this is the geometry list column).
* The other columns will represent the attributes associated with each geometry object.
  
Shapefiles are widely available for describing many different spatial objects like counties, census tracts, zip code tabulation areas (ZCTAs), states, countries, etc. There are even shapefiles that can be used to describe other spatial objects like roads, rivers, lakes, etc.

* A simple web search with appropriate terms will usually bring up a website with a relevant shapefile for your data.
* e.g., search "usa shapefile" brings up a [Census bureau page](https://www.census.gov/geographies/mapping-files/time-series/geo/carto-boundary-file.html) with many different shapefiles for different areas and characteristics of the United States. 
* A "shapefile" is often a zipped folder that contains many files inside it.
* The `.shp` file is usually the file you want to import.
  
We can import that shapefile into R as an `sf` object using the `st_read` function.

* Typically, we want to provide the `shp` file to the `dsn` argument (data source name) of `st_read`.

In this case, I have downloaded the `cb_2018_us_state_20m.zip` containing a shapefile of the U.S. states. I have unzipped this file into the `cb_2018_us_state_20m` folder in the `data` folder in my current working directory.

* The current working directory is the location to which R currently reads or saves files.
* The can learn what your current working directory is by running `getwd()` in the Console.
* You can change your current working directory by running `setwd("path")` in the Console, where `path` is the directory path you want to set as your working directory.
  
In our case, we want to read the file `cb_2018_us_state_20m.shp` in the `cb_2018_us_state_20m` folder in the `data` folder of our current working directory. We can run the following command to import the desired shapefile.

* "`./`" indicates the current directory.
* The `dsn` argument "`./data/cb_2018_us_state_20m/cb_2018_us_state_20m.shp` tells R to look for the `cb_state_20m.shp` file in the file path `./data/cb_state_2018_us_state_20m`.

For simplicity, I have uploaded `cb_2018_us_state_20m.zip` to my GitHub repository. Prior to importing the the `.shp` file, running the `download` and `unzip` commands should download and unzip the necessary folder into your working directory so that you can run the same code to import the shapefile.

```{r}
# download.file("https://github.com/jfrench/DataWrangleViz/raw/master/data/ca-county-boundaries.zip", destfile = "./ca-county-boundaries.zip", method = "auto")
# unzip("cb_2018_us_state_20m.zip", exdir = "./data/cb_2018_us_state_20m")
us_sf <- st_read(dsn = "./data/cb_2018_us_state_20m/cb_2018_us_state_20m.shp")
```

The `st_read` provides helpful information automatically when run. In this case, we learn:

* We imported an `sf` object with 52 features (observations).
* The imported `sf` object has 9 attributes (columns).
* The features appear to be the simple feature geometry `MULTIPOLGYON`.
* The dimension is `XY`, so we are working with 2-dimensional data.
* The bounding box tells us the largest and smallest y-coordinates.
* The CRS is NAD83.

We use the `class` function to see the classes of `us_sf`.
```{r}
class(us_sf)
```

`us_sf` is a `data.frame` that has been extended to the `sf` class.

Let's use the `str` function to learn more about the structure of `us_sf`.
```{r}
str(us_sf)
```

`us_sf` has 52 rows and 10 columns. The (most) useful columns are:

  * `STUSPS`: the state abbreviation
  * `NAME`: the state name
  * `ALAND`: the land area of each state
  * `AWATER`: the water area of each state
  
The `geometry` column is the simple feature geometry list column and contains the geometry object for each observation.

# Wrangling `sf` objects (Video: [YouTube](https://youtu.be/bRwEzIAVcYw), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=e7b96e08-7f6b-47d8-8e0b-af320154267e))

An `sf` object is a type of data frame, similar to how a tibble is a type of data frame. Both classes extend the `data.frame` class.

* This means that you can work with `sf` objects similarly to how you would work with a `data.frame`, though the default behaviors may be different.
* We can use the **dplyr** package or similar tools to manipulate an `sf` object.
  
We can select columns of the `us_sf` `sf` object in the following ways:

```{r}
us_sf$STUSPS
us_sf[,5]
us_sf[,"STUSPS"]
us_sf["STUSPS"]
```
Note that the `$` operator extracts the column from `us_sf` (returning a `character` vector), while the other choices *subset* `us_sf` and remain `sf` objects (i.e., the geometry list column is retained).

We can select rows in a similar fashion.

```{r}
us_sf[2:3,]
us_sf[us_sf$STUSPS == "CO",]
us_sf[startsWith(us_sf$STUSPS, "C"),]
us_sf %>% filter(startsWith(us_sf$STUSPS, "C"))
```

Note that `startsWith` is a **base** R function that finds the elements of a `character` vector that start with a certain set of characters while `start_with` is a **dplyr** function that is used to select columns of a data frame based on a pattern.

A really neat feature of `sf` objects is that can use a spatial object to select rows. Let's extract the  "Colorado" row from `us_sf`.

```{r}
co <- us_sf[us_sf$NAME == "Colorado",]
class(co)
```

If we pass the `co` object as the row argument inside the square brackets, `[]`, then the rows of `us_sf` with geometry objects that intersect the `co` geometry object will be returned

```{r}
(co_intersects <- us_sf[co,])
plot(co_intersects["NAME"])
plot(st_geometry(co_intersects))
```
```{r}
caco <- us_sf[us_sf$NAME == "Colorado" | 
                us_sf$NAME == "California",]
(caco_intersects <- us_sf[caco,])
plot(st_geometry(caco_intersects))
```

The **base** `merge` and **dplyr** `*_join` functions can be used to merge a data frame with an `sf` object.

  * The `sf` object must be the first argument of these functions.
  
Let's access a COVID-19 related data frame available in the **bayesutils** package, which can be installed from GitHub. 

* We install the package from GitHub using the `remote::install_github` function.
  * Make sure to install the **remotes** package [@R-remotes] if you don't have it.
* We then load the `covid_20210307` data set from the **bayesutils** package.

```{r}
# install.packages("remotes")
# remotes::install_github("jfrench/bayesutils")
data("covid_20210307", package = "bayesutils")
```

The `state_abb` column of `covid_20210307` has the state abbreviations and matches the `STUSPS` column of `us_sf`.

We use the **base** `merge` function to unite these two objects into a new object, `covid_us`.

```{r}
covid_us <- merge(us_sf, covid_20210307,
                  by.x = "STUSPS", by.y = "state_abb")
head(covid_us)
```

Alternatively, we could have used a **dplyr** `*_join` function. We'll use `full_join`. Note the special syntax in the `by` argument to address the fact that we want to merge the rows based on different columns in the data frames.

```{r}
covid_us2 <- full_join(us_sf, covid_20210307, by = c("STUSPS" = "state_abb"))
head(covid_us2)
```

If a new row is added to the `sf` object without a corresponding geometry, then an empty geometry object is added for that row.

## Plotting simple features

The power of the `sf` package is the ability to easily create plots of spatial data.

### Plotting `sfc` objects (Video: [YouTube](https://youtu.be/OnZniYBjfRU), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=0e60d06d-d3b3-4079-9b82-af32015426b1))

To simply plot the geometry list column of an `sf` object, you can use:

* `st_geometry` to extract the list column of simple feature geometries from the `sf` object (this will be an object of class `sfc`).
* `plot` to plot the `sfc` object.

```{r}
plot(st_geometry(us_sf))
```

Alternatively, you can directly extract the `sfc` component of the `sf` object using `$`, then plot the `sfc` object.

```{r}
plot(us_sf$geometry)
```

You can easily change aspects of the plotted `sfc` object (or an `sf`) object using the standard arguments:

* `axes` can be set to `TRUE` to show the axes
* `xlab` and `ylab` will change the axis labels
* `xlim` and `ylim` can be used to constrain the plotting regions.
  * Note that "W" longitude values are indicated using negative numbers, while "E" longitude values are positive numbers.
  * Note that "N" latitude values in the northern hemisphere are positive numbers. "S" latitude values in the southern hemisphere are negative numbers.

Each geometry has specific arguments that the user can change (consider looking at the documentation for `?sf::plot.sf` for details). In this case, we can change the fill color of the `MULTIPOLYGON` objects using the `color` argument and the border using the border argument.

```{r}
plot(us_sf$geometry, axes = TRUE,
     xlab = "longitude", ylab = "latitude",
     col = "grey", border = "blue",
     xlim = c(-125, -75),
     ylim = c(22, 50))
```

You can change the colors of the individual geometry objects with a little creativity. Let's color all the "C states" (California, Colorado, Connecticut) a little differently than the other states.

```{r}
# determine indices of C states
c_states <- startsWith(us_sf$STUSPS, "C")
# create a character vector of replicated "grey"
# values matching the number of rows in us_sf
mycol <- rep("grey", nrow(us_sf))
#change "grey" to "orange" for the c_states indices
mycol[c_states] <- "orange"
# create a character vector of replicated "yellow"
# values matching the number of rows in us_sf
myborder <- rep("yellow", nrow(us_sf))
#change "yellow" to "blue" for the c_states indices
myborder[c_states] <- "blue"
# create the customized plot
plot(st_geometry(us_sf), col = mycol, border = myborder, xlim = c(-125, -75),
     ylim = c(22, 50))
```

### Customize color for subset of `sf` object (Video: [YouTube](https://youtu.be/f_m-5sux8oM), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=af026e11-6be9-4819-8700-af320165f3c9))

Let's look at a different approach to coloring a subset of the `sf` object differently from the rest of the `sf` object. 

```{r}
# draw all geometries
plot(st_geometry(us_sf), col = "grey", border = "yellow",
     xlim = c(-125, -75), ylim = c(22, 50))
# plot subset of all geometries with different colors
# add = TRUE overlays new drawing on current graphic
plot(st_geometry(us_sf)[c_states],
     col = "orange", border = "blue",
     add = TRUE)
```


### Plotting attributes of an `sf` object (Video: [YouTube](https://youtu.be/TV8ATPGodOg), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=275df07a-d4ac-473b-81ba-af3201542724))

By default, use you can use the `plot` function to plot all the attributes of an `sf` object. In general, this isn't very useful.

```{r}
plot(covid_us)
```

More likely, you will want to plot a single attribute (variable), which can be done by subsetting that variable in the `sf` object and plotting the subsetted object.

Let's plot the land area of the states, excluding Alaska and Hawaii. First, we exclude the Alaska and Hawaii rows (and save the filtered object).

```{r}
covid_us <- covid_us %>% filter(!is.element(STUSPS, c("AK", "HI")))
plot(covid_us["ALAND"])
```

Land area is directly related to the area of the state. It would be interesting to visualize the states that have the greatest percentage of water. Let's create a new variable in `covid_us` that computes the percentage of the state that is water. We'll then plot this variable (excluding Alaska and Hawaii)

```{r}
covid_us <- covid_us %>% mutate(prop_water = AWATER/(AWATER + ALAND))
plot(covid_us["prop_water"])
```

Not surprisingly, coastal states and states on the Great Lakes have the highest percentage of water.

We can use the following commands to identify the states with the 6 largest proportions of water.
```{r}
covid_us %>% slice_max(prop_water, n = 6) %>% select(NAME, prop_water)
```

We can change the number of breaks in our color bar via the `nbreaks`.

We can change the colors in our color bar by specifying the `pal` argument.

* The `pal` argument takes a function that, when given `n`, the number of desired colors, returns a vector of colors.
* The `hcl.colors`, `rainbow`, `heat.colors`, `terrain.colors`, `topo.colors`, and `cm.colors` are all color palettes in base R that can be used to change the colors palettes.
* The `hcl.colors` palette is particularly good, as it includes color palettes `viridis` and `cividis` (corrected viridis) that are particularly well-suited to displaying colors that are color-blind and can be understood when printed in black and white.
* The **colorspace** package also has a wide variety of palettes you might consider.

However, the `hcl.colors` function has a `palette` argument to specify the desired palette. To use an `hcl.color` palette when plotting an `sf` object, we need to build a custom palette that only requires the number of colors desired. We create those function for the viridis and cividis palettes below. We then see that it produces the desired results.

```{r}
viridis_pal <- function(n) hcl.colors(n, palette = "viridis")
cividis_pal <- function(n) hcl.colors(n, palette = "cividis")
# dirty approach to see colors in the palette
image(matrix(0:4), col = viridis_pal(5))
image(matrix(0:4), col = cividis_pal(5))
```


Let's do some analysis of the actual COVID-19 data. Let's create a new column for `death_rate_100k`, which is the number of confirmed and probable COVID-19 deaths per 100,000 persons. Let's display the death rate using the viridis palette and then the cividis palette.

```{r}
covid_us <- covid_us %>% mutate(death_rate_100k = deaths/population*100000)
plot(covid_us["death_rate_100k"],
     nbreaks = 5,
     pal = viridis_pal)
plot(covid_us["death_rate_100k"],
     nbreaks = 5,
     pal = cividis_pal)
```

### Plotting `sf` objects with **ggplot2** (Video: [YouTube](https://youtu.be/M7khlIRlr4s), [Panopto](https://ucdenver.hosted.panopto.com/Panopto/Pages/Viewer.aspx?id=94f2e1a9-19aa-4477-aeb5-af32015426f4))

The **ggplot2** package includes a geometry for `sf` objects, `geom_sf`, the is typically adequate for using **ggplot2** to produce graphics for `sf` objects.

* The `sf` object is passed as the `data` argument to the `ggplot` function.
* `geom_sf` is the geometry of the `data`
* The `XY` coordinates of the `sf` object are automatically mapped to `x` and `y` aesthetics.
* The `color`, `linetype`, `fill`, etc., of the geometry objects can be changed by mapping an attribute of the `sf` object to the appropriate aesthetic.

If we only want to plot the geometry list column (i.e., the geometry objects) of each observation, then we don't have to specify any aesthetics.

```{r}
ggplot(covid_us) + geom_sf()
```

We will create a choropleth map of the `covid_us` data.
  * A choropleth map is a map of regions colored to indicate the level of some variable associated with the regions.

```{r}
ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k))
```

Changing the color palette for our `fill` aesthetic to a viridis palette.

```{r}
ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  colorspace::scale_fill_continuous_sequential(palette = "viridis")
```

Change the color palette to viridis using **ggplot2**'s built-in function.

```{r}
ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  scale_fill_viridis_c(option = "viridis")
```

Reverse the order of the colors.

```{r}
ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  scale_fill_viridis_c(option = "viridis", direction = -1)
```

Use the cividis palette using the `scale_fill_viridis` function from the **viridis** package [@R-viridis].

```{r}
ggplot(covid_us) + geom_sf(aes(fill = death_rate_100k)) +
  viridis::scale_fill_viridis(option = "E")
```

## Additional resources

**sf** help and tutorials from the package authors [https://r-spatial.github.io/sf/]

*Geocomputation with R* by Robin Lovelace [https://bookdown.org/robinlovelace/geocompr/]. This book covers tons of aspect of spatial data (not just from an R perspective, but general theory and concepts). This will help you to learn a lot more about representing and working with spatial data in general, not just working with it in R.

*Spatial Data Science with applications in R* by Edzer Pebesma and Roger Bivand [https://keen-swartz-3146c4.netlify.app/]. Also covers much much about representing spatial data than you probably ever thought possible! The authors are the main creators of the **sf** package.

## Color palettes available with **base R**

```{r, echo = FALSE}
require("graphics")

par(mfrow = c(1, 1))

## color swatches for RGB/HSV palettes
demo.pal <-
  function(n, border = if (n < 32) "light gray" else NA,
           main = paste("color palettes;  n=", n),
           ch.col = c("rainbow(n, start=.7, end=.1)", 
                      "heat.colors(n)",
                      "terrain.colors(n)", "topo.colors(n)",
                      "cm.colors(n)")) {
    nt <- length(ch.col)
    i <- 1:n; j <- n / nt; d <- j/6; dy <- 2*d
    plot(i, i+d, type = "n", yaxt = "n", ylab = "", main = main)
    for (k in 1:nt) {
        rect(i-.5, (k-1)*j+ dy, i+.4, k*j,
             col = eval(str2lang(ch.col[k])), border = border)
        text(2*j,  k * j + dy/4, ch.col[k])
    }
}
demo.pal(16)

## color swatches for HCL palettes
hcl.swatch <- function(type = NULL, n = 5, nrow = 4,
  border = if (n < 15) "black" else NA) {
    palette <- hcl.pals(type)
    cols <- sapply(palette, hcl.colors, n = n)
    ncol <- ncol(cols)
    nswatch <- min(ncol, nrow)

    par(mar = rep(0.1, 4),
        mfrow = c(1, min(5, ceiling(ncol/nrow))),
        pin = c(1, 0.5 * nswatch),
        cex = 0.7)

    while (length(palette)) {
        subset <- 1:min(nrow, ncol(cols))
        plot.new()
        plot.window(c(0, n), c(0, nrow + 1))
        text(0, rev(subset) + 0.1, palette[subset], adj = c(0, 0))
        y <- rep(subset, each = n)
        rect(rep(0:(n-1), n), rev(y), rep(1:n, n), rev(y) - 0.5,
             col = cols[, subset], border = border)
        palette <- palette[-subset]
        cols <- cols[, -subset, drop = FALSE]
    }

    par(mfrow = c(1, 1), mar = c(5.1, 4.1, 4.1, 2.1), cex = 1)
}

hcl.swatch("qualitative")
hcl.swatch("sequential")
hcl.swatch("diverging")
hcl.swatch("divergingx")
```

## References
