---
layout: post
title: "Simple features for R, part 2"
date: "Jul 18, 2016"
comments: true
author: Edzer Pebesma
categories: r
---
TOC
[DOWNLOADHERE]
# What happened so far?
* in an earlier [blog post](http://r-spatial.org/r/2016/02/15/simple-features-for-r.html) I introduced the idea of having simple features mapped directly into simple R objects
* an R Consortium [ISC proposal](https://github.com/edzer/sfr/blob/master/PROPOSAL.md) to implement this got [granted](https://www.r-consortium.org/news/announcement/2016/03/r-consortium-funds-technical-initiatives-community-events-and-training)
* during [UseR! 2016](https://user2016.sched.org/event/7BRR/spatial-data-in-r-simple-features-and-future-perspectives) I presented this proposal ([slides](http://pebesma.staff.ifgi.de/pebesma_sfr.pdf)), which we followed up with an open discussion on future directions
* first steps to implement this in the [sf](https://github.com/edzer/sfr/) package have finished, and are described below
This blog post describes current progress.
# Install & test
You can install package `sf` directly from github:
```{r eval=FALSE}
library(devtools) # maybe install first?
install_github("edzer/sfr", ref = "16e205f54976bee75c72ac1b54f117868b6fafbc")
```
if you want to try out `read.sf`, which reads through GDAL 2.0,
you also need my fork of the R package [rgdal2](https://github.com/thk686/rgdal2), installed by
```{r eval=FALSE}
install_github("edzer/rgdal2")
```
this, obviously, requires that GDAL 2.0 or later is installed, along with development files.
After installing, a vignette contains some basic operations, and is shown by
```{r eval=FALSE}
library(sf)
vignette("basic")
```
# How does it work?
Basic design ideas and constraints have been written in [this
document](https://github.com/edzer/sfr/blob/master/DESIGN.md).
Simple features are one of the following [17
types](https://en.wikipedia.org/wiki/Well-known_text): Point,
LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon,
GeometryCollection, CircularString, CompoundCurve, CurvePolygon,
MultiCurve, MultiSurface, Curve, Surface, PolyhedralSurface, TIN,
and Triangle. Each type can have 2D points (XY), 3D points (XYZ),
2D points with measure (XYM) and 3D points with measure (XYZM). This
leads to 17 x 4 = 68 combinations.
The first seven of these are most common, and *have been
implemented*, allowing for XY, XYZ, XYM and XYZM geometries.
## Simple feature instances: `sfi`
A single simple feature is created by calling the constructor
function, along with a modifier in case a three-dimensional geometry
has measure "M" as its third dimension:
```{r}
library(sf)
POINT(c(2,3))
POINT(c(2,3,4))
POINT(c(2,3,4), "M")
POINT(c(2,3,4,5))
```
what is printed is a [well kown
text](https://en.wikipedia.org/wiki/Well-known_text) representation
of the object; the data itself is however stored as a regular
R vector or matrix:
```{r}
str(POINT(c(2,3,4), "M"))
str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2))))
```
By using the two simple rules that
1. sets of points are kept in a `matrix`
1. other sets are kept in a `list`
we end up with the following structures, with increasing complexity:
### Sets of points (matrix):
```{r}
str(LINESTRING(rbind(c(2,2), c(3,3), c(3,2))))
str(MULTIPOINT(rbind(c(2,2), c(3,3), c(3,2))))
```
### Sets of sets of points:
```{r}
str(MULTILINESTRING(list(rbind(c(2,2), c(3,3), c(3,2)), rbind(c(2,1),c(0,0)))))
outer = matrix(c(0,0,10,0,10,10,0,10,0,0),ncol=2, byrow=TRUE)
hole1 = matrix(c(1,1,1,2,2,2,2,1,1,1),ncol=2, byrow=TRUE)
hole2 = matrix(c(5,5,5,6,6,6,6,5,5,5),ncol=2, byrow=TRUE)
str(POLYGON(list(outer, hole1, hole2)))
```
### Sets of sets of sets of points:
```{r}
pol1 = list(outer, hole1, hole2)
pol2 = list(outer + 12, hole1 + 12)
pol3 = list(outer + 24)
mp = MULTIPOLYGON(list(pol1,pol2,pol3))
str(mp)
```
### Sets of sets of sets of sets of points:
```{r}
str(GEOMETRYCOLLECTION(list(MULTIPOLYGON(list(pol1,pol2,pol3)), POINT(c(2,3)))))
```
where this is of course a worst case: `GEOMETRYCOLLECTION` objects
with simpler elements have less nesting.
### Methods for `sfi`
The following methods have been implemented for `sfi` objects:
```{r}
methods(class = "sfi")
```
### Alternatives to this implementation
1. Package [rgdal2](https://github.com/thk686/rgdal2) reads point sets not in a matrix, but into a list with numeric vectors named `x` and `y`. This is closer to the GDAL (OGR) data model, and would allow for easier disambiguation of the third dimension (`m` or `z`) in case of three-dimensional points. It is more difficult to select a single point, and requires validation of vector lenghts being identical. I'm inclined to keep using `matrix` for point sets.
1. Currently, `POINT Z` is of class `c("POINT Z", "sfi")`. An alternative would be to have it derive from `POINT`, i.e. give it class `c("POINT Z", "POINT", "sfi")`. This would make it easier to write methods for XYZ, XYM and XYZM geometries. This may be worth trying out.
## Simple feature list columns: `sfc`
Collections of simple features can be added together into a list. If
all elements of this list
* are of identical type (have identical class), or are a mix of `X` and `MULTIX` (with `X` being one of `POINT`, `LINESTRING` or `POLYGON`)
* have an identical coordinate reference system
then they can be combined in a `sfc` object. This object
* converts, if needed, `X` into `MULTIX` (this is also what PostGIS does),
* registers the coordinate reference system in attributes `epsg` and `proj4string`,
* has the bounding box in attribute `bbox`, and updates it after subsetting
```{r}
ls1 = LINESTRING(rbind(c(2,2), c(3,3), c(3,2)))
ls2 = LINESTRING(rbind(c(5,5), c(4,1), c(1,2)))
sfc = sfc(list(ls1, ls2), epsg = 4326)
attributes(sfc)
attributes(sfc[1])
```
The following methods have been implemented for `sfc` simple feature list columns:
```{r}
methods(class = "sfc")
```
## data.frames with simple features: `sf`
Typical spatial data contain attribute values and attribute
geometries. When combined in a table, they can be converted into
`sf` objects, e.g. by
```{r}
roads = data.frame(widths = c(5, 4.5))
roads$geom = sfc
roads.sf = sf(roads)
roads.sf
summary(roads.sf)
attributes(roads.sf)
```
here, attribute `relation_to_geometry` allows documenting how
attributes relate to the geometry: are they constant (field),
aggregated over the geometry (lattice), or do they identify
individual entities (buildings, parcels etc.)?
The following methods have been implemented for `sfc` simple feature list columns:
```{r}
methods(class = "sf")
```
## Coercion to and from `sp`
Points, MultiPoints, Lines, MultiLines, Polygons and MultiPolygons
can be converted between `sf` and [sp](https://cran.r-project.org/web/packages/sp/index.html),
both ways. A round trip is demonstrated by:
```{r}
df = data.frame(a=1)
df$geom = sfc(list(mp))
sf = sf(df)
library(methods)
a = as(sf, "Spatial")
class(a)
b = as.sf(a)
all.equal(sf, b) # round-trip sf-sp-sf
a2 = as(a, "SpatialPolygonsDataFrame")
all.equal(a, a2) # round-trip sp-sf-sp
```
## Reading through GDAL
Function `read.sf` works, if `rgdal2` is installed (see above), and reads
simple features through GDAL:
```{r}
(s = read.sf(system.file("shapes/", package="maptools"), "sids"))[1:5,]
summary(s)
```
This also shows the abbreviation of long geometries when printed or summarized,
provided by the `format` methods.
The following works for me, with PostGIS installed and data loaded:
```{r}
(s = read.sf("PG:dbname=postgis", "meuse2"))[1:5,]
summary(s)
```
# Still to do/to be decided
The following issues need to be decided upon:
* reproject sf objects through `rgdal2`? support well-known-text for CRS? or use PROJ.4 directly?
* when subsetting attributes from an `sf` objects, make geometry sticky (like sp does), or drop geometry and return `data.frame` (data.frame behaviour)?
The following things still need to be done:
* write simple features through GDAL (using rgdal2)
* using gdal geometry functions in `rgdal2`
* extend rgdal2 to also read `XYZ`, `XYM`, and `XYZM` geometries - my feeling is that this will be easier than modifying `rgdal`
* reprojection of sf objects
* link to GEOS, using GEOS functions: GDAL with GEOS enabled (and `rgdal2`) has some of this, but not for instance `rgeos::gRelate`
* develop better and more complete test cases; also check the OGC [test suite](http://cite.opengeospatial.org/teamengine/)
* improve documentation, add tutorial (vignettes, paper)
* add plot functions (base, grid)
* explore direct WKB - sf conversion, without GDAL
* explore how meaningfulness of operations can be verified when for attributes their `relation_to_geometry` has been specified
Please let me know if you have any comments, suggestions or questions!