--- layout: post title: "Handling and Analyzing Spatial, Spatiotemporal and Movement Data" date: "27 June, 2016" comments: true author: Edzer Pebesma output: html_document: toc: true theme: united --- # Preamble The source R-Markdown (.Rmd) file for this html document is found [here](https://raw.githubusercontent.com/edzer/UseR2016/master/tutorial.Rmd); here is the [zip file](http://pebesma.staff.ifgi.de/tutorial.zip) with included figures. A list of package is found at the end of this page. The packages this script attaches can be installed by running ```{r eval=FALSE} pkg = c("sp", "xts", "zoo", "spacetime", "trajectories", "maptools", "maps", "mapview", "leaflet", "rglwidget", "rgl", "RColorBrewer", "ggmap", "ggplot2", "dplyr", "rmarkdown", "units") install.packages(pkg) ``` To re-create (reproduce) this html, using `knit html` in Rstudio, have all packages installed, and then: * unzip the zip file, record the directory where they are unzipped * in RStudio, set the working directory here by `setwd("/path/to/this_dir")` * in RStudio, open the `tutorial.Rmd` file and click on the button `knit html` # Introduction Two quotes from [Cobb and Moore, 1997](http://www.jstor.org/stable/2975286), > _Data are not just numbers, they are numbers with a context_ and > _in data analysis, context provides meaning_ illustrate that data analysis without context is essentially meaningless. What is context? We can think of * _who_ collected or generated the data, * _how_ were the data collected (e.g., by which sampling strategy?), * for which purpose were the data collected, or generated (_why_) * _when_ and _where_ were the data collected * _what_ exactly was collected, what do the values refer to, what are measurement units, and what does the value `1` refer to? We can think of the _who, how_ and _why_ question referring to pragmatics of data collection, where _when, where_ and _what_ refer to data semantics. _Reference systems_ relate numbers (or words, or symbols) to the real world; much of this tutorial is about reference systems for space, time and attributes. At a higher level, reference systems can describe _phenomena_ in terms of how attributes and identity relate to space and time; in particular whether they are continuous or discrete matters for what we can meaningfully do with data obtained from these phenomena ([Scheider et al., 2016](http://ifgi.uni-muenster.de/~epebe_01/generativealgebra.pdf)). ## Time in R Base R has some infrastructure to annotate measurement units, in particular for time and date information: ```{r} now = Sys.time() + c(0, 3600) today = Sys.Date() + 0:1 ``` which represent numeric values pointing to the number of seconds or days elapsed since Jan 1, 1970, 00:00 UTC: ```{r} as.numeric(now) as.numeric(today) ``` The `class` attribute of these objects ```{r} class(now) class(today) ``` make it behave meaningfully like time: ```{r error=TRUE} now * now now + now (dt = now[2] - now[1]) ``` ## Time differences This last quantity, `dt` has a class ```{r} class(dt) ``` but also a `units` attribute, which can be retrieved by the `units` method: ```{r} units(dt) ``` but can also be set to other values, e.g. ```{r} units(dt) = "days" dt units(dt) = "secs" dt ``` ## Measurement units Beyond time and time differences, more general support for [SI units](https://en.wikipedia.org/wiki/International_System_of_Units) and derived units is provided by CRAN package [units](http://cran.r-project.org/package=units), which builds upon CRAN package [udunits2](http://cran.r-project.org/package=udunits2) and the external [udunits](https://www.unidata.ucar.edu/software/udunits/) library, from [UNIDATA](https://www.unidata.ucar.edu/). It checks compatibility of units, does unit conversion on the fly ```{r} units_file = system.file("share/udunits2.xml", package="udunits2") if (units_file == "") stop("install package udunits2 first") Sys.setenv(UDUNITS2_XML_PATH = units_file) # cope with bug in udunits2 0.8.1 on Win & Mac library(units) m = with(ud_units, m) s = with(ud_units, s) km = with(ud_units, km) h = with(ud_units, h) x = 1:3 * m/s xkmh = x units(xkmh) = km/h # explicit conversion xkmh (y = 1:3 * km/h) # something different x + y # implicit conversions y + x c(x, y) ``` it creates derived units where appropriate ```{r} x * y x / y log(x) ``` and also gives meaningful warnings when units are not compatible: ```{r error=TRUE} x + x * y ``` as udunits parses units, they can become arbitrarily complex: ```{r} u1 = make_unit(paste0(rep("m", 100), collapse = "*")) u2 = m^100 1:3 * u2 + 1:3 * u1 ``` Summarizing, measurement units * convert units explicitly, or implicitly/on the fly * catch operations that are algebraically correct but _physically meaningless_ * help carry out [dimensional analysis](https://en.wikipedia.org/wiki/Dimensional_analysis) and provide, in that sense, part of the context of data. ## Why don't we treat space and time as special cases of SI units? Measures of physical quantities such as length, mass, duration have a natural, absolute zero value. When measuring absolute time (when) and location (where), we need a reference. For time, we have Coordinated Universal Time ([UTC](https://en.wikipedia.org/wiki/Coordinated_Universal_Time)); different time zones and DST rules are defined with respect to UTC. For space, different [geodetic datums](https://en.wikipedia.org/wiki/Geodetic_datum) exist; the best known being the [WGS84](https://en.wikipedia.org/wiki/World_Geodetic_System) ellipsoid (``world geodetic system-1984'') WGS84 is a _global_ reference system, and has larger errors (deviations from the geoid, the `mean sea level' surface) than ellipsoids that are fitted locally. For that reason, local ellipsoids might better fit the Earth locally, and are prefered for particular surveying projects. Coordinate _transformations_ move from one geodetic datum (ellispoid) to another; these transformation may be non-unique, non-invertible, and approximate. Projecting longitude/latitude degrees to a flat representation (such as UTM, or Web Mercator) is called coodinate _conversion_; this process is usually accurate and invertible. ## When are space and time _not_ relevant? For a doctor, the identity and age of the patient are more relevant than the location of the patient and date of the examination. An MRI scan of a patient's brain is understood _with reference to_ the skull and other reference points, not with reference to the geographical location f the MRI scanner. Many data sets come without information about space and time; this usually indicates that these aspects were not considered relevant. For the `co2` dataset, ```{r} plot(co2) ``` the location of measurements ([Mauna Loa](https://en.wikipedia.org/wiki/Mauna_Loa)) can be derived from the URL mentioned in `?co2`, or even better, from the information the URL points to. For the `mtcars` dataset, `?mtcars` reveals when the car models described were current (1973-4). # Time series data We have seen so far mostly time information, but not information that is associated with time, or _time series data_. ## Classes for time series data The `co2` data set is an example of this, as a time series: ```{r} class(co2) summary(co2) ``` The `ts` class, part of base R, accomodates for regular time series, and does not use `POSIXt` for time: ```{r} attributes(co2) ``` Many of the methods for `ts` do not deal well with semi-regular time series, which are regular but contain missing values. Approaches to deal with irregular time series are found in packages [its](http://cran.r-project.org/package=its), [zoo](http://cran.r-project.org/package=zoo) and [xts](http://cran.r-project.org/package=xts). Package xts builds on `POSIXt` as the time index, and extends `zoo`. The combination has nice features for * selection, supporting [ISO 8601](https://en.wikipedia.org/wiki/ISO_8601) strings for time periods * aggregation, using functions that cut time in particular intervals What follows is an example where the string "2000:2007" selects all data for these three years, and the `aggregate` method (from `zoo`) uses the function `as.yearmon` to cut a time period into months, convert these into years, and to compute yearly averages: ```{r} data(air, package = "spacetime") library(zoo) library(xts) pm10 = xts(t(air), dates) as.year <- function(x) as.numeric(floor(as.yearmon(x))) plot(aggregate(pm10["2000::2007", 1], as.year, mean, na.rm = TRUE), ylab = "PM10 (ppm)") title("Yearly averaged PM10 values, 2000-2007") ``` ## Extensive or intensive properties? Physical properties can be [extensive or intensive](https://en.wikipedia.org/wiki/Intensive_and_extensive_properties), depending on whether the property changes when the size (extent) of the system changes: * exptensive properties are for instance mass and size: if we cut a rock into pieces, mass and size of a piece change * intensive properties are e.g. temperature or density; these do not necessarily change with size In the aggregation example above, we computed the yearly _mean_ values, but when examining `?aggregate.zoo`, we'd find out that the default aggregation function is `sum`, so ```{r} a = aggregate(pm10["2000::2007", 1], as.year, na.rm = TRUE) mean(a) ``` computes yearly _total_ pm10, from monthly averages, which is not physically meaningful for an intensive property like pm10. ## Events, continuous series, aggregations? Time series data, as represented by the packages mentioned above, do not distinguish events from continuous processes: ``values'' of something are simply associated with a time stamp. Yet, for meaningful analysis, their difference could not be larger: * for events, we can compute counts, or densities over time, but we cannot interpolate; for instance, the temporally interpolated value of earthquake magnitudes for a moment with no earthquake does not make sense * for continuous processes, counts or densities only reveal properties of the sampling procedure, and not of the observed phenomenon (which could have been observed any moment) Many time series data (`co2` and `pm10` above being two examples) concern temporally aggregated values. Yet, their value is not explicitly registered with the aggregation period, but rather with the starting time of this interval. If, however, instantaneous measurements of the same variables would be available (co2 or pm10 measured at the start of the month), they would be stored identically. If time periods would be registered explicitly, we would in addition have to specify how the associated (co2, pm10) value relates to this interval, possibilities are: * the value is constant throughout this interval * the value was aggregated over this interval (in which case the aggregation _function_ needs to be specified) # Spatial data ## Classes for spatial data A large majority of spatial data constitute, roughly, of points, lines, polygons or grids. As described in [Bivand, Pebesma and Gomez-Rubio (2013)](http://asdar-book.org/), package [sp](http://cran.r-project.org/package=sp) provides basic infrastructure for this, and supports the following classes: ![](spatial.png) The geometry-only classes, `SpatialXxx`, derive from `Spatial` and share a bounding box and coordinate reference system (`proj4string`); all the `SpatialXxxDataFrame` classes extend this with a `data.frame` carrying attributes associated with the geometries. Important reasons why many think that using `sp` is a good idea include: * reinventing the wheel too often creates duplicate work, and calls for many-to-many conversions * through [rgdal](http://cran.r-project.org/package=rgdal), `sp` supports reading and writing to and from all 142 [raster](http://gdal.org/formats_list.html) and 84 [vector](http://gdal.org/ogr_formats.html) formats supported by [GDAL](http://www.gdal.org/) * through [rgeos](http://cran.r-project.org/package=rgeos), an interface to [GEOS](https://trac.osgeo.org/geos/), `sp` objects can be used for all [DE-9IM](https://en.wikipedia.org/wiki/DE-9IM) intersections, including topological predicates like _touches_, _overlaps_, _intersects_, _contains_, etc., and also create buffers, unite polygons etc. ## An example: why polygons are complex `Single' geometry entries for polygons and lines can consist of multiple polygons or lines, to accomodate real-word data. In the following example we see a state consisting of multiple polygons, one of them containing a hole: ```{r} data(air, package = "spacetime") library(sp) nds = DE_NUTS1["Niedersachsen",] library(ggmap) bgMap = get_map(as.vector(bbox(nds)), source = "google", zoom = 7) par(mar = rep(0,4)) merc = CRS("+init=epsg:3857") plot(spTransform(nds, merc), bgMap = bgMap, col = grey(.5, alpha = .5)) ``` We can examine the data interactively, thanks to the excellent [mapview](http://cran.r-project.org/package=mapview) package: ```{r} library(mapview) mapview(nds) ``` When properly plotting data, we need to know which hole polygon belongs to which enclosing polygon, and this makes it difficult (if not impossible) to encode polygons in simple tables such as propagated by the [tidy data](https://www.jstatsoft.org/article/view/v059i10) framework. The `fortify` method in `ggplot2` destroys this structure, and the [ggplot2 wiki](https://github.com/hadley/ggplot2/wiki/plotting-polygon-shapefiles) explains why `ggplot2` cannot properly plot polygons with holes. In the plot above, we can see that the border of `nds` does not correspond very well with that of [openstreetmap](http://www.openstreetmap.org/). This may be due to an outdated version of `DE_NUTS1` in the `spacetime` package. ### Exercise 1. visit [gadm](http://gadm.org/) 1. download the German administrative boundaries as a shapefile from gadm.org 1. unzip the files starting with `DEU_adm1`, and register in which directory they are 1. read them in with `library(rgdal); x = readOGR(".", "DEU_adm1")`, where you replace `.` with the right directory 1. plot Lower Saxony with mapview (`mapview(x[9,])`) 1. check whether the Dutch/German boundaries are better, now. ## Selecting features As we've seen above, we can use the `[` subset operator on `Spatial` objects to select geometries/features (rows), or particular attributes, similar to how this is done for `data.frame` objects. Further functionality is obtained when we use a `Spatial` object as selector, as in ```{r} par(mar = c(4,4,4,1)) plot(DE_NUTS1[nds,], col = "#FFDDDD", axes = TRUE) # all states touching Lower Saxony; plot(nds, col = grey(.5), add = TRUE) # .. which is grey. ``` this carries out an intersection, and returns intersecting geometries, for _any Spatial object type_. ## Aggregation We can compute yearly mean pm10 values by station, by ```{r} yearmeans = aggregate(pm10, as.year, mean, na.rm = TRUE) # merge mean matrix (rows: year; cols: station ) with station data: s = SpatialPointsDataFrame(stations, data.frame(t(yearmeans))) spplot(s, names.attr = index(yearmeans), sp.layout = list(DE_NUTS1, col = grey(.8)), as.table = TRUE, colorkey = TRUE) ``` Suppose that we want state mean pm10 concentrations by year; we can do this by _spatial aggregation_: ```{r} a = aggregate(s, DE_NUTS1, mean, na.rm = TRUE) spplot(a, names.attr = index(yearmeans), as.table = TRUE, main = "state mean pm10 of rural background stations") ``` Here, the `by` argument (2nd arg) of `aggregate` is a `Spatial` object; spatial intersection yields the aggregation predicate. ## Spatial statistics, further packages Methods for statistically analyzing spatial data are well described in the spatial statistics literature. Other R packages that deal with handling or analyzing spatial data are categorized and described in the CRAN [Task View on Spatial Data](http://cran.uni-muenster.de/web/views/Spatial.html). # Spatiotemporal data, movement data We've seen above how temporal data can be stored in `Spatial` objects -- by storing variables associated with different times as different attributes. This is not optimal, because * the data objects do not know how variables are associated with time * it is not always the case that spatial entities have identical times associated with them There is also a [CRAN Task View on SpatialTemporal Data](http://cran.uni-muenster.de/web/views/SpatioTemporal.html) that categorizes and describes R packages for handling and analyzing spatiotemporal data, including movement data. ## spacetime: Class structure Building on `xts` and `sp`, R package [spacetime](http://cran.r-project.org/package=spacetime) provides classes for spatiotemporal data that solve both issues. The class diagram is given here: ![](st.png) `STF` and `STS` objects cater spatial _panel data_ (or longitudinal data), where each spatial entity has an identical set of time referenced obvservations associated with it; examples are * employment data per year, per state * fixed monitoring stations, measuring air quality every hour * time sequences of satellite imagery Note that these classes _only_ require a space time lattice layout: * spatial properties can be anything (points, lines, polygons, grid cells), * time sequences may be irregular Package `spacetime` comes with routines to read data from space-wide tables (where columns reflect different spatial features), time-wide tables (where columns reflect different times), and long tables (where each space/time combination has a row entry); examples of all three are given in [Pebesma, 2012](https://www.jstatsoft.org/article/view/v051i07). What follows is an example reading from a long table. ## Reading from tables A long table with panel data is found in package [plm](http://cran.r-project.org/package=plm): ```{r} data("Produc", package = "plm") Produc[1:5,1:9] ``` Since the data contains no spatial information (states) or temporal information as `POSIXct`, we need to construct these -- states: ```{r} library(maps) states.m = map('state', plot=FALSE, fill=TRUE) IDs <- sapply(strsplit(states.m$names, ":"), function(x) x[1]) library(maptools) states = map2SpatialPolygons(states.m, IDs=IDs) ``` years: ```{r} yrs = 1970:1986 time = as.POSIXct(paste(yrs, "-01-01", sep=""), tz = "GMT") ``` When combining all this information, we do not need to reorder states because `states` and `Produc` order states alphabetically. We need to de-select District of Columbia, which is not present in `Produc`: ```{r} # deselect District of Columbia, polygon 8, which is not present in Produc: library(spacetime) Produc.st = STFDF(states[-8], time, Produc[order(Produc[2], Produc[1]),]) library(RColorBrewer) stplot(Produc.st[,,"unemp"], yrs, col.regions = brewer.pal(9, "YlOrRd"), cuts = 9) ``` ## Movement data As can be seen from the [CRAN Task View on SpatialTemporal Data](http://cran.uni-muenster.de/web/views/SpatioTemporal.html), there are quite a few R packages dealing with movement data, many of which originate in the ecology domain. The [trajectories](http://cran.r-project.org/package=trajectories) tries to take a more general approach, but uses many of the ideas e.g. from [adehabitatLT](http://cran.r-project.org/package=adehabitatLT). In particular, it acknowleges that, when we observe movement, we often end up having * a period of movement registration for a particular item, which has a begin and end time that may not correspond to the life time of the item (`Track`, burst, trip), * potentially multiple tracks collected on a single item, with time gaps inbetween during which the item was not tracked (a set of tracks: `Tracks`) * multiple items for which `Tracks` are collected (`TracksCollection`) Think of registering a person's movement with a mobile phone or GPS receiver: * a sequence of location registrations, e.g. once per minute, is collected * multiple sequences are obtained when a sequence gets interrupted, e.g. when the phone's battery is empty, or the registration is stopped for some reason (no GPS reception? privacy?) * the analysis may concern multiple persons For this reason, `trajectories` offers `Track`, `Tracks` and `TracksCollection` objects to organize such data: ![](trajectories.png) Examples of open trajectory data of various kinds are * The [Argo](http://www.argo.ucsd.edu/) buoys, * The [Envirocar](http://www.envirocar.org/) car tracks (locations + ODB-II in-car sensor data); REST api, uses GeoJSON * [Geolife](https://www.microsoft.com/en-us/research/project/geolife-building-social-networks-using-human-location-history/): 182 persons in China tracked for some months, some of whom kept activity (transportation mode) diaries (1.7 Gb when unzipped) * Google location history (open to owner only): gives fixes with errors, but also activity classification probabilities * [Citibike NYC](https://www.citibikenyc.com/) (only origing and destination fixes) * [Atlantic hurricane](http://weather.unisys.com/hurricane/atlantic/) tracks since 1851 Except for the Argo buoys, `trajectories` contains demo scripts for all these examples. ### Exercise 1. check out which demo scripts package `trajectories` has, using command `demo` (and look into its help page) 1. run some of these demos, 1. try to understand what is going on, and what the demo tries to do 1. for a particular data set, which question could you formulate that could be answered by analyzing these data? ## plotting methods ## trajectory methods Methods that `Track`, `Tracks` or `TracksCollection` share with the spatial family include * `[[`, `$`, `[` retrieve or replace attributes, or select `Track` or `Tracks` * `stplot` create space-time plot * `aggregate` spatially aggregate track properties (coercing fixes to points) * `bbox` retrieve spatial bounding box * `coordinates` retrieve coordinates of fixes * `coordnames` retrieve coordinate names of fixes * `over` intersect spatially (coercing tracks to `SpatialLines`) * `plot` simple plot methods * `proj4string` retrieve coordinate reference system * `spTransform` (re)project coordinates to new coordinate reference system, or unproject * `summary` print simple summary Methods that are unique for the `Track` family are * `compare` compares two `Track` objects: for the common time period, a `difftrack` object is created with all distances between them (approximating linearly when times of fixes don't match) * `dists` compares two `Tracks` (two sets of `Track` elements) using mean distance, or Frechet distance * `downsample` remove fixes from a `Track`, starting with the most densely sampled ones * `frechetDist` compute [Frechet distance](https://en.wikipedia.org/wiki/Fr%C3%A9chet_distance) between two `Track` objects * `stcube` plot space-time cube for `Track` objects, possibly with openstreetmap base * `generalize` resample track to lower freqency or minimal distance * `stbox` print the space-time bounding box ```{r} r = read.csv("http://pebesma.staff.ifgi.de/pm10/3_2a.txt") time = as.POSIXct(strptime(paste(r$Date, r$Time), "%Y/%m/%d %H:%M:%S")) require(sp) pts = SpatialPoints(r[c("Long", "Lat")], CRS("+init=epsg:4326")) # WGS84 long/lat library(spacetime) library(trajectories) tr = Track(STIDF(pts, time, r["pm10"])) stbox(tr) ``` Some basic plots: ```{r} plot(as(tr, "STIDF")) ``` gives a space-time plot, where space indicats the index of each spatial item; the plot indicates that there were a few short breaks during the data collection. Plotting the `Track`, as ```{r} plot(tr) ``` shows the spatial path followed, without anything else. With `stplot` ```{r} stplot(tr) ``` we get the plot of the `STIDF` object; it cuts time in 6 equal pieces, and plots those. ```{r} library(rgl) library(rglwidget) stcube(tr) rglwidget(elementId="plot3d") ``` shows the 3-D, interactive space-time track, which even looks nicer when you try ```{r eval=FALSE} stcube(tr, showMap = TRUE) ``` The `mapview` of the complete trajectory shows a wealth of points: ```{r} require(mapview) mapview(as(tr, "SpatialPointsDataFrame"), zcol = "pm10", legend = TRUE) ``` When we `generalize` this, we get a `SpatialLines` object, with (by default) averaged `pm10` values per line segment: ```{r} tr0 = generalize(tr, distance = 100) class(tr@sp) mapview(as(tr0, "Spatial"), zcol = "pm10", lwd = 5, legend = TRUE) ``` ## Multiple tracks The following little program will read in all the 16 PM10 tracks, and plot them with mapview (this last command only works if `sp` devel is installed from github): ```{r} names = c("3_1a.txt", "3_2a.txt", "3_2b.txt", "3_3a.txt", "3_3b.txt", "3_3c.txt", "3_4a.txt", "3_5a.txt", "4_1a.txt", "4_2a.txt", "4_3a.txt", "5_1a.txt", "5_3a.txt", "5_4a.txt", "6_1a.txt", "6_2a.txt") read.tr = function(f) { r = read.csv(paste0("http://pebesma.staff.ifgi.de/pm10/", f)) time = as.POSIXct(strptime(paste(r$Date, r$Time), "%Y/%m/%d %H:%M:%S")) pts = SpatialPoints(r[c("Long", "Lat")], CRS("+init=epsg:4326")) # WGS84 long/lat Track(STIDF(pts, time, r["pm10"])) } trs = Tracks(lapply(names, function(f) read.tr(f))) dim(trs) trs0 = generalize(trs, distance = 100) dim(trs0) mapview(as(trs0, "Spatial"), zcol = "pm10", lwd = 5, legend = TRUE) ``` ## Distances between trajectories Distances between pairs of tracks can be computed; this is often done as a first step before, or an ingredient of clustering tracks. The naive ```{r eval=FALSE} dists(trs0,trs0) ``` computes spatial distances in synchronous time; since the tracks in this data sets are all taken on different days by a single person, all distances will be `NA`. Frechet distances, obtained by ```{r eval=FALSE} d = dists(trs, trs, frechetDist) ``` ignores time but preserve ordering (direction); it takes a long time to compute. ## Aggregation; densities Aggregating trajectories _and their attributes_ is not trivial: * if we'd coerce them to points, differences in sampling density/frequency bias the results * if we'd resample them to regular frequency, attributes need to be interpolated * we can think of different weighting scheme, depending on distance covered or time spent * as shown above, `generalize` aggregates points to lines, and associates aggregated point attributes to line segments A few examples from a web site that aggregates google location history from "somebody" are here: ![](locationhistory.png) ![](location2.png) ![](location3.png) ## Smoothing: storms data In the following example, ```{r} data(storms) plot(storms, type = 'p') smth = function(x, y, xout,...) { predict(smooth.spline(as.numeric(x), y), as.numeric(xout)) } storms.smooth = approxTracksCollection(storms, FUN = smth, n = 200) plot(storms.smooth, add = TRUE, col = 'red') ``` ## Simulating random tracks Random tracks (in free space) can be generated using a bivariate autoregressive process, an example is given here, along with a smooth: ```{r} opar = par(mfrow = c(1,2)) x = rTrack(ar = .4) plot(x); title("rough") x.smooth = approxTrack(x, FUN = smth, n = 800) plot(x.smooth, add=T, col = 'red') x = rTrack(ar = .8) # more smooth plot(x); title("smooth") x.smooth = approxTrack(x, FUN = smth, n = 800) plot(x.smooth, add=T, col = 'red') par(opar) ``` # Real-world problems, work in progress ## At the engineering level * fitting GPS fixes to a road network (map matching); initial work found [here](https://github.com/edzer/fuzzyMM) * properly representing [road networks in R](https://github.com/edzer/spnetwork) * simulate movement through a transportation network, e.g. from origin-destination matrices (using [sumo](http://sumo.dlr.de/)?) * more support for non-point (i.e., lines, polygons, grids) movement data (e.g. oil spills, fires) * deal with out of memory data sets * develop proper spatial statistical inference methods for trajectory data (... answering which question?) * represent spatial data in an easier (`list` column in `data.frame`) and interoperable way, [simple features for R](http://github.com/edzer/sfr/) (come and see my Wednesday talk in the Spatial session) ## A practical theory for meaningful spatial statistics When are spatial aggregation and prediction [meaningful](http://www.sciencedirect.com/science/article/pii/S1364815213001977)? ![](intro.png) What do polygon values refer to? What do raster cell values refer to? ![](choro.png) ![](lu.jpg) * We think too much in data morphology: points, lines, polygons, grids, trajectories, ... * We need to think in underlying phenomena, as _functions composed_ of rererence systems * Basic reference systems: $S$ space, $T$ time, $Q$ quality, $D$ entity * Derived reference systems: $R$ for regions/point sets, $I$ for time intervals/instance sets * *Fields*: $S \times T \Rightarrow Q$ * *Lattices*: $R \times I \Rightarrow Q$ * *Events*: $D \Rightarrow S \times T \times Q$ * (Trajectories: $T \Rightarrow S$) * *Objects*: $D \Rightarrow T \Rightarrow S \times Q$ * Read more in the paper on [Modelling spatiotemporal information generation](http://pebesma.staff.ifgi.de/generativealgebra.pdf) # sessionInfo() ```{r} sessionInfo() ```