---
layout: post
title:  "Fitting variogram models in gstat"
date:   2016-02-14 11:00:00 +0100
comments: true
author: Edzer Pebesma
categories: r
---
DOWNLOADHERE

Fitting variogram functions with R package
[gstat](https://cran.r-project.org/package=gstat) has become more
flexible, and hopefully more user friendly. Up to now, after loading data
```{r}
library(sp)
demo(meuse, ask = FALSE, echo = FALSE) # load meuse data set
```
users were required to use a sequence like
```{r}
library(gstat)
v = variogram(log(zinc)~1, meuse)
v.fit = fit.variogram(v, vgm(1, "Sph", 900, 1))
v.fit
```
where `fit.variogram` fits variogram parameters of a spherical model
(`Sph`) to the sample variogram `v`. The values 1, 900 and 1 were
needed as initial values in the weighted non-linear fit (where only
the range parameter is non-linear).

This has changed in gstat version 1.2: now, `vgm` can take only a
variogram model, as in
```{r}
fit.variogram(v, vgm("Sph"))
```
or even a set of models, in which case the best fitting is returned, as in
```{r}
fit.variogram(v, vgm(c("Exp", "Sph")))
fit.variogram(v, vgm(c("Exp", "Mat", "Sph")))
```
where we still see that the sperical model is chosen. If we choose a different
sample variogram, where Matern is chosen, as in:
```{r}
v0 = variogram(zinc~1, meuse)
fit.variogram(v0, vgm(c("Exp", "Mat", "Sph")))
```
we see that the kappa value is 0.5, which is a default value that was not fit. We can
fit kappa by specifying `fit.kappa = TRUE`, as in
```{r}
options(warn = -1) # don't print warnings
fit.variogram(v0, vgm(c("Exp", "Mat", "Sph")), fit.kappa = TRUE)
```
where the best fitting kappa from the range 0.3, 0.4, 0.5,...,5 is
chosen. I suppressed warnings here, as around 20 warnings were printed
in cases with crazy initial values. This is usual for Matern 
models: larger kappa values have effective ranges (the distance
value at which the model reaches, say, 95% of its sill) much larger than
the range parameter, as illustrated by
```{r fig=TRUE, fig.path = "images/", label="vgm1"}
plot(variogramLine(vgm(1, "Mat", 1, kappa = 4), 10), type = 'l')
```

where at distance 1, 0.05 of the sill is reached
(and the model, up till there, is nearly linear or
parabolic, leading to a singularity during fit). A different
parameterisation of the Matern model, given in [Michael Stein's
book](https://www.springer.com/gp/book/9780387986296), is the
following

```{r fig=TRUE, fig.path = "images/", label="vgm2"}
plot(variogramLine(vgm(1, "Ste", 1, kappa = 4), 10), type = 'l')
```

This one has the same smoothness, but reaches the sill much closer
to the range value. As a consequence it fits easier, that is,
without warnings:

```{r}
options(warn = 0) # set back to normal
fit.variogram(v0, vgm(c("Exp", "Ste", "Sph")), fit.kappa = TRUE)
```
For those you need a more precise estimate of the optimal kappa value,
you can iterate over steps of e.g. 0.01 by
```{r}
fit.variogram(v0, vgm(c("Exp", "Ste", "Sph")), fit.kappa = seq(.3,5,.01))
```
## How it works
Default initial parameter values are chosen from the sample variogram, where:

* the range parameter is taken as 1/3 of the maximum sample variogram distance,
* the nugget parameter is taken as the mean of the first three sample variogram values, and
* the partial sill is taken as the mean of the last five sample variogram values.

```{r}
vgm("Sph")
```
contains `NA` values for the numeric parameters, and 
under the hood (undocumented)
```{r}
gstat:::vgm_fill_na(vgm("Sph"), v)
```
fills the `NA` values with the initial values.

Providing more than one model to `vgm` returns a list,
```{r}
vgm(c("Sph", "Exp"))
```
which `fit.variogram` iterates over, returning the best fitting model. 

## Comparison to automap

Function `automap::autofitVariogram` does a similar job, but includes
the computation of the sample variogram from data (which can be
controlled by passing parameters to `...`). It takes slightly
different defaults for fitting, definitely different defaults
when computing the sample variogram, and has options for combining
distance bins.