--- title: "The kantorovich package" date: "2016-01-11" output: html_document --- {r setup0, include=FALSE} knitr::opts_chunk$set(collapse=TRUE)  I have just released a first version of the kantorovich package on [Github](https://github.com/stla/kantorovich). It is based on the code of my post [Using R to compute the Kantorovich distance](http://stla.github.io/stlapblog/posts/KantorovichWithR.html). This package has two main features: * It computes the extreme joinings of two probability measures$\mu$and$\nu$on a finite set; * It computes the Kantorovich distance between these two measures, for a given distance on their finite state space. With the help of the rccd and gmp packages, the kantorovich package can return the *exact* values of the extreme joinings and of the Kantorovich distance. ## Quick example As an example, take$\mu$and$\nu$the uniform probability measures on a finite set having three elements. {r} mu <- nu <- c(1/3, 1/3, 1/3)  The ejoinings function returns the extreme joinings of$\mu$and$\nu$. In this case these are the$6!$permutation matrices: {r} library(kantorovich) ejoinings(mu, nu)  Since mu and nu were unnamed, the vector names c(1,2,3) has been automatically assigned to them. The Kantorovich distance between$\mu$and$\nu$is relative to a given distance on the state space of$\mu$and$\nu$, represented by their vector names. By default, the kantorovich package takes the discrete$0\mathrm{-}1$distance. Obviously the Kantorovich distance is$0$on this example, because$\mu=\nu$. {r} kantorovich(mu, nu)  Note the message returned by both the ejoinings and the kantorovich functions. In order to get exact results, use rational numbers with the gmp package: {r, message=FALSE, warning=FALSE} library(gmp) mu <- nu <- as.bigq(c(1,1,1), c(3,3,3)) # shorter: as.bigq(c(1,1,1), 3) ejoinings(mu, nu)  ## User-specified distance Let us try an example with a user-specified distance. Let's say that the state space of$\mu$and$\nu$is$\{a, b, c\}$, and then we use c("a","b","c") as the vector names. {r} mu <- as.bigq(c(1,2,4), 7) nu <- as.bigq(c(3,1,5), 9) names(mu) <- names(nu) <- c("a", "b", "c")  ### Define distance as a matrix The distance can be specified as a matrix. Assume the distance$\rho$is given by$\rho(a,b)=1$,$\rho(a,c)=2$and$\rho(b,c)=4$. The bigq matrices offered by the gmp package do not handle dimension names. But, in our example, the distance$\rho$takes only integer values, therefore one can use a numerical matrix: {r} M <- matrix( c( c(0, 1, 2), c(1, 0, 4), c(2, 4, 0) ), byrow = TRUE, nrow = 3, dimnames = list(c("a","b","c"), c("a","b","c"))) kantorovich(mu, nu, dist=M)  If the distance takes rational values, one can proceed as before with a character matrix: {r} M <- matrix( c( c("0", "3/13", "2/13"), c("1/13", "0", "4/13"), c("2/13", "4/13", "0") ), byrow = TRUE, nrow = 3, dimnames = list(c("a","b","c"), c("a","b","c"))) kantorovich(mu, nu, dist=M)  ### Define distance as a function One can enter the distance as a function. In such an example, this does not sound convenient: {r} rho <- function(x,y){ if(x==y) { return(0) } else { if(x=="a" && y=="b") return(1) if(x=="a" && y=="c") return(2) if(x=="b" && y=="c") return(4) return(rho(y,x)) } } kantorovich(mu, nu, dist=rho)  Using a function could be more convenient in the case when the names are numbers: {r} names(mu) <- names(nu) <- 1:3  But one has to be aware that there are in character mode: {r} names(mu)  Thus, one can define a distance function as follows, for example with$\rho(x,y)=\frac{x-y}{x+y}\$: {r} rho <- function(x,y){ x <- as.numeric(x); y <- as.numeric(y) return(as.bigq(x-y, x+y)) } kantorovich(mu, nu, dist=rho)  ## A non-square example The kantorovich package also handles the case when mu and nu have different lengths, such as this example: {r} mu <- as.bigq(c(1,2,4), 7) nu <- as.bigq(c(3,1), 4) names(mu) <- c("a", "b", "c") names(nu) <- c("b", "c") ejoinings(mu, nu) kantorovich(mu, nu)