---
title: "Convergent Cross Mapping (CCM)"
subtitle: "School on Physics Applications in Biology, January 2018"
author: "Brenno Cabella, Roberto Kraenkel, Renato Coutinho, Paulo Inácio Prado, Rafael Lopes"
date: "`r format(Sys.time(), '%d de %B de %Y')`"
output:
         rmdformats::readthedown:
         self_contained: true
         thumbnails: true
         lightbox: true
         gallery: false
         highlight: tango
         toc_depth: 4
---

```{r setup, echo=FALSE, warning=FALSE, message=F}
library(knitr)
library(plotly)
library(rEDM)
library(dplyr)


opts_chunk$set(fig.align = 'center',
               fig.show = 'hold',
               fig.height = 5,
               warning = FALSE, message = FALSE, error = FALSE, echo=TRUE)
options(formatR.arrow = TRUE,width = 90)
```

In this tutorial we present the general idea of Convergent Cross Mapping (CCM,
Ye *et al.* 2017), then show some practical examples using synthetic data, and
lastly we propose a real data analysis as exercise.  For these activities, you
will need the most recent version of [R](https://cran.r-project.org/) and the
rEDM package installed in your working computer.

After running this tutorial, you can review the concepts of cross-mapping  and
learn details about the rEDM library in the introductory documentation of the
package (*vignette*) which you can find
[here](https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.html)
or in your local R installation with the command


```{r call vignette, eval=FALSE}
vignette("rEDM-tutorial", package="rEDM")
```

# From Chaos to Chaos-ality

## General idea [^1]

One of the corollaries of the Generalized Takens's Theorem is that it should
be possible to cross predict or cross map between variables that are observed
from the same dynamical system. Consider two variables, X and Y, that interact
in a dynamical system. Then the univariate reconstructions based on X alone
should uniquely identify the system state and thus the corresponding value of
Y, and *vice versa*.

## Synthetic Data

First, let's simulate a deterministic discrete-time dynamics with chaotic
behavior.  To create a 150-step time series following this dynamics, run the
commands below in the R console:

```{r generate data}
## Two vectors to store data
X <- c()
Y <- c()
## Initial values
X[1] <- 0.1
Y[1] <- 0.1
## Iterate the dynamics 150 time steps
for(i in 2:150){
  X[i] <- 3.77*X[i-1]*(1-X[i-1])
  Y[i] <- 3.82*Y[i-1]*(1-Y[i-1]-0.05*X[i-1])
}
XY<-as.data.frame(cbind(X,Y))
```

Note that the variable $X$ is causing changes in $Y$.  However, if we plot the
$X$ and $Y$ time series of this chaotic system, they do not seem to be
related:

```{r plot 1st time series}
par(cex=1.1,lwd=2)
plot(20:50,X[20:50],type="b", pch=18, col="blue",ylim=c(min(X,Y),max(X,Y)),
     main='Two Species',xlab = 'time',ylab='Population')
lines(20:50,Y[20:50],pch=19, col="red", type="b",lty=2,lwd=2)
legend(x = "bottomright", legend = c("X", "Y"),lty=c(1,2),pch=c(18,19),
       col = c("blue", "red"), inset = 0.02,lwd=2)
```

$X$ and $Y$ do not seem to be related, although we know that $X$ and $Y$ are
coupled (it's synthetic data!).  Moreover, we can calculate the correlation
coefficient between $X$ and $Y$:

```{r plot correlation}
fit<-lm(Y ~ X)
plot(X,Y,main='Correlation (X,Y)')
abline(0,1,lty=2)
#abline(fit$coefficients[1],fit$coefficients[2])
legend(x = "bottomleft", legend = paste('r =',round(cor(X,Y)*100)/100),
       inset = 0.02,col = 'black')
```

The results show that $X$ and $Y$ are not correlated, even though the data
comes from a model with known causality (*i.e.* $X$ is causing changes in
$Y$).  This is a clear example that "lack of correlation does not imply lack
of causation".

## Cross Mapping[^2]

How can we then extract the causality of $X$ on $Y$ from their dynamics? As we
have seen in previous sections, a generic property of the reconstructed shadow
manifold is that the states of $X(t)$ on the shadow manifold ($M_X$) maps
one-to-one onto states in the original attractor manifold $M$, and local
neighborhoods of $M_X$ map onto local neighborhoods of $M$.

It follows that for two variables $X$ and $Y$ that are dynamically coupled,
local neighborhoods on their lagged reconstructions ($M_X$ and $M_Y$,
respectively) will map to each other since $X$ and $Y$ are essentially
alternative observations of the common original attractor manifold $M$. 

**Convergent cross mapping** (CCM) determines how well local neighborhoods on
$M_X$ correspond to local neighborhoods on $M_Y$ and vice versa. To do so, we
construct a manifold $M_X$ from lags of the variable $X$ and use it to predict
the states of $Y$ at the same time points.  Similarly, a manifold $M_Y$ is
constructed from lags of the variable $Y$ and used to predict the states of
$X$ at the same times. 

To do so, we first need to obtain the optimal embedding dimension for
both variables using the simplex function (see the tutorial about
[simplex projection](simplex.html) for details).


```{r optimal embeddings X}
options(warn = -1)
simplex_X<-simplex(X,silent=T)
plot(simplex_X$rho,type='o', ylab = "Forecast Skill (rho)",
     xlab="Embedding Dimension (E)")
E_star_X<-which.max(simplex_X$rho)
print(paste('E*(X) =',E_star_X))
```

The optimal embedding dimension ($E^*$), that is, the one with larger
prediction skill $\rho$, is two.

```{r optimal embeddings Y}
simplex_Y<-simplex(Y,silent=T)
plot(simplex_Y$rho,type='o', ylab = "Forecast Skill (rho)",
     xlab="Embedding Dimension (E)")
E_star_Y<-which.max(simplex_Y$rho)
print(paste('E*(Y) =',E_star_Y))
```
$E^*$ is also two for the shadow manifold of $Y$.

## Constructing the manifolds $M_X$ and $M_Y$

Now that we have the optimal embedding dimensions, we can construct the shadow
manifolds $M_X$ and $M_Y$ using the `make_block` function from the rEDM package.
Use the command below to load this function into your R workspace (it will be
included in rEDM itself in future releases of the package):

```{r load make_block}
source("https://raw.githubusercontent.com/mathbio/edmTutorials/master/utilities/make_block.R")
```

Now let's create the shadow manifolds for both $X$ and $Y$. Since $E^* = 2$, we
will use two dimensions to construct the shadow manifolds $M_X$ and $M_Y$. This
means that $M_X$ will be built based on $X(t)$ and $X(t+1)$, similarly $M_Y$
is constructed from $Y(t)$ and $Y(t+1)$.

```{r make_block}
# max_lag is the optimal embedding dimension
Shadow_MXY<-make_block(XY,max_lag = 2)
Shadow_MX<-Shadow_MXY[,2:3]
Shadow_MY<-Shadow_MXY[,4:5]

head(Shadow_MXY)
```

The table above shows the $X$ and $Y$ time series and their respective lags
$X_1$ and $Y_1$. The manifold $M_X$ is thus composed of $X$ and $X_1$, and
$M_Y$ of $Y$ and $Y_1$.

## Cross-mapping from $M_X$ to $M_Y$ (X_xmap_Y)

To better understand the cross mapping, let's start by using $X$ to predict
one single value in $Y$. Here, we are using the term "prediction", but instead
of predicting future values of $X$ (as in the [simplex
tutorial](simplex.html)), we will predict values of $Y(t)$ using $X(t)$ and
vice-versa.  This "cross-prediction" is performed between different variables
but for the same point in time.

Suppose we want to predict the value $Y(t=70)$:

```{r MX X_xmap_Y_code}
predictor<-70
print(Y[predictor])
```

We are going to use the nearest neighbours of $X(t=70)$ within the manifold
$M_X$ to predict the value of $Y(t=70)$. To do this, we first calculate a
matrix of distances among all states of $M_X$ and then we select the
$E^*+1$, in this example 2+1 = 3, nearest neighbours within the $M_X$,
creating the simplex_Mx.

```{r neighbors_X}
dist.matrix_X <- as.matrix(dist(Shadow_MX, upper=TRUE))
neigb_X <- order(dist.matrix_X[predictor,])[2:4]
neigh_X_print<-c(neigb_X)
print(paste('simplex_Mx:',list(neigh_X_print)))
```

We can also plot $M_X$ with the predictor (red dot) and the respective
neighbours simplex_Mx (blue dots). Zoom in to see all the dots.

```{r MX X_xmap_Y, echo=FALSE, fig.align='center'}
p_MX_X_to_Y <- plot_ly(Shadow_MX, x=~X, y=~X_1, marker=(list(color=grey)), opacity=0.1) %>%
  layout(xaxis = list(title = 'X'),yaxis = list(title = 'X(t-1)'),title='Mx') %>%
  add_markers(text = paste("time =",1:length(X)), showlegend = FALSE) %>%
  add_trace( x = ~X, y=~X_1,data=Shadow_MX[c(predictor,neigb_X),],opacity=1,marker=list(color=c("red","blue","blue","blue")),type="scatter", mode="markers",text = paste("time =",c(predictor,neigb_X)))
p_MX_X_to_Y
```

The cross-mapping process starts by finding (mapping) the simplex_Mx onto
$M_Y$, creating the simplex_My. Note that simplex_My has the same indexes as
simplex_Mx.  The figure below shows $M_Y$ and simplex_My (green dots).

```{r MY_X_xmap_Y, echo=FALSE}
p_MY_X_to_Y <- plot_ly(Shadow_MY, x=~Y, y=~Y_1, marker=(list(color=grey)), opacity=0.1) %>%
  layout(xaxis = list(title = 'Y'),yaxis = list(title = 'Y (t-1)'),title='My') %>%
  add_markers(text = paste("time =",1:length(Y)), showlegend = FALSE) %>%
  add_trace( x = ~Y, y=~Y_1,data=Shadow_MY[c(neigb_X),],opacity=1,marker=list(color=c("green","green","green")),type="scatter", mode="markers",text = paste("time =",c(neigb_X)))
p_MY_X_to_Y

```

The simplex_My will then be used to estimate the value of $Y(t=70)$ in $M_Y$,
obtaining the predicted value $\tilde{Y}$. We can expand this idea of
predicting a single state of $Y$ based on $X$ to predict all the states of $Y$
using $X$. If we then do this prediction for every value of $Y$, we will have
a vector of the predicted states of $Y$ (using $X$), besides the real *observed*
states of $Y$. We can then compare the predicted and observed:

```{r estimating Y from X (X_xmap_Y)}
lib <- c(1, NROW(Shadow_MXY))
block_lnlp_output_XY <- block_lnlp(Shadow_MXY, lib = lib, pred = lib, columns = c("X",
 "X_1"), target_column = "Y", stats_only = FALSE, first_column_time = TRUE)
observed_all_Y <- block_lnlp_output_XY$model_output[[1]]$obs
predicted_all_Y <- block_lnlp_output_XY$model_output[[1]]$pred
pred_obs_Y<-as.data.frame(cbind(predicted_all_Y,observed_all_Y)) #
colnames(pred_obs_Y)<-c('Predicted Y','Observed Y')
head(pred_obs_Y)
```

For the case of $Y(t=70)$ we saw above, we find:
```{r predicted y}

pred_obs_Y[(predictor-2),]

```

Note that the index of the predicted point is off by two. That's because the
first $E^* = 2$ points cannot be predicted, so the prediction of $Y(t=70)$
corresponds to the value at index 68 of the table.

Below is the plot for all predictions of $Y$ (mapped from $X$: X_xmap_Y) against
the real observations, plus the Pearson's correlation coefficient.  The red
dot represents the predicted and observed $Y(t=70)$ used in the example above.

```{r plot_obs_pred_MX_MY}
fit_YX<-lm(predicted_all_Y ~ observed_all_Y)
plot_range <- range(c(observed_all_Y, predicted_all_Y), na.rm = TRUE)
plot(observed_all_Y,predicted_all_Y, xlim = plot_range, ylim = plot_range, xlab = "Observed Y",
ylab = "Predicted Y")
#abline(fit_YX$coefficients[1],fit_YX$coefficients[2])
abline(0,1,lty=2)
legend(x = "bottomright", legend = paste('r =',round(cor(observed_all_Y, predicted_all_Y)*100)/100),inset = 0.02,col = 'black')
observed_pred_Y<-observed_all_Y[predictor-2]
predicted_pred_Y<-predicted_all_Y[predictor-2]
points(observed_pred_Y,predicted_pred_Y,col='red',pch=16,cex=1.2)
```


## Cross-mapping from $M_Y$ to $M_X$ (Y_xmap_X)

Similarly to the previous mapping, we can now do the opposite: use $Y$ to
predict values of $X$. Now, we want to predict the value of $X(t=70)$:

```{r MX Y_xmap_X_code}
print(X[predictor])
```

We obtain the indexes of the $E^*+1$ nearest neighbors of the given predictor.
Since we are now cross mapping from $M_Y$ to $M_X$, the simplex_My is
generated directly from the shadow manifold $M_Y$. Again, we first create a
matrix of distances among the states of $M_Y$ and then we select the $E^*+1$
nearest neighbors.

```{r neighbors_y}
dist.matrix_Y <- as.matrix(dist(Shadow_MY, upper=TRUE))
neigb_Y <- order(dist.matrix_Y[predictor,])[2:4]
neigh_Y_print<-c(neigb_Y)
print(paste('simplex_My:',list(neigh_Y_print)))
```

The following plot presents $M_Y$ with the predictor (red dot) and respective
simplex_My (blue dots).

```{r MY Y_xmap_X, echo=FALSE, fig.align='center'}
p_MY_Y_to_X <- plot_ly(Shadow_MY, x=~Y, y=~Y_1, marker=(list(color=grey)), opacity=0.1) %>%
  layout(xaxis = list(title = 'Y'),yaxis = list(title = 'Y(t-1)'),title='My') %>%
  add_markers(text = paste("time =",1:length(X)), showlegend = FALSE) %>%
  add_trace( x = ~Y, y=~Y_1,data=Shadow_MY[c(predictor,neigb_Y),],opacity=1,marker=list(color=c("red","blue","blue","blue")),type="scatter", mode="markers",text = paste("time =",c(predictor,neigb_Y)))
p_MY_Y_to_X
```

Next, we map the simplex_My in $M_X$, creating the simplex_Mx.  Analogously,
simplex_Mx has the same indexes as simplex_My. The figure below shows $M_X$
and simplex_Mx (green dots).

```{r MX_Y_xmap_X, echo=FALSE}
p_MX_Y_to_X <- plot_ly(Shadow_MX, x=~X, y=~X_1, marker=(list(color=grey)), opacity=0.1) %>%
  layout(xaxis = list(title = 'X'),yaxis = list(title = 'X (t-1)'),title='Mx') %>%
  add_markers(text = paste("time =",1:length(Y)), showlegend = FALSE) %>%
  add_trace( x = ~X, y=~X_1,data=Shadow_MX[c(neigb_Y),],opacity=1,marker=list(color=c("green","green","green")),type="scatter", mode="markers",text = paste("time =",c(neigb_Y)))
p_MX_Y_to_X
```

The simplex_Mx will then be used to estimate the predicted value of the predictor in
$M_X$, obtaining the predicted value $\tilde{X}(t)$.

```{r estimating X from Y (Y_xmap_X)}
lib<-lib <- c(1, NROW(Shadow_MXY))
block_lnlp_output_YX <- block_lnlp(Shadow_MXY, lib = lib, pred = lib, columns = c("Y", "Y_1"),
                                   target_column = "X", stats_only = FALSE, first_column_time = TRUE)
observed_all_X <- block_lnlp_output_YX$model_output[[1]]$obs
predicted_all_X <- block_lnlp_output_YX$model_output[[1]]$pred
pred_obs_X<-as.data.frame(cbind(predicted_all_X,observed_all_X))
colnames(pred_obs_X)<-c('Predicted X','Observed X')
head(pred_obs_X)
```

Below is the plot for all predictions of $X$ (mapped from $Y$: Y_xmap_X)
against the real observations, plus the Pearson's correlation coefficient.
The red dot represents the predictor (at $t=70$) used in the example above.

```{r plot_obs_pred_MY_MX}
fit_XY<-lm(predicted_all_X ~ observed_all_X)
plot_range <- range(c(observed_all_X, predicted_all_X), na.rm = TRUE)
plot(observed_all_X, predicted_all_X, xlim = plot_range, ylim = plot_range,
     xlab = "Observed X", ylab = "Predicted X")
#abline(fit_XY$coefficients[1],fit_XY$coefficients[2])
abline(0,1,lty=2)
legend(x = "bottomright", legend = paste('r =',round(cor(observed_all_X,predicted_all_X)*100)/100),
       inset = 0.02,col = 'black')
observed_pred_X<-observed_all_X[predictor-2]
predicted_pred_X<-predicted_all_X[predictor-2]
points(observed_pred_X,predicted_pred_X,col='red',pch=16,cex=1.2)
```

Note that the Pearson correlation of Y_xmap_X (r = 0.68) is different than the
X_xmap_Y (r = -0.08). What is going on?


## But what causes what?

The Pearson correlation of Y_xmap_X is much larger than the X_xmap_Y. When we
used $Y$ to estimate (map) $X$, we obtained good predictions (r = 0.68).
However, when we used $X$ to estimate (map) $Y$, the correlation between
estimate and observed was quite low (r = -0.08).  This means that the Y
dynamics contains information about the dynamics of $X$, and therefore we can
use $Y$ to estimate $X$. This means that $X$ somehow influences (causes!) $Y$.
Indeed, since all our data is synthetic, we know this is true: we created the
variable $X$ independently of $Y$, and $X$ was causing changes in $Y$. So when
$X$ causes $Y$, the cross mapping skill from $M_Y$ to $M_X$ (Y_xmap_X) will
generally gives us good results (high correlations), but *not* X_xmap_Y.  It
is *very* important to notice that there is an inverse relation between cross
mapping and causality: if $X$ causes $Y$ ($X \rightarrow Y$), $Y$ cross map of
$X$ (Y_xmap_X) shows a positive (large) correlation.

## Convergent Cross Mapping (CCM)

Convergence means that the estimates from the cross-mapping improve as we
increase the length of the time-series. The length L is the sample size used
to construct a library.


```{r convergent}
# cross map from X to Y
X_xmap_Y<- ccm(XY, E = 2, lib_column = "X", target_column = "Y",
               lib_sizes = seq(10, 130, by = 10), num_samples = 100, random_libs = TRUE,
               replace = TRUE)
# cross map from Y to X
Y_xmap_X<- ccm(XY, E = 2, lib_column = "Y", target_column = "X",
               lib_sizes = seq(10, 130, by = 10), num_samples = 100, random_libs = TRUE,
               replace = TRUE)

# mean values
X_xmap_Y_means <- ccm_means(X_xmap_Y)
Y_xmap_X_means <- ccm_means(Y_xmap_X)

# plot graphs
plot(X_xmap_Y_means$lib_size, pmax(0, X_xmap_Y_means$rho), type = "l", col = "red",
     main='Two Species', xlab = "Library Size (L)",
     ylab = "Cross Map Skill (Pearson rho)", ylim = c(0,1))
lines(Y_xmap_X_means$lib_size, pmax(0, Y_xmap_X_means$rho), col = "blue")
legend(x = "topleft", legend = c("X_xmap_Y", "Y_xmap_X"), col = c("red", "blue"),
       cex=1.1,lwd=2, inset = 0.02)
```

This means that the more data you have (*i.e.* the larger L), the more
trajectories you have to infer the attractor, resulting thus in closer
nearest neighbors and less estimation error (*i.e.* a higher correlation
coefficient between estimated and observed). That is, if there is causation,
we expect to see the convergence of the cross mapping (CCM).


# Exercises

1. What happens when we invert the cause-effect relationship? Change the model
   so that the Y variable causes changes in X, but not the other way around.
2. What if there is no interaction between X and Y?
3. What about both variables interacting with each other (X causing X and Y
   causing X)?
4. Identifying causal links within systems is very important for effective
   policy and management recommendations on *e.g.* climate, ecosystem
   services, epidemiology and financial regulation.  In the following
   exercise, use CCM to identify causality among sardine, anchovy, and sea
   surface temperature measured at Scripps Pier and Newport Pier, California.

```{r ex4}
data(sardine_anchovy_sst)
head(sardine_anchovy_sst)
```

# Learn more 

* Sugihara G, May R, Ye H, Hsieh C-h, Deyle E, Fogarty M, Munch S. 2012. Detecting Causality in Complex Ecosystems. Science 338:496–500.


# Glossary
```{r include glossary, child = '../glossary.md'}
```


[^1]: Taken from [rEDM
vignette](https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.html)
section (*"Causality Inference and Cross Mapping"*).

[^2]: This section is adapted from: Sugihara et al., Detecting Causality in Complex Ecosystems (Supplementary Materials), Science, 2012.