--- title: "Convergent Cross Mapping (CCM)" subtitle: "School on Physics Applications in Biology, January 2018" author: "Brenno Cabella, Roberto Kraenkel, Renato Coutinho, Paulo Inácio Prado, Rafael Lopes" date: "`r format(Sys.time(), '%d de %B de %Y')`" output: rmdformats::readthedown: self_contained: true thumbnails: true lightbox: true gallery: false highlight: tango toc_depth: 4 --- ```{r setup, echo=FALSE, warning=FALSE, message=F} library(knitr) library(plotly) library(rEDM) library(dplyr) opts_chunk$set(fig.align = 'center', fig.show = 'hold', fig.height = 5, warning = FALSE, message = FALSE, error = FALSE, echo=TRUE) options(formatR.arrow = TRUE,width = 90) ``` In this tutorial we present the general idea of Convergent Cross Mapping (CCM, Ye *et al.* 2017), then show some practical examples using synthetic data, and lastly we propose a real data analysis as exercise. For these activities, you will need the most recent version of [R](https://cran.r-project.org/) and the rEDM package installed in your working computer. After running this tutorial, you can review the concepts of cross-mapping and learn details about the rEDM library in the introductory documentation of the package (*vignette*) which you can find [here](https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.html) or in your local R installation with the command ```{r call vignette, eval=FALSE} vignette("rEDM-tutorial", package="rEDM") ``` # From Chaos to Chaos-ality ## General idea [^1] One of the corollaries of the Generalized Takens's Theorem is that it should be possible to cross predict or cross map between variables that are observed from the same dynamical system. Consider two variables, X and Y, that interact in a dynamical system. Then the univariate reconstructions based on X alone should uniquely identify the system state and thus the corresponding value of Y, and *vice versa*. ## Synthetic Data First, let's simulate a deterministic discrete-time dynamics with chaotic behavior. To create a 150-step time series following this dynamics, run the commands below in the R console: ```{r generate data} ## Two vectors to store data X <- c() Y <- c() ## Initial values X[1] <- 0.1 Y[1] <- 0.1 ## Iterate the dynamics 150 time steps for(i in 2:150){ X[i] <- 3.77*X[i-1]*(1-X[i-1]) Y[i] <- 3.82*Y[i-1]*(1-Y[i-1]-0.05*X[i-1]) } XY<-as.data.frame(cbind(X,Y)) ``` Note that the variable $X$ is causing changes in $Y$. However, if we plot the $X$ and $Y$ time series of this chaotic system, they do not seem to be related: ```{r plot 1st time series} par(cex=1.1,lwd=2) plot(20:50,X[20:50],type="b", pch=18, col="blue",ylim=c(min(X,Y),max(X,Y)), main='Two Species',xlab = 'time',ylab='Population') lines(20:50,Y[20:50],pch=19, col="red", type="b",lty=2,lwd=2) legend(x = "bottomright", legend = c("X", "Y"),lty=c(1,2),pch=c(18,19), col = c("blue", "red"), inset = 0.02,lwd=2) ``` $X$ and $Y$ do not seem to be related, although we know that $X$ and $Y$ are coupled (it's synthetic data!). Moreover, we can calculate the correlation coefficient between $X$ and $Y$: ```{r plot correlation} fit<-lm(Y ~ X) plot(X,Y,main='Correlation (X,Y)') abline(0,1,lty=2) #abline(fit$coefficients[1],fit$coefficients[2]) legend(x = "bottomleft", legend = paste('r =',round(cor(X,Y)*100)/100), inset = 0.02,col = 'black') ``` The results show that $X$ and $Y$ are not correlated, even though the data comes from a model with known causality (*i.e.* $X$ is causing changes in $Y$). This is a clear example that "lack of correlation does not imply lack of causation". ## Cross Mapping[^2] How can we then extract the causality of $X$ on $Y$ from their dynamics? As we have seen in previous sections, a generic property of the reconstructed shadow manifold is that the states of $X(t)$ on the shadow manifold ($M_X$) maps one-to-one onto states in the original attractor manifold $M$, and local neighborhoods of $M_X$ map onto local neighborhoods of $M$. It follows that for two variables $X$ and $Y$ that are dynamically coupled, local neighborhoods on their lagged reconstructions ($M_X$ and $M_Y$, respectively) will map to each other since $X$ and $Y$ are essentially alternative observations of the common original attractor manifold $M$. **Convergent cross mapping** (CCM) determines how well local neighborhoods on $M_X$ correspond to local neighborhoods on $M_Y$ and vice versa. To do so, we construct a manifold $M_X$ from lags of the variable $X$ and use it to predict the states of $Y$ at the same time points. Similarly, a manifold $M_Y$ is constructed from lags of the variable $Y$ and used to predict the states of $X$ at the same times. To do so, we first need to obtain the optimal embedding dimension for both variables using the simplex function (see the tutorial about [simplex projection](simplex.html) for details). ```{r optimal embeddings X} options(warn = -1) simplex_X<-simplex(X,silent=T) plot(simplex_X$rho,type='o', ylab = "Forecast Skill (rho)", xlab="Embedding Dimension (E)") E_star_X<-which.max(simplex_X$rho) print(paste('E*(X) =',E_star_X)) ``` The optimal embedding dimension ($E^*$), that is, the one with larger prediction skill $\rho$, is two. ```{r optimal embeddings Y} simplex_Y<-simplex(Y,silent=T) plot(simplex_Y$rho,type='o', ylab = "Forecast Skill (rho)", xlab="Embedding Dimension (E)") E_star_Y<-which.max(simplex_Y$rho) print(paste('E*(Y) =',E_star_Y)) ``` $E^*$ is also two for the shadow manifold of $Y$. ## Constructing the manifolds $M_X$ and $M_Y$ Now that we have the optimal embedding dimensions, we can construct the shadow manifolds $M_X$ and $M_Y$ using the `make_block` function from the rEDM package. Use the command below to load this function into your R workspace (it will be included in rEDM itself in future releases of the package): ```{r load make_block} source("https://raw.githubusercontent.com/mathbio/edmTutorials/master/utilities/make_block.R") ``` Now let's create the shadow manifolds for both $X$ and $Y$. Since $E^* = 2$, we will use two dimensions to construct the shadow manifolds $M_X$ and $M_Y$. This means that $M_X$ will be built based on $X(t)$ and $X(t+1)$, similarly $M_Y$ is constructed from $Y(t)$ and $Y(t+1)$. ```{r make_block} # max_lag is the optimal embedding dimension Shadow_MXY<-make_block(XY,max_lag = 2) Shadow_MX<-Shadow_MXY[,2:3] Shadow_MY<-Shadow_MXY[,4:5] head(Shadow_MXY) ``` The table above shows the $X$ and $Y$ time series and their respective lags $X_1$ and $Y_1$. The manifold $M_X$ is thus composed of $X$ and $X_1$, and $M_Y$ of $Y$ and $Y_1$. ## Cross-mapping from $M_X$ to $M_Y$ (X_xmap_Y) To better understand the cross mapping, let's start by using $X$ to predict one single value in $Y$. Here, we are using the term "prediction", but instead of predicting future values of $X$ (as in the [simplex tutorial](simplex.html)), we will predict values of $Y(t)$ using $X(t)$ and vice-versa. This "cross-prediction" is performed between different variables but for the same point in time. Suppose we want to predict the value $Y(t=70)$: ```{r MX X_xmap_Y_code} predictor<-70 print(Y[predictor]) ``` We are going to use the nearest neighbours of $X(t=70)$ within the manifold $M_X$ to predict the value of $Y(t=70)$. To do this, we first calculate a matrix of distances among all states of $M_X$ and then we select the $E^*+1$, in this example 2+1 = 3, nearest neighbours within the $M_X$, creating the simplex_Mx. ```{r neighbors_X} dist.matrix_X <- as.matrix(dist(Shadow_MX, upper=TRUE)) neigb_X <- order(dist.matrix_X[predictor,])[2:4] neigh_X_print<-c(neigb_X) print(paste('simplex_Mx:',list(neigh_X_print))) ``` We can also plot $M_X$ with the predictor (red dot) and the respective neighbours simplex_Mx (blue dots). Zoom in to see all the dots. ```{r MX X_xmap_Y, echo=FALSE, fig.align='center'} p_MX_X_to_Y <- plot_ly(Shadow_MX, x=~X, y=~X_1, marker=(list(color=grey)), opacity=0.1) %>% layout(xaxis = list(title = 'X'),yaxis = list(title = 'X(t-1)'),title='Mx') %>% add_markers(text = paste("time =",1:length(X)), showlegend = FALSE) %>% add_trace( x = ~X, y=~X_1,data=Shadow_MX[c(predictor,neigb_X),],opacity=1,marker=list(color=c("red","blue","blue","blue")),type="scatter", mode="markers",text = paste("time =",c(predictor,neigb_X))) p_MX_X_to_Y ``` The cross-mapping process starts by finding (mapping) the simplex_Mx onto $M_Y$, creating the simplex_My. Note that simplex_My has the same indexes as simplex_Mx. The figure below shows $M_Y$ and simplex_My (green dots). ```{r MY_X_xmap_Y, echo=FALSE} p_MY_X_to_Y <- plot_ly(Shadow_MY, x=~Y, y=~Y_1, marker=(list(color=grey)), opacity=0.1) %>% layout(xaxis = list(title = 'Y'),yaxis = list(title = 'Y (t-1)'),title='My') %>% add_markers(text = paste("time =",1:length(Y)), showlegend = FALSE) %>% add_trace( x = ~Y, y=~Y_1,data=Shadow_MY[c(neigb_X),],opacity=1,marker=list(color=c("green","green","green")),type="scatter", mode="markers",text = paste("time =",c(neigb_X))) p_MY_X_to_Y ``` The simplex_My will then be used to estimate the value of $Y(t=70)$ in $M_Y$, obtaining the predicted value $\tilde{Y}$. We can expand this idea of predicting a single state of $Y$ based on $X$ to predict all the states of $Y$ using $X$. If we then do this prediction for every value of $Y$, we will have a vector of the predicted states of $Y$ (using $X$), besides the real *observed* states of $Y$. We can then compare the predicted and observed: ```{r estimating Y from X (X_xmap_Y)} lib <- c(1, NROW(Shadow_MXY)) block_lnlp_output_XY <- block_lnlp(Shadow_MXY, lib = lib, pred = lib, columns = c("X", "X_1"), target_column = "Y", stats_only = FALSE, first_column_time = TRUE) observed_all_Y <- block_lnlp_output_XY$model_output[[1]]$obs predicted_all_Y <- block_lnlp_output_XY$model_output[[1]]$pred pred_obs_Y<-as.data.frame(cbind(predicted_all_Y,observed_all_Y)) # colnames(pred_obs_Y)<-c('Predicted Y','Observed Y') head(pred_obs_Y) ``` For the case of $Y(t=70)$ we saw above, we find: ```{r predicted y} pred_obs_Y[(predictor-2),] ``` Note that the index of the predicted point is off by two. That's because the first $E^* = 2$ points cannot be predicted, so the prediction of $Y(t=70)$ corresponds to the value at index 68 of the table. Below is the plot for all predictions of $Y$ (mapped from $X$: X_xmap_Y) against the real observations, plus the Pearson's correlation coefficient. The red dot represents the predicted and observed $Y(t=70)$ used in the example above. ```{r plot_obs_pred_MX_MY} fit_YX<-lm(predicted_all_Y ~ observed_all_Y) plot_range <- range(c(observed_all_Y, predicted_all_Y), na.rm = TRUE) plot(observed_all_Y,predicted_all_Y, xlim = plot_range, ylim = plot_range, xlab = "Observed Y", ylab = "Predicted Y") #abline(fit_YX$coefficients[1],fit_YX$coefficients[2]) abline(0,1,lty=2) legend(x = "bottomright", legend = paste('r =',round(cor(observed_all_Y, predicted_all_Y)*100)/100),inset = 0.02,col = 'black') observed_pred_Y<-observed_all_Y[predictor-2] predicted_pred_Y<-predicted_all_Y[predictor-2] points(observed_pred_Y,predicted_pred_Y,col='red',pch=16,cex=1.2) ``` ## Cross-mapping from $M_Y$ to $M_X$ (Y_xmap_X) Similarly to the previous mapping, we can now do the opposite: use $Y$ to predict values of $X$. Now, we want to predict the value of $X(t=70)$: ```{r MX Y_xmap_X_code} print(X[predictor]) ``` We obtain the indexes of the $E^*+1$ nearest neighbors of the given predictor. Since we are now cross mapping from $M_Y$ to $M_X$, the simplex_My is generated directly from the shadow manifold $M_Y$. Again, we first create a matrix of distances among the states of $M_Y$ and then we select the $E^*+1$ nearest neighbors. ```{r neighbors_y} dist.matrix_Y <- as.matrix(dist(Shadow_MY, upper=TRUE)) neigb_Y <- order(dist.matrix_Y[predictor,])[2:4] neigh_Y_print<-c(neigb_Y) print(paste('simplex_My:',list(neigh_Y_print))) ``` The following plot presents $M_Y$ with the predictor (red dot) and respective simplex_My (blue dots). ```{r MY Y_xmap_X, echo=FALSE, fig.align='center'} p_MY_Y_to_X <- plot_ly(Shadow_MY, x=~Y, y=~Y_1, marker=(list(color=grey)), opacity=0.1) %>% layout(xaxis = list(title = 'Y'),yaxis = list(title = 'Y(t-1)'),title='My') %>% add_markers(text = paste("time =",1:length(X)), showlegend = FALSE) %>% add_trace( x = ~Y, y=~Y_1,data=Shadow_MY[c(predictor,neigb_Y),],opacity=1,marker=list(color=c("red","blue","blue","blue")),type="scatter", mode="markers",text = paste("time =",c(predictor,neigb_Y))) p_MY_Y_to_X ``` Next, we map the simplex_My in $M_X$, creating the simplex_Mx. Analogously, simplex_Mx has the same indexes as simplex_My. The figure below shows $M_X$ and simplex_Mx (green dots). ```{r MX_Y_xmap_X, echo=FALSE} p_MX_Y_to_X <- plot_ly(Shadow_MX, x=~X, y=~X_1, marker=(list(color=grey)), opacity=0.1) %>% layout(xaxis = list(title = 'X'),yaxis = list(title = 'X (t-1)'),title='Mx') %>% add_markers(text = paste("time =",1:length(Y)), showlegend = FALSE) %>% add_trace( x = ~X, y=~X_1,data=Shadow_MX[c(neigb_Y),],opacity=1,marker=list(color=c("green","green","green")),type="scatter", mode="markers",text = paste("time =",c(neigb_Y))) p_MX_Y_to_X ``` The simplex_Mx will then be used to estimate the predicted value of the predictor in $M_X$, obtaining the predicted value $\tilde{X}(t)$. ```{r estimating X from Y (Y_xmap_X)} lib<-lib <- c(1, NROW(Shadow_MXY)) block_lnlp_output_YX <- block_lnlp(Shadow_MXY, lib = lib, pred = lib, columns = c("Y", "Y_1"), target_column = "X", stats_only = FALSE, first_column_time = TRUE) observed_all_X <- block_lnlp_output_YX$model_output[[1]]$obs predicted_all_X <- block_lnlp_output_YX$model_output[[1]]$pred pred_obs_X<-as.data.frame(cbind(predicted_all_X,observed_all_X)) colnames(pred_obs_X)<-c('Predicted X','Observed X') head(pred_obs_X) ``` Below is the plot for all predictions of $X$ (mapped from $Y$: Y_xmap_X) against the real observations, plus the Pearson's correlation coefficient. The red dot represents the predictor (at $t=70$) used in the example above. ```{r plot_obs_pred_MY_MX} fit_XY<-lm(predicted_all_X ~ observed_all_X) plot_range <- range(c(observed_all_X, predicted_all_X), na.rm = TRUE) plot(observed_all_X, predicted_all_X, xlim = plot_range, ylim = plot_range, xlab = "Observed X", ylab = "Predicted X") #abline(fit_XY$coefficients[1],fit_XY$coefficients[2]) abline(0,1,lty=2) legend(x = "bottomright", legend = paste('r =',round(cor(observed_all_X,predicted_all_X)*100)/100), inset = 0.02,col = 'black') observed_pred_X<-observed_all_X[predictor-2] predicted_pred_X<-predicted_all_X[predictor-2] points(observed_pred_X,predicted_pred_X,col='red',pch=16,cex=1.2) ``` Note that the Pearson correlation of Y_xmap_X (r = 0.68) is different than the X_xmap_Y (r = -0.08). What is going on? ## But what causes what? The Pearson correlation of Y_xmap_X is much larger than the X_xmap_Y. When we used $Y$ to estimate (map) $X$, we obtained good predictions (r = 0.68). However, when we used $X$ to estimate (map) $Y$, the correlation between estimate and observed was quite low (r = -0.08). This means that the Y dynamics contains information about the dynamics of $X$, and therefore we can use $Y$ to estimate $X$. This means that $X$ somehow influences (causes!) $Y$. Indeed, since all our data is synthetic, we know this is true: we created the variable $X$ independently of $Y$, and $X$ was causing changes in $Y$. So when $X$ causes $Y$, the cross mapping skill from $M_Y$ to $M_X$ (Y_xmap_X) will generally gives us good results (high correlations), but *not* X_xmap_Y. It is *very* important to notice that there is an inverse relation between cross mapping and causality: if $X$ causes $Y$ ($X \rightarrow Y$), $Y$ cross map of $X$ (Y_xmap_X) shows a positive (large) correlation. ## Convergent Cross Mapping (CCM) Convergence means that the estimates from the cross-mapping improve as we increase the length of the time-series. The length L is the sample size used to construct a library. ```{r convergent} # cross map from X to Y X_xmap_Y<- ccm(XY, E = 2, lib_column = "X", target_column = "Y", lib_sizes = seq(10, 130, by = 10), num_samples = 100, random_libs = TRUE, replace = TRUE) # cross map from Y to X Y_xmap_X<- ccm(XY, E = 2, lib_column = "Y", target_column = "X", lib_sizes = seq(10, 130, by = 10), num_samples = 100, random_libs = TRUE, replace = TRUE) # mean values X_xmap_Y_means <- ccm_means(X_xmap_Y) Y_xmap_X_means <- ccm_means(Y_xmap_X) # plot graphs plot(X_xmap_Y_means$lib_size, pmax(0, X_xmap_Y_means$rho), type = "l", col = "red", main='Two Species', xlab = "Library Size (L)", ylab = "Cross Map Skill (Pearson rho)", ylim = c(0,1)) lines(Y_xmap_X_means$lib_size, pmax(0, Y_xmap_X_means$rho), col = "blue") legend(x = "topleft", legend = c("X_xmap_Y", "Y_xmap_X"), col = c("red", "blue"), cex=1.1,lwd=2, inset = 0.02) ``` This means that the more data you have (*i.e.* the larger L), the more trajectories you have to infer the attractor, resulting thus in closer nearest neighbors and less estimation error (*i.e.* a higher correlation coefficient between estimated and observed). That is, if there is causation, we expect to see the convergence of the cross mapping (CCM). # Exercises 1. What happens when we invert the cause-effect relationship? Change the model so that the Y variable causes changes in X, but not the other way around. 2. What if there is no interaction between X and Y? 3. What about both variables interacting with each other (X causing X and Y causing X)? 4. Identifying causal links within systems is very important for effective policy and management recommendations on *e.g.* climate, ecosystem services, epidemiology and financial regulation. In the following exercise, use CCM to identify causality among sardine, anchovy, and sea surface temperature measured at Scripps Pier and Newport Pier, California. ```{r ex4} data(sardine_anchovy_sst) head(sardine_anchovy_sst) ``` # Learn more * Sugihara G, May R, Ye H, Hsieh C-h, Deyle E, Fogarty M, Munch S. 2012. Detecting Causality in Complex Ecosystems. Science 338:496–500. # Glossary ```{r include glossary, child = '../glossary.md'} ``` [^1]: Taken from [rEDM vignette](https://cran.r-project.org/web/packages/rEDM/vignettes/rEDM-tutorial.html) section (*"Causality Inference and Cross Mapping"*). [^2]: This section is adapted from: Sugihara et al., Detecting Causality in Complex Ecosystems (Supplementary Materials), Science, 2012.