{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ "\"Open   \"Open" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 2: Correlations\n", "\n", "**Week 3, Day 5: Network Causality**\n", "\n", "**By Neuromatch Academy**\n", "\n", "**Content creators**: Ari Benjamin, Tony Liu, Konrad Kording\n", "\n", "**Content reviewers**: Mike X Cohen, Madineh Sarvestani, Yoni Friedman, Ella Batty, Michael Waskom\n", "\n", "**Production editors:** Gagana B, Spiros Chavlis" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial objectives\n", "\n", "*Estimated timing of tutorial: 45 min*\n", "\n", "This is Tutorial 2 on our day of examining causality. Below is the high level outline of what we'll cover today, with the sections we will focus on in this tutorial in bold:\n", "\n", "1. Master definitions of causality\n", "2. Understand that estimating causality is possible\n", "3. Learn 4 different methods and understand when they fail\n", " - perturbations\n", " - **correlations**\n", " - simultaneous fitting/regression\n", " - instrumental variables\n", "\n", "
\n", "\n", "**Tutorial 2 objectives**\n", "\n", "In Tutorial 1, we implemented and explored the dynamical system of neurons we will be working with throughout all of the tutorials today. We also learned about the \"gold standard\" of measuring causal effects through random perturbations. As random perturbations are often not possible, we will now turn to alternative methods to attempt to measure causality. We will:\n", "\n", "- Learn how to estimate connectivity from observations assuming **correlations approximate causation**\n", "- Show that this only works when the network is small\n", "\n", "
\n", "\n", "**Tutorial 2 setting**\n", "\n", "Often, we can't force neural activities or brain areas to be on or off. We just have to observe. Maybe we can get the correlation between two nodes -- is that good enough? The question we ask in this tutorial is **when is correlation a \"good enough\" substitute for causation?**\n", "\n", "The answer is not \"never\", actually, but \"sometimes\"." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Tutorial slides\n", "# @markdown These are the slides for the videos in all tutorials today\n", "from IPython.display import IFrame\n", "link_id = \"gp4m9\"\n", "print(f\"If you want to download the slides: https://osf.io/download/{link_id}/\")\n", "IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/{link_id}/?direct%26mode=render%26action=download%26mode=render\", width=854, height=480)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Install and import feedback gadget\n", "\n", "!pip3 install vibecheck datatops --quiet\n", "\n", "from vibecheck import DatatopsContentReviewContainer\n", "def content_review(notebook_section: str):\n", " return DatatopsContentReviewContainer(\n", " \"\", # No text prompt\n", " notebook_section,\n", " {\n", " \"url\": \"https://pmyvdlilci.execute-api.us-east-1.amazonaws.com/klab\",\n", " \"name\": \"neuromatch_cn\",\n", " \"user_key\": \"y1x3mpx5\",\n", " },\n", " ).render()\n", "\n", "\n", "feedback_prefix = \"W3D5_T2\"" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "both", "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Figure Settings\n", "import logging\n", "logging.getLogger('matplotlib.font_manager').disabled = True\n", "\n", "import ipywidgets as widgets # interactive display\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Plotting Functions\n", "\n", "def see_neurons(A, ax, show=False):\n", " \"\"\"\n", " Visualizes the connectivity matrix.\n", "\n", " Args:\n", " A (np.ndarray): the connectivity matrix of shape (n_neurons, n_neurons)\n", " ax (plt.axis): the matplotlib axis to display on\n", "\n", " Returns:\n", " Nothing, but visualizes A.\n", " \"\"\"\n", " A = A.T # make up for opposite connectivity\n", " n = len(A)\n", " ax.set_aspect('equal')\n", " thetas = np.linspace(0, np.pi * 2, n, endpoint=False)\n", " x, y = np.cos(thetas), np.sin(thetas),\n", " ax.scatter(x, y, c='k', s=150)\n", " A = A / A.max()\n", " for i in range(n):\n", " for j in range(n):\n", " if A[i, j] > 0:\n", " ax.arrow(x[i], y[i], x[j] - x[i], y[j] - y[i],\n", " color='k', alpha=A[i, j], head_width=.15,\n", " width = A[i,j] / 25, shape='right',\n", " length_includes_head=True)\n", " ax.axis('off')\n", " if show:\n", " plt.show()\n", "\n", "\n", "def plot_estimation_quality_vs_n_neurons(number_of_neurons):\n", " \"\"\"\n", " A wrapper function that calculates correlation between true and estimated connectivity\n", " matrices for each number of neurons and plots\n", "\n", " Args:\n", " number_of_neurons (list): list of different number of neurons for modeling system\n", " corr_func (function): Function for computing correlation\n", " \"\"\"\n", " corr_data = np.zeros((n_trials, len(number_of_neurons)))\n", " for trial in range(n_trials):\n", " print(f\"simulating trial {trial + 1} of {n_trials}\")\n", " for j, size in enumerate(number_of_neurons):\n", " corr = get_sys_corr(size, timesteps, trial)\n", " corr_data[trial, j] = corr\n", "\n", " corr_mean = corr_data.mean(axis=0)\n", " corr_std = corr_data.std(axis=0)\n", "\n", " plt.figure()\n", " plt.plot(number_of_neurons, corr_mean)\n", " plt.fill_between(number_of_neurons, corr_mean - corr_std,\n", " corr_mean + corr_std, alpha=.2)\n", " plt.xlabel(\"Number of neurons\")\n", " plt.ylabel(\"Correlation\")\n", " plt.title(\"Similarity between A and R as a function of network size\")\n", " plt.show()\n", "\n", "\n", "def plot_connectivity_matrix(A, ax=None, show=False):\n", " \"\"\"Plot the (weighted) connectivity matrix A as a heatmap\n", "\n", " Args:\n", " A (ndarray): connectivity matrix (n_neurons by n_neurons)\n", " ax: axis on which to display connectivity matrix\n", " \"\"\"\n", " if ax is None:\n", " ax = plt.gca()\n", " lim = np.abs(A).max()\n", " im = ax.imshow(A, vmin=-lim, vmax=lim, cmap=\"coolwarm\")\n", " ax.tick_params(labelsize=10)\n", " ax.xaxis.label.set_size(15)\n", " ax.yaxis.label.set_size(15)\n", " cbar = ax.figure.colorbar(im, ax=ax, ticks=[0], shrink=.7)\n", " cbar.ax.set_ylabel(\"Connectivity Strength\", rotation=90,\n", " labelpad=20, va=\"bottom\")\n", " ax.set(xlabel=\"Connectivity from\", ylabel=\"Connectivity to\")\n", " if show:\n", " plt.show()\n", "\n", "\n", "def plot_true_vs_estimated_connectivity(estimated_connectivity,\n", " true_connectivity,\n", " selected_neuron=None,\n", " correlation=None):\n", " \"\"\"Visualize true vs estimated connectivity matrices\n", "\n", " Args:\n", " estimated_connectivity (ndarray): estimated connectivity (n_neurons by n_neurons)\n", " true_connectivity (ndarray): ground-truth connectivity (n_neurons by n_neurons)\n", " correlation (float): correlation between true and estimated connectivity\n", " selected_neuron (int or None): None if plotting all connectivity, otherwise connectivity\n", " from selected_neuron will be shown\n", "\n", " \"\"\"\n", " fig, axs = plt.subplots(1, 2, figsize=(10, 5))\n", "\n", " if selected_neuron is not None:\n", " plot_connectivity_matrix(true_connectivity[:, [selected_neuron]], ax=axs[0])\n", " plot_connectivity_matrix(np.expand_dims(estimated_connectivity, axis=1),\n", " ax=axs[1])\n", " axs[0].set_xticks([0])\n", " axs[1].set_xticks([0])\n", " axs[0].set_xticklabels([selected_neuron])\n", " axs[1].set_xticklabels([selected_neuron])\n", " else:\n", " plot_connectivity_matrix(true_connectivity, ax=axs[0])\n", " plot_connectivity_matrix(estimated_connectivity, ax=axs[1])\n", "\n", " axs[0].set(title=\"True connectivity matrix A\")\n", " axs[1].set(title=\"Estimated connectivity matrix R\")\n", " if correlation:\n", " fig.suptitle(f\"Correlation between A and R: {correlation:.3f}\")\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Helper Functions\n", "\n", "def sigmoid(x):\n", " \"\"\"\n", " Compute sigmoid nonlinearity element-wise on x.\n", "\n", " Args:\n", " x (np.ndarray): the numpy data array we want to transform\n", " Returns\n", " (np.ndarray): x with sigmoid nonlinearity applied\n", " \"\"\"\n", " return 1 / (1 + np.exp(-x))\n", "\n", "\n", "def create_connectivity(n_neurons, random_state=42, p=0.9):\n", " \"\"\"\n", " Generate our nxn causal connectivity matrix.\n", "\n", " Args:\n", " n_neurons (int): the number of neurons in our system.\n", " random_state (int): random seed for reproducibility\n", "\n", " Returns:\n", " A (np.ndarray): our 0.1 sparse connectivity matrix\n", " \"\"\"\n", " np.random.seed(random_state)\n", " A_0 = np.random.choice([0, 1], size=(n_neurons, n_neurons), p=[p, 1 - p])\n", "\n", " # set the timescale of the dynamical system to about 100 steps\n", " _, s_vals, _ = np.linalg.svd(A_0)\n", " if s_vals[0] != 0 and not np.isnan(s_vals[0]):\n", " A = A_0 / (1.01 * s_vals[0])\n", " else:\n", " eps = 1e-12\n", " A = eps*np.ones_like(A_0) # if denominator is zero, set A to a small value\n", "\n", " return A\n", "\n", "\n", "def simulate_neurons(A, timesteps, random_state=42):\n", " \"\"\"\n", " Simulates a dynamical system for the specified number of neurons and timesteps.\n", "\n", " Args:\n", " A (np.array): the connectivity matrix\n", " timesteps (int): the number of timesteps to simulate our system.\n", " random_state (int): random seed for reproducibility\n", "\n", " Returns:\n", " X has shape (n_neurons, timeteps).\n", " \"\"\"\n", " np.random.seed(random_state)\n", " n_neurons = len(A)\n", " X = np.zeros((n_neurons, timesteps))\n", "\n", " for t in range(timesteps - 1):\n", " epsilon = np.random.multivariate_normal(np.zeros(n_neurons),\n", " np.eye(n_neurons))\n", " X[:, t + 1] = sigmoid(A.dot(X[:, t]) + epsilon)\n", "\n", " return X\n", "\n", "\n", "def get_sys_corr(n_neurons, timesteps, random_state=42, neuron_idx=None):\n", " \"\"\"\n", " A wrapper function for our correlation calculations between A and R.\n", "\n", " Args:\n", " n_neurons (int): the number of neurons in our system.\n", " timesteps (int): the number of timesteps to simulate our system.\n", " random_state (int): seed for reproducibility\n", " neuron_idx (int): optionally provide a neuron idx to slice out\n", "\n", " Returns:\n", " A single float correlation value representing the similarity between A and R\n", " \"\"\"\n", " A = create_connectivity(n_neurons, random_state)\n", " X = simulate_neurons(A, timesteps)\n", " R = correlation_for_all_neurons(X)\n", "\n", " return np.corrcoef(A.flatten(), R.flatten())[0, 1]" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "The helper functions defined above are:\n", "\n", "- `sigmoid`: computes sigmoid nonlinearity element-wise on input, from Tutorial 1.\n", "- `create_connectivity`: generates nxn causal connectivity matrix, from Tutorial 1.\n", "- `simulate_neurons`: simulates a dynamical system for the specified number of neurons and timesteps, from Tutorial 1.\n", "- `get_sys_corr`: a wrapper function for correlation calculations between $A$ and $R$." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: Small systems\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Video 1: Correlation vs causation\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'vjBO-S7KNPI'), ('Bilibili', 'BV1Ak4y1m7kk')]\n", "tab_contents = display_videos(video_ids, W=854, H=480)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Correlation_vs_causation_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Coding Exercise 1: Try to approximate causation with correlation\n", "\n", "In small systems, correlation can look like causation. Let's attempt to recover the true connectivity matrix ($A$) just by correlating the neural state at each timestep with the previous state: $C = \\vec{x_t} \\vec{x_{t+1}}^\\top$.\n", "\n", "Complete this function to estimate the connectivity matrix of a single neuron by calculating the correlation coefficients with every other neuron at the next timestep. That is, please correlate two vectors:\n", "1. The activity of a selected neuron at time $t$\n", "2. The activity of all other neurons at time $t+1$" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def compute_connectivity_from_single_neuron(X, selected_neuron):\n", " \"\"\"\n", " Computes the connectivity matrix from a single neuron neurons using correlations\n", "\n", " Args:\n", " X (ndarray): the matrix of activities\n", " selected_neuron (int): the index of the selected neuron\n", "\n", " Returns:\n", " estimated_connectivity (ndarray): estimated connectivity for the selected neuron, of shape (n_neurons,)\n", " \"\"\"\n", "\n", " # Extract the current activity of selected_neuron, t\n", " current_activity = X[selected_neuron, :-1]\n", "\n", " # Extract the observed outcomes of all the neurons\n", " next_activity = X[:, 1:]\n", "\n", " # Initialize estimated connectivity matrix\n", " estimated_connectivity = np.zeros(n_neurons)\n", "\n", " # Loop through all neurons\n", " for neuron_idx in range(n_neurons):\n", "\n", " # Get the activity of neuron_idx\n", " this_output_activity = next_activity[neuron_idx]\n", "\n", " ########################################################################\n", " ## TODO: Estimate the neural correlations between\n", " ## this_output_activity and current_activity\n", " ## ------------------- ----------------\n", " ##\n", " ## Note that np.corrcoef returns the full correlation matrix; we want the\n", " ## top right corner, which we have already provided.\n", " ## FIll out function and remove\n", " raise NotImplementedError('Compute neural correlations')\n", " ########################################################################\n", " # Compute correlation\n", " correlation = np.corrcoef(...)[0, 1]\n", "\n", " # Store this neuron's correlation\n", " estimated_connectivity[neuron_idx] = correlation\n", "\n", " return estimated_connectivity\n", "\n", "\n", "# Simulate a 6 neuron system for 5000 timesteps again.\n", "n_neurons = 6\n", "timesteps = 5000\n", "selected_neuron = 1\n", "\n", "# Invoke a helper function that generates our nxn causal connectivity matrix\n", "A = create_connectivity(n_neurons)\n", "\n", "# Invoke a helper function that simulates the neural activity\n", "X = simulate_neurons(A, timesteps)\n", "\n", "# Estimate connectivity\n", "estimated_connectivity = compute_connectivity_from_single_neuron(X, selected_neuron)\n", "\n", "# Visualize\n", "plot_true_vs_estimated_connectivity(estimated_connectivity, A, selected_neuron)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "def compute_connectivity_from_single_neuron(X, selected_neuron):\n", " \"\"\"\n", " Computes the connectivity matrix from a single neuron neurons using correlations\n", "\n", " Args:\n", " X (ndarray): the matrix of activities\n", " selected_neuron (int): the index of the selected neuron\n", "\n", " Returns:\n", " estimated_connectivity (ndarray): estimated connectivity for the selected neuron, of shape (n_neurons,)\n", " \"\"\"\n", "\n", " # Extract the current activity of selected_neuron, t\n", " current_activity = X[selected_neuron, :-1]\n", "\n", " # Extract the observed outcomes of all the neurons\n", " next_activity = X[:, 1:]\n", "\n", " # Initialize estimated connectivity matrix\n", " estimated_connectivity = np.zeros(n_neurons)\n", "\n", " # Loop through all neurons\n", " for neuron_idx in range(n_neurons):\n", "\n", " # Get the activity of neuron_idx\n", " this_output_activity = next_activity[neuron_idx]\n", "\n", " # Compute correlation\n", " correlation = np.corrcoef(this_output_activity, current_activity)[0, 1]\n", "\n", " # Store this neuron's correlation\n", " estimated_connectivity[neuron_idx] = correlation\n", "\n", " return estimated_connectivity\n", "\n", "\n", "# Simulate a 6 neuron system for 5000 timesteps again.\n", "n_neurons = 6\n", "timesteps = 5000\n", "selected_neuron = 1\n", "\n", "# Invoke a helper function that generates our nxn causal connectivity matrix\n", "A = create_connectivity(n_neurons)\n", "\n", "# Invoke a helper function that simulates the neural activity\n", "X = simulate_neurons(A, timesteps)\n", "\n", "# Estimate connectivity\n", "estimated_connectivity = compute_connectivity_from_single_neuron(X, selected_neuron)\n", "\n", "# Visualize\n", "with plt.xkcd():\n", " plot_true_vs_estimated_connectivity(estimated_connectivity, A, selected_neuron)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Approximate_causation_with_correlation_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Hopefully you saw that it pretty much worked. We wrote a function that does what you just did but in matrix form, so it's a little faster. It also does all neurons at the same time (helper function `correlation_for_all_neurons`)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell get helper function `correlation_for_all_neurons`\n", "\n", "def correlation_for_all_neurons(X):\n", " \"\"\"Computes the connectivity matrix for the all neurons using correlations\n", "\n", " Args:\n", " X: the matrix of activities\n", "\n", " Returns:\n", " estimated_connectivity (np.ndarray): estimated connectivity for the selected neuron, of shape (n_neurons,)\n", " \"\"\"\n", " n_neurons = len(X)\n", " S = np.concatenate([X[:, 1:], X[:, :-1]], axis=0)\n", " R = np.corrcoef(S)[:n_neurons, n_neurons:]\n", " return R" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to visualize full estimated vs true connectivity\n", "\n", "R = correlation_for_all_neurons(X)\n", "\n", "fig, axs = plt.subplots(2, 2, figsize=(10, 10))\n", "see_neurons(A, axs[0, 0])\n", "plot_connectivity_matrix(A, ax=axs[0, 1])\n", "axs[0, 1].set_title(\"True connectivity matrix A\")\n", "\n", "see_neurons(R, axs[1, 0])\n", "plot_connectivity_matrix(R, ax=axs[1, 1])\n", "axs[1, 1].set_title(\"Estimated connectivity matrix R\")\n", "\n", "fig.suptitle(\"True (A) and Estimated (R) connectivity matrices\")\n", "\n", "plt.tight_layout()\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "That pretty much worked too. Let's quantify how much it worked.\n", "\n", "We'll calculate the correlation coefficient between the true connectivity and the actual connectivity;" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "print(f\"Correlation matrix of A and R: {np.corrcoef(A.flatten(), R.flatten())[0, 1]:.5f}\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "It *appears* in our system that correlation captures causality." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Video 2: Correlation ~ causation for small systems\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'eWLOnTUe9SM'), ('Bilibili', 'BV1XZ4y1u7FR')]\n", "tab_contents = display_videos(video_ids, W=854, H=480)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Correlation_causation_for_small_systems_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "**Video correction**: the connectivity graph plots and associated explanations in this and other videos show the wrong direction of connectivity (the arrows should be pointing the opposite direction). This has been fixed in the figures above." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 2: Large systems\n", "\n", "*Estimated timing to here from start of tutorial: 15 min*\n", "\n", "As our system becomes more complex however, correlation fails to capture causality." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.1: Failure of correlation in complex systems\n", "\n", "Let's jump to a much bigger system. Instead of 6 neurons, we will now use 100 neurons. How does the estimation quality of the connectivity matrix change?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to simulate large system, estimate connectivity matrix with correlation and return estimation quality\n", "\n", "# Simulate a 100 neuron system for 5000 timesteps.\n", "n_neurons = 100\n", "timesteps = 5000\n", "random_state = 42\n", "\n", "A = create_connectivity(n_neurons, random_state)\n", "X = simulate_neurons(A, timesteps)\n", "R = correlation_for_all_neurons(X)\n", "corr = np.corrcoef(A.flatten(), R.flatten())[0, 1]\n", "\n", "fig, axs = plt.subplots(1, 2, figsize=(12, 6))\n", "plot_connectivity_matrix(A, ax=axs[0])\n", "axs[0].set_title(\"True connectivity matrix A\")\n", "plot_connectivity_matrix(R, ax=axs[1])\n", "axs[1].set_title(\"Estimated connectivity matrix R\")\n", "fig.suptitle(f\"Correlation of A and R: {corr:.5f}\")\n", "fig.tight_layout()\n", "plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Video 3: Correlation vs causation in large systems\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'U4sV-7g8T08'), ('Bilibili', 'BV1uC4y1b76C')]\n", "tab_contents = display_videos(video_ids, W=854, H=480)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Correlation_causation_in_large_systems_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Section 2.2: Correlation as a function of network size\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 2.2.1: Connectivity estimation as a function of number of neurons\n", "\n", "Instead of looking at just a few neurons (6) or a lot of neurons (100), as above, we will now systematically vary the number of neurons and plot the resulting changes in correlation coefficient between the true and estimated connectivity matrices." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to enable the widget\n", "@widgets.interact(n_neurons=(6, 42, 3))\n", "def plot_corrs(n_neurons=6):\n", " fig, axs = plt.subplots(1, 2, figsize=(10, 5))\n", " timesteps = 2000\n", " random_state = 42\n", " A = create_connectivity(n_neurons, random_state)\n", " X = simulate_neurons(A, timesteps)\n", " R = correlation_for_all_neurons(X)\n", " corr = np.corrcoef(A.flatten(), R.flatten())[0, 1]\n", " plot_connectivity_matrix(A, ax=axs[0])\n", " plot_connectivity_matrix(R, ax=axs[1])\n", " axs[0].set_title(\"True connectivity matrix A\")\n", " axs[1].set_title(\"Estimated connectivity matrix R\")\n", " fig.suptitle(f\"Correlation of A and R: {corr:.2f}\")\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Of course there is some variability due to randomness in $A$. Let's average over a few trials and find the relationship." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to plot connectivity estimation as a function of network size\n", "\n", "n_trials = 5\n", "timesteps = 1000 # shorter timesteps for faster running time\n", "number_of_neurons = [5, 10, 25, 50, 100]\n", "plot_estimation_quality_vs_n_neurons(number_of_neurons)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Connectivity_estimation_as_a_function_of_number_of_neurons_Interactive_Demo\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Interactive Demo 2.2.2: Connectivity estimation as a function of the sparsity of $A$\n", "\n", "You may rightly wonder if correlation only fails for large systems for certain types of $A$. In this interactive demo, you can examine connectivity estimation as a function of the sparsity of $A$. Does connectivity estimation get better or worse with less sparsity?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to enable the widget\n", "@widgets.interact(sparsity=(0.01, 0.99, .01))\n", "def plot_corrs(sparsity=0.9):\n", " fig, axs = plt.subplots(1, 2, figsize=(10, 5))\n", " timesteps = 2000\n", " random_state = 42\n", " n_neurons = 25\n", " A = create_connectivity(n_neurons, random_state, sparsity)\n", " X = simulate_neurons(A, timesteps)\n", " R = correlation_for_all_neurons(X)\n", " corr = np.corrcoef(A.flatten(), R.flatten())[0, 1]\n", " plot_connectivity_matrix(A, ax=axs[0])\n", " plot_connectivity_matrix(R, ax=axs[1])\n", " axs[0].set_title(\"True connectivity matrix A\")\n", " axs[1].set_title(\"Estimated connectivity matrix R\")\n", " fig.suptitle(f\"Correlation of A and R: {corr:.2f}\")\n", " plt.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Connectivity_estimation_as_a_function_of_the_sparsity_A_Interactive_Demo\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 3: Reflecting on causality\n", "\n", "*Estimated timing to here from start of tutorial: 34 min*" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ " ## Think! 3: Reflecting on causality\n", "\n", "Please discuss the following questions within groups for around 10 minutes.\n", "\n", "* Think of a research paper you've written. Did it use previous causal knowledge (e.g. a mechanism), or ask a causal question? Try phrasing that causal relationship in the language of an intervention. (\"*If I were to force $A$ to be $A'$, $B$ would...*\")\n", "* What methods for interventions exist in your area of neuroscience?\n", "* Think about these common \"filler\" words. Do they imply a causal relationship, in its interventional definition? (*regulates, mediates, generates, modulates, shapes, underlies, produces, encodes, induces, enables, ensures, supports, promotes, determines*)\n", "* What dimensionality would you (very roughly) estimate the brain to be? Would you expect correlations between neurons to give you their connectivity? Why?" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Reflecting_on_causality_Discussion\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Summary\n", "\n", "*Estimated timing of tutorial: 45 min*" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Video 4: Summary\n", "from ipywidgets import widgets\n", "from IPython.display import YouTubeVideo\n", "from IPython.display import IFrame\n", "from IPython.display import display\n", "\n", "\n", "class PlayVideo(IFrame):\n", " def __init__(self, id, source, page=1, width=400, height=300, **kwargs):\n", " self.id = id\n", " if source == 'Bilibili':\n", " src = f'https://player.bilibili.com/player.html?bvid={id}&page={page}'\n", " elif source == 'Osf':\n", " src = f'https://mfr.ca-1.osf.io/render?url=https://osf.io/download/{id}/?direct%26mode=render'\n", " super(PlayVideo, self).__init__(src, width, height, **kwargs)\n", "\n", "\n", "def display_videos(video_ids, W=400, H=300, fs=1):\n", " tab_contents = []\n", " for i, video_id in enumerate(video_ids):\n", " out = widgets.Output()\n", " with out:\n", " if video_ids[i][0] == 'Youtube':\n", " video = YouTubeVideo(id=video_ids[i][1], width=W,\n", " height=H, fs=fs, rel=0)\n", " print(f'Video available at https://youtube.com/watch?v={video.id}')\n", " else:\n", " video = PlayVideo(id=video_ids[i][1], source=video_ids[i][0], width=W,\n", " height=H, fs=fs, autoplay=False)\n", " if video_ids[i][0] == 'Bilibili':\n", " print(f'Video available at https://www.bilibili.com/video/{video.id}')\n", " elif video_ids[i][0] == 'Osf':\n", " print(f'Video available at https://osf.io/{video.id}')\n", " display(video)\n", " tab_contents.append(out)\n", " return tab_contents\n", "\n", "\n", "video_ids = [('Youtube', 'hRyAN3yak_U'), ('Bilibili', 'BV1KK4y1x74w')]\n", "tab_contents = display_videos(video_ids, W=854, H=480)\n", "tabs = widgets.Tab()\n", "tabs.children = tab_contents\n", "for i in range(len(tab_contents)):\n", " tabs.set_title(i, video_ids[i][0])\n", "display(tabs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Summary_Video\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Now for the takeaway. We know that for large systems correlation ≠ causation. But what about when we coarsely sample the large system? Do we get better at estimating the *effective* causal interaction between groups (=average of weights) from the correlation between the groups?\n", "\n", "From our simulation above, the answer appears to be no: as the number of neurons per group increases, we don't see any significant increase in our ability to estimate the causal interaction between groups.\n", "\n", "If you have time after completing all tutorials and are interested, please see Bonus Section 1 below for a brief discussion of correlation as a similarity metric and Bonus Section 2 to learn about causality and correlation in low resolution systems." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Bonus\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Bonus Section 1: Correlation as similarity metric\n", "\n", "We'd like to note here that though we make use of Pearson correlation coefficients throughout all of our tutorials to measure similarity between our estimated connectivity matrix $R$ and the ground truth connectivity $A$, this is not strictly correct usage of Pearson correlations as elements of $A$ are not normally distributed (they are in fact binary).\n", "\n", "We use Pearson correlations as they are quick and easy to compute within the Numpy framework and provide qualitatively similar results to other correlation metrics. Other ways to compute similarities:\n", "- [Spearman rank correlations](https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient), which does not require normally distributed data\n", "- dichotomizing our estimated matrix $R$ by the median and then running concordance analysis, such as computing [Cohen's kappa](https://en.wikipedia.org/wiki/Cohen%27s_kappa)\n", "\n", "Another thing to consider: all we want is some measure of the similarity between $A$ and $R$. Element-wise comparisons are one way to do this, but are there other ways you can think of? What about matrix similarities?" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Bonus Section 2: Low resolution systems\n", "\n", "A common situation in neuroscience is that you observe the *average* activity of large groups of neurons (think fMRI, EEG, LFP, etc.). We're going to simulate this effect, and ask if correlations work to recover the average causal effect of groups of neurons or areas.\n", "\n", "**Note on the quality of the analogy**: This is not intended as a perfect analogy of the brain or fMRI. Instead, we want to ask: *in a big system in which correlations fail to estimate causality, can you at least recover average connectivity between groups?*\n", "\n", "**Some brainy differences to remember**:\n", "We are assuming that the connectivity is random. In real brains, the neurons that are averaged have correlated input and output connectivities. This will improve the correspondence between correlations and causality for the average effect because the system has a lower true dimensionality. However, in real brains the system is also order of magnitudes larger than what we examine here, and the experimenter never has the fully-observed system." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "## Simulate a large system\n", "\n", "Execute the next cell to simulate a large system of 256 neurons for 10,000 timesteps - it will take a bit of time (around 4 mins) to finish so move on as it runs." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to simulate a large system\n", "\n", "n_neurons = 256\n", "timesteps = 10000\n", "random_state = 42\n", "\n", "A = create_connectivity(n_neurons, random_state)\n", "X = simulate_neurons(A, timesteps)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "### Bonus Section 1.1: Coarsely sample the system\n", "\n", "We don't observe this system. Instead, we observe the average activity of groups.\n" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "#### Bonus Coding Exercise 1.1: Compute average activity across groups and compare resulting connectivity to the truth\n", "\n", "Let's get a new matrix `coarse_X` that has 16 groups, each reflecting the average activity of 16 neurons (since there are 256 neurons in total).\n", "\n", "We will then define the true coarse connectivity as the average of the neuronal connection strengths between groups. We'll compute the correlation between our coarsely sampled groups to estimate the connectivity and compare with the true connectivity." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "def get_coarse_corr(n_groups, X):\n", " \"\"\"\n", " A wrapper function for our correlation calculations between coarsely sampled\n", " A and R.\n", "\n", " Args:\n", " n_groups (int): the number of groups. should divide the number of neurons evenly\n", " X: the simulated system\n", "\n", " Returns:\n", " A single float correlation value representing the similarity between A and R\n", " ndarray: estimated connectivity matrix\n", " ndarray: true connectivity matrix\n", " \"\"\"\n", "\n", " ############################################################################\n", " ## TODO: Insert your code here to get coarsely sampled X\n", " # Fill out function then remove\n", " raise NotImplementedError('Student exercise: please complete get_coarse_corr')\n", " ############################################################################\n", " coarse_X = ...\n", "\n", " # Make sure coarse_X is the right shape\n", " assert coarse_X.shape == (n_groups, timesteps)\n", "\n", " # Estimate connectivity from coarse system\n", " R = correlation_for_all_neurons(coarse_X)\n", "\n", " # Compute true coarse connectivity\n", " coarse_A = A.reshape(n_groups, n_neurons // n_groups, n_groups, n_neurons // n_groups).mean(3).mean(1)\n", "\n", " # Compute true vs estimated connectivity correlation\n", " corr = np.corrcoef(coarse_A.flatten(), R.flatten())[0, 1]\n", "\n", " return corr, R, coarse_A\n", "\n", "\n", "n_groups = 16\n", "\n", "# Call function\n", "corr, R, coarse_A = get_coarse_corr(n_groups, X)\n", "\n", "# Visualize\n", "plot_true_vs_estimated_connectivity(R, coarse_A, correlation=corr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# to_remove solution\n", "def get_coarse_corr(n_groups, X):\n", " \"\"\"\n", " A wrapper function for our correlation calculations between coarsely sampled\n", " A and R.\n", "\n", " Args:\n", " n_groups (int): the number of groups. should divide the number of neurons evenly\n", " X: the simulated system\n", "\n", " Returns:\n", " A single float correlation value representing the similarity between A and R\n", " ndarray: estimated connectivity matrix\n", " ndarray: true connectivity matrix\n", " \"\"\"\n", "\n", " coarse_X = X.reshape(n_groups, n_neurons // n_groups, timesteps).mean(1)\n", "\n", " # Make sure coarse_X is the right shape\n", " assert coarse_X.shape == (n_groups, timesteps)\n", "\n", " # Estimate connectivity from coarse system\n", " R = correlation_for_all_neurons(coarse_X)\n", "\n", " # Compute true coarse connectivity\n", " coarse_A = A.reshape(n_groups, n_neurons // n_groups, n_groups, n_neurons // n_groups).mean(3).mean(1)\n", "\n", " # Compute true vs estimated connectivity correlation\n", " corr = np.corrcoef(coarse_A.flatten(), R.flatten())[0, 1]\n", "\n", " return corr, R, coarse_A\n", "\n", "\n", "n_groups = 16\n", "\n", "# Call function\n", "corr, R, coarse_A = get_coarse_corr(n_groups, X)\n", "\n", "# Visualize\n", "with plt.xkcd():\n", " plot_true_vs_estimated_connectivity(R, coarse_A, correlation=corr)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Submit your feedback\n", "content_review(f\"{feedback_prefix}_Compute_average_activity_Bonus_Exercise\")" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "Correlation `corr` tell us how close is the estimated coarse connectivity matrix to the truth." ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We will now look at the estimation quality for different levels of coarseness when averaged over 3 trials." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @markdown Execute this cell to visualize plot\n", "\n", "n_neurons = 128\n", "timesteps = 5000\n", "n_trials = 3\n", "groups = [2 ** i for i in range(2, int(np.log2(n_neurons)))]\n", "\n", "corr_data = np.zeros((n_trials, len(groups)))\n", "\n", "for trial in range(n_trials):\n", " print(f\"simulating trial {trial + 1} out of {n_trials}\")\n", " A = create_connectivity(n_neurons, random_state=trial)\n", " X = simulate_neurons(A, timesteps, random_state=trial)\n", " for j, n_groups in enumerate(groups):\n", " corr_data[trial, j], _, _ = get_coarse_corr(n_groups, X)\n", "\n", "corr_mean = corr_data.mean(axis=0)\n", "corr_std = corr_data.std(axis=0)\n", "\n", "plt.figure()\n", "plt.plot(np.divide(n_neurons, groups), corr_mean)\n", "plt.fill_between(np.divide(n_neurons, groups), corr_mean - corr_std,\n", " corr_mean + corr_std, alpha=.2)\n", "plt.xlabel(f\"Number of neurons per group ({n_neurons} total neurons)\")\n", "plt.ylabel(\"Correlation of estimated\\neffective connectivity\")\n", "plt.title(\"Connectivity estimation performance vs coarseness of sampling\")\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "We know that for large systems **correlation $\\neq$ causation**. Here, we have looked at what happens when we coarsely sample a large system. Do we get better at estimating the *effective* causal interaction between groups (i.e., average of weights) from the correlation between the groups?\n", "\n", "From our simulation above, the answer appears to be no: as the number of neurons per group increases, we don't see any significant increase in our ability to estimate the causal interaction between groups." ] } ], "metadata": { "colab": { "collapsed_sections": [], "include_colab_link": true, "name": "W3D5_Tutorial2", "provenance": [], "toc_visible": true }, "kernel": { "display_name": "Python 3", "language": "python", "name": "python3" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.17" } }, "nbformat": 4, "nbformat_minor": 0 }