{ "cells": [ { "cell_type": "markdown", "metadata": { "colab_type": "text", "execution": {}, "id": "view-in-github" }, "source": [ " " ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "# Tutorial 1: Geometric view of data\n", "\n", "**Week 1, Day 4: Dimensionality Reduction**\n", "\n", "**By Neuromatch Academy**\n", "\n", "__Content creators:__ Alex Cayco Gajic, John Murray\n", "\n", "__Content reviewers:__ Roozbeh Farhoudi, Matt Krause, Spiros Chavlis, Richard Gao, Michael Waskom, Siddharth Suresh, Natalie Schaworonkow, Ella Batty" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Tutorial Objectives\n", "\n", "*Estimated timing of tutorial: 50 minutes*\n", "\n", "In this notebook we'll explore how multivariate data can be represented in different orthonormal bases. This will help us build intuition that will be helpful in understanding PCA in the following tutorial. \n", "\n", "Overview:\n", " - Generate correlated multivariate data.\n", " - Define an arbitrary orthonormal basis. \n", " - Project the data onto the new basis." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Tutorial slides\n", "\n", "# @markdown These are the slides for the videos in all tutorials today\n", "from IPython.display import IFrame\n", "IFrame(src=f\"https://mfr.ca-1.osf.io/render?url=https://osf.io/kaq2x/?direct%26mode=render%26action=download%26mode=render\", width=854, height=480)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Video 1: Geometric view of data\n", "from ipywidgets import widgets\n", "\n", "out2 = widgets.Output()\n", "with out2:\n", " from IPython.display import IFrame\n", " class BiliVideo(IFrame):\n", " def __init__(self, id, page=1, width=400, height=300, **kwargs):\n", " self.id=id\n", " src = 'https://player.bilibili.com/player.html?bvid={0}&page={1}'.format(id, page)\n", " super(BiliVideo, self).__init__(src, width, height, **kwargs)\n", "\n", " video = BiliVideo(id=\"BV1Af4y1R78w\", width=854, height=480, fs=1)\n", " print('Video available at https://www.bilibili.com/video/{0}'.format(video.id))\n", " display(video)\n", "\n", "out1 = widgets.Output()\n", "with out1:\n", " from IPython.display import YouTubeVideo\n", " video = YouTubeVideo(id=\"THu9yHnpq9I\", width=854, height=480, fs=1, rel=0)\n", " print('Video available at https://youtube.com/watch?v=' + video.id)\n", " display(video)\n", "\n", "out = widgets.Tab([out1, out2])\n", "out.set_title(0, 'Youtube')\n", "out.set_title(1, 'Bilibili')\n", "\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Setup" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "execution": {} }, "outputs": [], "source": [ "# Imports\n", "import numpy as np\n", "import matplotlib.pyplot as plt" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Figure Settings\n", "import ipywidgets as widgets # interactive display\n", "%config InlineBackend.figure_format = 'retina'\n", "plt.style.use(\"https://raw.githubusercontent.com/NeuromatchAcademy/course-content/main/nma.mplstyle\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Plotting Functions\n", "\n", "def plot_data(X):\n", " \"\"\"\n", " Plots bivariate data. Includes a plot of each random variable, and a scatter\n", " plot of their joint activity. The title indicates the sample correlation\n", " calculated from the data.\n", "\n", " Args:\n", " X (numpy array of floats) : Data matrix each column corresponds to a\n", " different random variable\n", "\n", " Returns:\n", " Nothing.\n", " \"\"\"\n", "\n", " fig = plt.figure(figsize=[8, 4])\n", " gs = fig.add_gridspec(2, 2)\n", " ax1 = fig.add_subplot(gs[0, 0])\n", " ax1.plot(X[:, 0], color='k')\n", " plt.ylabel('Neuron 1')\n", " plt.title('Sample var 1: {:.1f}'.format(np.var(X[:, 0])))\n", " ax1.set_xticklabels([])\n", " ax2 = fig.add_subplot(gs[1, 0])\n", " ax2.plot(X[:, 1], color='k')\n", " plt.xlabel('Sample Number')\n", " plt.ylabel('Neuron 2')\n", " plt.title('Sample var 2: {:.1f}'.format(np.var(X[:, 1])))\n", " ax3 = fig.add_subplot(gs[:, 1])\n", " ax3.plot(X[:, 0], X[:, 1], '.', markerfacecolor=[.5, .5, .5],\n", " markeredgewidth=0)\n", " ax3.axis('equal')\n", " plt.xlabel('Neuron 1 activity')\n", " plt.ylabel('Neuron 2 activity')\n", " plt.title('Sample corr: {:.1f}'.format(np.corrcoef(X[:, 0], X[:, 1])[0, 1]))\n", " plt.show()\n", "\n", "\n", "def plot_basis_vectors(X, W):\n", " \"\"\"\n", " Plots bivariate data as well as new basis vectors.\n", "\n", " Args:\n", " X (numpy array of floats) : Data matrix each column corresponds to a\n", " different random variable\n", " W (numpy array of floats) : Square matrix representing new orthonormal\n", " basis each column represents a basis vector\n", "\n", " Returns:\n", " Nothing.\n", " \"\"\"\n", "\n", " plt.figure(figsize=[4, 4])\n", " plt.plot(X[:, 0], X[:, 1], '.', color=[.5, .5, .5], label='Data')\n", " plt.axis('equal')\n", " plt.xlabel('Neuron 1 activity')\n", " plt.ylabel('Neuron 2 activity')\n", " plt.plot([0, W[0, 0]], [0, W[1, 0]], color='r', linewidth=3,\n", " label='Basis vector 1')\n", " plt.plot([0, W[0, 1]], [0, W[1, 1]], color='b', linewidth=3,\n", " label='Basis vector 2')\n", " plt.legend()\n", " plt.show()\n", "\n", "\n", "def plot_data_new_basis(Y):\n", " \"\"\"\n", " Plots bivariate data after transformation to new bases.\n", " Similar to plot_data but with colors corresponding to projections onto\n", " basis 1 (red) and basis 2 (blue). The title indicates the sample correlation\n", " calculated from the data.\n", "\n", " Note that samples are re-sorted in ascending order for the first\n", " random variable.\n", "\n", " Args:\n", " Y (numpy array of floats): Data matrix in new basis each column\n", " corresponds to a different random variable\n", "\n", " Returns:\n", " Nothing.\n", " \"\"\"\n", " fig = plt.figure(figsize=[8, 4])\n", " gs = fig.add_gridspec(2, 2)\n", " ax1 = fig.add_subplot(gs[0, 0])\n", " ax1.plot(Y[:, 0], 'r')\n", " plt.xlabel\n", " plt.ylabel('Projection \\n basis vector 1')\n", " plt.title('Sample var 1: {:.1f}'.format(np.var(Y[:, 0])))\n", " ax1.set_xticklabels([])\n", " ax2 = fig.add_subplot(gs[1, 0])\n", " ax2.plot(Y[:, 1], 'b')\n", " plt.xlabel('Sample number')\n", " plt.ylabel('Projection \\n basis vector 2')\n", " plt.title('Sample var 2: {:.1f}'.format(np.var(Y[:, 1])))\n", " ax3 = fig.add_subplot(gs[:, 1])\n", " ax3.plot(Y[:, 0], Y[:, 1], '.', color=[.5, .5, .5])\n", " ax3.axis('equal')\n", " plt.xlabel('Projection basis vector 1')\n", " plt.ylabel('Projection basis vector 2')\n", " plt.title('Sample corr: {:.1f}'.format(np.corrcoef(Y[:, 0], Y[:, 1])[0, 1]))\n", " plt.show()" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "---\n", "# Section 1: Generate correlated multivariate data" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "cellView": "form", "execution": {} }, "outputs": [], "source": [ "# @title Video 2: Multivariate data\n", "from ipywidgets import widgets\n", "\n", "out2 = widgets.Output()\n", "with out2:\n", " from IPython.display import IFrame\n", " class BiliVideo(IFrame):\n", " def __init__(self, id, page=1, width=400, height=300, **kwargs):\n", " self.id=id\n", " src = 'https://player.bilibili.com/player.html?bvid={0}&page={1}'.format(id, page)\n", " super(BiliVideo, self).__init__(src, width, height, **kwargs)\n", "\n", " video = BiliVideo(id=\"BV1xz4y1D7ES\", width=854, height=480, fs=1)\n", " print('Video available at https://www.bilibili.com/video/{0}'.format(video.id))\n", " display(video)\n", "\n", "out1 = widgets.Output()\n", "with out1:\n", " from IPython.display import YouTubeVideo\n", " video = YouTubeVideo(id=\"jcTq2PgU5Vw\", width=854, height=480, fs=1, rel=0)\n", " print('Video available at https://youtube.com/watch?v=' + video.id)\n", " display(video)\n", "\n", "out = widgets.Tab([out1, out2])\n", "out.set_title(0, 'Youtube')\n", "out.set_title(1, 'Bilibili')\n", "\n", "display(out)" ] }, { "cell_type": "markdown", "metadata": { "execution": {} }, "source": [ "This video describes the covariance matrix and the multivariate normal distribution.\n", "\n", "