{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# SYDE 556/750: Simulating Neurobiological Systems\n", "\n", "\n", "## Lecture 10: Learning" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## Learning\n", "\n", "- What do we mean by learning?\n", " - When we use an integrator to keep track of location, is that learning?\n", " - What about the learning used to complete a pattern in the Raven's Progressive Matrices task?\n", " - Neither of these require any connection weights to change in the model\n", " - But both allow future performance to be affected by past performance\n", " - I suggest the term 'adaptation' to capture all such future-affected-by-past phenomena\n", "\n", "- So, we'll stick with a simple definition of learning\n", " - Changing connection weights between groups of neurons" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Why might we want to change connection weights?\n", "- This is what traditional neural network approaches do\n", " - Change connection weights until it performs the desired task\n", " - Once it's doing the task, stop changing the weights\n", "- But we have a method for just solving for the optimal connection weights\n", " - So why bother learning?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### Why learning might be useful\n", "\n", "- We might not know the function at the beginning of the task\n", " - Example: a creature explores its environment and learns that eating red objects is bad, but eating green objects is good\n", " - what are the inputs and outputs here?\n", "- The desired function might change\n", " - Example: an ensemble whose input is a desired hand position, but the output is the muscle tension (or joint angles) needed to get there\n", " - why would this change?\n", "- The optimal weights we solve for might not be optimal\n", " - How could they not be optimal?\n", " - What assumptions are we making?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### The simplest approach\n", "\n", "- What's the easiest way to deal with this, given what we know?\n", "- If we need new decoders\n", " - Let's solve for them while the model's running\n", " - Gather data to build up our $\\Gamma$ and $\\Upsilon$ matrices" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "\n", "- Example: eating red but not green objects\n", " - Decoder from state to $Q$ value (utility of action) for eating\n", " - State is some high-dimensional vector that includes the colour of what we're looking for\n", " - And probably some other things, like whether it's small enough to be eaten\n", " - Initially doesn't use colour to get output\n", " - But we might experience a few bad outcomes after red, and good after green\n", " - These become new $x$ samples, with corresponding $f(x)$ outputs\n", " - Gather a few, recompute decoder\n", " - Could even do this after every timestep\n", "- Example: converting hand position to muscle commands\n", " - Send random signals to muscles\n", " - Observe hand position\n", " - Use that to train decoders\n", "- Example: going from optimal to even more optimal\n", " - As the model runs, we gather $x$ values \n", " - Recompute decoder for those $x$ values\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "### What's wrong with this approach\n", "\n", "- Feels like cheating\n", "- Why?\n", "- Two kinds of problems:\n", " - Not biologically realistic\n", " - How are neurons supposed to do all this?\n", " - store data\n", " - solve decoders\n", " - timing\n", " - Computationally expensive\n", " - Even if we're not worried about realism" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Traditional neural networks\n", "\n", "- Traditionally, learning is the main method of constructing a model network\n", "- Usually incremental learning (gradient descent)\n", " - As you get examples, shift the connection weights slightly based on that example\n", " - Don't have to consider all the data when making an update\n", "- Example: Perceptron learning (1957)\n", " - $\\Delta w_j = \\alpha(y_d - y)x_i$\n", "\n", "\n", "\n", "- Problems with perceptron\n", " - Can't do all possible functions\n", " - Effectively just linear functions of $x$ (with a threshold; i.e. a linear classifier)\n", " - Is that a problem (X)OR not?\n", " \n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Backprop and the NEF\n", "\n", "- How are nonlinear functions included?\n", " - Multiple layers\n", " \n", "\n", " \n", "- But now a new rule is needed\n", " - Standard answer: backprop\n", " - Same as perceptron for first (output) layer\n", " - Backprop adds: Estimate correct \"hidden layer\" input, and repeat\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- What would this be in NEF terms?\n", "- Remember that we're already fine with linear decoding\n", " - Encoders (and $\\alpha$ and $J^{bias}$) are input layer of weights, decoders are output layer\n", " - Note that in the NEF, we combine many of these layers together\n", "- We can just use the standard perceptron rule for decoders\n", " - As long as there are lots of neurons, and we've initialized them well with the desired intercepts, maximum rates, and encoders we should be able to decode lots of functions\n", " - So, what might backprop add to that?\n", " - Think about encoders" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Biologically realistic perceptron learning\n", "\n", "- [(MacNeil & Eliasmith, 2011)](http://compneuro.uwaterloo.ca/publications/macneil2011.html) derive a simple, plausible learning rule starting with a delta rule\n", "- $E = 1/2 \\int (x-\\hat{x})^2 dx$\n", "- $\\delta E/\\delta d_i = (x-\\hat{x})a_i$ (as usual for finding decoders)\n", "- So, to move down the gradient:\n", " - $\\Delta d_i = -\\kappa (x - \\hat{x})a_i$ (NEF notation)\n", " - $\\Delta d_i = \\kappa (y_d - y)a_i$ (the standard perceptron/delta rule)\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- How do we make it realistic?\n", "- Decoders don't exist in the brain\n", " - Need weights\n", "- The NEF tells us:\n", " - $\\omega_{ij} = \\alpha_j d_i \\cdot e_j$\n", " - $\\Delta \\omega_{ij} = \\alpha_j \\kappa (y_d - y)a_i \\cdot e_j$\n", "- Let's write $(y_d - y)$ as $E$ (for error)\n", " - $\\Delta \\omega_{ij} = \\alpha_j \\kappa a_i E \\cdot e_j$\n", " - $\\Delta \\omega_{ij} = \\kappa a_i (\\alpha_j E \\cdot e_j)$\n", "- What's $\\alpha_j E \\cdot e_j$?\n", " - That's the current that this neuron would get if it had $E$ as an input\n", " - But we don't want this current to drive the neuron\n", " - Rather, we want it to change the weight\n", " - It's a *modulatory* input\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- This is the \"Prescribed Error Sensitivity\" PES rule\n", " - Any model in the NEF could use this instead of computing decoders\n", " - Requires some other neural group computing the error $E$\n", " - Used in Spaun for Q-value learning (reinforcement task)\n", " - Can even be used to learn circular convolution\n", " - Only demonstrated up to 3 dimensions in [(Bekolay et al, 2013)](http://compneuro.uwaterloo.ca/publications/bekolay2013.html)\n", " - Why not more? Patience.\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "- Is this realistic?\n", " - Local information only\n", " - Need an error signal\n", " - Does it look like anything like this happens in the brain?\n", " - Yes\n", " - Retinal slip error is computed in oculomotor system\n", " - Dopamine seems to act as prediction error \n", " - Weight changes proportional to pre-synaptic activity and post-synaptic activity (Hebbian rule)" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Populating the interactive namespace from numpy and matplotlib\n" ] }, { "data": { "text/html": [ "\n", "