{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Dynamic Models\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "\n", "### Preliminaries\n", "\n", "- Goal \n", " - Introduction to dynamic (=temporal) Latent Variable Models, including the Hidden Markov Model and Kalman filter. \n", "- Materials\n", " - Mandatory\n", " - These lecture notes\n", " - Optional \n", " - Bishop pp.605-615 on Hidden Markov Models\n", " - Bishop pp.635-641 on Kalman filters\n", " - Faragher (2012), [Understanding the Basis of the Kalman Filter](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Faragher-2012-Understanding-the-Basis-of-the-Kalman-Filter.pdf)\n", " - Minka (1999), [From Hidden Markov Models to Linear Dynamical Systems](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Minka-1999-from-HMM-to-LDS.pdf)\n", " " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Example Problem\n", "\n", "- We consider a one-dimensional cart position tracking problem, see [Faragher 2012](https://github.com/bertdv/BMLIP/blob/master/lessons/notebooks/files/Faragher-2012-Understanding-the-Basis-of-the-Kalman-Filter.pdf). \n", "\n", "- The hidden states are the position $z_t$ and velocity $\\dot z_t$. We can apply an external acceleration/breaking force $u_t$. (Noisy) observations are represented by $x_t$. \n", "\n", "- The equations of motions are given by\n", "\n", "$$\\begin{align}\n", "\\begin{bmatrix} z_t \\\\ \\dot{z_t}\\end{bmatrix} &= \\begin{bmatrix} 1 & \\Delta t \\\\ 0 & 1\\end{bmatrix} \\begin{bmatrix} z_{t-1} \\\\ \\dot z_{t-1}\\end{bmatrix} + \\begin{bmatrix} (\\Delta t)^2/2 \\\\ \\Delta t\\end{bmatrix} u_t + \\mathcal{N}(0,\\Sigma_z) \\\\\n", "x_t &= \\begin{bmatrix} z_t \\\\ \\dot{z_t}\\end{bmatrix} + \\mathcal{N}(0,\\Sigma_x) \n", "\\end{align}$$\n", "\n", "- Task: Infer the position $z_t$ after 10 time steps. (Solution later in this lesson).\n", "\n", "
\n" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### Dynamical Models\n", "\n", "- In this lesson, we consider models where the sequence order of observations matters. \n", "\n", "- Consider the _ordered_ observation sequence $x^T \\triangleq \\left(x_1,x_2,\\ldots,x_T\\right)$.\n", " - (For brevity, in this lesson we use the notation $x_t^T$ to denote $(x_t,x_{t+1},\\ldots,x_T)$ and drop the subscript if $t=1$, so $x^T = x_1^T = \\left(x_1,x_2,\\ldots,x_T\\right)$)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- We wish to develop a generative model\n", " $$ p( x^T )$$\n", "that 'explains' the time series $x^T$. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- We cannot use the IID assumption $p( x^T ) = \\prod_t p(x_t )$. In general, we _can_ use the **chain rule** (a.k.a. **the general product rule**)\n", "\n", "$$\\begin{align*}\n", "p(x^T) &= p(x_T|x^{T-1}) \\,p(x^{T-1}) \\\\\n", " &= p(x_T|x^{T-1}) \\,p(x_{T-1}|x^{T-2}) \\cdots p(x_2|x_1)\\,p(x_1) \\\\\n", " &= p(x_1)\\prod_{t=2}^T p(x_t\\,|\\,x^{t-1})\n", "\\end{align*}$$" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Generally, we will want to limit the depth of dependencies on previous observations. For example, a $K$th-order linear **Auto-Regressive** (AR) model that is given by\n", "$$\\begin{align*}\n", " p(x_t\\,|\\,x^{t-1}) = \\mathcal{N}\\left(x_t \\,\\middle|\\, \\sum_{k=1}^K a_k x_{t-k}\\,,\\sigma^2\\,\\right) \n", "\\end{align*}$$\n", "limits the dependencies to the past $K$ samples." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### State-space Models\n", "\n", "- A limitation of AR models is that they need a lot of parameters in order to create a flexible model. E.g., if $x_t$ is an $M$-dimensional discrete variable, then a $K$th-order AR model will have about $M^{K}$ parameters. " ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- Similar to our work on Gaussian Mixture models, we can create a flexible dynamic system by introducing _latent_ (unobserved) variables $z^T \\triangleq \\left(z_1,z_2,\\dots,z_T\\right)$ (one $z_t$ for each observation $x_t$). In dynamic systems, $z_t$ are called _state variables_." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- A **state space model** is a particular latent variable dynamical model defined by\n", "$$\\begin{align*}\n", " p(x^T,z^T) &= \\underbrace{p(z_1)}_{\\text{initial state}} \\prod_{t=2}^T \\underbrace{p(z_t\\,|\\,z_{t-1})}_{\\text{state transitions}}\\,\\prod_{t=1}^T \\underbrace{p(x_t\\,|\\,z_t)}_{\\text{observations}}\n", "\\end{align*}$$\n", " - The condition $p(z_t\\,|\\,z^{t-1}) = p(z_t\\,|\\,z_{t-1})$ is called a $1$st-order Markov condition.\n", "\n", "" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "- The Forney-style factor graph for a state-space model:\n", "\n", "