{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "60318732",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from pylab import *"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6ba6bd25",
   "metadata": {},
   "source": [
    "Markov Chains\n",
    "============="
   ]
  },
  {
   "cell_type": "markdown",
   "id": "633f59d2",
   "metadata": {},
   "source": [
    "Assume we have a state space consisting of $N$ states $0...N-1$.\n",
    "\n",
    "A _Markov chain_ is an infinite sequence of random states $X_i$ such that\n",
    "\n",
    "$$ P(X_{n+1}= x | X_1=x_1 ... X_n=x_n) = P(X_{n+1} =x | X_n=x_n) $$\n",
    "\n",
    "That is, the probability of each subsequent state only depends on the previous state."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1f86e523",
   "metadata": {},
   "source": [
    "There are many variations:\n",
    "\n",
    "- usually, the probability is independent of $n$ (_time homogeneous_)\n",
    "- the probability can depend on a bounded number of previous states giving a higher order Markov chain\n",
    "- the set of states can be countably infinite\n",
    "- the state index $n$ can be continuous (continuous time Markov process)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bf0f8e97",
   "metadata": {},
   "source": [
    "We represent time homogeneous Markov chains with a finite state space as a matrix of transition probabilities $M_{ij}$.\n",
    "\n",
    "Here, $P(j\\rightarrow i) = M_{ij}$ and $\\sum_i M_{ij} = 1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 50,
   "id": "561d1db0",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "M = rand(5,5)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 51,
   "id": "9e5b6238",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.85666173,  0.66606344,  2.39313239,  2.00524954,  3.69764657])"
      ]
     },
     "execution_count": 51,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sum(M,axis=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 82,
   "id": "646f87a0",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 1.,  1.,  1.,  1.,  1.])"
      ]
     },
     "execution_count": 82,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M /= sum(M,axis=0)[newaxis,:]\n",
    "sum(M,axis=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "21371379",
   "metadata": {},
   "source": [
    "We call $M$ a _transition matrix_ or a _stochastic matrix_.\n",
    "\n",
    "(Note that the matrix is often written the other way around in the literature.)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7dbc4dcd",
   "metadata": {},
   "source": [
    "We often draw a transition diagram; this shows the state and the probabilities of transitions between them:\n",
    "\n",
    "![transitions](http://upload.wikimedia.org/wikipedia/commons/thumb/2/2b/Markovkate_01.svg/220px-Markovkate_01.svg.png)\n",
    "\n",
    "![transitions](http://upload.wikimedia.org/wikipedia/commons/thumb/4/47/MarkovChain1.png/200px-MarkovChain1.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c41e181b",
   "metadata": {},
   "source": [
    "We can compute the probability of being in the next state given that we are in state $0$ with a matrix multiplication."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 83,
   "id": "46d4b479",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([ 0.01215787,  0.40654973,  0.16527515,  0.36084789,  0.05516936])"
      ]
     },
     "execution_count": 83,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "state = zeros(5)\n",
    "state[0] = 1\n",
    "dot (M,state)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cfd0bc71",
   "metadata": {},
   "source": [
    "Sampling from a Markov Chain\n",
    "============================="
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "009e73da",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "v = add.accumulate(M[:,0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "e5d48573",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def binsearch(a,x,lo,hi):\n",
    "    while lo<hi:\n",
    "        mid = (lo+hi)//2\n",
    "        v = a[mid]\n",
    "        if v<x: lo = mid+1\n",
    "        elif v>x: hi = mid\n",
    "        else: return mid\n",
    "    assert lo==hi\n",
    "    return lo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 55,
   "id": "dbd8fd85",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "3"
      ]
     },
     "execution_count": 55,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "binsearch(v,0.9,0,len(M))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "id": "d0572dc1",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def rsample(dist):\n",
    "    assert amin(dist)>=0.0 and amax(dist)<=1.0\n",
    "    v = add.accumulate(dist)\n",
    "    assert abs(v[-1]-1.0)<1e-6\n",
    "    val = rand()\n",
    "    result = binsearch(v,val,0,len(v))\n",
    "    return result\n",
    "    "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "id": "b41e66a6",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def sample_chain(M,state,N):\n",
    "    result = [state]\n",
    "    for i in range(N):\n",
    "        state = rsample(M[:,state])\n",
    "        result.append(state)\n",
    "    return result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 79,
   "id": "661b250c",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[0, 4, 3, 4, 3, 2, 0, 2, 2, 4, 1]"
      ]
     },
     "execution_count": 79,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sample_chain(M,0,10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "70c968d3",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(5, 5)"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "M.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9e5db3ce",
   "metadata": {},
   "source": [
    "n-grams and Markov Chains\n",
    "==========================\n",
    "\n",
    "An $n$-gram is a sequence of $n$ letters, syllables, words or other linguistic entity (usually words).\n",
    "\n",
    "An $n$-gram model is a model of the probability that $n$ words occur.\n",
    "\n",
    "Usually, $n$-gram models are expressed as a conditional probability, given a very long sequence\n",
    "of words $w_1 ... w_L$:\n",
    "\n",
    "$$P(w_i | w_(i-1)... w_i-(n-1)$$\n",
    "\n",
    "Sometimes, it might alwo be expressed as a joint probability:\n",
    "\n",
    "$$P(w_i ... w_i-(n-1)$$\n",
    "\n",
    "(Boundary conditions are handled by adding special symbols to \"fill things up\".)\n",
    "\n",
    "An $n$-gram model defines a Markov chain.\n",
    "\n",
    "Google has released a large $n$-gram model for English and other languages, and you can experiment with it here:\n",
    "\n",
    "http://books.google.com/ngrams/"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 98,
   "id": "17020959",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "s = \"\"\"Humpty Dumpty sat on a wall,\n",
    "Humpty Dumpty had a great fall;\n",
    "All the King's horses and all the King's men\n",
    "Couldn't put Humpty together again\"\"\".lower()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 99,
   "id": "1fc7c9a7",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['humpty', 'dumpty', 'sat', 'on', 'a', 'wall', 'humpty', 'dumpty', 'had', 'a']"
      ]
     },
     "execution_count": 99,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import re\n",
    "words = re.split(r'\\W+',s)\n",
    "words[:10]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 100,
   "id": "1818d9ce",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from collections import Counter\n",
    "bigrams = Counter()\n",
    "for i in range(len(words)-1):\n",
    "    bigrams[tuple(words[i:i+2])] += 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 101,
   "id": "d05a1b40",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Counter({('the', 'king'): 2, ('king', 's'): 2, ('all', 'the'): 2, ('humpty', 'dumpty'): 2, ('horses', 'and'): 1, ('humpty', 'together'): 1, ('couldn', 't'): 1, ('on', 'a'): 1, ('s', 'men'): 1, ('t', 'put'): 1, ('s', 'horses'): 1, ('had', 'a'): 1, ('wall', 'humpty'): 1, ('together', 'again'): 1, ('a', 'wall'): 1, ('a', 'great'): 1, ('men', 'couldn'): 1, ('great', 'fall'): 1, ('put', 'humpty'): 1, ('fall', 'all'): 1, ('dumpty', 'sat'): 1, ('sat', 'on'): 1, ('and', 'all'): 1, ('dumpty', 'had'): 1})"
      ]
     },
     "execution_count": 101,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "bigrams"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 106,
   "id": "7dffb429",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[(('a', 'great'), 1),\n",
       " (('a', 'wall'), 1),\n",
       " (('all', 'the'), 2),\n",
       " (('and', 'all'), 1),\n",
       " (('couldn', 't'), 1),\n",
       " (('dumpty', 'had'), 1),\n",
       " (('dumpty', 'sat'), 1),\n",
       " (('fall', 'all'), 1),\n",
       " (('great', 'fall'), 1),\n",
       " (('had', 'a'), 1),\n",
       " (('horses', 'and'), 1),\n",
       " (('humpty', 'dumpty'), 2),\n",
       " (('humpty', 'together'), 1),\n",
       " (('king', 's'), 2),\n",
       " (('men', 'couldn'), 1),\n",
       " (('on', 'a'), 1),\n",
       " (('put', 'humpty'), 1),\n",
       " (('s', 'horses'), 1),\n",
       " (('s', 'men'), 1),\n",
       " (('sat', 'on'), 1),\n",
       " (('t', 'put'), 1),\n",
       " (('the', 'king'), 2),\n",
       " (('together', 'again'), 1),\n",
       " (('wall', 'humpty'), 1)]"
      ]
     },
     "execution_count": 106,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sorted(bigrams.items())"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "855a5a9a",
   "metadata": {},
   "source": [
    "So, we see that:\n",
    "\n",
    "$P( \\hbox{great} | \\hbox{a}) = 0.5$\n",
    "\n",
    "$P( \\hbox{wall} | \\hbox{a}) = 0.5$\n",
    "\n",
    "$P( \\hbox{(anything else)} | \\hbox{a}) = 0.0$"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "745f1fb9",
   "metadata": {},
   "source": [
    "Note that there are many pairs of words (even in this restricted vocabulary) that we don't see, even though\n",
    "their probability isn't zero.\n",
    "\n",
    "The process of fixing this is called _smoothing_.\n",
    "\n",
    "A simple mechanism is _pseudocounts_: here, we assume that anything that hasn't occurred has actually occurred\n",
    "some small number of times (1, 0.5, or smaller).\n",
    "\n",
    "Alternatively, we can compute $n$-grams for different $n$ and linearly interpolate."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8ff3063",
   "metadata": {},
   "source": [
    "Note that $n$-grams for $n>2$ give rise to higher order Markov chains.\n",
    "\n",
    "We can transform higher order Markov chains into first order Markov chains by associating states\n",
    "with each context of $n-1$ words.  However, then the emitted symbol differs from the state label."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f97d82dc",
   "metadata": {},
   "source": [
    "State Distributions and Steady State\n",
    "=============="
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9af72a9c",
   "metadata": {},
   "source": [
    "Assume we are given a vector of probabilities $p$ of states.\n",
    "\n",
    "The probability distribution after one state transition is just the product $M\\cdot p$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "id": "945c1e5d",
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "p = zeros(5)\n",
    "p[0] = 1.0"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 81,
   "id": "069d11ad",
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[ 1.  0.  0.  0.  0.]\n",
      "[ 0.01215787  0.40654973  0.16527515  0.36084789  0.05516936]\n",
      "[ 0.06709196  0.0425824   0.21430012  0.36536528  0.31066025]\n",
      "[ 0.11259592  0.11789572  0.24105906  0.240486    0.28796329]\n",
      "[ 0.10613046  0.13175719  0.23142557  0.25488324  0.27580354]\n",
      "[ 0.10373109  0.12629671  0.23029253  0.26201374  0.27766592]\n",
      "[ 0.10424015  0.12572506  0.23096495  0.26074477  0.27832507]\n",
      "[ 0.1043648   0.12608833  0.23100254  0.26036384  0.27818048]\n",
      "[ 0.1043276   0.12610693  0.23095915  0.26046004  0.27814629]\n",
      "[ 0.10432159  0.12608359  0.23095882  0.26047919  0.27815681]"
     ]
    }
   ],
   "source": [
    "for i in range(10):\n",
    "    print p\n",
    "    p = dot(M,p)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e6f80e92",
   "metadata": {},
   "source": [
    "This should look familiar: after a lot of applications of the matrix, it converges to a _steady state_ distribution.\n",
    "\n",
    "This steady state distribution is an _eigenvector_ of the matrix."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5aea0eed",
   "metadata": {},
   "source": [
    "There are a number of concepts about Markov chains that are important:\n",
    "\n",
    "- a state $j$ is _accessible_ from a state $i$ if there is some path connecting the two states with non-zero probability\n",
    "- two states _communicate_ if they are accessible from each other\n",
    "- a Markov chain is _irreducible_ if all states are accessible to each other\n",
    "- a state is _transient_ if there is a non-zero probability that we never return to it\n",
    "- a state is _recurrent_ if it is not transient\n",
    "- a state is _positive recurrent_ if the expected recurrence time is finite\n",
    "- a state is _periodic_ with period $k$ if we are guaranteed to return to it exactly every $k$ steps\n",
    "- a state is _absorbing_ if there are no transitions out of it\n",
    "- a Markov chain is _absorbing_ if every state can reach an absorbing state\n",
    "- a state is _ergodic_ if it is positive recurrent and not periodic\n",
    "\n",
    "Can we construct matrices illustrating these concepts?"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "629db931",
   "metadata": {},
   "source": [
    "Reversible Markov Chains\n",
    "========================\n",
    "\n",
    "A Markov chain is _reversible_ if there is some probability distribution $p$ such that \n",
    "\n",
    "$$p \\cdot M = M \\cdot p$$\n",
    "\n",
    "Note that $p \\cdot M$ is the Markov chain running in reverse.\n",
    "\n",
    "This means that if $p$ is the _steady state distribution_, \n",
    "running the Markov chain forwards or backwards gives the same result.\n",
    "\n",
    "Reversible Markov chains are very important for modern pattern recognition, statistics,\n",
    "physics, and simulations.\n",
    "They are used in Gibbs samplers and the Metropolis algorithm for probabilistic decision making."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "18ddbefe",
   "metadata": {},
   "source": [
    "Hidden Markov Models\n",
    "======================\n",
    "\n",
    "In Markov chains, \n",
    "\n",
    "- the state sequence is what we are directly interested in\n",
    "- we can \"observe\" the state sequence directly\n",
    "\n",
    "In _Hidden Markov Models_...\n",
    "\n",
    "- states are associated with actions or observations\n",
    "- often we can only observe the actions or observations and want to infer the state\n",
    "\n",
    "That is, each state $x$ _emits_ one symbol $y$ from a set of $k$ symbols.\n",
    "\n",
    "Extremely widely used:\n",
    "\n",
    "- speech recognition: hidden state is part-of-phoneme, observation is acoustic signal\n",
    "- handwriting recognition: hidden state is part-of-handwriting, observation is pen position\n",
    "- bioinformatics: hidden state is part-of-gene, observation is DNA \"letter\"\n",
    "- natural language processing: hidden state is part-of-speech, observation is word\n",
    "\n",
    "![hmm-example](http://upload.wikimedia.org/wikipedia/commons/thumb/4/43/HMMGraph.svg/400px-HMMGraph.svg.png)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "85d62bd1",
   "metadata": {},
   "source": [
    "To specify an HMM, we need:\n",
    "\n",
    "- sets of states, a symbol alphabet\n",
    "- a matrix of transition probabilities between states\n",
    "- a matrix of emission probabilities, one for each state"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d2d98b9f",
   "metadata": {},
   "source": [
    "HMM Algorithms\n",
    "===============\n",
    "\n",
    "Usually, we are only given a training set of observations (outputs) $Y_1...Y_M$, \n",
    "but may also be given a model (state transition matrix, emission probabilities).\n",
    "\n",
    "There are several different things we may want to compute:\n",
    "\n",
    "- given a model $M$ and a sequence of observations $Y$\n",
    "  - determine $P_M(Y)$\n",
    "  - determine the most likely state sequence $X$ (most likely explanation)\n",
    "  - determine the probability distribution of the next state (filtering)\n",
    "  - determine the probability distribution for state $X_k$ (smoothing)\n",
    "\n",
    "- given a set of training observations $Y$ and constraints on the model\n",
    "  - find a model that assigns the overall highest likelihood to the training data\n",
    "\n",
    "Why do we need constraints?  Is there a trivial example?"
   ]
  }
 ],
 "metadata": {},
 "nbformat": 4,
 "nbformat_minor": 5
}