{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Supervised Learning (Classification)\n",
    "\n",
    "In supervised learning, the task is to infer hidden structure from\n",
    "labeled data, comprised of training examples $\\{(x_n, y_n)\\}$.\n",
    "Classification means the output $y$ takes discrete values.\n",
    "\n",
    "We demonstrate with an example in Edward. A webpage version is available at\n",
    "http://edwardlib.org/tutorials/supervised-classification."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "import edward as ed\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "from edward.models import Bernoulli, MultivariateNormalTriL, Normal\n",
    "from edward.util import rbf\n",
    "from observations import crabs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "\n",
    "Use the\n",
    "[crabs data set](https://stat.ethz.ch/R-manual/R-devel/library/MASS/html/crabs.html),\n",
    "which consists of morphological measurements on a crab species. We\n",
    "are interested in predicting whether a given crab has the color form\n",
    "blue (encoded as 0) or orange (encoded as 1). We use all the numeric features\n",
    "in the dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Number of data points: 100\n",
      "Number of features: 5\n"
     ]
    }
   ],
   "source": [
    "ed.set_seed(42)\n",
    "\n",
    "data, metadata = crabs(\"~/data\")\n",
    "X_train = data[:100, 3:]\n",
    "y_train = data[:100, 1]\n",
    "\n",
    "N = X_train.shape[0]  # number of data points\n",
    "D = X_train.shape[1]  # number of features\n",
    "\n",
    "print(\"Number of data points: {}\".format(N))\n",
    "print(\"Number of features: {}\".format(D))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model\n",
    "\n",
    "A Gaussian process is a powerful object for modeling nonlinear\n",
    "relationships between pairs of random variables. It defines a distribution over\n",
    "(possibly nonlinear) functions, which can be applied for representing\n",
    "our uncertainty around the true functional relationship.\n",
    "Here we define a Gaussian process model for classification\n",
    "(Rasumussen & Williams, 2006).\n",
    "\n",
    "Formally, a distribution over functions $f:\\mathbb{R}^D\\to\\mathbb{R}$ can be specified\n",
    "by a Gaussian process\n",
    "$$\n",
    "\\begin{align*}\n",
    "  p(f)\n",
    "  &=\n",
    "  \\mathcal{GP}(f\\mid \\mathbf{0}, k(\\mathbf{x}, \\mathbf{x}^\\prime)),\n",
    "\\end{align*}\n",
    "$$\n",
    "whose mean function is the zero function, and whose covariance\n",
    "function is some kernel which describes dependence between\n",
    "any set of inputs to the function.\n",
    "\n",
    "Given a set of input-output pairs\n",
    "$\\{\\mathbf{x}_n\\in\\mathbb{R}^D,y_n\\in\\mathbb{R}\\}$,\n",
    "the likelihood can be written as a multivariate normal\n",
    "\n",
    "\\begin{align*}\n",
    "  p(\\mathbf{y})\n",
    "  &=\n",
    "  \\text{Normal}(\\mathbf{y} \\mid \\mathbf{0}, \\mathbf{K})\n",
    "\\end{align*}\n",
    "\n",
    "where $\\mathbf{K}$ is a covariance matrix given by evaluating\n",
    "$k(\\mathbf{x}_n, \\mathbf{x}_m)$ for each pair of inputs in the data\n",
    "set.\n",
    "\n",
    "The above applies directly for regression where $\\mathbb{y}$ is a\n",
    "real-valued response, but not for (binary) classification, where $\\mathbb{y}$\n",
    "is a label in $\\{0,1\\}$. To deal with classification, we interpret the\n",
    "response as latent variables which is squashed into $[0,1]$. We then\n",
    "draw from a Bernoulli to determine the label, with probability given\n",
    "by the squashed value.\n",
    "\n",
    "Define the likelihood of an observation $(\\mathbf{x}_n, y_n)$ as\n",
    "\n",
    "\\begin{align*}\n",
    "  p(y_n \\mid \\mathbf{z}, x_n)\n",
    "  &=\n",
    "  \\text{Bernoulli}(y_n \\mid \\text{logit}^{-1}(\\mathbf{x}_n^\\top \\mathbf{z})).\n",
    "\\end{align*}\n",
    "\n",
    "Define the prior to be a multivariate normal\n",
    "\n",
    "\\begin{align*}\n",
    "  p(\\mathbf{z})\n",
    "  &=\n",
    "  \\text{Normal}(\\mathbf{z} \\mid \\mathbf{0}, \\mathbf{K}),\n",
    "\\end{align*}\n",
    "\n",
    "with covariance matrix given as previously stated.\n",
    "\n",
    "Let's build the model in Edward. We use a radial basis function (RBF)\n",
    "kernel, also known as the squared exponential or exponentiated\n",
    "quadratic. It returns the kernel matrix evaluated over all pairs of\n",
    "data points; we then Cholesky decompose the matrix to parameterize the\n",
    "multivariate normal distribution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "X = tf.placeholder(tf.float32, [N, D])\n",
    "f = MultivariateNormalTriL(loc=tf.zeros(N), scale_tril=tf.cholesky(rbf(X)))\n",
    "y = Bernoulli(logits=f)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, we define a placeholder `X`. During inference, we pass in\n",
    "the value for this placeholder according to data."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inference\n",
    "\n",
    "Perform variational inference.\n",
    "Define the variational model to be a fully factorized normal."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "qf = Normal(loc=tf.get_variable(\"qf/loc\", [N]),\n",
    "            scale=tf.nn.softplus(tf.get_variable(\"qf/scale\", [N])))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Run variational inference for `5000` iterations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "5000/5000 [100%] ██████████████████████████████ Elapsed: 9s | Loss: 78.369\n"
     ]
    }
   ],
   "source": [
    "inference = ed.KLqp({f: qf}, data={X: X_train, y: y_train})\n",
    "inference.run(n_iter=5000)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this case\n",
    "`KLqp` defaults to minimizing the\n",
    "$\\text{KL}(q\\|p)$ divergence measure using the reparameterization\n",
    "gradient.\n",
    "For more details on inference, see the [$\\text{KL}(q\\|p)$ tutorial](/tutorials/klqp).\n",
    "(This example happens to be slow because evaluating and inverting full\n",
    "covariances in Gaussian processes happens to be slow.)"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}