{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Latent Space Models for Neural Data\n",
    "\n",
    "Many scientific fields involve the study of network data, including\n",
    "social networks, networks in statistical physics, biological\n",
    "networks, and information networks\n",
    "(Goldenberg, Zheng, Fienberg, & Airoldi, 2010; Newman, 2010).\n",
    "\n",
    "What we can learn about nodes in a network from their connectivity patterns?\n",
    "We can begin to study this using a latent space model (Hoff, Raftery, & Handcock, 2002).\n",
    "Latent space models embed nodes in the network in a latent space,\n",
    "where the likelihood of forming an edge between two nodes depends on\n",
    "their distance in the latent space.\n",
    "\n",
    "We will analyze network data from neuroscience.\n",
    "A webpage version is available at\n",
    "http://edwardlib.org/tutorials/latent-space-models."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from __future__ import absolute_import\n",
    "from __future__ import division\n",
    "from __future__ import print_function\n",
    "\n",
    "import edward as ed\n",
    "import numpy as np\n",
    "import tensorflow as tf\n",
    "\n",
    "from edward.models import Normal, Poisson\n",
    "from observations import celegans"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data\n",
    "\n",
    "The data comes from [Mark Newman's repository](http://www-personal.umich.edu/~mejn/netdata/).\n",
    "It is a weighted, directed network representing the neural network of\n",
    "the nematode\n",
    "[C. Elegans](https://en.wikipedia.org/wiki/Caenorhabditis_elegans)\n",
    "compiled by Watts & Strogatz (1998) using experimental data\n",
    "by White, Southgate, Thomson, & Brenner (1986).\n",
    "\n",
    "The neural network consists of around $300$ neurons. Each connection\n",
    "between neurons\n",
    "is associated with a weight (positive integer) capturing the strength\n",
    "of the connection.\n",
    "\n",
    "First, we load the data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "x_train = celegans(\"~/data\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Model\n",
    "\n",
    "What can we learn about the neurons from their connectivity patterns? Using\n",
    "a latent space model (Hoff et al., 2002), we will learn a latent\n",
    "embedding for each neuron to capture the similarities between them.\n",
    "\n",
    "Each neuron $n$ is a node in the network and is associated with a latent\n",
    "position $z_n\\in\\mathbb{R}^K$.\n",
    "We place a Gaussian prior on each of the latent positions.\n",
    "\n",
    "The log-odds of an edge between node $i$ and\n",
    "$j$ is proportional to the Euclidean distance between the latent\n",
    "representations of the nodes $|z_i- z_j|$. Here, we\n",
    "model the weights ($Y_{ij}$) of the edges with a Poisson likelihood.\n",
    "The rate is the reciprocal of the distance in latent space. The\n",
    "generative process is as follows:\n",
    "\n",
    "1. \n",
    "For each node $n=1,\\ldots,N$,\n",
    "\\begin{align}\n",
    "z_n \\sim N(0,I).\n",
    "\\end{align}\n",
    "2. \n",
    "For each edge $(i,j)\\in\\{1,\\ldots,N\\}\\times\\{1,\\ldots,N\\}$,\n",
    "\\begin{align}\n",
    "Y_{ij} \\sim \\text{Poisson}\\Bigg(\\frac{1}{|z_i - z_j|}\\Bigg).\n",
    "\\end{align}\n",
    "\n",
    "In Edward, we write the model as follows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "N = x_train.shape[0]  # number of data points\n",
    "K = 3  # latent dimensionality\n",
    "\n",
    "z = Normal(loc=tf.zeros([N, K]), scale=tf.ones([N, K]))\n",
    "\n",
    "# Calculate N x N distance matrix.\n",
    "# 1. Create a vector, [||z_1||^2, ||z_2||^2, ..., ||z_N||^2], and tile\n",
    "# it to create N identical rows.\n",
    "xp = tf.tile(tf.reduce_sum(tf.pow(z, 2), 1, keep_dims=True), [1, N])\n",
    "# 2. Create a N x N matrix where entry (i, j) is ||z_i||^2 + ||z_j||^2\n",
    "# - 2 z_i^T z_j.\n",
    "xp = xp + tf.transpose(xp) - 2 * tf.matmul(z, z, transpose_b=True)\n",
    "# 3. Invert the pairwise distances and make rate along diagonals to\n",
    "# be close to zero.\n",
    "xp = 1.0 / tf.sqrt(xp + tf.diag(tf.zeros(N) + 1e3))\n",
    "\n",
    "x = Poisson(rate=xp)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Inference\n",
    "\n",
    "Maximum a posteriori (MAP) estimation is simple in Edward. Two lines are\n",
    "required: Instantiating inference and running it."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "inference = ed.MAP([z], data={x: x_train})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See this extended tutorial about\n",
    "[MAP estimation in Edward](http://edwardlib.org/tutorials/map).\n",
    "\n",
    "One could instead run variational inference. This requires specifying\n",
    "a variational model and instantiating `KLqp`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Alternatively, run\n",
    "# qz = Normal(loc=tf.get_variable(\"qz/loc\", [N * K]),\n",
    "#             scale=tf.nn.softplus(tf.get_variable(\"qz/scale\", [N * K])))\n",
    "# inference = ed.KLqp({z: qz}, data={x: x_train})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "See this extended tutorial about\n",
    "[variational inference in Edward](http://edwardlib.org/tutorials/variational-inference).\n",
    "\n",
    "Finally, the following line runs the inference procedure for 2500\n",
    "iterations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "2500/2500 [100%] ██████████████████████████████ Elapsed: 11s | Loss: 35984.855\n"
     ]
    }
   ],
   "source": [
    "inference.run(n_iter=2500)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Acknowledgments\n",
    "\n",
    "We thank Maja Rudolph for writing the initial version of this\n",
    "tutorial."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}