{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Network Tour of Data Science\n",
    "###       Xavier Bresson, Winter 2016/17\n",
    "## Assignment 3 : Recurrent Neural Networks"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Import libraries\n",
    "import tensorflow as tf\n",
    "import numpy as np\n",
    "import collections\n",
    "import os"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Text data: hello world! is a very simple program in most programming languages often used to illustrate the basic syntax of a programming language\n",
      "\n",
      "Single characters: ['x', 'w', 'a', 'b', '!', 'n', 'h', 'd', 'i', 's', 'v', 'y', 'e', ' ', 'r', 'u', 'g', 'f', 'c', 'l', 'm', 'p', 't', 'o']\n",
      "\n",
      "Text data has 135 characters, 24 unique.\n",
      "\n",
      "Mapping characters to numbers: {'x': 0, 's': 9, 'v': 10, 'y': 11, 't': 22, 'w': 1, 'e': 12, 'r': 14, 'u': 15, 'g': 16, 'f': 17, 'c': 18, 'l': 19, 'm': 20, 'b': 3, 'a': 2, ' ': 13, '!': 4, 'o': 23, 'n': 5, 'h': 6, 'p': 21, 'd': 7, 'i': 8}\n",
      "\n",
      "Mapping numbers to characters: {0: 'x', 1: 'w', 2: 'a', 3: 'b', 4: '!', 5: 'n', 6: 'h', 7: 'd', 8: 'i', 9: 's', 10: 'v', 11: 'y', 12: 'e', 13: ' ', 14: 'r', 15: 'u', 16: 'g', 17: 'f', 18: 'c', 19: 'l', 20: 'm', 21: 'p', 22: 't', 23: 'o'}\n"
     ]
    }
   ],
   "source": [
    "# Load text data\n",
    "data = open(os.path.join('datasets', 'text_ass_6.txt'), 'r').read() # must be simple plain text file\n",
    "print('Text data:',data)\n",
    "chars = list(set(data))\n",
    "print('\\nSingle characters:',chars)\n",
    "data_len, vocab_size = len(data), len(chars)\n",
    "print('\\nText data has %d characters, %d unique.' % (data_len, vocab_size))\n",
    "char_to_ix = { ch:i for i,ch in enumerate(chars) }\n",
    "ix_to_char = { i:ch for i,ch in enumerate(chars) }\n",
    "print('\\nMapping characters to numbers:',char_to_ix)\n",
    "print('\\nMapping numbers to characters:',ix_to_char)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Goal \n",
    "The goal is to define with TensorFlow a vanilla recurrent neural network (RNN) model:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "h_t &= \\textrm{tanh}(W_h h_{t-1} + W_x x_t + b_h)\\\\\n",
    "y_t &= W_y h_t + b_y\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "\n",
    "to predict a sequence of characters. $x_t \\in \\mathbb{R}^D$ is the input character of the RNN in a dictionary of size $D$. $y_t \\in \\mathbb{R}^D$ is the predicted character (through a distribution function) by the RNN system. $h_t \\in \\mathbb{R}^H$ is the memory of the RNN, called hidden state at time $t$. Its dimensionality is arbitrarly chosen to $H$. The variables of the system are $W_h \\in \\mathbb{R}^{H\\times H}$, $W_x \\in \\mathbb{R}^{H\\times D}$, $W_y \\in \\mathbb{R}^{D\\times H}$, $b_h \\in \\mathbb{R}^D$, and $b_y \\in \\mathbb{R}^D$. <br>\n",
    "\n",
    "The number of time steps of the RNN is $T$, that is we will learn a sequence of data of length $T$: $x_t$ for $t=0,...,T-1$."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "data_len= 135  batch_size= 3  batch_len= 45  T= 5  epoch_size= 8  D= 24\n"
     ]
    }
   ],
   "source": [
    "# hyperparameters of RNN\n",
    "batch_size = 3                                  # batch size\n",
    "batch_len = data_len // batch_size              # batch length\n",
    "T = 5                                           # temporal length\n",
    "epoch_size = (batch_len - 1) // T               # nb of iterations to get one epoch\n",
    "D = vocab_size                                  # data dimension = nb of unique characters\n",
    "H = 5*D                                         # size of hidden state, the memory layer\n",
    "\n",
    "print('data_len=',data_len,' batch_size=',batch_size,' batch_len=',\n",
    "      batch_len,' T=',T,' epoch_size=',epoch_size,' D=',D)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 1 \n",
    "Initialize input variables of the computational graph:<br>\n",
    "(1) Xin of size *batch_size x T x D* and type *tf.float32*. Each input character is encoded on a vector of size D.<br>\n",
    "(2) Ytarget of size *batch_size x T* and type *tf.int64*. Each target character is encoded by a value in {0,...,D-1}.<br>\n",
    "(3) hin of size *batch_size x H* and type *tf.float32*<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# input variables of computational graph (CG)\n",
    "Xin = tf.placeholder(tf.float32, [batch_size,T,D]); #print('Xin=',Xin) # Input\n",
    "Ytarget = tf.placeholder(tf.int64, [batch_size,T]); #print('Y_=',Y_) # target \n",
    "hin = tf.placeholder(tf.float32, [batch_size,H]); #print('hin=',hin.get_shape())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 2\n",
    "Define the variables of the computational graph:<br>\n",
    "(1) $W_x$ is a random variable of shape *D x H* with normal distribution of variance $\\frac{6}{D+H}$<br>\n",
    "(2) $W_h$ is an identity matrix multiplies by constant $0.01$<br>\n",
    "(3) $W_y$ is a random variable of shape *H x D* with normal distribution of variance $\\frac{6}{D+H}$<br>\n",
    "(4) $b_h$, $b_y$ are zero vectors of size *H*, and *D*<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Wx= (24, 120)\n",
      "Wh= (120, 120)\n",
      "Wy= (120, 24)\n",
      "bh= (120,)\n",
      "by= (24,)\n"
     ]
    }
   ],
   "source": [
    "# Model variables\n",
    "Wx = tf.Variable(tf.random_normal([D,H], stddev=tf.sqrt(6./tf.to_float(D+H)))); print('Wx=',Wx.get_shape())\n",
    "Wh = tf.Variable(0.01*np.identity(H, np.float32)); print('Wh=',Wh.get_shape())\n",
    "Wy = tf.Variable(tf.random_normal([H,D], stddev=tf.sqrt(6./tf.to_float(H+D)))); print('Wy=',Wy.get_shape())\n",
    "bh = tf.Variable(tf.zeros([H])); print('bh=',bh.get_shape())\n",
    "by = tf.Variable(tf.zeros([D])); print('by=',by.get_shape())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 3\n",
    "Implement the recursive formula:\n",
    "\n",
    "$$\n",
    "\\begin{aligned}\n",
    "h_t &= \\textrm{tanh}(W_h h_{t-1} + W_x x_t + b_h)\\\\\n",
    "y_t &= W_y h_t + b_y\n",
    "\\end{aligned}\n",
    "$$\n",
    "\n",
    "with $h_{t=0}=hin$.<br>\n",
    "\n",
    "Hints: <br> \n",
    "(1) You may use functions *tf.split()*, *enumerate()*, *tf.squeeze()*, *tf.matmul()*, *tf.tanh()*, *tf.transpose()*, *append()*, *pack()*.<br>\n",
    "(2) You may use a matrix Y of shape *batch_size x T x D*. We recall that Ytarget should have the shape *batch_size x T*.<br>\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Y= (3, 5, 24)\n",
      "Ytarget= (3, 5)\n"
     ]
    }
   ],
   "source": [
    "# Vanilla RNN implementation\n",
    "Y = []\n",
    "ht = hin\n",
    "for t, xt in enumerate(tf.split(1, T, Xin)): \n",
    "    if batch_size>1:\n",
    "        xt = tf.squeeze(xt); #print('xt=',xt) \n",
    "    else:\n",
    "        xt = tf.squeeze(xt)[None,:] \n",
    "    ht = tf.matmul(ht, Wh); #print('ht1=',ht) \n",
    "    ht += tf.matmul(xt, Wx); #print('ht2=',ht) \n",
    "    \n",
    "    ht += bh; #print('ht3=',ht) \n",
    "    ht = tf.tanh(ht); #print('ht4=',ht) \n",
    "    \n",
    "    yt = tf.matmul(ht, Wy); #print('yt1=',yt)\n",
    "    yt += by; #print('yt2=',yt)\n",
    "    \n",
    "    Y.append(yt)\n",
    "#print('Y=',Y) \n",
    "\n",
    "Y = tf.pack(Y); \n",
    "if batch_size>1:\n",
    "    Y = tf.squeeze(Y); \n",
    "Y = tf.transpose(Y, [1, 0, 2])\n",
    "print('Y=',Y.get_shape())\n",
    "print('Ytarget=',Ytarget.get_shape())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 4\n",
    "Perplexity loss is implemented as:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# perplexity\n",
    "logits = tf.reshape(Y,[batch_size*T,D])\n",
    "weights = tf.ones([batch_size*T])\n",
    "cross_entropy_perplexity = tf.nn.seq2seq.sequence_loss_by_example([logits],[Ytarget],[weights])\n",
    "cross_entropy_perplexity = tf.reduce_sum(cross_entropy_perplexity) / batch_size\n",
    "loss = cross_entropy_perplexity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 5\n",
    "Implement the optimization of the loss function.\n",
    "\n",
    "Hint: You may use function *tf.train.GradientDescentOptimizer()*.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Optimization\n",
    "train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 6\n",
    "Implement the prediction scheme: from an input character e.g. \"h\" then the RNN should predict \"ello\". <br>\n",
    "\n",
    "Hints: <br> \n",
    "(1) You should use the learned RNN.<br>\n",
    "(2) You may use functions *tf.one_hot()*, *tf.nn.softmax()*, *tf.argmax()*.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Predict\n",
    "idx_pred = tf.placeholder(tf.int64) # input seed\n",
    "xtp = tf.one_hot(idx_pred,depth=D); #print('xtp1=',xtp.get_shape())\n",
    "htp = tf.zeros([1,H])\n",
    "Ypred = []\n",
    "for t in range(T):\n",
    "    htp = tf.matmul(htp, Wh); #print('htp1=',htp) \n",
    "    htp += tf.matmul(xtp, Wx); #print('htp2=',htp) \n",
    "    htp += bh; #print('htp3=',htp) # (1, 100)\n",
    "    htp = tf.tanh(htp); #print('htp4=',htp) # (1, 100)\n",
    "    ytp = tf.matmul(htp, Wy); #print('ytp1=',ytp)\n",
    "    ytp += by; #print('ytp2=',ytp)\n",
    "    ytp = tf.nn.softmax(ytp); #print('yt1=',ytp)\n",
    "    ytp = tf.squeeze(ytp); #print('yt2=',ytp)  \n",
    "    seed_idx = tf.argmax(ytp,dimension=0); #print('seed_idx=',seed_idx)\n",
    "    xtp = tf.one_hot(seed_idx,depth=D)[None,:]; #print('xtp2=',xtp.get_shape())\n",
    "    Ypred.append(seed_idx)\n",
    "Ypred = tf.convert_to_tensor(Ypred)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "original train set shape (135,)\n",
      "pre-processed train set shape (3, 45)\n"
     ]
    }
   ],
   "source": [
    "# Prepare train data matrix of size \"batch_size x batch_len\"\n",
    "data_ix = [char_to_ix[ch] for ch in data[:data_len]]\n",
    "train_data = np.array(data_ix)\n",
    "print('original train set shape',train_data.shape)\n",
    "train_data = np.reshape(train_data[:batch_size*batch_len], [batch_size,batch_len])\n",
    "print('pre-processed train set shape',train_data.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# The following function tansforms an integer value d between {0,...,D-1} into an one hot vector, that is a \n",
    "# vector of dimension D x 1 which has value 1 for index d-1, and 0 otherwise\n",
    "from scipy.sparse import coo_matrix\n",
    "def convert_to_one_hot(a,max_val=None):\n",
    "    N = a.size\n",
    "    data = np.ones(N,dtype=int)\n",
    "    sparse_out = coo_matrix((data,(np.arange(N),a.ravel())), shape=(N,max_val))\n",
    "    return np.array(sparse_out.todense())"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Step 7\n",
    "Run the computational graph with batches of training data.<br> \n",
    "Predict the sequence of characters starting from the character \"h\".<br> \n",
    "\n",
    "Hints:<br>\n",
    "(1) Initial memory is $h_{t=0}$ is 0.<br>\n",
    "(2) Run the computational graph to optimize the perplexity loss, and to predict the the sequence of characters starting from the character \"h\".<br>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {
    "collapsed": false,
    "scrolled": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\n",
      "n= 0 , perplexity value= 29.5535130421\n",
      "starting char= h , predicted sequences= bguct\n",
      "\n",
      "n= 1 , perplexity value= 27.9625002039\n",
      "starting char= h , predicted sequences= epepe\n",
      "\n",
      "n= 2 , perplexity value= 28.2821211393\n",
      "starting char= h , predicted sequences=  toe \n",
      "\n",
      "n= 3 , perplexity value= 27.6789270581\n",
      "starting char= h , predicted sequences=  la i\n",
      "\n",
      "n= 4 , perplexity value= 26.0211196019\n",
      "starting char= h , predicted sequences=  aa  \n",
      "\n",
      "n= 5 , perplexity value= 23.946255338\n",
      "starting char= h , predicted sequences=  a  a\n",
      "\n",
      "n= 6 , perplexity value= 21.8258818021\n",
      "starting char= h , predicted sequences=  a ps\n",
      "\n",
      "n= 7 , perplexity value= 21.1786920954\n",
      "starting char= h , predicted sequences=  osra\n",
      "\n",
      "n= 8 , perplexity value= 9.46772242765\n",
      "starting char= h , predicted sequences=  in i\n",
      "\n",
      "n= 9 , perplexity value= 10.4062497066\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 10 , perplexity value= 10.4282811726\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 11 , perplexity value= 11.502705008\n",
      "starting char= h , predicted sequences=  lan \n",
      "\n",
      "n= 12 , perplexity value= 11.6051496676\n",
      "starting char= h , predicted sequences=  lan \n",
      "\n",
      "n= 13 , perplexity value= 10.6132289482\n",
      "starting char= h , predicted sequences=  of a\n",
      "\n",
      "n= 14 , perplexity value= 9.6058489675\n",
      "starting char= h , predicted sequences=  ls p\n",
      "\n",
      "n= 15 , perplexity value= 9.37241584996\n",
      "starting char= h , predicted sequences= e pro\n",
      "\n",
      "n= 16 , perplexity value= 5.31786451328\n",
      "starting char= h , predicted sequences=  lang\n",
      "\n",
      "n= 17 , perplexity value= 5.71144402466\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 18 , perplexity value= 5.68151127767\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 19 , perplexity value= 6.46925124888\n",
      "starting char= h , predicted sequences= elln \n",
      "\n",
      "n= 20 , perplexity value= 6.58066032529\n",
      "starting char= h , predicted sequences= e bas\n",
      "\n",
      "n= 21 , perplexity value= 6.13328235991\n",
      "starting char= h , predicted sequences= e pf \n",
      "\n",
      "n= 22 , perplexity value= 5.65600492947\n",
      "starting char= h , predicted sequences= e pro\n",
      "\n",
      "n= 23 , perplexity value= 5.61072280515\n",
      "starting char= h , predicted sequences= el of\n",
      "\n",
      "n= 24 , perplexity value= 3.5418044391\n",
      "starting char= h , predicted sequences= ellu \n",
      "\n",
      "n= 25 , perplexity value= 3.61793218568\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 26 , perplexity value= 3.56632600989\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 27 , perplexity value= 3.98579038998\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 28 , perplexity value= 4.01507879557\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 29 , perplexity value= 3.81765365683\n",
      "starting char= h , predicted sequences= e bas\n",
      "\n",
      "n= 30 , perplexity value= 3.60902281626\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 31 , perplexity value= 3.66261425863\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 32 , perplexity value= 2.58569042517\n",
      "starting char= h , predicted sequences= ellu \n",
      "\n",
      "n= 33 , perplexity value= 2.71294759439\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 34 , perplexity value= 2.69944025877\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 35 , perplexity value= 2.93036847301\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 36 , perplexity value= 2.8802468858\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 37 , perplexity value= 2.74185760328\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 38 , perplexity value= 2.60001086299\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 39 , perplexity value= 2.63901644204\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 40 , perplexity value= 2.10067073021\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 41 , perplexity value= 2.21468356725\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 42 , perplexity value= 2.19319889961\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 43 , perplexity value= 2.36261821496\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 44 , perplexity value= 2.31412813917\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 45 , perplexity value= 2.22972349843\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 46 , perplexity value= 2.12354689044\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 47 , perplexity value= 2.12585888398\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 48 , perplexity value= 1.87534942843\n",
      "starting char= h , predicted sequences= ello \n",
      "\n",
      "n= 49 , perplexity value= 1.9234134446\n",
      "starting char= h , predicted sequences= ello \n"
     ]
    }
   ],
   "source": [
    "# Run CG\n",
    "init = tf.initialize_all_variables()\n",
    "sess = tf.Session()\n",
    "sess.run(init)\n",
    "\n",
    "h0 = np.zeros([batch_size,H])\n",
    "indices = collections.deque()\n",
    "costs = 0.0; epoch_iters = 0\n",
    "for n in range(50):\n",
    "    \n",
    "    # Batch extraction\n",
    "    if len(indices) < 1:\n",
    "        indices.extend(range(epoch_size))\n",
    "        costs = 0.0; epoch_iters = 0\n",
    "    i = indices.popleft() \n",
    "    batch_x = train_data[:,i*T:(i+1)*T]\n",
    "    batch_x = convert_to_one_hot(batch_x,D); batch_x = np.reshape(batch_x,[batch_size,T,D])\n",
    "    batch_y = train_data[:,i*T+1:(i+1)*T+1]\n",
    "    #print(batch_x.shape,batch_y.shape)\n",
    "\n",
    "    # Train\n",
    "    idx = char_to_ix['h'];\n",
    "    loss_value,_,Ypredicted = sess.run([loss,train_step,Ypred], feed_dict={Xin: batch_x, Ytarget: batch_y, hin: h0, idx_pred: [idx]})\n",
    "   \n",
    "    # Perplexity\n",
    "    costs += loss_value\n",
    "    epoch_iters += T\n",
    "    perplexity = np.exp(costs/epoch_iters)\n",
    "    \n",
    "    if not n%1:\n",
    "        idx_char = Ypredicted\n",
    "        txt = ''.join(ix_to_char[ix] for ix in list(idx_char))\n",
    "        print('\\nn=',n,', perplexity value=',perplexity)\n",
    "        print('starting char=',ix_to_char[idx], ', predicted sequences=',txt)\n",
    "    \n",
    "sess.close()    "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}