{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# A Network Tour of Data Science\n",
"### Xavier Bresson, Winter 2016/17\n",
"## Assignment 3 : Recurrent Neural Networks"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Import libraries\n",
"import tensorflow as tf\n",
"import numpy as np\n",
"import collections\n",
"import os"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Load text data\n",
"data = open(os.path.join('datasets', 'text_ass_6.txt'), 'r').read() # must be simple plain text file\n",
"print('Text data:',data)\n",
"chars = list(set(data))\n",
"print('\\nSingle characters:',chars)\n",
"data_len, vocab_size = len(data), len(chars)\n",
"print('\\nText data has %d characters, %d unique.' % (data_len, vocab_size))\n",
"char_to_ix = { ch:i for i,ch in enumerate(chars) }\n",
"ix_to_char = { i:ch for i,ch in enumerate(chars) }\n",
"print('\\nMapping characters to numbers:',char_to_ix)\n",
"print('\\nMapping numbers to characters:',ix_to_char)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Goal \n",
"The goal is to define with TensorFlow a vanilla recurrent neural network (RNN) model:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"h_t &= \\textrm{tanh}(W_h h_{t-1} + W_x x_t + b_h)\\\\\n",
"y_t &= W_y h_t + b_y\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"\n",
"to predict a sequence of characters. $x_t \\in \\mathbb{R}^D$ is the input character of the RNN in a dictionary of size $D$. $y_t \\in \\mathbb{R}^D$ is the predicted character (through a distribution function) by the RNN system. $h_t \\in \\mathbb{R}^H$ is the memory of the RNN, called hidden state at time $t$. Its dimensionality is arbitrarly chosen to $H$. The variables of the system are $W_h \\in \\mathbb{R}^{H\\times H}$, $W_x \\in \\mathbb{R}^{H\\times D}$, $W_y \\in \\mathbb{R}^{D\\times H}$, $b_h \\in \\mathbb{R}^D$, and $b_y \\in \\mathbb{R}^D$.
\n",
"\n",
"The number of time steps of the RNN is $T$, that is we will learn a sequence of data of length $T$: $x_t$ for $t=0,...,T-1$."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# hyperparameters of RNN\n",
"batch_size = 3 # batch size\n",
"batch_len = data_len // batch_size # batch length\n",
"T = 5 # temporal length\n",
"epoch_size = (batch_len - 1) // T # nb of iterations to get one epoch\n",
"D = vocab_size # data dimension = nb of unique characters\n",
"H = 5*D # size of hidden state, the memory layer\n",
"\n",
"print('data_len=',data_len,' batch_size=',batch_size,' batch_len=',\n",
" batch_len,' T=',T,' epoch_size=',epoch_size,' D=',D)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 1 \n",
"Initialize input variables of the computational graph:
\n",
"(1) Xin of size *batch_size x T x D* and type *tf.float32*. Each input character is encoded on a vector of size D.
\n",
"(2) Ytarget of size *batch_size x T* and type *tf.int64*. Each target character is encoded by a value in {0,...,D-1}.
\n",
"(3) hin of size *batch_size x H* and type *tf.float32*
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# input variables of computational graph (CG)\n",
"YOUR CODE HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 2\n",
"Define the variables of the computational graph:
\n",
"(1) $W_x$ is a random variable of shape *D x H* with normal distribution of variance $\\frac{6}{D+H}$
\n",
"(2) $W_h$ is an identity matrix multiplies by constant $0.01$
\n",
"(3) $W_y$ is a random variable of shape *H x D* with normal distribution of variance $\\frac{6}{D+H}$
\n",
"(4) $b_h$, $b_y$ are zero vectors of size *H*, and *D*
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Model variables\n",
"YOUR CODE HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 3\n",
"Implement the recursive formula:\n",
"\n",
"$$\n",
"\\begin{aligned}\n",
"h_t &= \\textrm{tanh}(W_h h_{t-1} + W_x x_t + b_h)\\\\\n",
"y_t &= W_y h_t + b_y\n",
"\\end{aligned}\n",
"$$\n",
"\n",
"with $h_{t=0}=hin$.
\n",
"\n",
"Hints:
\n",
"(1) You may use functions *tf.split()*, *enumerate()*, *tf.squeeze()*, *tf.matmul()*, *tf.tanh()*, *tf.transpose()*, *append()*, *pack()*.
\n",
"(2) You may use a matrix Y of shape *batch_size x T x D*. We recall that Ytarget should have the shape *batch_size x T*.
\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Vanilla RNN implementation\n",
"Y = []\n",
"ht = hin\n",
"\n",
"YOUR CODE HERE\n",
"\n",
"print('Y=',Y.get_shape())\n",
"print('Ytarget=',Ytarget.get_shape())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 4\n",
"Perplexity loss is implemented as:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# perplexity\n",
"logits = tf.reshape(Y,[batch_size*T,D])\n",
"weights = tf.ones([batch_size*T])\n",
"cross_entropy_perplexity = tf.nn.seq2seq.sequence_loss_by_example([logits],[Ytarget],[weights])\n",
"cross_entropy_perplexity = tf.reduce_sum(cross_entropy_perplexity) / batch_size\n",
"loss = cross_entropy_perplexity"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 5\n",
"Implement the optimization of the loss function.\n",
"\n",
"Hint: You may use function *tf.train.GradientDescentOptimizer()*.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# Optimization\n",
"YOUR CODE HERE"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 6\n",
"Implement the prediction scheme: from an input character e.g. \"h\" then the RNN should predict \"ello\".
\n",
"\n",
"Hints:
\n",
"(1) You should use the learned RNN.
\n",
"(2) You may use functions *tf.one_hot()*, *tf.nn.softmax()*, *tf.argmax()*.\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Predict\n",
"idx_pred = tf.placeholder(tf.int64) # input seed\n",
"\n",
"YOUR CODE HERE\n",
"\n",
"Ypred = tf.convert_to_tensor(Ypred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# Prepare train data matrix of size \"batch_size x batch_len\"\n",
"data_ix = [char_to_ix[ch] for ch in data[:data_len]]\n",
"train_data = np.array(data_ix)\n",
"print('original train set shape',train_data.shape)\n",
"train_data = np.reshape(train_data[:batch_size*batch_len], [batch_size,batch_len])\n",
"print('pre-processed train set shape',train_data.shape)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# The following function tansforms an integer value d between {0,...,D-1} into an one hot vector, that is a \n",
"# vector of dimension D x 1 which has value 1 for index d-1, and 0 otherwise\n",
"from scipy.sparse import coo_matrix\n",
"def convert_to_one_hot(a,max_val=None):\n",
" N = a.size\n",
" data = np.ones(N,dtype=int)\n",
" sparse_out = coo_matrix((data,(np.arange(N),a.ravel())), shape=(N,max_val))\n",
" return np.array(sparse_out.todense())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Step 7\n",
"Run the computational graph with batches of training data.
\n",
"Predict the sequence of characters starting from the character \"h\".
\n",
"\n",
"Hints:
\n",
"(1) Initial memory is $h_{t=0}$ is 0.
\n",
"(2) Run the computational graph to optimize the perplexity loss, and to predict the the sequence of characters starting from the character \"h\".
"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false,
"scrolled": false
},
"outputs": [],
"source": [
"# Run CG\n",
"init = tf.initialize_all_variables()\n",
"sess = tf.Session()\n",
"sess.run(init)\n",
"\n",
"h0 = np.zeros([batch_size,H])\n",
"indices = collections.deque()\n",
"costs = 0.0; epoch_iters = 0\n",
"for n in range(50):\n",
" \n",
" # Batch extraction\n",
" if len(indices) < 1:\n",
" indices.extend(range(epoch_size))\n",
" costs = 0.0; epoch_iters = 0\n",
" i = indices.popleft() \n",
" batch_x = train_data[:,i*T:(i+1)*T]\n",
" batch_x = convert_to_one_hot(batch_x,D); batch_x = np.reshape(batch_x,[batch_size,T,D])\n",
" batch_y = train_data[:,i*T+1:(i+1)*T+1]\n",
" #print(batch_x.shape,batch_y.shape)\n",
"\n",
" # Train\n",
" idx = char_to_ix['h'];\n",
" loss_value,_,Ypredicted = sess.run(YOUR CODE HERE)\n",
" \n",
" # Perplexity\n",
" costs += loss_value\n",
" epoch_iters += T\n",
" perplexity = np.exp(costs/epoch_iters)\n",
" \n",
" if not n%1:\n",
" idx_char = Ypredicted\n",
" txt = ''.join(ix_to_char[ix] for ix in list(idx_char))\n",
" print('\\nn=',n,', perplexity value=',perplexity)\n",
" print('starting char=',ix_to_char[idx], ', predicted sequences=',txt)\n",
" \n",
"sess.close() "
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 0
}