{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 306 }, "colab_type": "code", "executionInfo": { "elapsed": 2167, "status": "ok", "timestamp": 1524358132767, "user": { "displayName": "Ming-Yu Liu", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "113235319461992470380" }, "user_tz": 420 }, "id": "bcCHLv3Ia3KN", "outputId": "610364d9-e543-4c5c-c309-d743b8177da6" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# code for loading the format for the notebook\n", "import os\n", "\n", "# path : store the current path to convert back to it later\n", "path = os.getcwd()\n", "os.chdir(os.path.join('..', '..', 'notebook_format'))\n", "from formats import load_style\n", "load_style(css_style = 'custom2.css', plot_style = False)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 170 }, "colab_type": "code", "executionInfo": { "elapsed": 1165, "status": "ok", "timestamp": 1524358133954, "user": { "displayName": "Ming-Yu Liu", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "113235319461992470380" }, "user_tz": 420 }, "id": "LziH-_z1a8oK", "outputId": "1a4dc680-10b6-46cc-d520-118f36efe1c7" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Ethen 2018-04-21 22:30:43 \n", "\n", "CPython 3.6.4\n", "IPython 6.3.1\n", "\n", "keras 2.1.5\n", "numpy 1.14.2\n", "tensorflow 1.7.0\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] } ], "source": [ "os.chdir(path)\n", "\n", "import os\n", "import warnings\n", "import numpy as np\n", "import tensorflow as tf\n", "from time import time\n", "from keras.datasets import mnist\n", "from keras.utils import to_categorical\n", "\n", "# 1. magic so that the notebook will reload external python modules\n", "# 2. magic to print version\n", "%load_ext autoreload\n", "%autoreload 2\n", "%load_ext watermark\n", "%watermark -a 'Ethen' -d -t -v -p keras,numpy,tensorflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# RNN (Recurrent Neural Network)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The idea behind RNN is to make use of sequential information that exists in our dataset. In feed forward neural network, we assume that all inputs and outputs are independent of each other. But for some tasks, this might not be the best way to tackle the problem. For example, in Natural Language Processing (NLP) applications, if we wish to predict the next word in a sentence (one business application of this is [Swiftkey](https://en.wikipedia.org/wiki/SwiftKey)), we could imagine that knowing the word that comes before it can come in handy. RNNs are called recurrent because they perform the same task for every element of a sequence (sharing the weights), with the output being depended on the previous computations. Another way to think about RNNs is that they have a \"memory\" which captures information about what has been calculated so far.\n", "\n", "The following diagram shows what a typical RNN-type network looks like:\n", "\n", "\n", "\n", "We can think of RNN-type networks as networks with loops. During the forward stage, RNN is being unrolled/unfolded into a full network. By unrolling, we are referring to the fact that we will be performing the computation for the complete sequence. e.g.\n", "\n", "- If the input sequence is a sentence of 5 words, the network (RNN cell) would be unrolled into a 5-copies, one copy for each word.\n", "- If we were to consider every image's row as a sequence of pixels. For example MNIST image shape is 28*28 pixels, we would then be handling 28 time steps each having a feature size of 28 for every sample.\n", "\n", "The formula for the computation happening in a RNN cell are as follow:\n", "\n", "- $x_t$: The input at time step $t$, taking the size of the feature space, e.g. one-hot vector or embedding of the input word.\n", "- $s_t$: The hidden state at time step $t$. This is essentially the \"memory\" of the network $s_t$ is calculated based on the previous hidden state and the input at the current step. $s_t = f(U x_t + W s_{t - 1})$. The function $f$ is usually a nonlinearity function such as tanh or relu. At the first step, $s_t$ is usually initialized to all zeros in order to calculate the first hidden state.\n", "- $o_t$: is the output at step $t$. For example, if we wish to predict the most probable word in a sentence, i.e. a classification problem then the computation can be a linear layer followed by a softmax $o_t = softmax(V s_t)$.\n", "\n", "A few things to note:\n", "\n", "- Unlike a traditional deep neural network, which uses different parameters at each layer, a RNN shares the same parameters ($U$, $V$, $W$ above) across all steps. This reflects the fact that we are performing the same task at each step, just with different inputs, such design greatly reduces the total number of parameters we need to learn.\n", "- The above diagram has outputs at each time step, but depending on the task this may not be necessary. For example, when in basic sequence classification, we can assume the last hidden state has accumulated the information representing the entire sequence. A concrete example might be when predicting the sentiment of a sentence we may only care about the final output, not the sentiment after each word. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Implementation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use the MNIST dataset as our example dataset as it requires little preprocessing and let us focus on the algorithm at hand. Using the MNIST data `from tensorflow.examples.tutorials.mnist import input_data` will raise a lot of deprecation warnings, thus we leverage keras' mnist data and implement a class to generate batches of data from it." ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "VJglBxOafCmc" }, "outputs": [], "source": [ "class DataLoader:\n", " \"\"\"Container for a dataset.\"\"\"\n", "\n", " def __init__(self, images, labels, num_classes):\n", " if images.shape[0] != labels.shape[0]:\n", " raise ValueError('images.shape: %s labels.shape: %s' % (images.shape, labels.shape))\n", " \n", " self.num_classes = num_classes\n", " self._images = images\n", " self._labels = labels\n", "\n", " self._num_examples = images.shape[0]\n", " self._epochs_completed = 0\n", " self._index_in_epoch = 0\n", "\n", " def next_batch(self, batch_size, shuffle = True):\n", " \"\"\"Return the next `batch_size` examples from this data set.\"\"\"\n", "\n", " # shuffle for the first epoch\n", " start = self._index_in_epoch\n", " if self._epochs_completed == 0 and start == 0 and shuffle:\n", " self._shuffle_images_and_labels()\n", "\n", " if start + batch_size > self._num_examples:\n", " # retrieve the rest of the examples that does not add up to a full batch size\n", " self._epochs_completed += 1\n", " rest_num_examples = self._num_examples - start\n", " rest_images = self._images[start:self._num_examples]\n", " rest_labels = self._labels[start:self._num_examples]\n", " if shuffle:\n", " self._shuffle_images_and_labels()\n", "\n", " # complete the batch size from the next epoch\n", " start = 0\n", " self._index_in_epoch = batch_size - rest_num_examples\n", " end = self._index_in_epoch\n", " new_images = self._images[start:end]\n", " new_labels = self._labels[start:end]\n", " images = np.concatenate((rest_images, new_images), axis = 0)\n", " labels = np.concatenate((rest_labels, new_labels), axis = 0)\n", " return images, to_categorical(labels, self.num_classes)\n", " else:\n", " self._index_in_epoch += batch_size\n", " end = self._index_in_epoch\n", " return (self._images[start:end],\n", " to_categorical(self._labels[start:end], self.num_classes))\n", "\n", " def _shuffle_images_and_labels(self):\n", " permutated = np.arange(self._num_examples)\n", " np.random.shuffle(permutated)\n", " self._images[permutated]\n", " self._labels[permutated]" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 796, "status": "ok", "timestamp": 1524358135109, "user": { "displayName": "Ming-Yu Liu", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "113235319461992470380" }, "user_tz": 420 }, "id": "71b43RtcxqFi", "outputId": "a55df026-eccd-4f2c-d1f8-d981faff86f0" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mnist data shape: (60000, 28, 28)\n" ] } ], "source": [ "(X_train, Y_train), (X_test, Y_test) = mnist.load_data()\n", "X_train = X_train.astype('float32')\n", "X_test = X_test.astype('float32')\n", "\n", "# images takes values between 0 - 255, we can normalize it\n", "# by dividing every number by 255\n", "X_train /= 255\n", "X_test /= 255\n", "\n", "print('mnist data shape: ', X_train.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It can be helpful to write down the dimensions of our input and weights. Here our MNIST image's feature size is 28, the number of possible output/target is 10 and assume we've set our hidden layer size is 128 (this is a hyperparameter that we can tune, increasing it makes the \"memory\" capable of memorizing more complex patterns, but also results in additional computation and raise the risk of overfitting). Then we have:\n", "\n", "- $x_t \\in \\mathbb{R}^{28}$\n", "- $o_t \\in \\mathbb{R}^{10}$\n", "- $s_t \\in \\mathbb{R}^{128}$\n", "- $U \\in \\mathbb{R}^{28 \\times 128}$\n", "- $W \\in \\mathbb{R}^{128 \\times 128}$\n", "- $V \\in \\mathbb{R}^{128 \\times 10}$" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 51 }, "colab_type": "code", "executionInfo": { "elapsed": 422, "status": "ok", "timestamp": 1524358135549, "user": { "displayName": "Ming-Yu Liu", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "113235319461992470380" }, "user_tz": 420 }, "id": "hhf0iI3JxrUC", "outputId": "1c7a4f55-14cf-4c1f-f11e-3ea1ad2178e1" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "label shape: (128, 10)\n", "data shape: (128, 28, 28)\n" ] } ], "source": [ "# Define some parameters\n", "element_size = 28\n", "time_steps = 28\n", "num_classes = 10\n", "batch_size = 128\n", "hidden_layer_size = 128\n", "\n", "# example of generating a batch of data using the\n", "# DataLoader class\n", "data_loader = DataLoader(X_train, Y_train, num_classes)\n", "X_batch, y_batch = data_loader.next_batch(batch_size)\n", "print('label shape: ', y_batch.shape)\n", "print('data shape: ', X_batch.shape)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "c24nUczfhbb-" }, "outputs": [], "source": [ "# the first dimension holds the batch size\n", "inputs = tf.placeholder(tf.float32, shape = [None, time_steps, element_size], name = 'inputs')\n", "labels = tf.placeholder(tf.float32, shape = [None, num_classes], name = 'labels')\n", "\n", "# Wx = U from the diagram, input weight\n", "# Wh = W from the diagram, weight hidden\n", "# bx = bias term for the input weight\n", "Wx = tf.Variable(tf.zeros([element_size, hidden_layer_size]))\n", "Wh = tf.Variable(tf.zeros([hidden_layer_size, hidden_layer_size]))\n", "bx = tf.Variable(tf.zeros([hidden_layer_size]))\n", "\n", "def rnn_step(previous_hidden_state, x): \n", " current_hidden_state = tf.tanh(\n", " tf.matmul(previous_hidden_state, Wh) +\n", " tf.matmul(x, Wx) + bx)\n", "\n", " return current_hidden_state" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 34 }, "colab_type": "code", "executionInfo": { "elapsed": 1258, "status": "ok", "timestamp": 1524358137230, "user": { "displayName": "Ming-Yu Liu", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "113235319461992470380" }, "user_tz": 420 }, "id": "W-alJzyUk2mb", "outputId": "91c89c4b-b978-4dbd-a3cb-55dc9e90d055" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(28, 128, 128)\n" ] } ], "source": [ "# the original input batch's shape is of [batch_size, time_steps and element_size]\n", "# we permutate the order to [time_steps, batch_size, element_size]. The time_steps\n", "# is put up front in order to leverage tf.scan's functionality\n", "input_reshaped = tf.transpose(inputs, perm = [1, 0, 2])\n", "\n", "# we initialize a hidden state to begin with and apply the rnn steps using tf.scan,\n", "# which repeatedly applies a callable to our inputs\n", "initial_hidden = tf.zeros([batch_size, hidden_layer_size])\n", "all_hidden_states = tf.scan(\n", " rnn_step, input_reshaped, initializer = initial_hidden, name = 'hidden_states')\n", "\n", "# if we do a fake run, we can see that the output at this point is the hidden state\n", "# for every time step [time_steps, batch_size, hidden_layer_size]\n", "with tf.Session() as sess:\n", " sess.run(tf.global_variables_initializer())\n", "\n", " data_loader = DataLoader(X_train, Y_train, num_classes)\n", " X_batch, y_batch = data_loader.next_batch(batch_size)\n", " temp = sess.run(all_hidden_states, feed_dict = {inputs: X_batch, labels: y_batch})\n", " print(temp.shape)" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "ccZrawStlhkA" }, "outputs": [], "source": [ "# output linear layer's weight and bias, V from the diagram\n", "Wl = tf.Variable(tf.truncated_normal(\n", " [hidden_layer_size, num_classes],\n", " mean = 0, stddev = .01))\n", "bl = tf.Variable(tf.truncated_normal(\n", " [num_classes], mean = 0,stddev = .01))\n", "\n", "# apply linear layer to state vector;\n", "# instead of calculating the output vector for every hidden state,\n", "# in basic classification, we can assume the last hidden state\n", "# has accumulated the information representing the entire sequence\n", "output = tf.matmul(all_hidden_states[-1], Wl) + bl" ] }, { "cell_type": "code", "execution_count": 0, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 } }, "colab_type": "code", "id": "bzjr74Rzffat" }, "outputs": [], "source": [ "learning_rate = 0.001\n", "\n", "# specify the cross entropy loss, the optimizer to train the loss,\n", "# the accuracy measurement\n", "cross_entropy = tf.reduce_mean(\n", " tf.nn.softmax_cross_entropy_with_logits_v2(\n", " logits = output, labels = labels))\n", "train_step = tf.train.RMSPropOptimizer(learning_rate).minimize(cross_entropy)\n", "correct_prediction = tf.equal(tf.argmax(labels, axis = 1), tf.argmax(output, axis = 1))\n", "accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) * 100" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "colab": { "autoexec": { "startup": false, "wait_interval": 0 }, "base_uri": "https://localhost:8080/", "height": 153 }, "colab_type": "code", "executionInfo": { "elapsed": 76331, "status": "ok", "timestamp": 1524358227134, "user": { "displayName": "Ming-Yu Liu", "photoUrl": "https://lh3.googleusercontent.com/a/default-user=s128", "userId": "113235319461992470380" }, "user_tz": 420 }, "id": "NSeCPQlusgQ_", "outputId": "b0806101-723f-4035-b94e-d0c031a0fb1a" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iter 0, Minibatch Loss = 2.301946, Training Accuracy = 10.15625\n", "Iter 1000, Minibatch Loss = 1.153001, Training Accuracy = 56.25000\n", "Iter 2000, Minibatch Loss = 0.672977, Training Accuracy = 75.78125\n", "Iter 3000, Minibatch Loss = 0.278527, Training Accuracy = 90.62500\n", "Iter 4000, Minibatch Loss = 0.095026, Training Accuracy = 99.21875\n", "Optimization finished!\n", "Test Accuracy: 95.3125\n", "elapse time: 75.74224495887756\n" ] } ], "source": [ "X_test_batch = X_test[:batch_size]\n", "y_test_batch = to_categorical(Y_test[:batch_size], num_classes)\n", "\n", "data_loader = DataLoader(X_train, Y_train, num_classes)\n", "\n", "epochs = 5000\n", "with tf.Session() as sess: \n", " sess.run(tf.global_variables_initializer())\n", "\n", " start = time()\n", " for i in range(epochs):\n", " X_batch, y_batch = data_loader.next_batch(batch_size)\n", " sess.run(train_step, feed_dict = {inputs: X_batch, labels: y_batch})\n", "\n", " if i % 1000 == 0:\n", " acc, loss = sess.run([accuracy, cross_entropy],\n", " feed_dict = {inputs: X_batch, labels: y_batch})\n", " print('Iter ' + str(i) + ', Minibatch Loss =',\n", " '{:.6f}'.format(loss) + ', Training Accuracy =',\n", " '{:.5f}'.format(acc))\n", "\n", " print('Optimization finished!')\n", " acc_test = sess.run(accuracy, feed_dict = {inputs: X_test_batch, labels: y_test_batch})\n", " print('Test Accuracy: ', acc_test)\n", " print('elapse time: ', time() - start)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reference\n", "\n", "- [Jupyter Notebook: Recurrent Neural Network Example](http://nbviewer.jupyter.org/github/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/recurrent_network.ipynb)\n", "- [Blog: Recurrent Neural Networks Tutorial, Part 1 – Introduction to RNNs](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-1-introduction-to-rnns/)\n", "- [Blog: Recurrent Neural Networks Tutorial, Part 2 – Implementing a RNN with Python, Numpy and Theano](http://www.wildml.com/2015/09/recurrent-neural-networks-tutorial-part-2-implementing-a-language-model-rnn-with-python-numpy-and-theano/)" ] } ], "metadata": { "accelerator": "GPU", "colab": { "collapsed_sections": [], "default_view": {}, "name": "tensorflow_rnn.ipynb", "provenance": [], "version": "0.3.2", "views": {} }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" }, "toc": { "nav_menu": {}, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": { "height": "calc(100% - 180px)", "left": "10px", "top": "150px", "width": "251px" }, "toc_section_display": true, "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }