{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Convolutional Neural Networks\n", "- CS231n Convolutional Neural Networks for Visual Recognition: https://git.io/vKlww" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracting MNIST_data/train-images-idx3-ubyte.gz\n", "Extracting MNIST_data/train-labels-idx1-ubyte.gz\n", "Extracting MNIST_data/t10k-images-idx3-ubyte.gz\n", "Extracting MNIST_data/t10k-labels-idx1-ubyte.gz\n" ] } ], "source": [ "import tensorflow as tf\n", "from tensorflow.examples.tutorials.mnist import input_data\n", "mnist = input_data.read_data_sets(\"MNIST_data/\", one_hot=True)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "collapsed": false }, "outputs": [], "source": [ "x = tf.placeholder(\"float\", [None, 784])\n", "y = tf.placeholder(\"float\", [None, 10])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- Image reshape to 4D [-1, 28, 28, 1]\n", " - The MNIST images have just one gray color value (the number of color channels is one), so that the last value is one." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(?, 28, 28, 1)\n" ] } ], "source": [ "x_image = tf.reshape(x, [-1, 28, 28, 1])\n", "print x_image.get_shape()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- **stride**\n", " - we must specify the stride with which we slide the filter. \n", " - stride = 1: we slide the filters one pixel at a time. \n", " - stride = 2: we slide the filters two pixel at a time. \n", " - This will produce smaller output volumes spatially.\n", "- **padding**\n", " - we need pad the input volume with zeros around the border. \n", " - zero padding" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- The CONV layer’s parameters consist of a set of learnable filters.\n", "- The convolution will compute 32 kernels (features) for each 5x5 patch. \n", "- Its weight tensor will have a shape of [5, 5, 1, 32]. \n", " - The first two dimensions are the patch size\n", " - The third is the number of input channels (here, it is one channel)\n", " - The last one is the number of kernels (features). \n", "- A bias vector has a component per kernel." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#conv2d\n", "- https://www.tensorflow.org/versions/r0.11/api_docs/python/nn.html#max_pool" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": false }, "outputs": [], "source": [ "W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 1, 32], stddev=0.1))\n", "b_conv1 = tf.Variable(tf.constant(0.1, shape=[32]))\n", "h_conv1 = tf.nn.relu(tf.nn.conv2d(x_image, W_conv1, strides=[1, 1, 1, 1], padding='SAME') + b_conv1)\n", "h_pool1 = tf.nn.max_pool(h_conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "collapsed": true }, "outputs": [], "source": [ "W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1))\n", "b_conv2 = tf.Variable(tf.constant(0.1, shape=[64]))\n", "h_conv2 = tf.nn.relu(tf.nn.conv2d(h_pool1, W_conv2, strides=[1, 1, 1, 1], padding='SAME') + b_conv2)\n", "h_pool2 = tf.nn.max_pool(h_conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "h_pool2_flat = tf.reshape(h_pool2, [-1, 7 * 7 * 64])" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "W_fc1 = tf.Variable(tf.truncated_normal([7 * 7 * 64, 1024], stddev=0.1))\n", "b_fc1 = tf.Variable(tf.constant(0.1, shape=[1024]))\n", "h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "collapsed": true }, "outputs": [], "source": [ "keep_prob = tf.placeholder(\"float\")\n", "h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob=keep_prob)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "collapsed": true }, "outputs": [], "source": [ "W_fc2 = tf.Variable(tf.truncated_normal([1024, 10], stddev=0.1))\n", "b_fc2 = tf.Variable(tf.constant(0.1, shape=[10]))\n", "y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "collapsed": false }, "outputs": [], "source": [ "cross_entropy = -tf.reduce_sum(y * tf.log(y_conv))\n", "train_step = tf.train.AdamOptimizer(1e-4).minimize(cross_entropy)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "collapsed": false }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "total batch: 550\n", "Epoch 0 Finished - Accuracy 0.9337\n", "Epoch 1 Finished - Accuracy 0.9567\n", "Epoch 2 Finished - Accuracy 0.9673\n", "Epoch 3 Finished - Accuracy 0.9714\n", "Epoch 4 Finished - Accuracy 0.9735\n", "Epoch 5 Finished - Accuracy 0.9767\n", "Epoch 6 Finished - Accuracy 0.9802\n", "Epoch 7 Finished - Accuracy 0.9801\n", "Epoch 8 Finished - Accuracy 0.9832\n", "Epoch 9 Finished - Accuracy 0.9821\n", "Epoch 10 Finished - Accuracy 0.9854\n", "Epoch 11 Finished - Accuracy 0.9867\n", "Epoch 12 Finished - Accuracy 0.9868\n", "Epoch 13 Finished - Accuracy 0.9851\n", "Epoch 14 Finished - Accuracy 0.9892\n", "Epoch 15 Finished - Accuracy 0.9886\n", "Epoch 16 Finished - Accuracy 0.9859\n", "Epoch 17 Finished - Accuracy 0.987\n", "Epoch 18 Finished - Accuracy 0.9877\n", "Epoch 19 Finished - Accuracy 0.9876\n", "Epoch 20 Finished - Accuracy 0.9884\n", "Epoch 21 Finished - Accuracy 0.989\n", "Epoch 22 Finished - Accuracy 0.988\n", "Epoch 23 Finished - Accuracy 0.9893\n", "Epoch 24 Finished - Accuracy 0.9872\n", "Epoch 25 Finished - Accuracy 0.9881\n", "Epoch 26 Finished - Accuracy 0.9873\n", "Epoch 27 Finished - Accuracy 0.9889\n", "Epoch 28 Finished - Accuracy 0.9877\n", "Epoch 29 Finished - Accuracy 0.9893\n", "Epoch 30 Finished - Accuracy 0.9896\n", "Epoch 31 Finished - Accuracy 0.9896\n", "Epoch 32 Finished - Accuracy 0.9901\n", "Epoch 33 Finished - Accuracy 0.9878\n", "Epoch 34 Finished - Accuracy 0.9895\n", "Epoch 35 Finished - Accuracy 0.9884\n", "Epoch 36 Finished - Accuracy 0.9895\n", "Epoch 37 Finished - Accuracy 0.9891\n", "Epoch 38 Finished - Accuracy 0.99\n", "Epoch 39 Finished - Accuracy 0.9893\n", "Optimization Finished!\n" ] } ], "source": [ "# Initializing the variables\n", "init = tf.initialize_all_variables()\n", "\n", "# Parameters\n", "training_epochs = 40\n", "learning_rate = 0.001\n", "batch_size = 100\n", "display_step = 1\n", "\n", "# Calculate accuracy with a Test model \n", "prediction_and_ground_truth = tf.equal(tf.argmax(y_conv, 1), tf.argmax(y, 1))\n", "accuracy = tf.reduce_mean(tf.cast(prediction_and_ground_truth, \"float\"))\n", "\n", "# Launch the tensorflow graph\n", "with tf.Session() as sess:\n", " sess.run(init)\n", " total_batch = int(mnist.train.num_examples/batch_size)\n", " \n", " print \"total batch: %d\" % total_batch\n", "\n", " # Training cycle\n", " for epoch in range(training_epochs):\n", "\n", " # Loop over all batches\n", " for i in range(total_batch):\n", " batch_images, batch_labels = mnist.train.next_batch(batch_size)\n", " sess.run(train_step, feed_dict={x: batch_images, y: batch_labels, keep_prob: 0.5})\n", " print \"Epoch %d Finished - Accuracy %g\" % (epoch, sess.run(accuracy, feed_dict={x: mnist.test.images, y: mnist.test.labels, keep_prob: 0.5}))\n", "\n", " print(\"Optimization Finished!\")" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python [Root]", "language": "python", "name": "Python [Root]" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 2 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython2", "version": "2.7.12" } }, "nbformat": 4, "nbformat_minor": 0 }