{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] }, { "data": { "text/plain": [ "'2.0.8'" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import keras\n", "keras.__version__" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# 5.1 - Introduction to convnets\n", "\n", "This notebook contains the code sample found in Chapter 5, Section 1 of [Deep Learning with Python](https://www.manning.com/books/deep-learning-with-python?a_aid=keras&a_bid=76564dff). Note that the original text features far more content, in particular further explanations and figures: in this notebook, you will only find source code and related comments.\n", "\n", "----\n", "\n", "First, let's take a practical look at a very simple convnet example. We will use our convnet to classify MNIST digits, a task that you've already been \n", "through in Chapter 2, using a densely-connected network (our test accuracy then was 97.8%). Even though our convnet will be very basic, its \n", "accuracy will still blow out of the water that of the densely-connected model from Chapter 2.\n", "\n", "The 6 lines of code below show you what a basic convnet looks like. It's a stack of `Conv2D` and `MaxPooling2D` layers. We'll see in a \n", "minute what they do concretely.\n", "Importantly, a convnet takes as input tensors of shape `(image_height, image_width, image_channels)` (not including the batch dimension). \n", "In our case, we will configure our convnet to process inputs of size `(28, 28, 1)`, which is the format of MNIST images. We do this via \n", "passing the argument `input_shape=(28, 28, 1)` to our first layer." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "from keras import layers\n", "from keras import models\n", "\n", "model = models.Sequential()\n", "model.add(layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1)))\n", "model.add(layers.MaxPooling2D((2, 2)))\n", "model.add(layers.Conv2D(64, (3, 3), activation='relu'))\n", "model.add(layers.MaxPooling2D((2, 2)))\n", "model.add(layers.Conv2D(64, (3, 3), activation='relu'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's display the architecture of our convnet so far:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "conv2d_1 (Conv2D) (None, 26, 26, 32) 320 \n", "_________________________________________________________________\n", "max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0 \n", "_________________________________________________________________\n", "conv2d_2 (Conv2D) (None, 11, 11, 64) 18496 \n", "_________________________________________________________________\n", "max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0 \n", "_________________________________________________________________\n", "conv2d_3 (Conv2D) (None, 3, 3, 64) 36928 \n", "=================================================================\n", "Total params: 55,744\n", "Trainable params: 55,744\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "model.summary()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "You can see above that the output of every `Conv2D` and `MaxPooling2D` layer is a 3D tensor of shape `(height, width, channels)`. The width \n", "and height dimensions tend to shrink as we go deeper in the network. The number of channels is controlled by the first argument passed to \n", "the `Conv2D` layers (e.g. 32 or 64).\n", "\n", "The next step would be to feed our last output tensor (of shape `(3, 3, 64)`) into a densely-connected classifier network like those you are \n", "already familiar with: a stack of `Dense` layers. These classifiers process vectors, which are 1D, whereas our current output is a 3D tensor. \n", "So first, we will have to flatten our 3D outputs to 1D, and then add a few `Dense` layers on top:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "collapsed": true }, "outputs": [], "source": [ "model.add(layers.Flatten())\n", "model.add(layers.Dense(64, activation='relu'))\n", "model.add(layers.Dense(10, activation='softmax'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to do 10-way classification, so we use a final layer with 10 outputs and a softmax activation. Now here's what our network \n", "looks like:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "conv2d_1 (Conv2D) (None, 26, 26, 32) 320 \n", "_________________________________________________________________\n", "max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32) 0 \n", "_________________________________________________________________\n", "conv2d_2 (Conv2D) (None, 11, 11, 64) 18496 \n", "_________________________________________________________________\n", "max_pooling2d_2 (MaxPooling2 (None, 5, 5, 64) 0 \n", "_________________________________________________________________\n", "conv2d_3 (Conv2D) (None, 3, 3, 64) 36928 \n", "_________________________________________________________________\n", "flatten_1 (Flatten) (None, 576) 0 \n", "_________________________________________________________________\n", "dense_1 (Dense) (None, 64) 36928 \n", "_________________________________________________________________\n", "dense_2 (Dense) (None, 10) 650 \n", "=================================================================\n", "Total params: 93,322\n", "Trainable params: 93,322\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As you can see, our `(3, 3, 64)` outputs were flattened into vectors of shape `(576,)`, before going through two `Dense` layers.\n", "\n", "Now, let's train our convnet on the MNIST digits. We will reuse a lot of the code we have already covered in the MNIST example from Chapter \n", "2." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "from keras.datasets import mnist\n", "from keras.utils import to_categorical\n", "\n", "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()\n", "\n", "train_images = train_images.reshape((60000, 28, 28, 1))\n", "train_images = train_images.astype('float32') / 255\n", "\n", "test_images = test_images.reshape((10000, 28, 28, 1))\n", "test_images = test_images.astype('float32') / 255\n", "\n", "train_labels = to_categorical(train_labels)\n", "test_labels = to_categorical(test_labels)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Epoch 1/5\n", "60000/60000 [==============================] - 8s - loss: 0.1766 - acc: 0.9440 \n", "Epoch 2/5\n", "60000/60000 [==============================] - 7s - loss: 0.0462 - acc: 0.9855 \n", "Epoch 3/5\n", "60000/60000 [==============================] - 7s - loss: 0.0322 - acc: 0.9902 \n", "Epoch 4/5\n", "60000/60000 [==============================] - 7s - loss: 0.0241 - acc: 0.9926 \n", "Epoch 5/5\n", "60000/60000 [==============================] - 7s - loss: 0.0187 - acc: 0.9943 \n" ] }, { "data": { "text/plain": [ "" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "model.compile(optimizer='rmsprop',\n", " loss='categorical_crossentropy',\n", " metrics=['accuracy'])\n", "model.fit(train_images, train_labels, epochs=5, batch_size=64)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's evaluate the model on the test data:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 9536/10000 [===========================>..] - ETA: 0s" ] } ], "source": [ "test_loss, test_acc = model.evaluate(test_images, test_labels)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.99129999999999996" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_acc" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While our densely-connected network from Chapter 2 had a test accuracy of 97.8%, our basic convnet has a test accuracy of 99.3%: we \n", "decreased our error rate by 68% (relative). Not bad! " ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.5.2" } }, "nbformat": 4, "nbformat_minor": 2 }