{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A First Look at Keras\n", "\n", "### Q1\n", "What was the accuracy on the MNIST test data for our one layer perceptron:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's import the Keras library" ] }, { "cell_type": "code", "execution_count": 27, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'2.1.3'" ] }, "execution_count": 27, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import keras\n", "keras.__version__" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A first look at a neural network\n", "\n", "Let's examine how Keras learns to classify \n", "hand-written digits. Unless you already have experience with Keras or similar libraries, you will not understand everything about this \n", "first example right away. \n", "\n", "Remember, the problem we are trying to solve here is to classify grayscale images of handwritten digits (28 pixels by 28 pixels), into their 10 \n", "categories (0 to 9). The dataset we will use is the MNIST dataset, a classic dataset in the machine learning community, which has been \n", "around for almost as long as the field itself and has been very intensively studied. It's a set of 60,000 training images, plus 10,000 test \n", "images, assembled by the National Institute of Standards and Technology (the NIST in MNIST) in the 1980s. You can think of \"solving\" MNIST \n", "as the \"Hello World\" of deep learning -- it's what you do to verify that your algorithms are working as expected. As you become a machine \n", "learning practitioner, you will see MNIST come up over and over again, in scientific papers, blog posts, and so on." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The MNIST dataset comes pre-loaded in Keras, in the form of a set of four Numpy arrays:" ] }, { "cell_type": "code", "execution_count": 28, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from keras.datasets import mnist\n", "\n", "(train_images, train_labels), (test_images, test_labels) = mnist.load_data()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`train_images` and `train_labels` form the \"training set\", the data that the model will learn from. The model will then be tested on the \n", "\"test set\", `test_images` and `test_labels`. Our images are encoded as Numpy arrays, and the labels are simply an array of digits, ranging \n", "from 0 to 9. There is a one-to-one correspondence between the images and the labels.\n", "\n", "Let's have a look at the training data:" ] }, { "cell_type": "code", "execution_count": 29, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(60000, 28, 28)" ] }, "execution_count": 29, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_images.shape" ] }, { "cell_type": "code", "execution_count": 30, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "60000" ] }, "execution_count": 30, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(train_labels)" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([[[1., 0.],\n", " [1., 0.],\n", " [1., 0.],\n", " ...,\n", " [1., 0.],\n", " [1., 0.],\n", " [1., 0.]],\n", "\n", " [[0., 1.],\n", " [1., 0.],\n", " [1., 0.],\n", " ...,\n", " [1., 0.],\n", " [1., 0.],\n", " [1., 0.]],\n", "\n", " [[1., 0.],\n", " [1., 0.],\n", " [1., 0.],\n", " ...,\n", " [1., 0.],\n", " [1., 0.],\n", " [1., 0.]],\n", "\n", " ...,\n", "\n", " [[1., 0.],\n", " [1., 0.],\n", " [1., 0.],\n", " ...,\n", " [1., 0.],\n", " [1., 0.],\n", " [1., 0.]],\n", "\n", " [[1., 0.],\n", " [1., 0.],\n", " [1., 0.],\n", " ...,\n", " [1., 0.],\n", " [1., 0.],\n", " [1., 0.]],\n", "\n", " [[1., 0.],\n", " [1., 0.],\n", " [1., 0.],\n", " ...,\n", " [1., 0.],\n", " [0., 1.],\n", " [1., 0.]]])" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "train_labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's have a look at the test data:" ] }, { "cell_type": "code", "execution_count": 31, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(10000, 28, 28)" ] }, "execution_count": 31, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_images.shape" ] }, { "cell_type": "code", "execution_count": 32, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "10000" ] }, "execution_count": 32, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(test_labels)" ] }, { "cell_type": "code", "execution_count": 33, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([7, 2, 1, ..., 4, 5, 6], dtype=uint8)" ] }, "execution_count": 33, "metadata": {}, "output_type": "execute_result" } ], "source": [ "test_labels" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# A Review:\n", "### Q2 Can you display the first image of the test dataset?" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [], "source": [ "# Your code here\n", "import matplotlib.pyplot as plt\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Our workflow will be as follow: first we will present our neural network with the training data, `train_images` and `train_labels`. The \n", "network will then learn to associate images and labels. Finally, we will ask the network to produce predictions for `test_images`, and we \n", "will verify if these predictions match the labels from `test_labels`.\n", "\n", "Let's build our network -- again, remember that you aren't supposed to understand everything about this example just yet." ] }, { "cell_type": "code", "execution_count": 34, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from keras import models\n", "from keras import layers\n", "\n", "network = models.Sequential()\n", "network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))\n", "network.add(layers.Dense(10, activation='softmax'))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "The core building block of neural networks is the \"layer\", a data-processing module which you can conceive as a \"filter\" for data. Some \n", "data comes in, and comes out in a more useful form. Precisely, layers extract _representations_ out of the data fed into them -- hopefully \n", "representations that are more meaningful for the problem at hand. Most of deep learning really consists of chaining together simple layers \n", "which will implement a form of progressive \"data distillation\". A deep learning model is like a sieve for data processing, made of a \n", "succession of increasingly refined data filters -- the \"layers\".\n", "\n", "Here our network consists of a sequence of two `Dense` layers, which are densely-connected (also called \"fully-connected\") neural layers. \n", "The second (and last) layer is a 10-way \"softmax\" layer, which means it will return an array of 10 probability scores (summing to 1). Each \n", "score will be the probability that the current digit image belongs to one of our 10 digit classes.\n", "\n", "To make our network ready for training, we need to pick three more things, as part of \"compilation\" step:\n", "\n", "* A loss function: the is how the network will be able to measure how good a job it is doing on its training data, and thus how it will be \n", "able to steer itself in the right direction.\n", "* An optimizer: this is the mechanism through which the network will update itself based on the data it sees and its loss function.\n", "* Metrics to monitor during training and testing. Here we will only care about accuracy (the fraction of the images that were correctly \n", "classified).\n", "\n", "When we were working with Tensorflow we had to define each of these metrics. Keras makes this task much easier:" ] }, { "cell_type": "code", "execution_count": 35, "metadata": { "collapsed": true }, "outputs": [], "source": [ "network.compile(optimizer='rmsprop',\n", " loss='categorical_crossentropy',\n", " metrics=['accuracy'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Before training, we will preprocess our data by reshaping it into the shape that the network expects, and scaling it so that all values are in \n", "the `[0, 1]` interval. Previously, our training images for instance were stored in an array of shape `(60000, 28, 28)` of type `uint8` with \n", "values in the `[0, 255]` interval. We transform it into a `float32` array of shape `(60000, 28 * 28)` with values between 0 and 1." ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "collapsed": true }, "outputs": [], "source": [ "train_images = train_images.reshape((60000, 28 * 28))\n", "train_images = train_images.astype('float32') / 255\n", "\n", "test_images = test_images.reshape((10000, 28 * 28))\n", "test_images = test_images.astype('float32') / 255" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q3: Why are we dividing each pixel by 255? What did the original number represent?\n", "\n", "\n", "We also need to categorically encode the labels:" ] }, { "cell_type": "code", "execution_count": 37, "metadata": { "collapsed": true }, "outputs": [], "source": [ "from keras.utils import to_categorical\n", "\n", "train_labels = to_categorical(train_labels)\n", "test_labels = to_categorical(test_labels)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q4 What does the first test label look like? (print it out)\n", "And what is this called?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are now ready to train our network, which in Keras is done via a call to the `fit` method of the network: \n", "we \"fit\" the model to its training data." ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [], "source": [ "network.fit(train_images, train_labels, epochs=5, batch_size=128)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Two quantities are being displayed during training: the \"loss\" of the network over the training data, and the accuracy of the network over \n", "the training data.\n", "\n", "We quickly reach an accuracy of 0.989 (i.e. 98.9%) on the training data. Now let's check that our model performs well on the test set too:" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [], "source": [ "test_loss, test_acc = network.evaluate(test_images, test_labels)" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [], "source": [ "print('test_acc:', test_acc)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "Our test set accuracy turns out to be 97.8% -- that's quite a bit lower than the training set accuracy. \n", "This gap between training accuracy and test accuracy is an example of \"overfitting\", \n", "the fact that machine learning models tend to perform worse on new data than on their training data. \n", "\n", "\n", "### Q5: \n", "What is the accuracy on our test data if we use 7 epochs? (edit this cell and put in the answer)\n", "\n", "### Q6 \n", "What is the accuracy on our test data using our network before we do any training (before `fit`)?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q7 a single layer perceptron in Keras.\n", "Can you construct and train a new network `network2` or whatever you want to name it. That has a single layer of 10 softmax neurons? What is its accuracy on the test data? " ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [], "source": [ "# TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Q8 a two hidden layer network in Keras.\n", "Can you construct and train a new network that has two hidden layers \n", "(as before, the first layer can have 512 neurons - the second should have 256)? What is its accuracy on the test data? " ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [], "source": [ "# TBD" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Remix\n", "Remix by Ron Zacharski. Orginal Python notebook by François Chollet\n", "\n", "### MIT License\n", "\n", "Copyright (c) 2017 François Chollet\n", "\n", "Permission is hereby granted, free of charge, to any person obtaining a copy\n", "of this software and associated documentation files (the \"Software\"), to deal\n", "in the Software without restriction, including without limitation the rights\n", "to use, copy, modify, merge, publish, distribute, sublicense, and/or sell\n", "copies of the Software, and to permit persons to whom the Software is\n", "furnished to do so, subject to the following conditions:\n", "\n", "The above copyright notice and this permission notice shall be included in all\n", "copies or substantial portions of the Software.\n", "\n", "THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n", "IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n", "FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE\n", "AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n", "LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,\n", "OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE\n", "SOFTWARE." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 2 }