{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# code for loading the format for the notebook\n", "import os\n", "\n", "# path : store the current path to convert back to it later\n", "path = os.getcwd()\n", "os.chdir(os.path.join('..', 'notebook_format'))\n", "\n", "from formats import load_style\n", "load_style(plot_style=False)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Ethen 2018-08-26 22:56:05 \n", "\n", "CPython 3.6.4\n", "IPython 6.4.0\n", "\n", "numpy 1.14.1\n", "pandas 0.23.0\n", "keras 2.2.2\n" ] } ], "source": [ "os.chdir(path)\n", "\n", "# 1. magic to print version\n", "# 2. magic so that the notebook will reload external python modules\n", "%load_ext watermark\n", "%load_ext autoreload \n", "%autoreload 2\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from keras.datasets import mnist\n", "from keras.utils import np_utils\n", "from keras.optimizers import RMSprop\n", "from keras.models import Sequential, load_model\n", "from keras.layers.core import Dense, Dropout, Activation\n", "\n", "%watermark -a 'Ethen' -d -t -v -p numpy,pandas,keras" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Keras Basics" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basic Keras API to build a simple multi-layer neural network." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "60000 train samples\n", "10000 test samples\n" ] } ], "source": [ "n_classes = 10\n", "n_features = 784 # mnist is a 28 * 28 image\n", "\n", "# load the dataset and some preprocessing step that can be skipped\n", "(X_train, y_train), (X_test, y_test) = mnist.load_data()\n", "X_train = X_train.reshape(60000, n_features)\n", "X_test = X_test.reshape(10000, n_features)\n", "X_train = X_train.astype('float32')\n", "X_test = X_test.astype('float32')\n", "\n", "# images takes values between 0 - 255, we can normalize it\n", "# by dividing every number by 255\n", "X_train /= 255\n", "X_test /= 255\n", "\n", "print(X_train.shape[0], 'train samples')\n", "print(X_test.shape[0], 'test samples')" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "# convert class vectors to binary class matrices (one-hot encoding)\n", "# note: you HAVE to to this step\n", "Y_train = np_utils.to_categorical(y_train, n_classes)\n", "Y_test = np_utils.to_categorical(y_test , n_classes)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Basics of training a model:\n", "\n", "The easiest way to build models in keras is to use `Sequential` model and the `.add()` method to stack layers together in sequence to build up our network.\n", "\n", "- We start with `Dense` (fully-connected layers), where we specify how many nodes you wish to have for the layer. Since the first layer that we're going to add is the input layer, we have to make sure that the `input_dim` parameter matches the number of features (columns) in the training set. Then after the first layer, we don't need to specify the size of the input anymore.\n", "- Then we specify the `Activation` function for that layer, and add a `Dropout` layer if we wish.\n", "- For the last `Dense` and `Activation` layer we need to specify the number of class as the output and softmax to tell it to output the predicted class's probability." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "# define the model\n", "model = Sequential()\n", "model.add(Dense(512, input_dim = n_features))\n", "model.add(Activation('relu'))\n", "model.add(Dropout(0.2))\n", "model.add(Dense(512))\n", "model.add(Activation('relu'))\n", "model.add(Dropout(0.2))\n", "model.add(Dense(n_classes))\n", "model.add(Activation('softmax'))" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "_________________________________________________________________\n", "Layer (type) Output Shape Param # \n", "=================================================================\n", "dense_1 (Dense) (None, 512) 401920 \n", "_________________________________________________________________\n", "activation_1 (Activation) (None, 512) 0 \n", "_________________________________________________________________\n", "dropout_1 (Dropout) (None, 512) 0 \n", "_________________________________________________________________\n", "dense_2 (Dense) (None, 512) 262656 \n", "_________________________________________________________________\n", "activation_2 (Activation) (None, 512) 0 \n", "_________________________________________________________________\n", "dropout_2 (Dropout) (None, 512) 0 \n", "_________________________________________________________________\n", "dense_3 (Dense) (None, 10) 5130 \n", "_________________________________________________________________\n", "activation_3 (Activation) (None, 10) 0 \n", "=================================================================\n", "Total params: 669,706\n", "Trainable params: 669,706\n", "Non-trainable params: 0\n", "_________________________________________________________________\n" ] } ], "source": [ "# we can check the summary to check the number of parameters\n", "model.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once our model looks good, we can configure its learning process with `.compile()`, where you need to specify which `optimizer` to use, and the `loss` function ( `categorical_crossentropy` is the typical one for multi-class classification) and the `metrics` to track. \n", "\n", "Finally, `.fit()` the model by passing in the training, validation set, the number of epochs and batch size. For the batch size, we typically specify this number to be power of 2 for computing efficiency." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train on 60000 samples, validate on 10000 samples\n", "Epoch 1/10\n", "60000/60000 [==============================] - 5s 76us/step - loss: 0.2464 - acc: 0.9243 - val_loss: 0.1205 - val_acc: 0.9610\n", "Epoch 2/10\n", "60000/60000 [==============================] - 4s 73us/step - loss: 0.1029 - acc: 0.9691 - val_loss: 0.0954 - val_acc: 0.9684\n", "Epoch 3/10\n", "60000/60000 [==============================] - 4s 72us/step - loss: 0.0741 - acc: 0.9774 - val_loss: 0.0746 - val_acc: 0.9776\n", "Epoch 4/10\n", "60000/60000 [==============================] - 4s 71us/step - loss: 0.0610 - acc: 0.9812 - val_loss: 0.0724 - val_acc: 0.9798\n", "Epoch 5/10\n", "60000/60000 [==============================] - 4s 73us/step - loss: 0.0508 - acc: 0.9846 - val_loss: 0.0709 - val_acc: 0.9815\n", "Epoch 6/10\n", "60000/60000 [==============================] - 5s 79us/step - loss: 0.0429 - acc: 0.9876 - val_loss: 0.0687 - val_acc: 0.9810\n", "Epoch 7/10\n", "60000/60000 [==============================] - 4s 73us/step - loss: 0.0380 - acc: 0.9888 - val_loss: 0.0728 - val_acc: 0.9827\n", "Epoch 8/10\n", "60000/60000 [==============================] - 4s 72us/step - loss: 0.0347 - acc: 0.9900 - val_loss: 0.0888 - val_acc: 0.9815\n", "Epoch 9/10\n", "60000/60000 [==============================] - 4s 73us/step - loss: 0.0308 - acc: 0.9907 - val_loss: 0.0765 - val_acc: 0.9831\n", "Epoch 10/10\n", "60000/60000 [==============================] - 5s 75us/step - loss: 0.0294 - acc: 0.9915 - val_loss: 0.0831 - val_acc: 0.9834\n" ] } ], "source": [ "model.compile(loss = 'categorical_crossentropy', optimizer = RMSprop(), metrics = ['accuracy'])\n", "\n", "n_epochs = 10\n", "batch_size = 128 \n", "history = model.fit(\n", " X_train, \n", " Y_train,\n", " batch_size = batch_size, \n", " epochs = n_epochs,\n", " verbose = 1, # set it to 0 if we don't want to have progess bars\n", " validation_data = (X_test, Y_test)\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'val_loss': [0.12054000333249569,\n", " 0.09538325125724077,\n", " 0.0746443172362633,\n", " 0.07235066388248233,\n", " 0.07087009320242796,\n", " 0.06869937015355099,\n", " 0.07279114324423717,\n", " 0.08881157057585878,\n", " 0.07649361667371704,\n", " 0.08308388180260735],\n", " 'val_acc': [0.961,\n", " 0.9684,\n", " 0.9776,\n", " 0.9798,\n", " 0.9815,\n", " 0.981,\n", " 0.9827,\n", " 0.9815,\n", " 0.9831,\n", " 0.9834],\n", " 'loss': [0.24640614926020304,\n", " 0.1029205946157376,\n", " 0.07406270666122436,\n", " 0.06099520227760077,\n", " 0.0507756207327048,\n", " 0.042891469335804386,\n", " 0.03797459708166619,\n", " 0.03471902108440796,\n", " 0.030829096661073467,\n", " 0.029378291251541427],\n", " 'acc': [0.9242833333015442,\n", " 0.9690666666666666,\n", " 0.9773999999682108,\n", " 0.9811666666666666,\n", " 0.9846333333015442,\n", " 0.9875666666348776,\n", " 0.9887666666984558,\n", " 0.9899833333651225,\n", " 0.9906999999682109,\n", " 0.9914833333015441]}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# history attribute stores the training and validation score and loss\n", "history.history" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "metrics: ['loss', 'acc']\n" ] }, { "data": { "text/plain": [ "[0.08308388501826912, 0.9834]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# .evaluate gives the loss and metric evaluation score for the dataset,\n", "# here the result matches the validation set's history above\n", "print('metrics: ', model.metrics_names)\n", "score = model.evaluate(X_test, Y_test, verbose = 0)\n", "score" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "6\n", "(784, 512)\n", "(512,)\n" ] } ], "source": [ "# stores the weight of the model,\n", "# it's a list, note that the length is 6 because we have 3 dense layer\n", "# and each one has it's associated bias term\n", "weights = model.get_weights()\n", "print(len(weights))\n", "\n", "# W1 should have 784, 512 for the 784\n", "# feauture column and the 512 the number \n", "# of dense nodes that we've specified\n", "W1, b1, W2, b2, W3, b3 = weights\n", "print(W1.shape)\n", "print(b1.shape)" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "valid accuracy: 98.34\n" ] } ], "source": [ "# predict the accuracy\n", "y_pred = model.predict_classes(X_test, verbose = 0)\n", "accuracy = np.sum(y_test == y_pred) / X_test.shape[0]\n", "print('valid accuracy: %.2f' % (accuracy * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Saving and loading the models" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is not recommended to use pickle or cPickle to save a Keras model. By saving it as a HDF5 file, we can preserve the configuration and weights of the model." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [], "source": [ "model.save('my_model.h5') # creates a HDF5 file 'my_model.h5'\n", "del model # deletes the existing model\n", "\n", "# returns a compiled model\n", "# identical to the previous one\n", "model = load_model('my_model.h5')" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "valid accuracy: 98.34\n" ] } ], "source": [ "# testing: predict the accuracy using the loaded model\n", "y_pred = model.predict_classes(X_test, verbose = 0)\n", "accuracy = np.sum(y_test == y_pred) / X_test.shape[0]\n", "print('valid accuracy: %.2f' % (accuracy * 100))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Keras Documentation](http://keras.io/) \n", "- [Keras Documentation: mnist_mlp example](https://github.com/fchollet/keras/blob/master/examples/mnist_mlp.py)\n", "- [Keras Documentation: Saving Keras Model](http://keras.io/getting-started/faq/#how-can-i-save-a-keras-model)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" }, "toc": { "nav_menu": { "height": "141px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": "block", "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }