{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "

Table of Contents

\n", "
" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n" ], "text/plain": [ "" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# code for loading the format for the notebook\n", "import os\n", "\n", "# path : store the current path to convert back to it later\n", "path = os.getcwd()\n", "os.chdir(os.path.join('..', 'notebook_format'))\n", "\n", "from formats import load_style\n", "load_style(plot_style=False)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using TensorFlow backend.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Ethen 2017-03-24 10:55:48 \n", "\n", "CPython 3.5.2\n", "IPython 5.3.0\n", "\n", "numpy 1.12.1\n", "pandas 0.19.2\n", "keras 2.0.2\n", "sklearn 0.18\n", "tensorflow 1.0.1\n" ] } ], "source": [ "os.chdir(path)\n", "\n", "# 1. magic to print version\n", "# 2. magic so that the notebook will reload external python modules\n", "%load_ext watermark\n", "%load_ext autoreload \n", "%autoreload 2\n", "\n", "import numpy as np\n", "import pandas as pd\n", "from keras.regularizers import l2\n", "from keras.models import Sequential\n", "from keras.callbacks import EarlyStopping\n", "from keras.layers.advanced_activations import PReLU\n", "from keras.wrappers.scikit_learn import KerasClassifier\n", "from keras.layers.core import Dense, Dropout, Activation\n", "from keras.layers.normalization import BatchNormalization\n", "from sklearn.metrics import accuracy_score\n", "from sklearn.model_selection import RandomizedSearchCV\n", "from tensorflow.examples.tutorials.mnist import input_data\n", "\n", "%watermark -a 'Ethen' -d -t -v -p numpy,pandas,keras,sklearn,tensorflow" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Keras Hyperparameter Tuning" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll use MNIST dataset. The downloaded data is split into three parts, 55,000 data points of training data (mnist.train), 10,000 points of test data (mnist.test), and 5,000 points of validation data (mnist.validation).\n", "\n", "Every part of the dataset contains the data and label and we can access them via .images and .labels. e.g. the training images are mnist.train.images and the train labels are mnist.train.labels (one-hot encoded)." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Extracting MNIST_data/train-images-idx3-ubyte.gz\n", "Extracting MNIST_data/train-labels-idx1-ubyte.gz\n", "Extracting MNIST_data/t10k-images-idx3-ubyte.gz\n", "Extracting MNIST_data/t10k-labels-idx1-ubyte.gz\n", "\n", "number of training observations: 55000\n", "number of validation observations: 5000\n", "number of testing observations: 5000\n", "feature num: 784\n", "class num: 10\n" ] } ], "source": [ "# convenient one-liner to load the dataset\n", "mnist = input_data.read_data_sets('MNIST_data', one_hot = True)\n", "\n", "# extract the training, validation and test set\n", "X_train = mnist.train.images\n", "y_train = mnist.train.labels\n", "X_val = mnist.validation.images\n", "y_val = mnist.validation.labels\n", "X_test = mnist.validation.images\n", "y_test = mnist.validation.labels\n", "print()\n", "print('number of training observations: ', X_train.shape[0])\n", "print('number of validation observations: ', X_val.shape[0])\n", "print('number of testing observations: ', X_test.shape[0])\n", "\n", "# the labels have already been one-hot encoded\n", "n_input = X_train.shape[1]\n", "n_class = y_train.shape[1]\n", "print('feature num: ', n_input)\n", "print('class num: ', n_class)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Keras provides a wrapper class `KerasClassifier` that allows us to use our deep learning models with scikit-learn, this is especially useful when you want to tune hyperparameters using scikit-learn's [RandomizedSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.RandomizedSearchCV.html) or [GridSearchCV](http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.GridSearchCV.html). \n", "\n", "To use it, we first define a function that takes the arguments that we wish to tune, inside the function, you define the network's structure as usual and compile it. Then the function is passed to `KerasClassifier`'s `build_fn` parameter. Note that like all other estimators in scikit-learn, `build_fn` should provide default values for its arguments, so that we could create the estimator even without passing in values for every parameters." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "collapsed": true }, "outputs": [], "source": [ "def build_keras_base(hidden_layers = [64, 64, 64], dropout_rate = 0, \n", " l2_penalty = 0.1, optimizer = 'adam',\n", " n_input = 100, n_class = 2):\n", " \"\"\"\n", " Keras Multi-layer neural network. Fixed parameters include: \n", " 1. activation function (PRelu)\n", " 2. always uses batch normalization after the activation\n", " 3. use adam as the optimizer\n", " \n", " Parameters\n", " ----------\n", " Tunable parameters are (commonly tuned)\n", " \n", " hidden_layers: list\n", " the number of hidden layers, and the size of each hidden layer\n", " \n", " dropout_rate: float 0 ~ 1\n", " if bigger than 0, there will be a dropout layer\n", " \n", " l2_penalty: float\n", " or so called l2 regularization\n", " \n", " optimizer: string or keras optimizer\n", " method to train the network\n", " \n", " Returns\n", " -------\n", " model : \n", " a keras model\n", "\n", " Reference\n", " ---------\n", " https://keras.io/scikit-learn-api/\n", " \"\"\" \n", " model = Sequential() \n", " for index, layers in enumerate(hidden_layers): \n", " if not index:\n", " # specify the input_dim to be the number of features for the first layer\n", " model.add(Dense(layers, input_dim = n_input, kernel_regularizer = l2(l2_penalty)))\n", " else:\n", " model.add(Dense(layers, kernel_regularizer = l2(l2_penalty)))\n", " \n", " # insert BatchNorm layer immediately after fully connected layers\n", " # and before activation layer\n", " model.add(BatchNormalization())\n", " model.add(PReLU()) \n", " if dropout_rate:\n", " model.add(Dropout(p = dropout_rate))\n", " \n", " model.add(Dense(n_class))\n", " model.add(Activation('softmax'))\n", " \n", " # the loss for binary and muti-class classification is different \n", " loss = 'binary_crossentropy'\n", " if n_class > 2:\n", " loss = 'categorical_crossentropy'\n", " \n", " model.compile(loss = loss, optimizer = optimizer, metrics = ['accuracy']) \n", " return model" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# pass in fixed parameters n_input and n_class\n", "model_keras = KerasClassifier(\n", " build_fn = build_keras_base,\n", " n_input = n_input,\n", " n_class = n_class,\n", ")\n", "\n", "# specify other extra parameters pass to the .fit\n", "# number of epochs is set to a large number, we'll\n", "# let early stopping terminate the training process\n", "early_stop = EarlyStopping(\n", " monitor = 'val_loss', min_delta = 0.1, patience = 5, verbose = 0)\n", "\n", "callbacks = [early_stop]\n", "keras_fit_params = { \n", " 'callbacks': callbacks,\n", " 'epochs': 200,\n", " 'batch_size': 2048,\n", " 'validation_data': (X_val, y_val),\n", " 'verbose': 0\n", "}\n", "\n", "# random search's parameter:\n", "# specify the options and store them inside the dictionary\n", "# batch size and training method can also be hyperparameters, \n", "# but it is fixed\n", "dropout_rate_opts = [0, 0.2, 0.5]\n", "hidden_layers_opts = [[64, 64, 64, 64], [32, 32, 32, 32, 32], [100, 100, 100]]\n", "l2_penalty_opts = [0.01, 0.1, 0.5]\n", "keras_param_options = {\n", " 'hidden_layers': hidden_layers_opts,\n", " 'dropout_rate': dropout_rate_opts, \n", " 'l2_penalty': l2_penalty_opts\n", "}" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Fitting 3 folds for each of 3 candidates, totalling 9 fits\n", "18333/18333 [==============================] - 2s \n", "36192/36667 [============================>.] - ETA: 0s" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=-1)]: Done 6 out of 9 | elapsed: 3.0min remaining: 1.5min\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "18333/18333 [==============================] - 1s \n", "36256/36667 [============================>.] - ETA: 0s" ] }, { "name": "stderr", "output_type": "stream", "text": [ "[Parallel(n_jobs=-1)]: Done 9 out of 9 | elapsed: 3.1min finished\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Best score obtained: -0.2284632457149802\n", "Parameters:\n", "\tdropout_rate: 0.2\n", "\thidden_layers: [100, 100, 100]\n", "\tl2_penalty: 0.5\n" ] } ], "source": [ "# `verbose` 2 will print the class info for every cross validation, \n", "# kind of too much\n", "rs_keras = RandomizedSearchCV( \n", " model_keras, \n", " param_distributions = keras_param_options,\n", " fit_params = keras_fit_params,\n", " scoring = 'neg_log_loss',\n", " n_iter = 3, \n", " cv = 3,\n", " n_jobs = -1,\n", " verbose = 1\n", ")\n", "rs_keras.fit(X_train, y_train)\n", "\n", "print('Best score obtained: {0}'.format(rs_keras.best_score_))\n", "print('Parameters:')\n", "for param, value in rs_keras.best_params_.items():\n", " print('\\t{}: {}'.format(param, value))" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4960/5000 [============================>.] - ETA: 0s" ] }, { "data": { "text/plain": [ "0.95979999999999999" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# flatten the one-hot encoded labels for\n", "# acessing prediction accuracy on the test set\n", "y_true = np.nonzero(y_test)[1]\n", "y_pred = rs_keras.predict(X_test)\n", "accuracy_score(y_true, y_pred)" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "54304/55000 [============================>.] - ETA: 0sloss : 0.339452602564\n", "acc : 0.956618181818\n" ] } ], "source": [ "# validator.best_estimator_ returns sklearn-wrapped version of best model.\n", "# validator.best_estimator_.model returns the (unwrapped) keras model\n", "best_model = rs_keras.best_estimator_.model\n", "metric_names = best_model.metrics_names\n", "metric_values = best_model.evaluate(X_train, y_train)\n", "for metric, value in zip(metric_names, metric_values):\n", " print(metric, ': ', value)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Reference" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "- [Keras Documentation: Wrappers for the Scikit-Learn API](https://keras.io/scikit-learn-api/)\n", "- [Blog: Use Keras Deep Learning Models with Scikit-Learn in Python](http://machinelearningmastery.com/use-keras-deep-learning-models-scikit-learn-python/)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.4" }, "toc": { "nav_menu": { "height": "81px", "width": "252px" }, "number_sections": true, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": true, "toc_position": {}, "toc_section_display": "block", "toc_window_display": true }, "varInspector": { "cols": { "lenName": 16, "lenType": 16, "lenVar": 40 }, "kernels_config": { "python": { "delete_cmd_postfix": "", "delete_cmd_prefix": "del ", "library": "var_list.py", "varRefreshCmd": "print(var_dic_list())" }, "r": { "delete_cmd_postfix": ") ", "delete_cmd_prefix": "rm(", "library": "var_list.r", "varRefreshCmd": "cat(var_dic_list()) " } }, "types_to_exclude": [ "module", "function", "builtin_function_or_method", "instance", "_Feature" ], "window_display": false } }, "nbformat": 4, "nbformat_minor": 1 }