{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Building and training a mutli-layer network with Keras" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "from keras.models import Sequential\n", "from keras.layers import Dense, Activation\n", "from keras.optimizers import SGD\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Classifying Iris versicolor\n", "\n", "Let us know try a slightly different problem: identifying the species *versicolor* instead of *setosa*.\n", "\n", "This is a more challenging problem, because the species *versicolor* is very close to the related species *virginica*, as shown on the data below." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Load data\n", "df = pd.read_csv('./data/versicolor/train.csv')\n", "X = df[['petal length (cm)', 'petal width (cm)']].values\n", "y = df['versicolor'].values" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def plot_keras_model( model=None ):\n", " \"Plot the Keras model, along with data\"\n", " plt.clf()\n", " # Calculate the probability on a mesh\n", " if model is not None:\n", " petal_width_mesh, petal_length_mesh = \\\n", " np.meshgrid( np.linspace(0,3,100), np.linspace(0,8,100) )\n", " petal_width_mesh = petal_width_mesh.flatten()\n", " petal_length_mesh = petal_length_mesh.flatten()\n", " p = model.predict( np.stack( (petal_length_mesh, petal_width_mesh), axis=1 ) )\n", " p = p.reshape((100,100))\n", " # Plot the probability on the mesh\n", " plt.imshow( p.T, extent=[0,8,0,3], origin='lower', \n", " vmin=0, vmax=1, cmap='RdBu', aspect='auto', alpha=0.7 )\n", " # Plot the data points\n", " plt.scatter( df['petal length (cm)'], df['petal width (cm)'], c=df['versicolor'], cmap='RdBu')\n", " plt.xlabel('petal length (cm)')\n", " plt.ylabel('petal width (cm)')\n", " cb = plt.colorbar()\n", " cb.set_label('versicolor')\n", "plot_keras_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Single layer network\n", "\n", "Let us see how a single-layer neural network performs in this case. Here are building exactly the same kind of network as we did in the previous notebook. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Build the model\n", "single_layer_model = Sequential()\n", "single_layer_model.add( Dense( output_dim=1, input_dim=2 ) )\n", "single_layer_model.add( Activation( 'sigmoid' ) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Prepare the model for training\n", "single_layer_model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1), metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Train the network\n", "single_layer_model.fit( X, y, batch_size=16, epochs=1000, verbose=0 )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_keras_model( model=single_layer_model )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The network is unable to make the correct prediction, even after 1000 epochs of training. \n", "\n", "This because, as we saw when tuning the weights by hand, a single-layer network is only capable of producing a single linear boundary between two areas of the plane. For a more complicated model, we need several layers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Two layer network\n", "\n", "A two layer network looks like this:\n", "\n", "![Two-layer network](./images/Two_layer.png)\n", "\n", "where the number of units in the intermediate layer (4 here) is a parameter that the user needs to choose." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Build the model: pick 8 units in the intermediate layer\n", "two_layer_model = Sequential()\n", "two_layer_model.add( Dense( input_dim=2, units=8 ) )\n", "two_layer_model.add( Activation( 'sigmoid' ) )\n", "two_layer_model.add( Dense( input_dim=8, units=1 ) )\n", "two_layer_model.add( Activation( 'sigmoid' ) )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Compile the model\n", "two_layer_model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1), metrics=['accuracy'])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Train it\n", "two_layer_model.fit( X, y, batch_size=16, epochs=1000, verbose=0 )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "plot_keras_model( model=two_layer_model )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## A few more remarks on Keras and neural networks\n", "\n", "Keras allows to build and train a number of neural network architectures:\n", "- fully-connected networks\n", "- convolutional networks\n", "- recurrent networks\n", "\n", "The corresponding code with the Keras interface is much less verbose than directly with the Tensorflow interfact (but also less flexible).\n", "\n", "Keras still requires the user to make many educated guesses:\n", "- Structure of the network (architecture, number of layers, number of nodes in each layers, activation functions)\n", "- Training parameters (loss function, optimizer and learning rate, batch size, number of epochs)" ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }