{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Building and training a single-layer neural network with keras\n",
    "\n",
    "Keras is a high-level Python interface for building and training neural networks. \n",
    "\n",
    "It can use Tensorflow or Theano as backend, so as to make efficient use of the CPU and (if available) GPU."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "# Keras imports\n",
    "from keras.models import Sequential\n",
    "from keras.layers import Dense, Activation\n",
    "from keras.optimizers import SGD"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Defining the model with keras\n",
    "\n",
    "Keras describes a neural network by a set of `Dense` layers (which contain the weights) and `Activation` layers (which contain the non-linear functions).\n",
    "\n",
    "![keras layers](./images/keras_layers.png)\n",
    "\n",
    "In the case of our simple single-layer network, the API for building the network is the following."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build the model with keras\n",
    "model = Sequential()\n",
    "model.add( Dense( input_dim=2, units=1 ) )\n",
    "model.add( Activation( 'sigmoid' ) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Print the summary\n",
    "model.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Training the network\n",
    "\n",
    "Again, we will load the training data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Load data\n",
    "df = pd.read_csv('./data/setosa/train.csv')\n",
    "X = df[['petal length (cm)', 'petal width (cm)']].values\n",
    "y = df['setosa'].values"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "and define a function to look at the predictions of the model (which for the moment is untrained)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def plot_keras_model():\n",
    "    \"Plot the results of the model, along with the data points\"\n",
    "    # Calculate the probability on a mesh\n",
    "    petal_width_mesh, petal_length_mesh = \\\n",
    "        np.meshgrid( np.linspace(0,3,100), np.linspace(0,8,100) )\n",
    "    petal_width_mesh = petal_width_mesh.flatten()\n",
    "    petal_length_mesh = petal_length_mesh.flatten()\n",
    "    p = model.predict( np.stack( (petal_length_mesh, petal_width_mesh), axis=1 ) )\n",
    "    p = p.reshape((100,100))\n",
    "    # Plot the probability on the mesh\n",
    "    plt.clf()\n",
    "    plt.imshow( p.T, extent=[0,8,0,3], origin='lower', \n",
    "               vmin=0, vmax=1, cmap='RdBu', aspect='auto', alpha=0.7 )\n",
    "    # Plot the data points\n",
    "    plt.scatter( df['petal length (cm)'], df['petal width (cm)'], c=df['setosa'], cmap='RdBu')\n",
    "    plt.xlabel('petal length (cm)')\n",
    "    plt.ylabel('petal width (cm)')\n",
    "    cb = plt.colorbar()\n",
    "    cb.set_label('setosa')\n",
    "plot_keras_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Keras will then automatically adjust the weights by trying to minimize a given **loss function**.\n",
    "\n",
    "When the network output is a probability, a good loss function is the binary cross-entropy:\n",
    "\n",
    "$ L(p_{setosa}) = - y\\log(p_{setosa}) - (1-y) \\log(1-p_{setosa}) $"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Prepare the model for training\n",
    "model.compile(loss='binary_crossentropy', optimizer=SGD(lr=0.1), metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Train the network\n",
    "model.fit( X, y, batch_size=16, epochs=20, verbose=1 )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "plot_keras_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### What just happened?\n",
    "\n",
    "Just like we did by hand, keras tried to find the **best set of weights** for the network.\n",
    "\n",
    "In order to do so, Keras divided the dataset into **batches** of 16 examples (16 specimens of iris at a time). \n",
    "For each batch, it computed how the weights should be changed in order to **make the loss function lower**, and correspondingly ajusted them by a small amount (which is proportional to the learning rate `lr`).\n",
    "\n",
    "The process of going through the whole dataset (batch by batch) is called an **epoch**. The network needs to \"see\" the dataset several times (i.e. it needs several epochs) in order to find the right weights."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### A few remarks on training\n",
    "\n",
    "**The training depends on a number of parameters, that have to be \"skillfully\" chosen by the user:**\n",
    "- loss function\n",
    "- optimizer and learning rate\n",
    "- batch size\n",
    "- number of epochs\n",
    "\n",
    "A \"bad\" choice may result in the network training more slowly, or not properly training at all.\n",
    "\n",
    "**The training is stochastic.**\n",
    "This is because:\n",
    "- The initial values of the weights are chosen randomly.\n",
    "- The way the data is divided into batch is random."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Performing predictions on the test sets\n",
    "\n",
    "The optimal weights are stored in the `model` object. As before, we can thus use them to make new predictions on the test dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_test = pd.read_csv('./data/setosa/test.csv')\n",
    "df_test.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.predict( np.array([[4.2, 1.5]]) )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_test['probability_setosa_predicted'] = model.predict( df_test[['petal length (cm)', 'petal width (cm)']].values )"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df_test"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Next steps\n",
    "\n",
    "Let us know look at how to train a network with multiple layers [here](./Multi_layer_keras.ipynb)."
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": {},
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}