# Introduction to Keras for Deep Learning

In [1]:
import addutils.toc ; addutils.toc.js(ipy_notebook=True)

In [54]:
import numpy as np
import pandas as pd
from utilities import cifar10
from addutils import css_notebook
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import mean_squared_error
import random
import time
css_notebook()

In [41]:
import bokeh.plotting as bk
from bokeh.io import push_notebook
from bokeh.layouts import gridplot
from bokeh.models import ColumnDataSource, Range1d
bk.output_notebook()

## 1 Keras

Keras is designed to be modular, minimalist and easily extensible. Francois Chollet, the author of Keras, says:

>The library was developed with a focus on enabling fast experimentation. Being able to go from
idea to result with the least possible delay is key to doing good research.

Keras defines high-level Neural Network modules on top of either TensorFlow or Theano (nowadays has gained less attention thatn TensorFlow). It is possible to compose layer in a modular fashion and even extend the framework with  user defined models.

### 1.1 Installation

Installation is easy. By now you should have the environment *addfor_tutorials* with TensorFlow installed. If either case is not true please refer to the *README.md* to install anaconda and notebook *ml25v04_tensorflow_basic_concepts.ipynb* to install TensorFlow.

Activate your addfor_tutorials environment and from the command-line type:
```
pip install keras
```
Now you can import Keras and check that it is using TensorFlow as its backend.

In [4]:
import keras

  from ._conv import register_converters as _register_converters
Using TensorFlow backend.


In [5]:
print(keras.__version__)

2.1.4


### 1.2 Keras architecture

### 1.2.1 Introduction

Keras uses the backend to perform efficient symbolic computation on Tensors. There are two ways to compose models in Keras: 
 - **Sequential** composition
 - **Functional** composition

The sequential composer build a lists of modules that constituites the architecture of the network, for example a simple feed forward neural network for MNIST can be written as:
```python
model = Sequential()
model.add(Dense(32, input_shape=(784,)))
model.add(Activation('relu'))
model.add(Dense(10))
model.add(Activation('softmax'))
```

The functional API treat each layer as a function and allows to compose functions into a complex neural network. For example the network defined before can be expressed as somethign like: $$y=relu(f(\sigma(g(x)))$$
The same network can be defined with functional API with:
```python
inputs = Input(shape=(784,))
x = Dense(32)(inputs)
x = Activation("relu")(x)
x = Dense(10)(x)
predictions = Activation("softmax")(x)
model = Model(inputs=inputs, outputs=predictions)
```
Each layer is a function, and since a model is a composition of layers, a model is also a function and can be treated as another layer by calling it on appropriately shaped input tensor. 

Functional API can be used to define any kind of network, but there are some kind of networks that can be defined only using functional API. For example networks with multiple input and outputs or networks that use shared layers. As an example to define a multiple input-output network you can use:

```python
model = Model(inputs=[input1, input2], outputs=[output1, output2])
```

### 1.2.2 Layers Overview

A *Dense model* is a fully connected neural network layer. 

```python
keras.layers.Dense
```

*Convolutional layers* are principally:
```python
keras.layers.convolutional.Conv1D
keras.layers.convolutional.Conv2D
keras.layers.pooling.MaxPooling1D
keras.layers.pooling.MaxPooling2D
```

*Regularization layers:*
```python
keras.layers.core.Dropout
keras.layers.normalization.BatchNormalization
```

*Activation functions*: all principal activation functions are supported.

*Losses*: cross entropy, mean squared error and all popular losses are supported.

*Metrics*: a measure that tells how the model is performing.

*Optimizers*: such as Adam, Adagrad, RMSProp and plain SGD are all supported in module `keras.optimizers`.

### 1.2.3 Training

Beofre training a model, it is necessary to `compile` it. Compile takes three arguments: an optimizer, a loss and a metric.

```python
# For a multi-class classification problem
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
```

Once the model is compiled, training is performed by calling the fit method, for example:
```python
# Train the model, iterating on the data in batches of 32 samples
model.fit(data, labels, epochs=10, batch_size=32)
```

### 1.2.4 Additional operations

In Keras is possible to save model architecture to yaml or json format by calling:

```python
# model saving
json_string = model.to_json()
yaml_string = model.to_yaml() 
# model reconstruction
model = model_from_json(json_string)
model = model_from_yaml(yaml_string)
```

Weights are saved in hdf5 format instead, by calling:
```python
model.save('my_model.h5')
```
and restoring with:
```python
model = load_model('my_model.h5')
```

One of the coolest things about Keras is the possiblity of adding callbacks during training. For example the model can decice when to stop based on a EarlyStopping condition or loss history can be saved (and later viewed with TensorBoard) at each iteration. Keras support model checkpointing in a similar way to TensorFlow.

## 2 Simple example

In [29]:
from keras.models import Sequential
from keras.layers.core import Dense, Dropout, Activation, Flatten
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.optimizers import SGD, Adam, RMSprop
from keras.callbacks import TensorBoard

In [None]:
IMG_CHANNELS = 3
IMG_ROWS = 32
IMG_COLS = 32

In [None]:
cifar10.data_path = "example_data/CIFAR-10/"
cifar10.maybe_download_and_extract()

In [None]:
class_names = cifar10.load_class_names()
images_train, cls_train, labels_train = cifar10.load_training_data()
images_test, cls_test, labels_test = cifar10.load_test_data()

In [None]:
# constants
BATCH_SIZE = 128
NB_EPOCH = 6
NB_CLASSES = 10
VERBOSE = 0
VALIDATION_SPLIT = 0.2

In [None]:
model = Sequential()
model.add(Conv2D(32, (3, 3), padding='same', 
                 input_shape=(IMG_ROWS, IMG_COLS, IMG_CHANNELS)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(64, (3, 3), padding='same'))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.7))
model.add(Dense(NB_CLASSES))
model.add(Activation('softmax'))

In [None]:
model.summary()

In [None]:
model.compile(loss='categorical_crossentropy', optimizer=RMSprop(), metrics=['accuracy'])

It is possible to add custom callbacks, for example one that records the loss and accuracy.

In [None]:
class LossHistory(keras.callbacks.Callback):
    def on_train_begin(self, logs={}):
        self.losses = []
        self.accuracy = []

    def on_batch_end(self, batch, logs={}):
        self.losses.append(logs.get('loss'))
        self.accuracy.append(logs.get('acc'))

Multiple callbacks can be added to the model, for example we can write logs to be read later by tensorboard.

In [None]:
callbacks = []
callbacks.append(TensorBoard(log_dir='temp/keras/logs'))

In [None]:
history_new = LossHistory()
callbacks.append(history_new)

The model returns the values recorded during the training (it is independent by the callbacks)

In [None]:
hist = model.fit(images_train, labels_train, batch_size=BATCH_SIZE,
                 epochs=NB_EPOCH, validation_split=VALIDATION_SPLIT, 
                 verbose=VERBOSE, callbacks=callbacks)

In [None]:
score = model.evaluate(images_test, labels_test, batch_size=BATCH_SIZE, verbose=VERBOSE)
print('Test accuracy: {}'.format(score[1])) 

In [None]:
print(hist.history)

In [None]:
fig = bk.figure(plot_width=600, plot_height=350, title=None)
fig.line(np.array(range(len(history_new.losses))), np.array(history_new.losses))
fig.line(np.array(range(len(history_new.accuracy))), np.array(history_new.accuracy), 
         color='red')
bk.show(fig)

## 3 Transfer Learning with Keras

For this example you need to download the [stanford dog dataset](http://vision.stanford.edu/aditya86/ImageNetDogs/images.tar) (with corresponding [annotaions](http://vision.stanford.edu/aditya86/ImageNetDogs/annotation.tar) and train/test [split](http://vision.stanford.edu/aditya86/ImageNetDogs/lists.tar)). Extract the dataset into the example_data directory and then execute following cell to split data in training and evalutaion sets.

In [None]:
import scipy.io
import os
import shutil

In [None]:
test_list = scipy.io.loadmat('example_data/lists/test_list.mat')
train_list = scipy.io.loadmat('example_data/lists/train_list.mat')

In [None]:
for el in train_list['file_list']:
    dirname = os.path.dirname(el[0][0])
    dstdir = os.path.join('example_data/data/train', dirname)
    if not os.path.exists(dstdir):
        os.makedirs(dstdir)
    srcfile = os.path.join('example_data/Images', el[0][0])
    shutil.copy(srcfile, dstdir)

for el in test_list['file_list']:
    dirname = os.path.dirname(el[0][0])
    dstdir = os.path.join('example_data/data/test', dirname)
    if not os.path.exists(dstdir):
        os.makedirs(dstdir)
    srcfile = os.path.join('example_data/Images', el[0][0])
    shutil.copy(srcfile, dstdir)

Now import the necessary modules and functions

In [None]:
from keras.preprocessing.image import ImageDataGenerator
from keras.applications.vgg16 import preprocess_input, VGG16
from keras.layers import Dense, GlobalAveragePooling2D
from keras.models import Model

We do not rely on numpy to import data into our model. Instead we use a datagenerator, similar to tf.data, to load the images directly from disk. With this function we are able to preprocess the input with additional functions, for example we use the function `preprocess_input` to apply the same transformation to the images that VGG uses.

In [None]:
train_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)
validation_datagen = ImageDataGenerator(preprocessing_function=preprocess_input)

batch_size = 32
train_generator = train_datagen.flow_from_directory('example_data/data/train', target_size=(224,224),
                                                    class_mode='categorical', shuffle=True,
                                                    batch_size=batch_size)

validation_generator = validation_datagen.flow_from_directory('example_data/data/test', target_size=(224,224),
                                                              class_mode='categorical', shuffle=False,
                                                              batch_size=batch_size)

Now use a pretrained version of VGG-16, trained on imagenet, but exclude the top layers. We add new layers to account for different number of classes. In this case we have 120 classes.

In [None]:
base_model = VGG16(weights='imagenet', include_top=False)
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(120, activation='softmax')(x)    

In [None]:
model = Model(inputs=base_model.input, outputs=predictions)

for layer in base_model.layers:
    layer.trainable = False

model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy'])

In [None]:
model.fit_generator(train_generator, epochs=10, validation_data=validation_generator)

Now the top layers should have learned the mapping from the pretrained weights to the correct classes. You can explore all layers of the model and decide which layer to freeze and which one to fine train. Remember that once frozen you have to compile the model again.

In [None]:
for i, layer in enumerate(base_model.layers):
    print(i, layer.name)

## 4 LSTM with Keras

This example is analogous to the one in previous notebook.

In [8]:
data = pd.read_csv('example_data/data2.csv', parse_dates=['X0'])

In [9]:
prediction = 1
steps_forward = 12
steps_backward = 0 
inputs_default = 0
hidden = 128
batch_size = 1024
timesteps = 12
epochs = 30
test_size = 0.4

In [26]:
def rolling_past(X, y, size):
    X = pd.DataFrame(X)
    y = pd.DataFrame(y)
    dfs = [X.shift(i) for i in range(size)]
    res = pd.concat(dfs, axis=1)
    res['target'] = y
    res.dropna(inplace=True, axis=0)
    res_shuffle = res.iloc[np.random.permutation(len(res))]
    res_y = res['target']
    res.drop(['target'], axis=1, inplace=True)
    res_y_shuffle = res_shuffle['target']
    res_shuffle.drop(['target'], axis=1, inplace=True)
    return (res.values.reshape(res.shape[0], size, -1), res_y.values.reshape((-1, 1)),
            res_shuffle.values.reshape(res_shuffle.shape[0], size, -1), res_y_shuffle.values.reshape((-1,1)))

In [10]:
input_range = {
    'X109': [steps_backward, steps_forward],
    'X110': [steps_backward, steps_forward],
    'X111': [steps_backward, steps_forward],
    'X112': [steps_backward, steps_forward],
    'X70': [steps_backward, steps_forward],
    'X71': [steps_backward, steps_forward],
    'X73': [steps_backward, steps_forward],
    'X91': [steps_backward, steps_forward],
    'X92': [steps_backward, steps_forward],
    'X94': [steps_backward, steps_forward],
}

In [11]:
X_columns = ['X109',  'X54', 'X53', 'X71', 'X112', 'X59', 'X111', 'X92', 'X66', 
             'X94', 'X73', 'X91', 'X110', 'X40', 'y', 'X47', 'X48', 'X70', 'X60']
y_column = 'y'

In [19]:
def transform(source, y_column, X_columns, inputs_per_column,
              inputs_default, steps_forward, dates='X0'):
    dates = source[dates].iloc[:-steps_forward]

    y = pd.DataFrame()
    y[y_column] = source[y_column].shift(-steps_forward)

    scaler = StandardScaler()

    new_X = pd.DataFrame(scaler.fit_transform(source[X_columns]), columns=X_columns)
    X = pd.DataFrame()

    for column in X_columns:
        if inputs_per_column:
            inputs = inputs_per_column.get(column, None)
            if inputs:
                inputs_list = range(inputs[0], inputs[1] + 1)
            else:
                inputs_list = range(-inputs_default, 1)
        else:
            inputs_list = range(-inputs_default, 1)

        for i in inputs_list:
            col_name = "%s_%s" % (column, i)
            X[col_name] = new_X[column].shift(-i)  # Note: shift direction is inverted

    null_indices = y.isnull().any(1).nonzero()[0]
    X.drop(null_indices, axis=0, inplace=True)
    X.dropna(inplace=True, axis=0)
    y.dropna(inplace=True, axis=0)

    return X, y, dates, X.index

In [20]:
def split(X, y, dates, test_size):
    X.set_index(dates, inplace=True)
    X_group = X.groupby(X.index.week)
    y.set_index(dates, inplace=True)
    y_group = y.groupby(X.index.week)

    a = list(X_group.groups.keys())# [:-1] 
    random.shuffle(a)
    sp = int(len(a) * test_size)
    # train_weeks = sorted(a[sp:])
    train_weeks = a[sp:]
    # test_weeks = sorted(a[:sp])
    test_weeks = a[:sp]

    print('train_weeks: ', train_weeks)

    X_train = pd.concat([X_group.get_group(i).reset_index(drop=True) for i in train_weeks])
    X_test = pd.concat([X_group.get_group(i).reset_index(drop=True) for i in test_weeks])
    y_train = pd.concat([y_group.get_group(i).reset_index(drop=True) for i in train_weeks])
    y_test = pd.concat([y_group.get_group(i).reset_index(drop=True) for i in test_weeks])

    return (X_train.values, y_train.values, X_test.values, y_test.values)

In [21]:
X, y, dates, _ = transform(source=data, y_column=y_column, X_columns=X_columns,
                           inputs_per_column=input_range, inputs_default=inputs_default,
                           steps_forward=steps_forward)

In [24]:
X_train, y_train, X_test, y_test = split(X, y, dates, test_size)

train_weeks:  [24, 21, 17, 16, 15, 18, 25, 14]


In [27]:
X_train, y_train, X_train_shuffled, y_train_shuffled = rolling_past(X_train, y_train, timesteps)
X_test, y_test, X_test_shuffled, y_test_shuffled = rolling_past(X_test, y_test, timesteps)
X, y, _, _ = rolling_past(X, y, timesteps)

In [31]:
from keras.layers import LSTM

# Build the model
model = Sequential()
model.add(LSTM(hidden,
               batch_input_shape=(None,
                                  timesteps,
                                  X.shape[2])))
model.add(Dense(y.shape[1]))
model.compile(loss='mean_squared_error', optimizer='adam')

In [36]:
# Fit the model
t0 = time.time()
model.fit(X_train_shuffled,
          y_train_shuffled,
          epochs=epochs,
          batch_size=batch_size,
          shuffle=False)
print('Training time: {:3.6f} s'.format(time.time() - t0))

Epoch 1/100
Epoch 2/100
Epoch 3/100
Epoch 4/100
Epoch 5/100
Epoch 6/100
Epoch 7/100
Epoch 8/100
Epoch 9/100
Epoch 10/100
Epoch 11/100
Epoch 12/100
Epoch 13/100
Epoch 14/100
Epoch 15/100
Epoch 16/100
Epoch 17/100
Epoch 18/100
Epoch 19/100
Epoch 20/100
Epoch 21/100
Epoch 22/100
Epoch 23/100
Epoch 24/100
Epoch 25/100
Epoch 26/100
Epoch 27/100
Epoch 28/100
Epoch 29/100
Epoch 30/100
Epoch 31/100
Epoch 32/100
Epoch 33/100
Epoch 34/100
Epoch 35/100
Epoch 36/100
Epoch 37/100
Epoch 38/100
Epoch 39/100
Epoch 40/100
Epoch 41/100
Epoch 42/100
Epoch 43/100
Epoch 44/100
Epoch 45/100
Epoch 46/100
Epoch 47/100
Epoch 48/100
Epoch 49/100
Epoch 50/100
Epoch 51/100
Epoch 52/100
Epoch 53/100
Epoch 54/100
Epoch 55/100
Epoch 56/100
Epoch 57/100
Epoch 58/100
Epoch 59/100
Epoch 60/100
Epoch 61/100
Epoch 62/100
Epoch 63/100
Epoch 64/100
Epoch 65/100
Epoch 66/100
Epoch 67/100
Epoch 68/100
Epoch 69/100
Epoch 70/100
Epoch 71/100
Epoch 72/100
Epoch 73/100
Epoch 74/100
Epoch 75/100
Epoch 76/100
Epoch 77/100
Epoch 78

Epoch 94/100
Epoch 95/100
Epoch 96/100
Epoch 97/100
Epoch 98/100
Epoch 99/100
Epoch 100/100
Training time: 658.307858 s


In [37]:
# Predict all dataset
t0 = time.time()
y_hat = model.predict(X, batch_size=batch_size)
print('Prediction time whole dataset: {:3.6f} s'.format(time.time() - t0))

Prediction time whole dataset: 3.522112 s


In [49]:
# Predict training set only
t0 = time.time()
y_train_predicted = model.predict(X_train, batch_size=batch_size)
print('Prediction time train set: {:3.6f} s'.format(time.time() - t0))

Prediction time train set: 1.974006 s


In [50]:
# Predict test set only
t0 = time.time()
y_test_predicted = model.predict(X_test, batch_size=batch_size)
print('Prediction time test set: {:3.6f} s'.format(time.time() - t0))

Prediction time test set: 1.323014 s


In [55]:
trainScore = np.sqrt(mean_squared_error(y_train, y_train_predicted))
print('Train Score: %.2f RMSE' % (trainScore))
testScore = np.sqrt(mean_squared_error(y_test, y_test_predicted))
print('Test Score: %.2f RMSE' % (testScore))

Train Score: 0.40 RMSE
Test Score: 0.42 RMSE


In [52]:
tools = 'pan,wheel_zoom,box_zoom,reset,save'
fig_b = bk.figure(plot_width=800, plot_height=350, 
                  x_axis_label='time',
                  y_axis_label='value',
                  tools=tools)
fig_b.line(data['X0'].values[:len(y)], np.ravel(y), legend='true value')
fig_b.line(data['X0'].values[:len(y_hat)], np.ravel(y_hat), color='orange', legend='predicted value')
bk.show(fig_b)

---

Visit [www.add-for.com](<http://www.add-for.com/IT>) for more tutorials and updates.

This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/">Creative Commons Attribution-ShareAlike 4.0 International License</a>.