# The Art of Training Neural Networks with Keras

<br />

## Tools in the Deep Learning Ecosystem

<img src="deep_learning_ecosystem.jpeg" />

<br />

* Hardware GPU vs CPU: At the core hardware level, we have CPUs and GPUs executing instructions. 

    <br />

    * A CPU is able to execute large, sequential instructions but can only execute a small number of instructions parallelly
    
    <br />
    
    * A GPU can execute hundreds of small instructions parallelly
    
<img src = 'cpu_vs_gpu.png' />

* For deep learning where we have to do a bunch of linear algebraic computations parallelly, GPUs are exponentially faster than CPUs

<br />

* Compute Frameworks such as BLAS and CUDA help routine computations to be optimised for the specific processor instruction set for accelerated compute

    <br />
    
    * Basic Linear Algebra Subprograms are a bunch of routines that specify the low level routines for many linear algebraic operations. R, numpy, Matlab use BLAS to accelerate linear algebra operations. Ex: Intel's implementation of BLAS is known as Intel Math Kernel Library (MKL)
    
    <br />
    
    * CUDA is a framework created by NVIDIA that helps programmers write software to perform general purpose computing tasks on GPUs, most of the deep learning libraries use CUDA

<br />

* Libraries with support for Autodifferentiation to help in gradient computation for stacked layers were developed such as Theano, Tensorflow, CNTK and PyTorch. 

* These libraries are operations level (dot product, etc.) and the code is low level and hard to write for quick prototyping of new networks

<br />

<img src='deep_learning_frameworks.png' width='550px'/>

<br />

* That is wehre Keras comes into picture, it is a high level library with the abstraction at the __layer__ level. It was built as an abstraction to Theano and later support was added to Tensorflow and CNTK

* So you can write code in Keras that can run on any of these three deep learning library __backends__

## The Building Blocks of Neural Networks

### Layers, Data and Learning Representations

<br />

* __Layers__: logically grouped operations in a neural network, the parameters for the operations in the layer learn to generate the best features to predict the target
 
<img src='nn_layers.jpeg' width='400px'/>

<img src='nn_layer_operations.png' width='650px'/>

<br />

* The network needs to have __input data__ and corresponding __targets (y)__

<br />

* In traditional machine learning we see that changing the representation of the data (kernel trick, etc.) helps ease the process of learning from data

<br />

* __Activation function__ adds that non linearity and in combination with weights (parameters) of a layer, the network learns better representations of the data at each layer

<br />

### So basically, each layer takes input as data and spits out transformed data as output, simple as that. Now, let's dive into the details

<img src='learning_representations_mlp.jpg' />

<br />

* The goal of training neural networks is to find these perfect representation of data, which we get by "learning" the right weights

<img src='learning_weights.jpg' />

<br />
 
* The loss function, which defines the feedback signal used for learning helps guage __how different are the targets and the predicted targets__

<img src='loss_function.jpg' />

<br />
 
* The optimizer, based on the feedback signal from the loss function changes the parameters / weights of the network to help make the predictions as close to the target as possible (minimizing the loss function)

<br />

<img src='building_blocks_of_neural_networks.jpg' />

## The Keras Interface

<br />

* There are two major ways to define and run neural netwroks using the Keras API
    
    <br />
    
    * Sequential API
    
    <br />
    
    * Functional API

<br />

### The Sequential API

* The sequential api allows us to __quickly stack layers__ and build networks

<img src="keras_interface.jpg" width='550px'/>

<br />

<img src='keras_sequential_api.jpg' />

<br />

### The Functional API

* The functional api allows us to build complex graph networks, we can kee chaining the the layers as functions and finally the `Model(inputs, outputs)` class connects all the various inputs and outputs

<br />

<img src="functional_api_bimodal_network.png" width='450px'/>



## Layers in Keras

* There are different categories of layers in Keras, the most commonly used ones are "Dense Layers", "Convolution Layers", "Recurrent Layers", etc.

In [1]:
#!pip install tensorflow



In [1]:
from keras.layers import Dense

Using TensorFlow backend.


In [2]:
import keras

In [3]:
dir(keras.layers)

['Activation',
 'ActivityRegularization',
 'Add',
 'AlphaDropout',
 'AtrousConvolution1D',
 'AtrousConvolution2D',
 'Average',
 'AveragePooling1D',
 'AveragePooling2D',
 'AveragePooling3D',
 'AvgPool1D',
 'AvgPool2D',
 'AvgPool3D',
 'BatchNormalization',
 'Bidirectional',
 'Concatenate',
 'Conv1D',
 'Conv2D',
 'Conv2DTranspose',
 'Conv3D',
 'Conv3DTranspose',
 'ConvLSTM2D',
 'ConvLSTM2DCell',
 'ConvRecurrent2D',
 'Convolution1D',
 'Convolution2D',
 'Convolution3D',
 'Cropping1D',
 'Cropping2D',
 'Cropping3D',
 'CuDNNGRU',
 'CuDNNLSTM',
 'Deconvolution2D',
 'Deconvolution3D',
 'Dense',
 'DepthwiseConv2D',
 'Dot',
 'Dropout',
 'ELU',
 'Embedding',
 'Flatten',
 'GRU',
 'GRUCell',
 'GaussianDropout',
 'GaussianNoise',
 'GlobalAveragePooling1D',
 'GlobalAveragePooling2D',
 'GlobalAveragePooling3D',
 'GlobalAvgPool1D',
 'GlobalAvgPool2D',
 'GlobalAvgPool3D',
 'GlobalMaxPool1D',
 'GlobalMaxPool2D',
 'GlobalMaxPool3D',
 'GlobalMaxPooling1D',
 'GlobalMaxPooling2D',
 'GlobalMaxPooling3D',
 'High

In [5]:
layer1 = Dense(units = 32, activation = 'sigmoid')

* Let's break down the above layer that we just built using the Dense class from keras

<br />

* The OUTPUT from the above layer can be given by `sigmoid ( dot ( W, INPUT ) + b )`, in the first round of training the W (weights) are randomly "initialized"

<br />

* Every input to the Dense layer (also known as fully connected) is connected to every unit in the hidden layer, as shown in the figure below

<br />

<img src ='fc_dense_layers_keras.jpg' />

<br />

* We will see many more categories and types of __Layers__ in keras, the __"Dense"__ layer is just one such class

## Stacking the layers together : The Keras Sequential API

<br />

* The keras sequential api enables us to build common yet complex neural network architectures flexibly

<br />

* Objects of the Keras sequential class, can have multiple neural network layers stacked on top of one another

<br />

<img src='keras_sequential_api.jpg' />



* You can create a Keras sequential model by passing in a list of layers to the Sequential object

In [6]:
from keras.models import Sequential
from keras.layers import Dense

model = Sequential([
    Dense(32, input_shape=(784,), activation = 'sigmoid'),
    Dense(10, activation = 'sigmoid'),
    Dense(9,activation='sigmoid'),
    Dense(1, activation = 'sigmoid')
])


* But soon this becomes a problem to add more layers in a single list, so the "add" method on the sequential class object can add layers sequentially to the neural network

In [7]:
neural_network = Sequential()

neural_network.add(Dense(32, input_dim=784, activation = 'sigmoid'))

neural_network.add(Dense(10, activation = 'sigmoid'))

neural_network.add(Dense(1, activation = 'sigmoid'))

* The summary method on the neural network object gives us basic information about the structure of the network

In [8]:
neural_network.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_5 (Dense)              (None, 32)                25120     
_________________________________________________________________
dense_6 (Dense)              (None, 10)                330       
_________________________________________________________________
dense_7 (Dense)              (None, 1)                 11        
Total params: 25,461
Trainable params: 25,461
Non-trainable params: 0
_________________________________________________________________


* That's all! Keras is that simple, initialize an object of the Sequential class, use the add method on that object to add layers to the network sequentially, but wait what about the loss function? what about the optimizer?

## Defining, Compiling and Running a Neural Network in Keras

<br />

### The process of learning in a Neural Network

<br />

### Loss Score

<br />

* The loss score is feedback signal that says how far is the output of your network compared to the ground truth

<br />

<img src='loss_function.jpg' />

<br />

* Two such loss scores that we use quite frequently are :
    
    1) Binary Cross-Entropy: For Two-Class Classification problems
    
    2) Mean Squared Error: For Regression problems
    
<br />

* We have already come across mean squared error before, so let's dive deeper into Binary Cross Entropy

<br />

$$\begin{eqnarray} 
  C = -\frac{1}{n} \sum_x \left[y \ln p + (1-y ) \ln (1-p) \right]
\end{eqnarray}$$

<br />

* In the above equation, p is the output of the network, n is the total number of samples in the training data, the sum is over all training inputs, x, and y is the corresponding desired output

<br />

* Below, we see the value of the cross entropy (sometimes referred to as the log loss) changing with the predicted probability, we can see that the value of the loss for a prediction above 0.5 significantly drops and this helps the network converge much faster than using a traditional mean squared error

<br />

<img src='img/binary_cross_entropy.png' width='300px'/>

## Optimizers

<br />

* An optimizer is an algorithm that uses the feedback signal from the loss function, to actually update the weights so that the output from the network gets closer to the ground truth. The first optimizer that we use is Stochastic Gradient Descent (SGD), we will slowly come across many more optimizers

<br />

* We can import Classes from the optimizers module of keras and customize the specific optimizer to our liking

<br />

<img src='building_blocks_of_neural_networks.jpg' /> 

In [9]:
from keras.optimizers import SGD

customized_optimizer = SGD(lr = 0.0001)

* We already know how learning rate can effect convergence, the graph below provides a decent intuition, hence having the flexibility to change the learning rate is very important

![](img/learning_rate.jpg)

## Compiling the neural network ( loss function + optimizer)

In [10]:
neural_network.compile(loss = 'binary_crossentropy', optimizer = 'sgd', metrics = ['accuracy'])

In [11]:
neural_network.compile(loss = 'binary_crossentropy', optimizer = customized_optimizer, metrics = ['accuracy'])

* As we can see from the compile step above we need to specify the loss function, optimization algorithm and we can also mention the metrics that we want to monitor while training the neural network

<br />

* Also, please __note__ that the optimizer argument can either be a string or a customizable object from the optimizers module in Keras

## So are we done? what about learning the weights? 

<br />

* Training the network in Keras is also very simple, we call the `.fit()` method and pass in the arguments

<br />

* Some important terms for training neural networks are epochs, batch_size

* An __Epoch__ is when an ENTIRE dataset is passed forward and backward through the neural network only once.

<br />

* Since most of the times an epoch is too large to fit in memory, we divide the data into batches and compute the gradient on batches for each forward and backward pass

<br />

* __Batch size__ is the number of samples that are going to be propagated through the network.

<br />

<img src='learning_rate_epochs_rel.png' width='400px'/>

## Neural Network Architectures for Basic ML Tasks

<br />

<img src='nn_for_basic_ml_tasks.jpg' width='800px'/>