"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dims = X.shape[1]\n",
"print(dims, 'dims')\n",
"print(\"Building model...\")\n",
"\n",
"nb_classes = Y.shape[1]\n",
"print(nb_classes, 'classes')\n",
"\n",
"model = Sequential()\n",
"model.add(Dense(nb_classes, input_shape=(dims,)))\n",
"model.add(Activation('softmax'))\n",
"model.compile(optimizer='sgd', loss='categorical_crossentropy')\n",
"model.fit(X, Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Simplicity is pretty impressive right? :)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now lets understand:\n",
"The core data structure of Keras is a **model**, a way to organize layers. The main type of model is the **Sequential** model, a linear stack of layers.

\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we did here is stacking a Fully Connected (**Dense**) layer of trainable weights from the input to the output and an **Activation** layer on top of the weights layer."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Dense"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"```python\n",
"from keras.layers.core import Dense\n",
"\n",
"Dense(output_dim, init='glorot_uniform', activation='linear', \n",
" weights=None, W_regularizer=None, b_regularizer=None,\n",
" activity_regularizer=None, W_constraint=None, \n",
" b_constraint=None, bias=True, input_dim=None)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Activation"
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": true
},
"source": [
"```python\n",
"from keras.layers.core import Activation\n",
"\n",
"Activation(activation)\n",
"```"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Optimizer"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you need to, you can further configure your optimizer. A core principle of Keras is to make things reasonably simple, while allowing the user to be fully in control when they need to (the ultimate control being the easy extensibility of the source code).\n",
"Here we used **SGD** (stochastic gradient descent) as an optimization algorithm for our trainable weights. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\"Data Sciencing\" this example a little bit more\n",
"====="
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"What we did here is nice, however in the real world it is not useable because of overfitting.\n",
"Lets try and solve it with cross validation."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"##### Overfitting"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In overfitting, a statistical model describes random error or noise instead of the underlying relationship. Overfitting occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. \n",
"\n",
"A model that has been overfit has poor predictive performance, as it overreacts to minor fluctuations in the training data."
]
},
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To avoid overfitting, we will first split out data to training set and test set and test out model on the test set.\n",
"Next: we will use two of keras's callbacks **EarlyStopping** and **ModelCheckpoint**

"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Train on 19835 samples, validate on 3501 samples\n",
"Epoch 1/20\n",
"19835/19835 [==============================] - 0s - loss: 0.6391 - val_loss: 0.6680\n",
"Epoch 2/20\n",
"19835/19835 [==============================] - 0s - loss: 0.6386 - val_loss: 0.6689\n",
"Epoch 3/20\n",
"19835/19835 [==============================] - 0s - loss: 0.6384 - val_loss: 0.6695\n",
"Epoch 4/20\n",
"19835/19835 [==============================] - 0s - loss: 0.6381 - val_loss: 0.6702\n",
"Epoch 5/20\n",
"19835/19835 [==============================] - 0s - loss: 0.6378 - val_loss: 0.6709\n",
"Epoch 6/20\n",
"19328/19835 [============================>.] - ETA: 0s - loss: 0.6380Epoch 00005: early stopping\n",
"19835/19835 [==============================] - 0s - loss: 0.6375 - val_loss: 0.6716\n"
]
},
{
"data": {
"text/plain": [
""
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"X, X_test, Y, Y_test = train_test_split(X, Y, test_size=0.15, random_state=42)\n",
"\n",
"fBestModel = 'best_model.h5' \n",
"early_stop = EarlyStopping(monitor='val_loss', patience=4, verbose=1) \n",
"best_model = ModelCheckpoint(fBestModel, verbose=0, save_best_only=True)\n",
"model.fit(X, Y, validation_data = (X_test, Y_test), nb_epoch=20, \n",
" batch_size=128, verbose=True, validation_split=0.15, \n",
" callbacks=[best_model, early_stop]) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Multi-Layer Perceptron and Fully Connected"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"So, how hard can it be to build a Multi-Layer percepton with keras?\n",
"It is baiscly the same, just add more layers!"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = Sequential()\n",
"model.add(Dense(100, input_shape=(dims,)))\n",
"model.add(Dense(nb_classes))\n",
"model.add(Activation('softmax'))\n",
"model.compile(optimizer='sgd', loss='categorical_crossentropy')\n",
"model.fit(X, Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Your Turn!"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Hands On - Keras Fully Connected\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Take couple of minutes and Try and optimize the number of layers and the number of parameters in the layers to get the best results. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"model = Sequential()\n",
"model.add(Dense(100, input_shape=(dims,)))\n",
"\n",
"# ...\n",
"# ...\n",
"# Play with it! add as much layers as you want! try and get better results.\n",
"\n",
"model.add(Dense(nb_classes))\n",
"model.add(Activation('softmax'))\n",
"model.compile(optimizer='sgd', loss='categorical_crossentropy')\n",
"model.fit(X, Y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Building a question answering system, an image classification model, a Neural Turing Machine, a word2vec embedder or any other model is just as fast. The ideas behind deep learning are simple, so why should their implementation be painful?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Theoretical Motivations for depth"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Much has been studied about the depth of neural nets. Is has been proven mathematically[1] and empirically that convolutional neural network benifit from depth! "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[1] - On the Expressive Power of Deep Learning: A Tensor Analysis - Cohen, et al 2015"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Theoretical Motivations for depth"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"One much quoted theorem about neural network states that:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
">Universal approximation theorem states[1] that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of $\\mathbb{R}^n$, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; however, it does not touch upon the algorithmic learnability of those parameters."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"[1] - Approximation Capabilities of Multilayer Feedforward Networks - Kurt Hornik 1991"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.4.3"
}
},
"nbformat": 4,
"nbformat_minor": 0
}