{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Deep Autoencoders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### by Khaled Nasr as a part of a <a href=\"https://www.google-melange.com/gsoc/project/details/google/gsoc2014/khalednasr92/5657382461898752\">GSoC 2014 project</a> mentored by Theofanis Karaletsos and Sergey Lisitsyn "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This notebook illustrates how to train and evaluate a deep autoencoder using Shogun. We'll look at both regular fully-connected autoencoders and convolutional autoencoders."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A (single layer) [autoencoder](http://deeplearning.net/tutorial/dA.html#autoencoders) is a neural network that has three layers: an input layer, a hidden (encoding) layer, and a decoding layer. The network is trained to reconstruct its inputs, which forces the hidden layer to try to learn good representations of the inputs.\n",
    "\n",
    "In order to encourage the hidden layer to learn good input representations, certain variations on the simple autoencoder exist. Shogun currently supports two of them: Denoising Autoencoders [1] and Contractive Autoencoders [2]. In this notebook we'll focus on denoising autoencoders. \n",
    "\n",
    "For denoising autoencoders, each time a new training example is introduced to the network, it's randomly corrupted in some mannar, and the target is set to the original example. The autoencoder will try to recover the orignal data from it's noisy version, which is why it's called a denoising autoencoder. This process will force the hidden layer to learn a good representation of the input, one which is not affected by the corruption process.\n",
    "\n",
    "A deep autoencoder is an autoencoder with multiple hidden layers. Training such autoencoders directly is usually difficult, however, they can be pre-trained as a stack of single layer autoencoders. That is, we train the first hidden layer to reconstruct the input data, and then train the second hidden layer to reconstruct the states of the first hidden layer, and so on. After pre-training, we can train the entire deep autoencoder to fine-tune all the parameters together. We can also use the autoencoder to initialize a regular neural network and train it in a supervised manner.\n",
    "\n",
    "In this notebook we'll apply deep autoencoders to the USPS dataset for handwritten digits. We'll start by loading the data and dividing it into a training set and a test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "%pylab inline\n",
    "%matplotlib inline\n",
    "import os\nSHOGUN_DATA_DIR=os.getenv('SHOGUN_DATA_DIR', '../../../data')\n",
    "from scipy.io import loadmat\n",
    "from shogun import RealFeatures, MulticlassLabels, Math\n",
    "\n",
    "# load the dataset\n",
    "dataset = loadmat(os.path.join(SHOGUN_DATA_DIR, 'multiclass/usps.mat'))\n",
    "\n",
    "Xall = dataset['data']\n",
    "# the usps dataset has the digits labeled from 1 to 10 \n",
    "# we'll subtract 1 to make them in the 0-9 range instead\n",
    "Yall = np.array(dataset['label'].squeeze(), dtype=np.double)-1 \n",
    "\n",
    "# 4000 examples for training\n",
    "Xtrain = RealFeatures(Xall[:,0:4000])\n",
    "Ytrain = MulticlassLabels(Yall[0:4000])\n",
    "\n",
    "# the rest for testing\n",
    "Xtest = RealFeatures(Xall[:,4000:-1])\n",
    "Ytest = MulticlassLabels(Yall[4000:-1])\n",
    "\n",
    "# initialize the random number generator with a fixed seed, for repeatability\n",
    "Math.init_random(10)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the autoencoder"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Similar to regular neural networks in Shogun, we create a [deep autoencoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html) using an array of [NeuralLayer](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CNeuralLayer.html)-based classes, which can be created using the utility class [NeuralLayers](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CNeuralLayers.html). However, for deep autoencoders there's a restriction that the layer sizes in the network have to be symmetric, that is, the first layer has to have the same size as the last layer, the second layer has to have the same size as the second-to-last layer, and so on. This restriction is necessary for pre-training to work. More details on that can found in the following section.\n",
    "\n",
    "We'll create a 5-layer deep autoencoder with following layer sizes: 256->512->128->512->256. We'll use [rectified linear neurons](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CNeuralRectifiedLinearLayer.html) for the hidden layers and [linear neurons](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CNeuralLinearLayer.html) for the output layer."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from shogun import NeuralLayers, DeepAutoencoder\n",
    "\n",
    "layers = NeuralLayers()\n",
    "layers = layers.input(256).rectified_linear(512).rectified_linear(128).rectified_linear(512).linear(256).done()\n",
    "\n",
    "ae = DeepAutoencoder(layers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Pre-training"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can pre-train the network. To illustrate exactly what's going to happen, we'll give the layers some labels: L1 for the input layer, L2 for the first hidden layer, and so on up to L5 for the output layer.\n",
    "\n",
    "In pre-training, an autoencoder will formed for each encoding layer (layers up to the middle layer in the network). So here we'll have two autoencoders: L1->L2->L5, and L2->L3->L4. The first autoencoder will be trained on the raw data and used to initialize the weights and biases of layers L2 and L5 in the deep autoencoder. After the first autoencoder is trained, we use it to transform the raw data into the states of L2. These states will then be used to train the second autoencoder, which will be used to initialize the weights and biases of layers L3 and L4 in the deep autoencoder.\n",
    "\n",
    "The operations described above are performed by the the [pre_train()](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#acf6896cb166afbba063fd1257cb8bc97) function. Pre-training parameters for each autoencoder can be controlled using the [pt_* public attributes](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#a6389a6f19b8854c64e1b6be5aa0c1fc4) of [DeepAutoencoder](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html). Each of those attributes is an [SGVector](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1SGVector.html) whose length is the number of autoencoders in the deep autoencoder (2 in our case). It can be used to set the parameters for each autoencoder indiviually. [SGVector's set_const()](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1SGVector.html#a8bce01a1fc41a734d9b5cf1533fd7a2a) method can also be used to assign the same parameter value for all autoencoders.\n",
    "\n",
    "Different noise types can be used to corrupt the inputs in a denoising autoencoder. Shogun currently supports 2 [noise types](http://www.shogun-toolbox.org/doc/en/latest/namespaceshogun.html#af95cf5d3778127a87c8a67516405d863): dropout noise, where a random portion of the inputs is set to zero at each iteration in training, and gaussian noise, where the inputs are corrupted with random gaussian noise. The noise type and strength can be controlled using [pt_noise_type](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#af6e5d2ade5cb270cc50565d590f929ae) and [pt_noise_parameter](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#adbdff6c07fa7dd70aaf547e192365075). Here, we'll use dropout noise."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from shogun import AENT_DROPOUT, NNOM_GRADIENT_DESCENT\n",
    "\n",
    "ae.pt_noise_type.set_const(AENT_DROPOUT) # use dropout noise\n",
    "ae.pt_noise_parameter.set_const(0.5) # each input has a 50% chance of being set to zero\n",
    "\n",
    "ae.pt_optimization_method.set_const(NNOM_GRADIENT_DESCENT) # train using gradient descent\n",
    "ae.pt_gd_learning_rate.set_const(0.01)\n",
    "ae.pt_gd_mini_batch_size.set_const(128)\n",
    "\n",
    "ae.pt_max_num_epochs.set_const(50)\n",
    "ae.pt_epsilon.set_const(0.0) # disable automatic convergence testing\n",
    "\n",
    "# uncomment this line to allow the training progress to be printed on the console\n",
    "#from shogun import MSG_INFO; ae.io.set_loglevel(MSG_INFO)\n",
    "\n",
    "# start pre-training. this might take some time\n",
    "ae.pre_train(Xtrain)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fine-tuning"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After pre-training, we can train the autoencoder as a whole to fine-tune the parameters. Training the whole autoencoder is performed using the [train()](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CAutoencoder.html#ace3eb6cc545affcbfa31d754ffd087dc) function. Training parameters are controlled through the [public attributes](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#pub-attribs), same as a regular neural network."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "ae.set_noise_type(AENT_DROPOUT) # same noise type we used for pre-training\n",
    "ae.set_noise_parameter(0.5)\n",
    "\n",
    "ae.set_max_num_epochs(50)\n",
    "ae.set_optimization_method(NNOM_GRADIENT_DESCENT)\n",
    "ae.set_gd_mini_batch_size(128)\n",
    "ae.set_gd_learning_rate(0.0001)\n",
    "ae.set_epsilon(0.0)\n",
    "\n",
    "# start fine-tuning. this might take some time\n",
    "_ = ae.train(Xtrain)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Evaluation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we can evaluate the autoencoder that we trained. We'll start by providing it with corrupted inputs and looking at how it will reconstruct them. The function [reconstruct()](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#ae8c2d565cf2ea809103d0557c57689c7) is used to obtain the reconstructions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# get a 50-example subset of the test set\n",
    "subset = Xtest[:,0:50].copy()\n",
    "\n",
    "# corrupt the first 25 examples with multiplicative noise\n",
    "subset[:,0:25] *= (random.random((256,25))>0.5)\n",
    "\n",
    "# corrupt the other 25 examples with additive noise \n",
    "subset[:,25:50] += random.random((256,25))\n",
    "\n",
    "# obtain the reconstructions\n",
    "reconstructed_subset = ae.reconstruct(RealFeatures(subset))\n",
    "\n",
    "# plot the corrupted data and the reconstructions\n",
    "figure(figsize=(10,10))\n",
    "for i in range(50):\n",
    "    ax1=subplot(10,10,i*2+1)\n",
    "    ax1.imshow(subset[:,i].reshape((16,16)), interpolation='nearest', cmap = cm.Greys_r)\n",
    "    ax1.set_xticks([])\n",
    "    ax1.set_yticks([])\n",
    "\n",
    "    ax2=subplot(10,10,i*2+2)\n",
    "    ax2.imshow(reconstructed_subset[:,i].reshape((16,16)), interpolation='nearest', cmap = cm.Greys_r)\n",
    "    ax2.set_xticks([])\n",
    "    ax2.set_yticks([])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The figure shows the corrupted examples and their reconstructions. The top half of the figure shows the ones corrupted with multiplicative noise, the bottom half shows the ones corrupted with additive noise. We can see that the autoencoders can provide decent reconstructions despite the heavy noise.\n",
    "\n",
    "Next we'll look at the weights that the first hidden layer has learned. To obtain the weights, we can call the [get_layer_parameters()]() function, which will return a vector containing both the weights and the biases of the layer. The biases are stored first in the array followed by the weights matrix in column-major format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# obtain the weights matrix of the first hidden layer\n",
    "# the 512 is the number of biases in the layer (512 neurons)\n",
    "# the transpose is because numpy stores matrices in row-major format, and Shogun stores \n",
    "# them in column major format\n",
    "w1 = ae.get_layer_parameters(1)[512:].reshape(256,512).T\n",
    "\n",
    "# visualize the weights between the first 100 neurons in the hidden layer \n",
    "# and the neurons in the input layer\n",
    "figure(figsize=(10,10))\n",
    "for i in range(100):\n",
    "\tax1=subplot(10,10,i+1)\n",
    "\tax1.imshow(w1[i,:].reshape((16,16)), interpolation='nearest', cmap = cm.Greys_r)\n",
    "\tax1.set_xticks([])\n",
    "\tax1.set_yticks([])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we can use the autoencoder to initialize a supervised neural network. The network will have all the layer of the autoencoder up to (and including) the middle layer. We'll also add a softmax output layer. So, the network will look like: L1->L2->L3->Softmax. The network is obtained by calling [convert_to_neural_network()](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CDeepAutoencoder.html#a8c179cd9a503b2fa78b9bfe10ae473e5):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from shogun import NeuralSoftmaxLayer\n",
    "\n",
    "nn = ae.convert_to_neural_network(NeuralSoftmaxLayer(10))\n",
    "\n",
    "nn.set_max_num_epochs(50)\n",
    "\n",
    "nn.set_labels(Ytrain)\n",
    "_ = nn.train(Xtrain)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we'll evaluate the accuracy on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from shogun import MulticlassAccuracy\n",
    "\n",
    "predictions = nn.apply_multiclass(Xtest)\n",
    "accuracy = MulticlassAccuracy().evaluate(predictions, Ytest) * 100\n",
    "\n",
    "print \"Classification accuracy on the test set =\", accuracy, \"%\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convolutional Autoencoders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Convolutional autoencoders [3] are the adaptation of autoencoders to images (or other spacially-structured data). They are built with convolutional layers where each layer consists of a number of feature maps. Each feature map is produced by convolving a small filter with the layer's inputs, adding a bias, and then applying some non-linear activation function. Additionally, a max-pooling operation can be performed on each feature map by dividing it into small non-overlapping regions and taking the maximum over each region. In this section we'll pre-train a [convolutional network](http://deeplearning.net/tutorial/lenet.html) as a stacked autoencoder and use it for classification.\n",
    "\n",
    "In Shogun, convolutional autoencoders are constructed and trained just like regular autoencoders. Except that we build the autoencoder using [CNeuralConvolutionalLayer](http://www.shogun-toolbox.org/doc/en/latest/classshogun_1_1CNeuralConvolutionalLayer.html) objects:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from shogun import DynamicObjectArray, NeuralInputLayer, NeuralConvolutionalLayer, CMAF_RECTIFIED_LINEAR\n",
    "\n",
    "conv_layers = DynamicObjectArray()\n",
    "# 16x16 single channel images\n",
    "conv_layers.append_element(NeuralInputLayer(16,16,1)) \n",
    "\n",
    "# the first encoding layer: 5 feature maps, filters with radius 2 (5x5 filters)\n",
    "# and max-pooling in a 2x2 region: its output will be 10 8x8 feature maps\n",
    "conv_layers.append_element(NeuralConvolutionalLayer(CMAF_RECTIFIED_LINEAR, 5, 2, 2, 2, 2)) \n",
    "\n",
    "# the second encoding layer: 15 feature maps, filters with radius 2 (5x5 filters)\n",
    "# and max-pooling in a 2x2 region: its output will be 20 4x4 feature maps\n",
    "conv_layers.append_element(NeuralConvolutionalLayer(CMAF_RECTIFIED_LINEAR, 15, 2, 2, 2, 2))\n",
    "\n",
    "# the first decoding layer: same structure as the first encoding layer\n",
    "conv_layers.append_element(NeuralConvolutionalLayer(CMAF_RECTIFIED_LINEAR, 5, 2, 2))\n",
    "\n",
    "# the second decoding layer: same structure as the input layer\n",
    "conv_layers.append_element(NeuralConvolutionalLayer(CMAF_RECTIFIED_LINEAR, 1, 2, 2))\n",
    "\n",
    "conv_ae = DeepAutoencoder(conv_layers)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we'll pre-train the autoencoder:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "conv_ae.pt_noise_type.set_const(AENT_DROPOUT) # use dropout noise\n",
    "conv_ae.pt_noise_parameter.set_const(0.3) # each input has a 30% chance of being set to zero\n",
    "\n",
    "conv_ae.pt_optimization_method.set_const(NNOM_GRADIENT_DESCENT) # train using gradient descent\n",
    "conv_ae.pt_gd_learning_rate.set_const(0.002)\n",
    "conv_ae.pt_gd_mini_batch_size.set_const(100)\n",
    "\n",
    "conv_ae.pt_max_num_epochs[0] = 30 # max number of epochs for pre-training the first encoding layer\n",
    "conv_ae.pt_max_num_epochs[1] = 10 # max number of epochs for pre-training the second encoding layer\n",
    "conv_ae.pt_epsilon.set_const(0.0) # disable automatic convergence testing\n",
    "\n",
    "# start pre-training. this might take some time\n",
    "conv_ae.pre_train(Xtrain)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And then convert the autoencoder to a regular neural network for classification:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "conv_nn = ae.convert_to_neural_network(NeuralSoftmaxLayer(10))\n",
    "\n",
    "# train the network\n",
    "conv_nn.set_epsilon(0.0)\n",
    "conv_nn.set_max_num_epochs(50)\n",
    "conv_nn.set_labels(Ytrain)\n",
    "\n",
    "# start training. this might take some time\n",
    "_ = conv_nn.train(Xtrain)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "And evaluate it on the test set:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "predictions = conv_nn.apply_multiclass(Xtest)\n",
    "accuracy = MulticlassAccuracy().evaluate(predictions, Ytest) * 100\n",
    "\n",
    "print \"Classification accuracy on the test set =\", accuracy, \"%\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## References"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [1] [Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, Vincent, 2010](http://jmlr.org/papers/volume11/vincent10a/vincent10a.pdf)\n",
    "- [2] [Contractive Auto-Encoders: Explicit Invariance During Feature Extraction, Rifai, 2011](http://machinelearning.wustl.edu/mlpapers/paper_files/ICML2011Rifai_455.pdf)\n",
    "- [3] [Stacked Convolutional Auto-Encoders for Hierarchical Feature Extraction, J. Masci, 2011](http://www.idsia.ch/~ciresan/data/icann2011.pdf)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 2",
   "language": "python",
   "name": "python2"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 2
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython2",
   "version": "2.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}