{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# TensorFlow Assignment: Multilayer Perceptron (MLP) Optimizer Sandbox"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**[Duke Community Standard](http://integrity.duke.edu/standard.html): By typing your name below, you are certifying that you have adhered to the Duke Community Standard in completing this assignment.**\n",
    "\n",
    "Name: [YOUR NAME HERE]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Multilayer Perceptron (MLP)\n",
    "\n",
    "### Imports and helper functions\n",
    "\n",
    "Let's play around with some optimizers. First some imports and helper functions:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%matplotlib inline\n",
    "\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import tensorflow as tf\n",
    "from tensorflow.examples.tutorials.mnist import input_data\n",
    "\n",
    "# Import data\n",
    "mnist = input_data.read_data_sets(\"MNIST_data/\", one_hot=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Helper functions for creating weight variables\n",
    "def weight_variable(shape):\n",
    "    \"\"\"weight_variable generates a weight variable of a given shape.\"\"\"\n",
    "    initial = tf.truncated_normal(shape, stddev=0.1)\n",
    "    return tf.Variable(initial)\n",
    "\n",
    "def bias_variable(shape):\n",
    "    \"\"\"bias_variable generates a bias variable of a given shape.\"\"\"\n",
    "    initial = tf.constant(0.1, shape=shape)\n",
    "    return tf.Variable(initial)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Models\n",
    "\n",
    "And here's the forward pass of the computation graph definition of the completed TensorFlow MLP assignment:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "# Model Inputs\n",
    "x = tf.placeholder(tf.float32, [None, 784])\n",
    "y_ = tf.placeholder(tf.float32, [None, 10])\n",
    "\n",
    "# Define the graph\n",
    "# First fully connected layer\n",
    "W_fc1 = weight_variable([784, 500])\n",
    "b_fc1 = bias_variable([500])\n",
    "# h_fc1 = tf.nn.sigmoid(tf.matmul(x, W_fc1) + b_fc1)\n",
    "h_fc1 = tf.nn.relu(tf.matmul(x, W_fc1) + b_fc1)\n",
    "\n",
    "# Second fully connected layer\n",
    "W_fc2 = weight_variable([500, 10])\n",
    "b_fc2 = bias_variable([10])\n",
    "y_mlp = tf.matmul(h_fc1, W_fc2) + b_fc2\n",
    "\n",
    "# Loss \n",
    "cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(labels=y_, logits=y_mlp))\n",
    "\n",
    "# Evaluation\n",
    "correct_prediction = tf.equal(tf.argmax(y_mlp, 1), tf.argmax(y_, 1))\n",
    "accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Optimizers\n",
    "\n",
    "Instead of the optimizer being given though, let's try out a few. Here we have optimizers implementing algorithms for [Stochastic Gradient Descent](https://www.tensorflow.org/api_docs/python/tf/train/GradientDescentOptimizer) (SGD), [Stochastic Gradient Descent with Momentum](https://www.tensorflow.org/api_docs/python/tf/train/MomentumOptimizer) (momentum), and [Adaptive Moments](https://www.tensorflow.org/api_docs/python/tf/train/AdamOptimizer) (ADAM). Try out different parameter settings (e.g. learning rate) for each of them."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optimizers: Try out a few different parameters for SGD and SGD momentum\n",
    "train_step_SGD = tf.train.GradientDescentOptimizer(learning_rate=**PICK_ONE**).minimize(cross_entropy)\n",
    "train_step_momentum = tf.train.MomentumOptimizer(learning_rate=**PICK_ONE**, momentum=**mass_x_velocity**).minimize(cross_entropy)\n",
    "train_step_ADAM = tf.train.AdamOptimizer().minimize(cross_entropy)\n",
    "\n",
    "# Op for initializing all variables\n",
    "initialize_all = tf.global_variables_initializer()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Training\n",
    "\n",
    "Because we'll be repeating training a few times, let's move our training regimen into function. Note that we pass which optimization algorithm we're running as an argument. In addition to printing out the validation accuracy and final test accuracy, we'll also return the lists of accuracies at each validation step and the training losses at each iteration."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def train_MLP(train_step_optimizer, iterations=4000):\n",
    "    with tf.Session() as sess:\n",
    "        # Initialize (or reset) all variables\n",
    "        sess.run(initialize_all)\n",
    "        \n",
    "        # Initialize arrays to track losses and validation accuracies\n",
    "        valid_accs = [] \n",
    "        losses = []\n",
    "        \n",
    "        for i in range(iterations):\n",
    "            # Validate every 250th batch\n",
    "            if i % 250 == 0:\n",
    "                validation_accuracy = 0\n",
    "                for v in range(10):\n",
    "                    batch = mnist.validation.next_batch(50)\n",
    "                    validation_accuracy += (1/10) * accuracy.eval(feed_dict={x: batch[0], y_: batch[1]})\n",
    "                print('step %d, validation accuracy %g' % (i, validation_accuracy))\n",
    "                valid_accs.append(validation_accuracy)\n",
    "                \n",
    "            # Train    \n",
    "            batch = mnist.train.next_batch(50)\n",
    "            loss, _ = sess.run([cross_entropy, train_step_optimizer], feed_dict={x: batch[0], y_: batch[1]})\n",
    "            losses.append(loss)\n",
    "            \n",
    "        print('test accuracy %g' % accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))\n",
    "        \n",
    "    return valid_accs, losses"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, let's train the MLP using all three optimizers and compare the results:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(\"SGD:\")\n",
    "valid_accs_SGD, losses_SGD = train_MLP(train_step_SGD)\n",
    "print(\"Momentum:\")\n",
    "valid_accs_momentum, losses_momentum = train_MLP(train_step_momentum)\n",
    "print(\"ADAM:\")\n",
    "valid_accs_ADAM, losses_ADAM = train_MLP(train_step_ADAM)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Visualization\n",
    "\n",
    "Plotting things:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "fig, ax = plt.subplots(1, 2)\n",
    "fig.tight_layout()\n",
    "\n",
    "ax[0].plot(valid_accs_SGD)\n",
    "ax[0].plot(valid_accs_momentum)\n",
    "ax[0].plot(valid_accs_ADAM)\n",
    "\n",
    "ax[0].set_ylabel('Validation Accuracy')\n",
    "ax[0].legend(['SGD', 'Momentum', 'ADAM'], loc='lower right')\n",
    "\n",
    "ax[1].plot(losses_SGD)\n",
    "ax[1].plot(losses_momentum)\n",
    "ax[1].plot(losses_ADAM)\n",
    "\n",
    "ax[1].set_ylabel('Cross Entropy')\n",
    "ax[1].legend(['SGD', 'Momentum', 'ADAM'], loc='upper right')\n",
    "# ax[1].set_ylim([0,1.5])  # <- Use this to change y-axis limits"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Short Answer\n",
    "\n",
    "How do SGD, SGD with momentum, and ADAM compare in performance? Ease of tuning parameters?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "***\n",
    "\n",
    "[Your answer here]\n",
    "\n",
    "***"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Convolutional Neural Network (CNN): *Optional*\n",
    "\n",
    "Adapt the MLP code above to train a CNN instead (*Hint: you can adapt the code from the 01D_MLP_CNN_Assignment_Solutions.ipynb for the CNN just like I did for the MLP*), and again compare the optimizers. The more complex nature of the CNN parameter space means that the differences between optimizers should be much more significant."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}