{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Brief History of Perceptrons"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Frank Rosenblatt, godfather of the perceptron, popularized it as a device rather than an algorithm. The perceptron first entered the world as hardware.1 Rosenblatt, a psychologist who studied and later lectured at Cornell University, received funding from the U.S. Office of Naval Research to build a machine that could learn. His machine, the Mark I perceptron, looked like this.\n",
    "\n",
    "![image](./images/perceptron.png)\n",
    "\n",
    "A perceptron is a linear classifier; that is, it is an algorithm that classifies input by separating two categories with a straight line.Rosenblatt built a single-layer perceptron. That is, his hardware-algorithm did not include multiple layers, which allow neural networks to model a feature hierarchy. It was, therefore, a shallow neural network, which prevented his perceptron from performing non-linear classification, such as the XOR function (an XOR operator trigger when input exhibits either one trait or another, but not both; it stands for “exclusive OR”), as Minsky and Papert showed in their book.\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Multilayer Perceptron Layer"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Subsequent work with multilayer perceptrons has shown that they are capable of approximating an XOR operator as well as many other non-linear functions.\n",
    "\n",
    "A multilayer perceptron (MLP) is a deep, artificial neural network. It is composed of more than one perceptron. They are composed of an input layer to receive the signal, an output layer that makes a decision or prediction about the input, and in between those two, an arbitrary number of hidden layers that are the true computational engine of the MLP. MLPs with one hidden layer are capable of approximating any continuous function.\n",
    "\n",
    "Multilayer perceptrons are often applied to supervised learning problems3: they train on a set of input-output pairs and learn to model the correlation (or dependencies) between those inputs and outputs. Training involves adjusting the parameters, or the weights and biases, of the model in order to minimize error. Backpropagation is used to make those weigh and bias adjustments relative to the error, and the error itself can be measured in a variety of ways, including by root mean squared error (RMSE)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import tensorflow as tf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parameters\n",
    "learning_rate = 0.001\n",
    "training_steps = 3000\n",
    "batch_size = 100\n",
    "display_step = 300"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()\n",
    "# Convert to float32.\n",
    "x_train, x_test = np.array(x_train, np.float32), np.array(x_test, np.float32)\n",
    "# Flatten images to 1-D vector of 784 features (28*28).\n",
    "x_train, x_test = x_train.reshape([-1, 784]), x_test.reshape([-1, 784])\n",
    "# Normalize images value from [0, 255] to [0, 1].\n",
    "x_train, x_test = x_train / 255., x_test / 255.\n",
    "\n",
    "# Use tf.data API to shuffle and batch data.\n",
    "train_data = tf.data.Dataset.from_tensor_slices((x_train, y_train))\n",
    "train_data = train_data.repeat().shuffle(5000).batch(batch_size).prefetch(1)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Network Parameters\n",
    "n_hidden_1 = 256 # 1st layer number of neurons\n",
    "n_hidden_2 = 256 # 2nd layer number of neurons\n",
    "n_input = 784 # MNIST data input (img shape: 28*28)\n",
    "n_classes = 10 # MNIST total classes (0-9 digits)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Store layers weight & bias\n",
    "weights = {\n",
    "    'h1': tf.Variable(tf.random.normal([n_input, n_hidden_1])),\n",
    "    'h2': tf.Variable(tf.random.normal([n_hidden_1, n_hidden_2])),\n",
    "    'out': tf.Variable(tf.random.normal([n_hidden_2, n_classes]))\n",
    "}\n",
    "biases = {\n",
    "    'b1': tf.Variable(tf.random.normal([n_hidden_1])),\n",
    "    'b2': tf.Variable(tf.random.normal([n_hidden_2])),\n",
    "    'out': tf.Variable(tf.random.normal([n_classes]))\n",
    "}"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create model\n",
    "def multilayer_perceptron(x):\n",
    "    # Hidden fully connected layer with 256 neurons\n",
    "    layer_1 = tf.add(tf.matmul(x, weights['h1']), biases['b1'])\n",
    "    layer_1 = tf.nn.sigmoid(layer_1)\n",
    "    # Hidden fully connected layer with 256 neurons\n",
    "    layer_2 = tf.add(tf.matmul(layer_1, weights['h2']), biases['b2'])\n",
    "    # Output fully connected layer with a neuron for each class\n",
    "    layer_2 = tf.nn.sigmoid(layer_2)\n",
    "    output = tf.matmul(layer_2, weights['out']) + biases['out']\n",
    "    return tf.nn.softmax(output)\n",
    "\n",
    "# Cross-Entropy loss function.\n",
    "def cross_entropy(y_pred, y_true):\n",
    "    # Encode label to a one hot vector.\n",
    "    y_true = tf.one_hot(y_true, depth=10)\n",
    "    # Clip prediction values to avoid log(0) error.\n",
    "    y_pred = tf.clip_by_value(y_pred, 1e-9, 1.)\n",
    "    # Compute cross-entropy.\n",
    "    return tf.reduce_mean(-tf.reduce_sum(y_true * tf.math.log(y_pred)))\n",
    "\n",
    "# Accuracy metric.\n",
    "def accuracy(y_pred, y_true):\n",
    "    # Predicted class is the index of highest score in prediction vector (i.e. argmax).\n",
    "    correct_prediction = tf.equal(tf.argmax(y_pred, 1), tf.cast(y_true, tf.int64))\n",
    "    return tf.reduce_mean(tf.cast(correct_prediction, tf.float32), axis=-1)\n",
    "\n",
    "# Stochastic gradient descent optimizer.\n",
    "optimizer = tf.optimizers.SGD(learning_rate)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Optimization process. \n",
    "def train_step(x, y):\n",
    "    # Wrap computation inside a GradientTape for automatic differentiation.\n",
    "    with tf.GradientTape() as tape:\n",
    "        pred = multilayer_perceptron(x)\n",
    "        loss = cross_entropy(pred, y)\n",
    "        \n",
    "    # Variables to update, i.e. trainable variables.\n",
    "    trainable_variables = list(weights.values()) + list(biases.values())\n",
    "\n",
    "    # Compute gradients.\n",
    "    gradients = tape.gradient(loss, trainable_variables)\n",
    "    \n",
    "    # Update W and b following gradients.\n",
    "    optimizer.apply_gradients(zip(gradients, trainable_variables))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "step: 300, loss: 77.175262, accuracy: 0.770000\n",
      "step: 600, loss: 69.724609, accuracy: 0.780000\n",
      "step: 900, loss: 67.467102, accuracy: 0.810000\n",
      "step: 1200, loss: 34.210907, accuracy: 0.890000\n",
      "step: 1500, loss: 48.025967, accuracy: 0.860000\n",
      "step: 1800, loss: 51.137367, accuracy: 0.880000\n",
      "step: 2100, loss: 62.240646, accuracy: 0.800000\n",
      "step: 2400, loss: 37.719868, accuracy: 0.880000\n",
      "step: 2700, loss: 30.107677, accuracy: 0.870000\n",
      "step: 3000, loss: 28.086235, accuracy: 0.900000\n"
     ]
    }
   ],
   "source": [
    "# Run training for the given number of steps.\n",
    "for step, (batch_x, batch_y) in enumerate(train_data.take(training_steps), 1):\n",
    "    # Run the optimization to update W and b values.\n",
    "    train_step(batch_x, batch_y)\n",
    "    \n",
    "    if (step+1) % display_step == 0:\n",
    "        pred = multilayer_perceptron(batch_x)\n",
    "        loss  = cross_entropy(pred, batch_y)\n",
    "        acc  = accuracy(pred, batch_y)\n",
    "        print(\"step: %i, loss: %f, accuracy: %f\" % (step+1, loss, acc))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}