{ "cells": [ { "cell_type": "markdown", "metadata": { "toc": true }, "source": [ "
\n",
"\n",
"Each image is 28 pixels by 28 pixels, which is essentially a $28 \\times 28$ array of numbers. To use it in a context of a machine learning problem, we can flatten this array into a vector of $28 \\times 28 = 784$, this will be the number of features for each image. It doesn't matter how we flatten the array, as long as we're consistent between images. Note that, flattening the data throws away information about the 2D structure of the image. Isn't that bad? Well, the best computer vision methods do exploit this structure. But the simple method we will be using here, a softmax regression (defined below), won't.\n",
"\n",
"The dataset also includes labels for each image, telling us the each image's label. For example, the labels for the above images are 5, 0, 4, and 1. Here we're going to train a softmax model to look at images and predict what digits they are. The possible label values in the MNIST dataset are numbers between 0 and 9, hence this will be a 10-class classification problem."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"60000 train samples\n",
"10000 test samples\n"
]
}
],
"source": [
"n_class = 10\n",
"n_features = 784 # mnist is a 28 * 28 image\n",
"\n",
"# load the dataset and some preprocessing step that can be skipped\n",
"(X_train, y_train), (X_test, y_test) = mnist.load_data()\n",
"X_train = X_train.reshape(60000, n_features)\n",
"X_test = X_test.reshape(10000, n_features)\n",
"X_train = X_train.astype('float32')\n",
"X_test = X_test.astype('float32')\n",
"\n",
"# images takes values between 0 - 255, we can normalize it\n",
"# by dividing every number by 255\n",
"X_train /= 255\n",
"X_test /= 255\n",
"\n",
"print(X_train.shape[0], 'train samples')\n",
"print(X_test.shape[0], 'test samples')"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [],
"source": [
"# convert class vectors to binary class matrices (one-hot encoding)\n",
"# note: you HAVE to to this step\n",
"Y_train = np_utils.to_categorical(y_train, n_class)\n",
"Y_test = np_utils.to_categorical(y_test , n_class)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the following code chunk, we define the overall computational graph/structure for the softmax classifier using the cross entropy cost function as the objective. Recall that the formula for this function can be denoted as:\n",
"\n",
"$$L = -\\sum_i y'_i \\log(y_i)$$\n",
"\n",
"Where y is our predicted probability distribution, and y′ is the true distribution."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# define some global variables\n",
"learning_rate = 0.1 \n",
"n_iterations = 400\n",
"\n",
"# define the input and output \n",
"# here None means that a dimension can be of any length,\n",
"# which is what we want, since the number of observations\n",
"# we have can vary;\n",
"# note that the shape argument to placeholder is optional, \n",
"# but it allows TensorFlow to automatically catch bugs stemming \n",
"# from inconsistent tensor shapes\n",
"X = tf.placeholder(tf.float32, [None, n_features])\n",
"y = tf.placeholder(tf.float32, [None, n_class])\n",
"\n",
"# initialize both W and b as tensors full of zeros. \n",
"# these are parameters that the model is later going to learn,\n",
"# Notice that W has a shape of [784, 10] because we want to multiply \n",
"# the 784-dimensional image vectors by it to produce 10-dimensional \n",
"# vectors of evidence for the difference classes. b has a shape of [10] \n",
"# so we can add it to the output.\n",
"W = tf.Variable(tf.zeros([n_features, n_class]))\n",
"b = tf.Variable(tf.zeros([n_class]))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"```python\n",
"# to define the softmax classifier and cross entropy cost\n",
"# we can do the following\n",
"\n",
"# matrix multiplication using the .matmul command\n",
"# and add the softmax output\n",
"output = tf.nn.softmax(tf.matmul(X, W) + b)\n",
"\n",
"# cost function: cross entropy, the reduce mean is simply the average of the\n",
"# cost function across all observations\n",
"cross_entropy = tf.reduce_mean(-tf.reduce_sum(y * tf.log(output), axis = 1))\n",
"\n",
"```"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"WARNING:tensorflow:From