{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#
cs231n 2019. [A1, part 3](http://cs231n.github.io/assignments2018/assignment1/). Softmax exercise\n", "\n", " \n", "####
**Solution by [Yury Kashnitsky](https://www.kaggle.com/kashnitsky) (@yorko)**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*\n", "\n", "This exercise is analogous to the SVM exercise. You will:\n", "\n", "- implement a fully-vectorized **loss function** for the Softmax classifier\n", "- implement the fully-vectorized expression for its **analytic gradient**\n", "- **check your implementation** with numerical gradient\n", "- use a validation set to **tune the learning rate and regularization** strength\n", "- **optimize** the loss function with **SGD**\n", "- **visualize** the final learned weights\n" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import random\n", "import numpy as np\n", "from cs231n.data_utils import load_CIFAR10\n", "from tqdm import tqdm_notebook\n", "import matplotlib.pyplot as plt\n", "\n", "%matplotlib inline\n", "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", "plt.rcParams['image.interpolation'] = 'nearest'\n", "plt.rcParams['image.cmap'] = 'gray'\n", "\n", "# for auto-reloading extenrnal modules\n", "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Train data shape: (49000, 3073)\n", "Train labels shape: (49000,)\n", "Validation data shape: (1000, 3073)\n", "Validation labels shape: (1000,)\n", "Test data shape: (1000, 3073)\n", "Test labels shape: (1000,)\n", "dev data shape: (500, 3073)\n", "dev labels shape: (500,)\n" ] } ], "source": [ "def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000, num_dev=500):\n", " \"\"\"\n", " Load the CIFAR-10 dataset from disk and perform preprocessing to prepare\n", " it for the linear classifier. These are the same steps as we used for the\n", " SVM, but condensed to a single function. \n", " \"\"\"\n", " # Load the raw CIFAR-10 data\n", " cifar10_dir = '/home/yorko/data/cifar-10-batches-py'\n", " X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)\n", " \n", " # subsample the data\n", " mask = list(range(num_training, num_training + num_validation))\n", " X_val = X_train[mask]\n", " y_val = y_train[mask]\n", " mask = list(range(num_training))\n", " X_train = X_train[mask]\n", " y_train = y_train[mask]\n", " mask = list(range(num_test))\n", " X_test = X_test[mask]\n", " y_test = y_test[mask]\n", " mask = np.random.choice(num_training, num_dev, replace=False)\n", " X_dev = X_train[mask]\n", " y_dev = y_train[mask]\n", " \n", " # Preprocessing: reshape the image data into rows\n", " X_train = np.reshape(X_train, (X_train.shape[0], -1))\n", " X_val = np.reshape(X_val, (X_val.shape[0], -1))\n", " X_test = np.reshape(X_test, (X_test.shape[0], -1))\n", " X_dev = np.reshape(X_dev, (X_dev.shape[0], -1))\n", " \n", " # Normalize the data: subtract the mean image\n", " mean_image = np.mean(X_train, axis = 0)\n", " X_train -= mean_image\n", " X_val -= mean_image\n", " X_test -= mean_image\n", " X_dev -= mean_image\n", " \n", " # add bias dimension and transform into columns\n", " X_train = np.hstack([X_train, np.ones((X_train.shape[0], 1))])\n", " X_val = np.hstack([X_val, np.ones((X_val.shape[0], 1))])\n", " X_test = np.hstack([X_test, np.ones((X_test.shape[0], 1))])\n", " X_dev = np.hstack([X_dev, np.ones((X_dev.shape[0], 1))])\n", " \n", " return X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev\n", "\n", "\n", "# Invoke the above function to get our data.\n", "X_train, y_train, X_val, y_val, X_test, y_test, X_dev, y_dev = get_CIFAR10_data()\n", "print('Train data shape: ', X_train.shape)\n", "print('Train labels shape: ', y_train.shape)\n", "print('Validation data shape: ', X_val.shape)\n", "print('Validation labels shape: ', y_val.shape)\n", "print('Test data shape: ', X_test.shape)\n", "print('Test labels shape: ', y_test.shape)\n", "print('dev data shape: ', X_dev.shape)\n", "print('dev labels shape: ', y_dev.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Softmax Classifier\n", "\n", "Your code for this section will all be written inside **cs231n/classifiers/softmax.py**. \n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "loss: 2.364941\n", "sanity check: 2.302585\n" ] } ], "source": [ "# First implement the naive softmax loss function with nested loops.\n", "# Open the file cs231n/classifiers/softmax.py and implement the\n", "# softmax_loss_naive function.\n", "\n", "from cs231n.classifiers.softmax import softmax_loss_naive\n", "import time\n", "\n", "# Generate a random softmax weight matrix and use it to compute the loss.\n", "W = np.random.randn(3073, 10) * 0.0001\n", "loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)\n", "\n", "# As a rough sanity check, our loss should be something close to -log(0.1).\n", "print('loss: %f' % loss)\n", "print('sanity check: %f' % (-np.log(0.1)))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Inline Question #1\n", "Why do we expect our loss to be close to -log(0.1)? Explain briefly.\n", "\n", "$\\color{blue}{\\textit Your Answer:}$ *There are ten classes here, so if the scores are random and mostly equal, we expect the ratio in the softmax formula to be $\\approx$ 0.1.*\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "numerical: 4.414098 analytic: 4.414098, relative error: 8.407148e-09\n", "numerical: 0.115917 analytic: 0.115917, relative error: 2.561369e-07\n", "numerical: -1.337843 analytic: -1.337843, relative error: 2.223465e-10\n", "numerical: 1.663433 analytic: 1.663433, relative error: 2.801221e-09\n", "numerical: 0.345482 analytic: 0.345482, relative error: 2.811123e-08\n", "numerical: -2.875655 analytic: -2.875655, relative error: 1.403804e-08\n", "numerical: 3.905061 analytic: 3.905061, relative error: 7.269116e-09\n", "numerical: -1.111623 analytic: -1.111623, relative error: 1.084694e-08\n", "numerical: 1.578582 analytic: 1.578581, relative error: 1.587520e-08\n", "numerical: -2.557901 analytic: -2.557901, relative error: 4.136910e-09\n", "numerical: -0.080359 analytic: -0.080359, relative error: 3.252329e-07\n", "numerical: -1.137684 analytic: -1.137684, relative error: 2.093039e-08\n", "numerical: 0.979711 analytic: 0.979711, relative error: 9.203386e-08\n", "numerical: 1.420139 analytic: 1.420139, relative error: 1.984016e-08\n", "numerical: 3.553916 analytic: 3.553915, relative error: 1.270606e-08\n", "numerical: -3.555908 analytic: -3.555908, relative error: 7.909795e-09\n", "numerical: 0.037696 analytic: 0.037696, relative error: 2.082735e-07\n", "numerical: -2.779842 analytic: -2.779842, relative error: 3.970456e-09\n", "numerical: -1.119180 analytic: -1.119180, relative error: 2.000238e-09\n", "numerical: -1.375382 analytic: -1.375382, relative error: 5.295108e-10\n" ] } ], "source": [ "# Complete the implementation of softmax_loss_naive and implement a (naive)\n", "# version of the gradient that uses nested loops.\n", "loss, grad = softmax_loss_naive(W, X_dev, y_dev, 0.0)\n", "\n", "# As we did for the SVM, use numeric gradient checking as a debugging tool.\n", "# The numeric gradient should be close to the analytic gradient.\n", "from cs231n.gradient_check import grad_check_sparse\n", "f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 0.0)[0]\n", "grad_numerical = grad_check_sparse(f, W, grad, 10)\n", "\n", "# similar to SVM case, do another gradient check with regularization\n", "loss, grad = softmax_loss_naive(W, X_dev, y_dev, 5e1)\n", "f = lambda w: softmax_loss_naive(w, X_dev, y_dev, 5e1)[0]\n", "grad_numerical = grad_check_sparse(f, W, grad, 10)" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "naive loss: 2.364941e+00 computed in 0.132490s\n", "vectorized loss: 2.364941e+00 computed in 0.002477s\n", "Loss difference: 0.000000\n", "Gradient difference: 0.000000\n" ] } ], "source": [ "# Now that we have a naive implementation of the softmax loss function and its gradient,\n", "# implement a vectorized version in softmax_loss_vectorized.\n", "# The two versions should compute the same results, but the vectorized version should be\n", "# much faster.\n", "tic = time.time()\n", "loss_naive, grad_naive = softmax_loss_naive(W, X_dev, \n", " y_dev, 0.000005)\n", "toc = time.time()\n", "print('naive loss: %e computed in %fs' % (loss_naive, toc - tic))\n", "\n", "from cs231n.classifiers.softmax import softmax_loss_vectorized\n", "tic = time.time()\n", "loss_vectorized, grad_vectorized = softmax_loss_vectorized(W, X_dev, y_dev, 0.000005)\n", "toc = time.time()\n", "print('vectorized loss: %e computed in %fs' % (loss_vectorized, toc - tic))\n", "\n", "# As we did for the SVM, we use the Frobenius norm to compare the two versions\n", "# of the gradient.\n", "grad_difference = np.linalg.norm(grad_naive - grad_vectorized, ord='fro')\n", "print('Loss difference: %f' % np.abs(loss_naive - loss_vectorized))\n", "print('Gradient difference: %f' % grad_difference)" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "114d12a38f7d46c9a7ed168edb8bb2c6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=3), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "66651a622483449fbb8a526c65619f4c", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=3), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "lr 3.000000e-07 reg 5.000000e+03 train accuracy: 0.372755 val accuracy: 0.384000\n", "lr 3.000000e-07 reg 2.750000e+04 train accuracy: 0.321755 val accuracy: 0.331000\n", "lr 3.000000e-07 reg 5.000000e+04 train accuracy: 0.300612 val accuracy: 0.320000\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "bfe70fa2c14443e2ab100bf8f900439e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=3), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "lr 4.000000e-07 reg 5.000000e+03 train accuracy: 0.375265 val accuracy: 0.382000\n", "lr 4.000000e-07 reg 2.750000e+04 train accuracy: 0.327633 val accuracy: 0.335000\n", "lr 4.000000e-07 reg 5.000000e+04 train accuracy: 0.298449 val accuracy: 0.313000\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0b7289454cb141d88b481d414c2097eb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=3), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "lr 5.000000e-07 reg 5.000000e+03 train accuracy: 0.375714 val accuracy: 0.386000\n", "lr 5.000000e-07 reg 2.750000e+04 train accuracy: 0.318204 val accuracy: 0.335000\n", "lr 5.000000e-07 reg 5.000000e+04 train accuracy: 0.293245 val accuracy: 0.305000\n", "\n", "best validation accuracy achieved during cross-validation: 0.386000\n", "CPU times: user 19min 26s, sys: 21.3 s, total: 19min 48s\n", "Wall time: 4min 16s\n" ] } ], "source": [ "%%time\n", "# Use the validation set to tune hyperparameters (regularization strength and\n", "# learning rate). You should experiment with different ranges for the learning\n", "# rates and regularization strengths; if you are careful you should be able to\n", "# get a classification accuracy of over 0.35 on the validation set.\n", "from cs231n.classifiers import Softmax\n", "results = {}\n", "best_val = -1\n", "best_softmax_clf = None\n", "learning_rates = np.linspace(3e-7, 5e-7, 3)\n", "regularization_strengths = np.linspace(5e3, 5e4, 3)\n", "\n", "################################################################################\n", "# TODO: #\n", "# Use the validation set to set the learning rate and regularization strength. #\n", "# This should be identical to the validation that you did for the SVM; save #\n", "# the best trained softmax classifer in best_softmax. #\n", "################################################################################\n", "for lr in tqdm_notebook(learning_rates):\n", " for reg in tqdm_notebook(regularization_strengths):\n", " softmax_clf = Softmax()\n", " _ = softmax_clf.train(X_train, y_train, learning_rate=lr, \n", " reg=reg,\n", " num_iters=1500, verbose=False)\n", " y_train_pred = softmax_clf.predict(X_train)\n", " train_acc = np.mean(y_train == y_train_pred)\n", " y_val_pred = softmax_clf.predict(X_val)\n", " val_acc = np.mean(y_val == y_val_pred)\n", " results[(lr, reg)] = (train_acc, val_acc)\n", " print('lr %e reg %e train accuracy: %f val accuracy: %f' % (\n", " lr, reg, train_acc, val_acc))\n", " if val_acc > best_val:\n", " best_val = val_acc\n", " best_softmax_clf = softmax_clf\n", "################################################################################\n", "# END OF YOUR CODE #\n", "################################################################################\n", " \n", "print('best validation accuracy achieved during cross-validation: %f' % best_val)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Softmax on raw pixels final test set accuracy: 0.377000\n" ] } ], "source": [ "# evaluate on test set\n", "# Evaluate the best softmax on test set\n", "y_test_pred = best_softmax_clf.predict(X_test)\n", "test_accuracy = np.mean(y_test == y_test_pred)\n", "print('Softmax on raw pixels final test set accuracy: %f' % (test_accuracy, ))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Inline Question 2** - *True or False*\n", "\n", "Suppose the overall training loss is defined as the sum of the per-datapoint loss over all training examples. It is possible to add a new datapoint to a training set that would leave the SVM loss unchanged, but this is not the case with the Softmax classifier loss.\n", "\n", "$\\color{blue}{\\textit Your Answer:}$ *True*\n", "\n", "\n", "$\\color{blue}{\\textit Your Explanation:}$ Hinge (or SVM) loss can be strictly equal to zero for data points with big enough margin. But logarithmic loss (Softmax classifier loss) is always positive. \n" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "# Visualize the learned weights for each class\n", "w = best_softmax_clf.W[:-1,:] # strip out the bias\n", "w = w.reshape(32, 32, 3, 10)\n", "\n", "w_min, w_max = np.min(w), np.max(w)\n", "\n", "classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\n", "for i in range(10):\n", " plt.subplot(2, 5, i + 1)\n", " \n", " # Rescale the weights to be between 0 and 255\n", " wimg = 255.0 * (w[:, :, :, i].squeeze() - w_min) / (w_max - w_min)\n", " plt.imshow(wimg.astype('uint8'))\n", " plt.axis('off')\n", " plt.title(classes[i])" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 1 }