{ "cells": [ { "cell_type": "markdown", "metadata": { "tags": [ "pdf-title" ] }, "source": [ "# Image features exercise\n", "*Complete and hand in this completed worksheet (including its outputs and any supporting code outside of the worksheet) with your assignment submission. For more details see the [assignments page](http://vision.stanford.edu/teaching/cs231n/assignments.html) on the course website.*\n", "\n", "We have seen that we can achieve reasonable performance on an image classification task by training a linear classifier on the pixels of the input image. In this exercise we will show that we can improve our classification performance by training linear classifiers not on raw pixels but on features that are computed from the raw pixels.\n", "\n", "All of your work for this exercise will be done in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "pdf-ignore" ] }, "outputs": [], "source": [ "import random\n", "import numpy as np\n", "from cs231n.data_utils import load_CIFAR10\n", "import matplotlib.pyplot as plt\n", "\n", "\n", "%matplotlib inline\n", "plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots\n", "plt.rcParams['image.interpolation'] = 'nearest'\n", "plt.rcParams['image.cmap'] = 'gray'\n", "\n", "# for auto-reloading extenrnal modules\n", "# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython\n", "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "pdf-ignore" ] }, "source": [ "## Load data\n", "Similar to previous exercises, we will load CIFAR-10 data from disk." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "pdf-ignore" ] }, "outputs": [], "source": [ "from cs231n.features import color_histogram_hsv, hog_feature\n", "\n", "def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):\n", " # Load the raw CIFAR-10 data\n", " cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'\n", "\n", " # Cleaning up variables to prevent loading data multiple times (which may cause memory issue)\n", " try:\n", " del X_train, y_train\n", " del X_test, y_test\n", " print('Clear previously loaded data.')\n", " except:\n", " pass\n", "\n", " X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)\n", " \n", " # Subsample the data\n", " mask = list(range(num_training, num_training + num_validation))\n", " X_val = X_train[mask]\n", " y_val = y_train[mask]\n", " mask = list(range(num_training))\n", " X_train = X_train[mask]\n", " y_train = y_train[mask]\n", " mask = list(range(num_test))\n", " X_test = X_test[mask]\n", " y_test = y_test[mask]\n", " \n", " return X_train, y_train, X_val, y_val, X_test, y_test\n", "\n", "X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "pdf-ignore" ] }, "source": [ "## Extract Features\n", "For each image we will compute a Histogram of Oriented\n", "Gradients (HOG) as well as a color histogram using the hue channel in HSV\n", "color space. We form our final feature vector for each image by concatenating\n", "the HOG and color histogram feature vectors.\n", "\n", "Roughly speaking, HOG should capture the texture of the image while ignoring\n", "color information, and the color histogram represents the color of the input\n", "image while ignoring texture. As a result, we expect that using both together\n", "ought to work better than using either alone. Verifying this assumption would\n", "be a good thing to try for your own interest.\n", "\n", "The `hog_feature` and `color_histogram_hsv` functions both operate on a single\n", "image and return a feature vector for that image. The extract_features\n", "function takes a set of images and a list of feature functions and evaluates\n", "each feature function on each image, storing the results in a matrix where\n", "each column is the concatenation of all feature vectors for a single image." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true, "tags": [ "pdf-ignore" ] }, "outputs": [], "source": [ "from cs231n.features import *\n", "\n", "num_color_bins = 10 # Number of bins in the color histogram\n", "feature_fns = [hog_feature, lambda img: color_histogram_hsv(img, nbin=num_color_bins)]\n", "X_train_feats = extract_features(X_train, feature_fns, verbose=True)\n", "X_val_feats = extract_features(X_val, feature_fns)\n", "X_test_feats = extract_features(X_test, feature_fns)\n", "\n", "# Preprocessing: Subtract the mean feature\n", "mean_feat = np.mean(X_train_feats, axis=0, keepdims=True)\n", "X_train_feats -= mean_feat\n", "X_val_feats -= mean_feat\n", "X_test_feats -= mean_feat\n", "\n", "# Preprocessing: Divide by standard deviation. This ensures that each feature\n", "# has roughly the same scale.\n", "std_feat = np.std(X_train_feats, axis=0, keepdims=True)\n", "X_train_feats /= std_feat\n", "X_val_feats /= std_feat\n", "X_test_feats /= std_feat\n", "\n", "# Preprocessing: Add a bias dimension\n", "X_train_feats = np.hstack([X_train_feats, np.ones((X_train_feats.shape[0], 1))])\n", "X_val_feats = np.hstack([X_val_feats, np.ones((X_val_feats.shape[0], 1))])\n", "X_test_feats = np.hstack([X_test_feats, np.ones((X_test_feats.shape[0], 1))])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Train SVM on features\n", "Using the multiclass SVM code developed earlier in the assignment, train SVMs on top of the features extracted above; this should achieve better results than training SVMs directly on top of raw pixels." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "code" ] }, "outputs": [], "source": [ "# Use the validation set to tune the learning rate and regularization strength\n", "\n", "from cs231n.classifiers.linear_classifier import LinearSVM\n", "\n", "learning_rates = [1e-9, 1e-8, 1e-7]\n", "regularization_strengths = [5e4, 5e5, 5e6]\n", "\n", "results = {}\n", "best_val = -1\n", "best_svm = None\n", "\n", "################################################################################\n", "# TODO: #\n", "# Use the validation set to set the learning rate and regularization strength. #\n", "# This should be identical to the validation that you did for the SVM; save #\n", "# the best trained classifer in best_svm. You might also want to play #\n", "# with different numbers of bins in the color histogram. If you are careful #\n", "# you should be able to get accuracy of near 0.44 on the validation set. #\n", "################################################################################\n", "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n\n", "pass\n", "\n# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n", "\n", "# Print out results.\n", "for lr, reg in sorted(results):\n", " train_accuracy, val_accuracy = results[(lr, reg)]\n", " print('lr %e reg %e train accuracy: %f val accuracy: %f' % (\n", " lr, reg, train_accuracy, val_accuracy))\n", " \n", "print('best validation accuracy achieved during cross-validation: %f' % best_val)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Evaluate your trained SVM on the test set\n", "y_test_pred = best_svm.predict(X_test_feats)\n", "test_accuracy = np.mean(y_test == y_test_pred)\n", "print(test_accuracy)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# An important way to gain intuition about how an algorithm works is to\n", "# visualize the mistakes that it makes. In this visualization, we show examples\n", "# of images that are misclassified by our current system. The first column\n", "# shows images that our system labeled as \"plane\" but whose true label is\n", "# something other than \"plane\".\n", "\n", "examples_per_class = 8\n", "classes = ['plane', 'car', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']\n", "for cls, cls_name in enumerate(classes):\n", " idxs = np.where((y_test != cls) & (y_test_pred == cls))[0]\n", " idxs = np.random.choice(idxs, examples_per_class, replace=False)\n", " for i, idx in enumerate(idxs):\n", " plt.subplot(examples_per_class, len(classes), i * len(classes) + cls + 1)\n", " plt.imshow(X_test[idx].astype('uint8'))\n", " plt.axis('off')\n", " if i == 0:\n", " plt.title(cls_name)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "tags": [ "pdf-inline" ] }, "source": [ "### Inline question 1:\n", "Describe the misclassification results that you see. Do they make sense?\n", "\n", "\n", "$\\color{blue}{\\textit Your Answer:}$\n", "\n", "\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Neural Network on image features\n", "Earlier in this assigment we saw that training a two-layer neural network on raw pixels achieved better classification performance than linear classifiers on raw pixels. In this notebook we have seen that linear classifiers on image features outperform linear classifiers on raw pixels. \n", "\n", "For completeness, we should also try training a neural network on image features. This approach should outperform all previous approaches: you should easily be able to achieve over 55% classification accuracy on the test set; our best model achieves about 60% classification accuracy." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "pdf-ignore" ] }, "outputs": [], "source": [ "# Preprocessing: Remove the bias dimension\n", "# Make sure to run this cell only ONCE\n", "print(X_train_feats.shape)\n", "X_train_feats = X_train_feats[:, :-1]\n", "X_val_feats = X_val_feats[:, :-1]\n", "X_test_feats = X_test_feats[:, :-1]\n", "\n", "print(X_train_feats.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [ "code" ] }, "outputs": [], "source": [ "from cs231n.classifiers.neural_net import TwoLayerNet\n", "\n", "input_dim = X_train_feats.shape[1]\n", "hidden_dim = 500\n", "num_classes = 10\n", "\n", "net = TwoLayerNet(input_dim, hidden_dim, num_classes)\n", "best_net = None\n", "\n", "################################################################################\n", "# TODO: Train a two-layer neural network on image features. You may want to #\n", "# cross-validate various parameters as in previous sections. Store your best #\n", "# model in the best_net variable. #\n", "################################################################################\n", "# *****START OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n\n", "pass\n", "\n# *****END OF YOUR CODE (DO NOT DELETE/MODIFY THIS LINE)*****\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Run your best neural net classifier on the test set. You should be able\n", "# to get more than 55% accuracy.\n", "\n", "test_acc = (best_net.predict(X_test_feats) == y_test).mean()\n", "print(test_acc)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.1" } }, "nbformat": 4, "nbformat_minor": 1 }