{ "cells": [ { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "075631be66c2978b2d434e0232d09471", "grade": false, "grade_id": "cell-c85ad046711cfabb", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "# Transfer Learning and Fine Tuning" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "6e9cde26cf0e76bdc442222c5ca2e39b", "grade": false, "grade_id": "cell-7212ec48a57532a1", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "__Before starting, we recommend you enable GPU acceleration if you're running on Colab.__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "88c6e9705c2e02a8edd0198ba5e9e916", "grade": false, "grade_id": "cell-a5158fd078d07951", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "# Execute this code block to install dependencies when running on colab\n", "try:\n", " import torch\n", "except:\n", " from os.path import exists\n", " from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag\n", " platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())\n", " cuda_output = !ldconfig -p|grep cudart.so|sed -e 's/.*\\.\\([0-9]*\\)\\.\\([0-9]*\\)$/cu\\1\\2/'\n", " accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'\n", "\n", " !pip install -q http://download.pytorch.org/whl/{accelerator}/torch-1.0.0-{platform}-linux_x86_64.whl torchvision\n", "\n", "try: \n", " import torchbearer\n", "except:\n", " !pip install torchbearer" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "fb49fd29bdf8c5c031d13418343f8d16", "grade": false, "grade_id": "cell-945ef0d7026ec3ea", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Getting started \n", "\n", "Start by downloading and unzipping the data set. The code in the following block will download and unzip the data we're going to use." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "b506aa57373622598f26aa6971f1ec37", "grade": false, "grade_id": "cell-cf326d4e1417ed60", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "from os.path import exists\n", "if not exists('data'):\n", " !wget -O boat-data.zip https://artist-cloud.ecs.soton.ac.uk/index.php/s/eAhIkhhdxgmhRHj/download\n", " !unzip boat-data.zip" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "fb5c7860ce433295f3b3b23fafe6c96b", "grade": false, "grade_id": "cell-4cdb8f91c289006a", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We'll start by exploring the data, and look at how we can get that data loaded into memory through python code. We can just use the `ls` command from with this notebook to explore the data set:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "592f0ac72200d1a9280bd02e36bd6bcf", "grade": false, "grade_id": "cell-198f03f582659c48", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "!ls data\n", "!ls data/test" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "23cc1ac60bb63996d0e16c1afa283d04", "grade": false, "grade_id": "cell-106f682e33179d43", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "If you explore the `data` directory you should see three folders:\n", "\n", "\t- The `train` folder contains the training data & is broken into subdirectories for each class. \n", "\t- The `valid` folder contains the validation data & is broken into subdirectories for each class. \n", "\t- The `test` folder contains the testing data & is broken into subdirectories for each class. \n", " \n", "The following displays one of the images:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "4b007287a84d0ff7b07a2f33af11ffe5", "grade": false, "grade_id": "cell-28b5931a811121fa", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "from IPython.display import Image\n", "Image(\"data/test/Alilaguna/20130412_064059_20202.jpg\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "b376786779742d0ffa9fed69f7d017dc", "grade": false, "grade_id": "cell-c5ce4a26e04b4d70", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "The `torchvision` library has support for directly reading images from a directory structure like the one we have using the `torchvision.datasets.ImageFolder` class. In addition to loading the images directly, `torchvision` provides a mechanism to dynamically augment the data being read by applying random transformations (flipping, rotating, etc), as well as cropping and scaling the images. The following code will generate a visualisation of the first batch of images produced by the data loader:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "71e1257edcf70e0a62db671fa6b2c528", "grade": false, "grade_id": "cell-aac23c7f12a0e77e", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "# Plot ad hoc data instances\n", "from torchvision.datasets import ImageFolder\n", "from torch.utils.data import DataLoader\n", "from torchvision import transforms \n", "import matplotlib\n", "import matplotlib.pyplot as plt\n", "import numpy\n", "\n", "transform = transforms.Compose([\n", " transforms.Resize((240, 800)),\n", " transforms.ToTensor() # convert to tensor\n", "])\n", "\n", "train_dataset = ImageFolder(\"data/train\", transform)\n", "train_loader = DataLoader(train_dataset, batch_size=4, shuffle=True)\n", "\n", "# generate the first batch\n", "(batch_images, batch_labels) = train_loader.__iter__().__next__()\n", "\n", "# plot 4 images\n", "plt.subplot(221).set_title(train_dataset.classes[batch_labels[0]])\n", "plt.imshow(batch_images[0].permute(1, 2, 0), aspect='equal')\n", "plt.subplot(222).set_title(train_dataset.classes[batch_labels[1]])\n", "plt.imshow(batch_images[1].permute(1, 2, 0), aspect='equal')\n", "plt.subplot(223).set_title(train_dataset.classes[batch_labels[2]])\n", "plt.imshow(batch_images[2].permute(1, 2, 0), aspect='equal')\n", "plt.subplot(224).set_title(train_dataset.classes[batch_labels[3]])\n", "plt.imshow(batch_images[3].permute(1, 2, 0), aspect='equal')\n", "\n", "# show the plot\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "620edbb4fae3212d8c660eb55ffbb11d", "grade": false, "grade_id": "cell-b1931beaae104400", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## A simple CNN for boat classification\n", "\n", "Now let's try something a little more challenging and take our `BetterCNN` convolutional network from the experiments with mnist last week, and apply it to the problem of boat classification. Firstly we need to setup the data for training, and it would also be sensible to load the validation data to monitor performance during training, as well as load the test data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "c1154c6604d52b4491f8fe8576d8b1d8", "grade": false, "grade_id": "cell-26708de0ffdcce1e", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "# the number of images that will be processed in a single step\n", "batch_size=128\n", "# the size of the images that we'll learn on - we'll shrink them from the original size for speed\n", "image_size=(30, 100)\n", "\n", "transform = transforms.Compose([\n", " transforms.Resize(image_size),\n", " transforms.ToTensor() # convert to tensor\n", "])\n", "\n", "train_dataset = ImageFolder(\"data/train\", transform)\n", "train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True)\n", "\n", "val_dataset = ImageFolder(\"data/valid\", transform)\n", "val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False)\n", "\n", "test_dataset = ImageFolder(\"data/test\", transform)\n", "test_loader = DataLoader(test_dataset, batch_size=batch_size, shuffle=False)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "48e9a50b111cdae9523c6c447f8f798f", "grade": false, "grade_id": "cell-7a92c71762aab7cb", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Note that for now we're not using any data augmentation from the training data, however we've structured the code so that we can easily add it by manipulating the transforms that are applied. Note that if we dot that, we probably wouldn't want to do the augmentation of the validation/test sets, so we would need a separate set of transforms for those. \n", "\n", "Now we can add the network definition from last week. We'll make a slight change to the previous `BetterCNN` function so that it allows us to specify the number of input channels and the number of output classes in the constructor. In the following block you'll need to paste your implementation of the `BetterCNN` forward method from last week:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "e5f4ae82205592bf691b6e2873a8106f", "grade": true, "grade_id": "cell-bffd4bd9e7e1cc22", "locked": false, "points": 0, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "import torch \n", "import torch.nn.functional as F\n", "from torch import nn\n", "\n", "# Model Definition\n", "class BetterCNN(nn.Module):\n", " def __init__(self, n_channels_in, n_classes):\n", " super(BetterCNN, self).__init__()\n", " self.conv1 = nn.Conv2d(n_channels_in, 30, (5, 5), padding=0)\n", " self.conv2 = nn.Conv2d(30, 15, (3, 3), padding=0)\n", " self.fc1 = nn.Linear(1725, 128)\n", " self.fc2 = nn.Linear(128, 50)\n", " self.fc3 = nn.Linear(50, n_classes)\n", " \n", " def forward(self, x):\n", " # YOUR CODE HERE\n", " raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "5dfeb519f9c6d704ed78126e6f267d67", "grade": false, "grade_id": "cell-c5f081021e23a8c7", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We're now in a position to add the code to fit the model - we'll use `torchbearer` to help us out:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "cba1c1d7e7ba51e9671aa6b83f45e305", "grade": false, "grade_id": "cell-f5f8fedf684a983a", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "import torchbearer\n", "from torchbearer import Trial\n", "from torch import optim\n", "\n", "model = BetterCNN(3, len(train_dataset.classes))\n", "\n", "# define the loss function and the optimiser\n", "loss_function = nn.CrossEntropyLoss()\n", "optimiser = optim.Adam(model.parameters())\n", "\n", "device = \"cuda:0\" if torch.cuda.is_available() else \"cpu\"\n", "trial = Trial(model, optimiser, loss_function, metrics=['loss', 'accuracy']).to(device)\n", "trial.with_generators(train_loader, val_generator=val_loader, test_generator=test_loader)\n", "trial.run(epochs=10)\n", "results = trial.evaluate(data_key=torchbearer.VALIDATION_DATA)\n", "print()\n", "print(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once the model is trained, we can investigate the performance on the test data. Rather than just look at the average accuracy we need to dig into a little more detail, so we'll use the `classification_report` method in scikit-learn to compute per-class precision and recall values:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "b545486735a6e28e089b597cb0d49344", "grade": false, "grade_id": "cell-4aa1283f8cbbddfe", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "predictions = trial.predict()\n", "predicted_classes = predictions.argmax(1).cpu()\n", "true_classes = list(x for (_,x) in test_dataset.samples)\n", "\n", "from sklearn import metrics\n", "print(metrics.classification_report(true_classes, predicted_classes, target_names=train_dataset.classes))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "d812aa8c982413a75906473af54a94b3", "grade": false, "grade_id": "cell-1c5a0fd0d8866367", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "In this particular case you should observe that the overall accuracies are a bit mixed; whist the average is high, it is clear that our model doesnt work well for some classes. Be aware though that we're using a relatively small set of both training and validation data, and that there is a very high bias in the class distribution which inevitably could lead to higher accuracies because of common classes. As we mentioned in the lab last week, this network architecture isn't any thing like one that has won the image classification challenges." ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "c02058c5bfadbdded219bb30f71f1887", "grade": false, "grade_id": "cell-81619bed36921f4e", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Using a better network model - transferring and finetuning a pretrained ResNet\n", "\n", "Training a network from scratch can be a lot of work. Is there some way we could take an existing network trained on some data with one set of labels, and adapt it to work on a different data set with different labels? Assuming that the inputs of the network are equivalent (for example, image with the same number of bands and size), then the answer is an emphatic yes! This process of \"finetuning\" a pre-trained network has become common-place as its much faster an easier than starting from scratch. This approach will also help us better work with the small amount of data that we have to train with.\n", "\n", "Let's try this in practice - we'll start by loading a pre-trained network architecture called a Deep Residual Network (or ResNet for short) that has been trained on the 1000-class ImageNet dataset. The ResNet architecture is very deep - it has many (in our case 50) convolutional layers and is currently one of the best performing architectures on the ImageNet challenge. The PyTorch model zoo contains code that implements the resnet50 architecture, and automatically downloads the pre-trained model weights. We'll start by using this to load the model and test it by classifying an image:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "964bfed4a3e2042bc13652222518eda9", "grade": false, "grade_id": "cell-287b78a01e335e03", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "from torchvision.models import resnet50\n", "from urllib.request import urlopen\n", "\n", "imagenet_labels = urlopen(\"https://raw.githubusercontent.com/Lasagne/Recipes/master/examples/resnet50/imagenet_classes.txt\").read().decode('utf-8').split(\"\\n\")\n", "\n", "model = resnet50(pretrained=True)\n", "model.eval()\n", "\n", "preprocess_input = transforms.Compose([\n", " transforms.Resize(224),\n", " transforms.CenterCrop(224),\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", " std=[0.229, 0.224, 0.225]),\n", " ])\n", "\n", "from PIL import Image as PImage\n", "img_path = 'data/mf.jpg'\n", "img = PImage.open(img_path)\n", "\n", "print(preprocess_input(img))\n", "preds = model(preprocess_input(img).unsqueeze(0))\n", "\n", "_, indexes = preds.topk(5)\n", "for i in indexes[0]:\n", " print('Predicted:', imagenet_labels[i])" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "1d2c14847a8c53d4f62b8a2ad021a486", "grade": false, "grade_id": "cell-697c601dae23fcf6", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "If you run the above code (it will take a little longer the first time as the model is downloaded), it should indicate that our input image was likely to contain a fish! Run the following block to display the image and verify this.\n", "\n", "The `preprocess_input` transformation in the above is important - it's responsible for applying the same normalisation operations to our input image as were applied to the images when the network was trained, as well as getting the image to the 224x224 size expected by the network." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "0a00c1d41d7aac118682a25f18470ffd", "grade": false, "grade_id": "cell-9a06591746c94d90", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "Image(\"data/mf.jpg\")" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "51d1912db209330e6db31aacc7dfe63d", "grade": false, "grade_id": "cell-7f846ba428d1dc2e", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "In the above code to load the pretrained ResNet model, after the model is loaded we call `model.eval()`. Why is this? What happens if you don't make this call before using the model? __Write your answers in the following block:__" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "checksum": "48e029a4ee1c0f55c0ad6905e567215b", "grade": true, "grade_id": "cell-5cbf588bbdc529b1", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "826829f3ff9cb155aea3c18bb92d10e4", "grade": false, "grade_id": "cell-191458497767e7e9", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "We're now in a position to start to hack the model structure. Fundamentally we need to first remove the classification layer at the end of the model and replace it with a new one (with a different number of classes). Let's print out the model structure:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "407df9afd7caafc57826fb2c735a9b16", "grade": false, "grade_id": "cell-dcb5be4d7ed3e21e", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "print(model)" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "81cf51fa8f14ca4a06e14c99c97f4538", "grade": false, "grade_id": "cell-97edcaa45ce5c45a", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Looking at the model we can see that really what we want to do is just replace the last fully connected layer with one with a different number of features. There are many ways in which PyTorch allows us to do this, but the simplest is to just do a direct replacement. \n", "\n", "We'll also make a small change to the pooling layer before the last fully connected layer; in the default ResNet model 2D average pooling is used to reduce a 2048x7x7 tensor to 2048x1x1 by using a 7x7 pooling window. This is precisely the definition of Global Average Pooling, however it is implemented with a fixed size window. As our boat images are not square, it would be nice if we could use them at a high resolution without cropping. We can replace the pooling layer with an `nn.AdaptiveAvgPool2d` layer that will perform real Global Average Pooling by dynamically scaling the window to whatever spatial size the input feature maps have:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "559d270eaf01ba3dd6973c88e22d9697", "grade": false, "grade_id": "cell-fcd0e2300471e082", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "model = resnet50(pretrained=True)\n", "model.avgpool = nn.AdaptiveAvgPool2d((1,1))\n", "model.fc = nn.Linear(2048, len(train_dataset.classes))\n", "model.train()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "aa3fd4dee5e59227b0c84905ee181b03", "grade": false, "grade_id": "cell-f724d96dd892688e", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "The actual process of finetuning involves us now training the model with our own data. As the network is already largely trained, we'll likely want to use a small learning rate so not to make big changes in weights. Often we'll first \"freeze\" the weights of the already trained layers whilst we learn initial weights for our new layer to avoid overfitting while training:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "e6de92ab884d3ffc4b33b7757268661a", "grade": false, "grade_id": "cell-01d15125266f4bcb", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "# Freeze layers by not tracking gradients\n", "for param in model.parameters():\n", " param.requires_grad = False\n", "model.fc.weight.requires_grad = True #unfreeze last layer weights\n", "model.fc.bias.requires_grad = True #unfreeze last layer biases\n", "\n", "optimiser = optim.Adam(filter(lambda p: p.requires_grad, model.parameters()), lr=1e-4) #only optimse non-frozen layers" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "5d5951450c8a455a0b6fde2cd4f5aee3", "grade": false, "grade_id": "cell-aba07f0f08d01caa", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "If we have lots of training data we could then unlock these layers and perform end-to-end finetuning afterwards. The Standford CS231n course pages have lots of useful hints on fine-tuning: http://cs231n.github.io/transfer-learning/\n", "\n", "__Use the following block to try finetuning the resnet50 with the boat data. You'll need a GPU to do this effectively as it's _rather_ slow!__" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "cb4d06cba9bead1623d69c9ef3c6d5b9", "grade": true, "grade_id": "cell-5bdd554860b3ba46", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "c857362570e8e94b022f99495b0313e7", "grade": false, "grade_id": "cell-7884760fe185fba1", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Use the following block to note down the performance you achieve by finetuning the model. How does this compare to the `BetterCNN` model used earlier?" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "checksum": "ab979a5055a27ed6066e25f65218366c", "grade": true, "grade_id": "cell-cf0b692e1dd62f46", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "4c7daa9fcc011e7aacd2df393ebe08db", "grade": false, "grade_id": "cell-0830b9249bdad550", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "## Extracting features from a model\n", "\n", "Sometimes you want to do things that are not so easily accomplished with a deep network. You might want to build classifiers using very small amounts of data, or you might want a way of finding things in photographs that are in some way semantically similar, but don't have exactly the same classes. CNNs can actually help here using a technique known often called transfer learning (and related to the fine tuning that we just looked at). If we assume we have a trained network, then by extracting vectors from the layers before the final classifier we should have a means of achieving these tasks, as the vectors are likely to strongly encode semantic information about the content of the input image. If we wanted to quickly train new classifiers for new classes, we could for instance just use a relatively simple linear classifier trained on these vectors. If we wanted to find semantically similar images, we could just compare the Euclidean distance of these vectors.\n", "\n", "PyTorch makes it pretty easy to get these vector representations. The following code gets a resnet model in which the a feature vector is computed from the penultimate layer of the network by applying a _global average pooling_ operation over the feature maps:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "784d9ba2ba6e69cbb039a6438bec9e47", "grade": false, "grade_id": "cell-9538f74d127ff036", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "from torch import nn\n", "model = resnet50(pretrained=True)\n", "feature_extractor_model = nn.Sequential(*list(model.children())[:-2], nn.AdaptiveAvgPool2d((1,1)))\n", "feature_extractor_model.eval()\n", "feature_extractor_model = feature_extractor_model.to(device)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With this model, extract the features for some inputs. To demonstrate, we can put the whole thing together and generate a vector from an image:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "fde4c00c8010435b3e8f84fd8af9760d", "grade": false, "grade_id": "cell-9f3a6ed567057514", "locked": true, "schema_version": 1, "solution": false } }, "outputs": [], "source": [ "transform = transforms.Compose([\n", " transforms.Resize((240, 800)),\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[0.485, 0.456, 0.406],\n", " std=[0.229, 0.224, 0.225]),\n", "])\n", "\n", "img_path = 'data/test/Alilaguna/20130412_064059_20202.jpg'\n", "img = PImage.open(img_path)\n", "feature = feature_extractor_model(preprocess_input(img).unsqueeze(0).to(device))\n", "print('Feature shape:', feature.shape)\n", "print('Feature data:', feature[0].reshape(-1))" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "1546aab02e723b5442384ca8643c7add", "grade": false, "grade_id": "cell-29976f9663d07a8b", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Use the following block to generate some features for some different images in the test set, and calculate the Euclidean distances between these features. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "31d466b9c60b1ce7c3d1557a148efa67", "grade": true, "grade_id": "cell-f946a8fe42c5e62e", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "c6132b5a4111d64f5a6caed93b916094", "grade": false, "grade_id": "cell-f57d9ecd4453171b", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "In the following block describe how the Euclidean distances compare to your perception of similarity between the images:" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "checksum": "319e561084e3b932bba69d4de8906755", "grade": true, "grade_id": "cell-90270dfd9d4d4401", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "source": [ "YOUR ANSWER HERE" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "14b53d43a0adb59b5442eb0838793e19", "grade": false, "grade_id": "cell-d59ce90c3c416dd4", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Let's now explore how features from a ResNet could be used with a classifier that works well with small amounts of training data - a Support Vector Machine. Use the following block to train a multiclass SVM using the `sklearn.svm.SVC` class to learn the boat classes using the features extracted by the ResNet50 with the default imagenet weights. Once you've trained it (and optimised the parameters) print out a classification report. \n", "\n", "We've provided all the features and labels in the form of `numpy` arrays so you don't have to extract them (which can take a few minutes; you can view the code we used [here](https://github.com/ecs-vlc/COMP6258/blob/master/docs/labs/lab6/genfeats.py)):" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "deletable": false, "nbgrader": { "checksum": "78b26fda1c740bc9d01af86a7489bf17", "grade": true, "grade_id": "cell-4fb9cc66b42a5911", "locked": false, "points": 10, "schema_version": 1, "solution": true } }, "outputs": [], "source": [ "import numpy as np\n", "\n", "if not exists('training_features.npy'):\n", " !wget -O Resnet50Features.zip https://artist-cloud.ecs.soton.ac.uk/index.php/s/P68OB07DquOwSR7/download\n", " !unzip Resnet50Features.zip\n", "\n", "training_features = np.load('training_features.npy')\n", "training_labels = np.load('training_labels.npy')\n", "\n", "valid_features = np.load('valid_features.npy')\n", "valid_labels = np.load('valid_labels.npy')\n", "\n", "testing_features = np.load('testing_features.npy')\n", "testing_labels = np.load('testing_labels.npy')\n", "\n", "# YOUR CODE HERE\n", "raise NotImplementedError()" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "editable": false, "nbgrader": { "checksum": "128b352148003f71f3f6d871dcdd4328", "grade": false, "grade_id": "cell-180cc4348d118b8f", "locked": true, "schema_version": 1, "solution": false } }, "source": [ "Use the following block to reflect on how the test performance compares to the finetuned ResNet. What is the difference in time taken to train and evaluate the different approaches?" ] }, { "cell_type": "markdown", "metadata": { "deletable": false, "nbgrader": { "checksum": "b304991b1611dd38841143f01bc862f3", "grade": true, "grade_id": "cell-6c2f3d1c291704ff", "locked": false, "points": 5, "schema_version": 1, "solution": true } }, "source": [ "YOUR ANSWER HERE" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.7" } }, "nbformat": 4, "nbformat_minor": 2 }