{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Lesson 1 - What's your pet" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Welcome to lesson 1! For those of you who are using a Jupyter Notebook for the first time, you can learn about this useful tool in a tutorial we prepared specially for you; click `File`->`Open` now and click `notebook_tutorial.ipynb`. \n", "\n", "In this lesson we will build our first image classifier from scratch, and see if we can achieve world-class results. Let's dive in!\n", "\n", "Every notebook starts with the following three lines; they ensure that any edits to libraries you make are reloaded here automatically, and also that any charts or images displayed are shown in this notebook." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We import all the necessary packages. We are going to work with the [fastai V1 library](http://www.fast.ai/2018/10/02/fastai-ai/) which sits on top of [Pytorch 1.0](https://hackernoon.com/pytorch-1-0-468332ba5163). The fastai library provides many useful functions that enable us to quickly and easily build neural networks and train our models." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from fastai import *\n", "from fastai.vision import *" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Looking at the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We are going to use the [Oxford-IIIT Pet Dataset](http://www.robots.ox.ac.uk/~vgg/data/pets/) by [O. M. Parkhi et al., 2012](http://www.robots.ox.ac.uk/~vgg/publications/2012/parkhi12a/parkhi12a.pdf) which features 12 cat breeds and 25 dogs breeds. Our model will need to learn to differentiate between these 37 distinct categories. According to their paper, the best accuracy they could get in 2012 was 59.21%, using a complex model that was specific to pet detection, with separate \"Image\", \"Head\", and \"Body\" models for the pet photos. Let's see how accurate we can be using deep learning!\n", "\n", "We are going to use the `untar_data` function to which we must pass a URL as an argument and which will download and extract the data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "help(untar_data)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# DEBUG\n", "URLs.PETS" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path = untar_data(URLs.PETS)\n", "path" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path.ls()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "path_anno = path / 'annotations'\n", "path_img = path / 'images'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first thing we do when we approach a problem is to take a look at the data. We _always_ need to understand very well what the problem is and what the data looks like before we can figure out how to solve it. Taking a look at the data means understanding how the data directories are structured, what the labels are and what some sample images look like.\n", "\n", "The main difference between the handling of image classification datasets is the way labels are stored. In this particular dataset, labels are stored in the filenames themselves. We will need to extract them to be able to classify the images into the correct categories. Fortunately, the fastai library has a handy function made exactly for this, `ImageDataBunch.from_name_re` gets the labels from the filenames using a [regular expression](https://docs.python.org/3.6/library/re.html)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# DEBUG\n", "path_anno, path_img" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fnames = get_image_files(path_img)\n", "fnames[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(2)\n", "pat = r'/([^/]+)_\\d+.jpg$'" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224)\n", "data.normalize(imagenet_stats)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "data.show_batch(rows=3, figsize=(7,6))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "print(data.classes)\n", "len(data.classes), data.c # the number of classes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training: resnet34" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will start training our model. We will use a [convolutional neural network](http://cs231n.github.io/convolutional-networks/) backbone and a fully connected head with a single hidden layer as a classifier. Don't know what these things mean? Not to worry, we will dive deeper in the coming lessons. For the moment you need to know that we are building a model which will take images as input and will output the predicted probability for each of the categories (in this case, it will have 37 ouptuts).\n", "\n", "We will train for 5 epochs (5 cycles through all our data)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn = ConvLearner(data, models.resnet34, metrics=error_rate)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit_one_cycle(4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.save('stage-1')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see what results we have got. \n", "\n", "We will first see which were the categories that the model most confused with one another. We will try to see if what the model predicted was reasonable or not. In this case the mistakes look reasonable (none of the mistakes seems obviously naive). This is an indicator that our classifier is working correctly. \n", "\n", "Furthermore, when we plot the confusion matrix, we can see that the distribution is heavily skewed: the model makes the same mistakes over and over again but it rarely confuses other categories. This suggests that it just finds it difficult to distinguish some specific categories between each other; this is normal behaviour." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "interp = ClassificationInterpretation.from_learner(learn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "interp.plot_top_losses(9, figsize=(15,11))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "doc(interp.plot_top_losses)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "interp.plot_confusion_matrix(figsize=(12,12), dpi=60)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "interp.most_confused(min_val=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Unfreezing, fine-tuning, and learning rates" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since our model is working as we expect it to, we will *unfreeze* our model and train some more." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.unfreeze()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit_one_cycle(1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.load('stage-1')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "learn.lr_find()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Fix the previous error by updating fastai library to the latest version today**" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.lr_find()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.recorder.plot()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.unfreeze()\n", "learn.fit_one_cycle(2, max_lr=slice(1e-6, 1e-4))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "doc(learn.fit_one_cycle)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.recorder.plot()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "That's a pretty accurate model!" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Training: resnet50" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we will train in the same way as before but with one caveat: instead of using resnet34 as our backbone we will use resnet50 (resnet34 is a 34 layer residual network while resnet50 has 50 layers. Later in the course you can learn the details in the [resnet paper](https://arxiv.org/pdf/1512.03385.pdf)).\n", "\n", "Basically, resnet50 usually performs better because it is a deeper network with more parameters. Let's see if we can achieve a higher performance here." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=299, bs=48)\n", "data.normalize(imagenet_stats)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn = ConvLearner(data, models.resnet50, metrics=error_rate)" ] }, { "cell_type": "code", "execution_count": 34, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HBox(children=(IntProgress(value=0, max=5), HTML(value='0.00% [0/5 00:00<00:00]'))), HTML(value…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total time: 17:07\n", "epoch train loss valid loss error_rate\n", "1 0.641380 0.236811 0.068556 (03:56)\n", "2 0.341766 0.211440 0.069875 (03:18)\n", "3 0.242888 0.207498 0.069875 (03:17)\n", "4 0.159888 0.167932 0.054054 (03:17)\n", "5 0.107927 0.163711 0.050758 (03:17)\n", "\n" ] } ], "source": [ "learn.fit_one_cycle(5)" ] }, { "cell_type": "code", "execution_count": 35, "metadata": {}, "outputs": [], "source": [ "learn.save('stage-1-50')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It's astonishing that it's possible to recognize pet breeds so accurately! Let's see if full fine-tuning helps:" ] }, { "cell_type": "code", "execution_count": 36, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HBox(children=(IntProgress(value=0, max=1), HTML(value='0.00% [0/1 00:00<00:00]'))), HTML(value…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total time: 04:24\n", "epoch train loss valid loss error_rate\n", "1 0.103458 0.155929 0.051417 (04:24)\n", "\n" ] } ], "source": [ "learn.unfreeze()\n", "learn.fit_one_cycle(1, max_lr=slice(1e-6,1e-4))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case it doesn't, so let's go back to our previous model." ] }, { "cell_type": "code", "execution_count": 37, "metadata": {}, "outputs": [], "source": [ "learn.load('stage-1-50')" ] }, { "cell_type": "code", "execution_count": 38, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=16), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=16), HTML(value='0.00% [0/16 00:00<00:00]')))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "interp = ClassificationInterpretation.from_learner(learn)" ] }, { "cell_type": "code", "execution_count": 39, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[('Egyptian_Mau', 'Bengal', 7),\n", " ('american_bulldog', 'staffordshire_bull_terrier', 4),\n", " ('staffordshire_bull_terrier', 'american_bulldog', 4),\n", " ('boxer', 'american_bulldog', 4),\n", " ('Russian_Blue', 'British_Shorthair', 4),\n", " ('american_pit_bull_terrier', 'staffordshire_bull_terrier', 4),\n", " ('Ragdoll', 'Birman', 3)]" ] }, "execution_count": 39, "metadata": {}, "output_type": "execute_result" } ], "source": [ "interp.most_confused(min_val=2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Other data formats" ] }, { "cell_type": "code", "execution_count": 40, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Downloading http://files.fast.ai/data/examples/mnist_sample\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e2eca45b5e3a4a01820ba9cd24f7a0fa", "version_major": 2, "version_minor": 0 }, "text/plain": [ "HBox(children=(IntProgress(value=0, max=3214948), HTML(value='')))" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/plain": [ "PosixPath('/home/ubuntu/.fastai/data/mnist_sample')" ] }, "execution_count": 40, "metadata": {}, "output_type": "execute_result" } ], "source": [ "path = untar_data(URLs.MNIST_SAMPLE)\n", "path" ] }, { "cell_type": "code", "execution_count": 42, "metadata": {}, "outputs": [], "source": [ "tfms = get_transforms(do_flip=False)\n", "data = ImageDataBunch.from_folder(path, ds_tfms=tfms, size=26)" ] }, { "cell_type": "code", "execution_count": 43, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data.show_batch(rows=3, figsize=(5,5))" ] }, { "cell_type": "code", "execution_count": 44, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HBox(children=(IntProgress(value=0, max=2), HTML(value='0.00% [0/2 00:00<00:00]'))), HTML(value…" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "Total time: 00:23\n", "epoch train loss valid loss accuracy\n", "1 0.114837 0.023115 0.993621 (00:12)\n", "2 0.067166 0.014504 0.995584 (00:10)\n", "\n" ] } ], "source": [ "learn = ConvLearner(data, models.resnet18, metrics=accuracy)\n", "learn.fit(2)" ] }, { "cell_type": "code", "execution_count": 45, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namelabel
0train/3/7463.png0
1train/3/21102.png0
2train/3/31559.png0
3train/3/46882.png0
4train/3/26209.png0
\n", "
" ], "text/plain": [ " name label\n", "0 train/3/7463.png 0\n", "1 train/3/21102.png 0\n", "2 train/3/31559.png 0\n", "3 train/3/46882.png 0\n", "4 train/3/26209.png 0" ] }, "execution_count": 45, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df = pd.read_csv(path / 'labels.csv')\n", "df.head()" ] }, { "cell_type": "code", "execution_count": 46, "metadata": {}, "outputs": [], "source": [ "data = ImageDataBunch.from_csv(path, ds_tfms=tfms, size=28)" ] }, { "cell_type": "code", "execution_count": 47, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1]" ] }, "execution_count": 47, "metadata": {}, "output_type": "execute_result" }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "data.show_batch(rows=3, figsize=(5,5))\n", "data.classes" ] }, { "cell_type": "code", "execution_count": 48, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[0, 1]" ] }, "execution_count": 48, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = ImageDataBunch.from_df(path, df, ds_tfms=tfms, size=24)\n", "data.classes" ] }, { "cell_type": "code", "execution_count": 49, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[PosixPath('/home/ubuntu/.fastai/data/mnist_sample/train/3/7463.png'),\n", " PosixPath('/home/ubuntu/.fastai/data/mnist_sample/train/3/21102.png')]" ] }, "execution_count": 49, "metadata": {}, "output_type": "execute_result" } ], "source": [ "fn_paths = [path / name for name in df['name']]; fn_paths[:2]" ] }, { "cell_type": "code", "execution_count": 50, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['3', '7']" ] }, "execution_count": 50, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pat = r\"/(\\d)/\\d+\\.png$\"\n", "data = ImageDataBunch.from_name_re(path, fn_paths, pat=pat, ds_tfms=tfms, size=24)\n", "data.classes" ] }, { "cell_type": "code", "execution_count": 51, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['3', '7']" ] }, "execution_count": 51, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = ImageDataBunch.from_name_func(path, fn_paths, ds_tfms=tfms, size=24,\n", " label_func = lambda x: '3' if '/3/' in str(x) else '7')\n", "data.classes" ] }, { "cell_type": "code", "execution_count": 52, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['3', '3', '3', '3', '3']" ] }, "execution_count": 52, "metadata": {}, "output_type": "execute_result" } ], "source": [ "labels = [('3' if '/3/' in str(x) else '7') for x in fn_paths]\n", "labels[:5]" ] }, { "cell_type": "code", "execution_count": 53, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "['3', '7']" ] }, "execution_count": 53, "metadata": {}, "output_type": "execute_result" } ], "source": [ "data = ImageDataBunch.from_lists(path, fn_paths, labels=labels, ds_tfms=tfms, size=24)\n", "data.classes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.6" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": false, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }