{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from nb_003 import *\n", "import nb_002\n", "\n", "import operator\n", "from random import sample\n", "from torch.utils.data.sampler import Sampler" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA_PATH = Path('data')\n", "PATH = DATA_PATH/'caltech101' # http://www.vision.caltech.edu/Image_Datasets/Caltech101/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Caltech 101" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create validation set" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The first step will be to create a dataset from our files. We need to separate a definite amount of files to be used as our validation set. We will do this randomly by setting a percentage apart, in this case 0.2." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "classes = [\"airplanes\", \"Motorbikes\", \"BACKGROUND_Google\", \"Faces\", \"watch\", \"Leopards\", \"bonsai\",\n", " \"car_side\", \"ketch\", \"chandelier\", \"hawksbill\", \"grand_piano\", \"brain\", \"butterfly\", \"helicopter\", \"menorah\",\n", " \"trilobite\", \"starfish\", \"kangaroo\", \"sunflower\", \"ewer\", \"buddha\", \"scorpion\", \"revolver\", \"laptop\", \"ibis\", \"llama\",\n", " \"minaret\", \"umbrella\", \"electric_guitar\", \"crab\", \"crayfish\",]\n", "\n", "np.random.seed(42)\n", "train_ds,valid_ds = ImageDataset.from_folder(PATH, test_pct=0.2)\n", "\n", "x = train_ds[1114][0]\n", "def xi(): return Image(train_ds[1114][0])\n", "classes = train_ds.classes\n", "c = len(classes)\n", "\n", "len(train_ds),len(valid_ds),c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rectangular affine fix" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(x, figsize=(6,3), hide_axis=False)\n", "print(x.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rot_m = np.array(rotate.func(40.)); rot_m" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rotate(xi(), 40.).show(figsize=(6,3))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def affine_mult(c,m):\n", " if m is None: return c\n", " size = c.size()\n", " _,h,w,_ = size\n", " m[0,1] *= h/w\n", " m[1,0] *= w/h\n", " c = c.view(-1,2)\n", " c = torch.addmm(m[:2,2], c, m[:2,:2].t())\n", " return c.view(size)\n", "\n", "nb_002.affine_mult = affine_mult" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rotate(xi(), 40.).show(figsize=(6,3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Crop with padding" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we are going to add padding or crop automatically according to a desired final size. The best way to do this is to integrate both transforms into the same function. \n", "\n", "We will do the padding necessary to achieve a _size x size_ (square) image. If _size_ is greater than either the height or width dimension of our image, we know we will need to add padding. If _size_ is smaller than either _height_ or _width_ dimension of our image, we will have to crop. We might have to do one, the other, both or neither. In this example we are only adding padding since both our _height_ and _width_ are smaller than 300, our desired dimension for the new _height_ and _width_.\n", "\n", "As is the case with our original function, we can add a *row_pct* or *col_pct* to our transform to focus on different parts of the image instead of the center which is our default.\n", "\n", "**Crop_pad**\n", "\n", "Crop_pad crops and pads our image to create an output image according to a given target size.\n", "\n", "_Parameters_\n", "\n", "1. **Size** What is the target size of each side in pixels. If only one number *s* is specified, image is made square with dimensions *s* \\* *s*.\n", "\n", " Domain: Positive integers.\n", " \n", "2. **Padding_mode** What is the type of padding used in the transform.\n", " \n", " Domain: 'reflect', 'zeros', 'border'\n", " \n", "3. **Row_pct** Determines where to cut our image vertically on the bottom and top when cropping (which rows are left out). If <0.5, more rows will be cut in the top than in the bottom and viceversa (varies linearly).\n", "\n", " Domain: Real numbers between 0 and 1.\n", " \n", "4. **Col_pct** Determines where to cut our image horizontally on the left and right when cropping (which columns are left out). If <0.5, more rows will be cut in the left than in the right and viceversa (varies linearly).\n", "\n", " Domain: Real numbers between 0 and 1.\n", "\n", "Note: While experimenting take into account that this example image contains a thin black border in the original. This affects our transforms and can be seen when we use reflect padding." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "class TfmCrop(TfmPixel): order=99\n", "\n", "@TfmCrop\n", "def crop_pad(x, size, padding_mode='reflect',\n", " row_pct:uniform = 0.5, col_pct:uniform = 0.5):\n", " size = listify(size,2)\n", " rows,cols = size\n", " if x.size(1)= (h_o, w_o) which is exactly what we want (a larger area). Conversely if we want to **pad**, we will equate the dimension with **the largest ratio** since that will guarantee that (h_m, w_m) <= (h_o, w_o) (a smaller area).\n", "\n", "As an example say we have our image with dimensions h_i = 192 and w_i = 128 and our target dimensions are h_o=160 w_o=320. That is, we have to turn a vertical rectangle into a horizontal rectangle. We can do this in to ways:\n", "\n", "1. Padding the borders so we make our image wider\n", "2. Cropping the top and bottom so we squash our image and make it wider\n", "\n", "If we intend to crop, our intermediate dimensions will be (h_m, w_m) = (480, 320). If we intend to pad (h_m, w_m) = (160, 107). Note that 480/320 ≈ 160/107 ≈ 192/128, that is our intermidiate image's aspect ratio is always equal to our input image's aspect ratio." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def round_multiple(x, mult): return (int(x/mult+0.5)*mult)\n", "\n", "def get_crop_target(target_px, target_aspect=None, mult=32):\n", " target_px = listify(target_px, 2)\n", " target_r,target_c = target_px\n", " if target_aspect:\n", " target_r = math.sqrt(target_r*target_c/target_aspect)\n", " target_c = target_r*target_aspect\n", " return round_multiple(target_r,mult),round_multiple(target_c,mult)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_crop_target(220)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_crop_target((220,110))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "crop_target = get_crop_target(220, 2.);\n", "target_r,target_c = crop_target\n", "crop_target, target_r*target_c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,r,c = x.shape; x.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "@partial(Transform, order=99)\n", "def crop_pad(img, size=None, mult=32, padding_mode=None,\n", " row_pct:uniform = 0.5, col_pct:uniform = 0.5):\n", " aspect = img.aspect if hasattr(img, 'aspect') else 1.\n", " if not size and hasattr(img, 'size'): size = img.size\n", " if not padding_mode:\n", " if hasattr(img, 'sample_kwargs') and ('padding_mode' in img.sample_kwargs):\n", " padding_mode = img.sample_kwargs['padding_mode']\n", " else: padding_mode='reflect'\n", " if padding_mode=='zeros': padding_mode='constant'\n", "\n", " rows,cols = get_crop_target(size, aspect, mult=mult)\n", " x = img.px\n", " if x.size(1)= (h_o, w_o) which is exactly what we want (a larger area). Conversely if we want to **pad**, we will equate the dimension with **the largest ratio** since that will guarantee that (h_m, w_m) <= (h_o, w_o) (a smaller area).\n", "\n", "As an example say we have our image with dimensions h_i = 192 and w_i = 128 and our target dimensions are h_o=160 w_o=320. That is, we have to turn a vertical rectangle into a horizontal rectangle. We can do this in to ways:\n", "\n", "1. Padding the borders so we make our image wider\n", "2. Cropping the top and bottom so we squash our image and make it wider\n", "\n", "If we intend to crop, our intermediate dimensions will be (h_m, w_m) = (480, 320). If we intend to pad (h_m, w_m) = (160, 107). Note that 480/320 ≈ 160/107 ≈ 192/128." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r_ratio = r/target_r\n", "c_ratio = c/target_c\n", "# min -> crop; max -> pad\n", "ratio = max(r_ratio,c_ratio)\n", "r_ratio,c_ratio,ratio" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "r2,c2 = round(r/ratio),round(c/ratio); r2,c2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def get_resize_target(img, crop_target, do_crop=False):\n", " if crop_target is None: return None\n", " ch,r,c = img.shape\n", " target_r,target_c = crop_target\n", " ratio = (min if do_crop else max)(r/target_r, c/target_c)\n", " return ch,round(r/ratio),round(c/ratio)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_resize_target(x, crop_target, False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_resize_target(x, crop_target, True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "@partial(Transform, order=TfmAffine.order-2)\n", "def resize_image(x, *args, **kwargs): return x.resize(*args, **kwargs)\n", "\n", "def _resize(self, size=None, do_crop=False, mult=32):\n", " assert self._flow is None\n", " if not size and hasattr(self, 'size'): size = self.size\n", " aspect = self.aspect if hasattr(self, 'aspect') else None\n", " crop_target = get_crop_target(size, aspect, mult=mult)\n", " target = get_resize_target(self, crop_target, do_crop)\n", " self.flow = affine_grid(target)\n", " return self\n", "\n", "Image.resize=_resize" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img = xi()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img.aspect = 2\n", "img.resize(220)\n", "img.show(figsize=(9,3))\n", "img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img = xi()\n", "img.aspect = 2\n", "img.resize(220, do_crop=True)\n", "img.show(figsize=(9,3))\n", "img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def is_listy(x)->bool: return isinstance(x, (tuple,list))\n", "\n", "def apply_tfms(tfms, x, do_resolve=True, xtra=None, aspect=None, size=None,\n", " padding_mode='reflect', **kwargs):\n", " if not tfms: return x\n", " if not xtra: xtra={}\n", " tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)\n", " if do_resolve: resolve_tfms(tfms)\n", " x = Image(x.clone())\n", " x.set_sample(padding_mode=padding_mode, **kwargs)\n", " x.aspect = aspect\n", " x.size = size\n", " \n", " for tfm in tfms:\n", " if tfm.tfm in xtra: x = tfm(x, **xtra[tfm.tfm])\n", " x = tfm(x)\n", " return x.px\n", "\n", "nb_002.apply_tfms = apply_tfms\n", "\n", "import nb_002b\n", "nb_002b.apply_tfms = apply_tfms" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [resize_image(size=crop_target),\n", " rotate(degrees=(40.,40.))]\n", "\n", "img = apply_tfms(tfms, x)\n", "show_image(img, figsize=(6,3))\n", "crop_target,img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [resize_image(size=crop_target, do_crop=True),\n", " rotate(degrees=(40.,40.))]\n", "\n", "img = apply_tfms(tfms, x, aspect=2)\n", "show_image(img, figsize=(6,3))\n", "img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [resize_image(size=220),\n", " rotate(degrees=(40.,40.))]\n", "\n", "img = apply_tfms(tfms, x, aspect=2)\n", "show_image(img, figsize=(6,3))\n", "get_crop_target(220, 2),img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [rotate(degrees=(40.,40.)), crop_pad(size=220)]\n", "\n", "img = apply_tfms(tfms, x, aspect=2)\n", "show_image(img, figsize=(6,3))\n", "img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [rotate(degrees=(40.,40.)),\n", " resize_image(),\n", " crop_pad()]\n", "\n", "img = apply_tfms(tfms, x, aspect=2, size=220)\n", "show_image(img, figsize=(6,3))\n", "get_crop_target(220,2), img.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def resize_crop(size=None, do_crop=False, mult=32, rand_crop=False):\n", " crop_kw = {'row_pct':(0,1.),'col_pct':(0,1.)} if rand_crop else {}\n", " return [resize_image(size=size, do_crop=do_crop, mult=mult),\n", " crop_pad(size=size, mult=mult, **crop_kw)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [rotate(degrees=(40.,40.)), *resize_crop()]\n", "\n", "img = apply_tfms(tfms, x, aspect=2, size=220)\n", "show_image(img, figsize=(6,3))\n", "get_crop_target(220,2), img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [rotate(degrees=(40.,40.)), *resize_crop(do_crop=True)]\n", "img = apply_tfms(tfms, x, size=220, aspect=2)\n", "show_image(img, figsize=(6,3))\n", "img.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [rotate(degrees=(40.,40.)), *resize_crop(do_crop=False)]\n", "img = apply_tfms(tfms, x, size=220, aspect=2, padding_mode='zeros')\n", "show_image(img, figsize=(6,3))\n", "img.shape" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's see how our transforms look for different values of zoom, rotate and crop_pad." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transform" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def rand_zoom(*args, **kwargs): return zoom(*args, row_pct=(0,1), col_pct=(0,1), **kwargs)\n", "def rand_crop(*args, **kwargs): return crop_pad(*args, row_pct=(0,1), col_pct=(0,1), **kwargs)\n", "def zoom_crop(scale, do_rand=False, p=1.0):\n", " zoom_fn = rand_zoom if do_rand else zoom\n", " crop_fn = rand_crop if do_rand else crop_pad\n", " return [zoom_fn(scale=scale, p=p), crop_fn()]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [\n", " rotate(degrees=(-20,20.)),\n", " rand_zoom(scale=(1.,1.95)),\n", " *resize_crop(size=100, rand_crop=True, do_crop=False)\n", "]\n", "\n", "_,axes = plt.subplots(1,4, figsize=(12,3))\n", "for ax in axes.flat:\n", " show_image(apply_tfms(tfms, x, padding_mode='zeros'), ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfms = [\n", " rotate(degrees=(-20,20.)),\n", " rand_zoom(scale=(1.,1.95)),\n", " *resize_crop(size=100, rand_crop=True, do_crop=True)\n", "]\n", "\n", "_,axes = plt.subplots(1,4, figsize=(12,3))\n", "for ax in axes.flat:\n", " show_image(apply_tfms(tfms, x), ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fit" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, with our choice of transforms and parameters we are going to fit our Darknet model and check our results. To fit our model we will need to resize our images to have the same size so we can feed them in batches to our model. We face the same decisions as before. \n", "\n", "In this case we chose to pad our images (since in \\_apply_affine do_crop default is False). If we wanted to crop instead, we can easily add do_crop=True to train_tds. \n", "\n", "We also decided to make our images square, with dimension size x size. If we wanted a rectangle with width to height ratio *a* we could have added aspect=*a* to train_ds." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "[PIL.Image.open(fn).size for fn in np.random.choice(train_ds.x, 5)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "size = 150" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_tfms = [\n", " rotate(degrees=(-20,20.)),\n", " rand_zoom(scale=(1.,1.5)),\n", " *resize_crop(size=size, rand_crop=True, do_crop=True)\n", "]\n", "valid_tfms = [\n", " *resize_crop(size=size, rand_crop=False, do_crop=True)\n", "]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(1,4, figsize=(10,5))\n", "for ax in axes.flat: show_image(apply_tfms(train_tfms, x), ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(apply_tfms(valid_tfms, x, size=size))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bs = 128" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_tds = DatasetTfm(valid_ds, valid_tfms, padding_mode='zeros')\n", "data = DataBunch(valid_tds, valid_tds, bs=bs, num_workers=0)\n", "xb,yb = next(iter(data.train_dl))\n", "b = xb.transpose(1,0).reshape(3,-1)\n", "data_mean=b.mean(1).cpu()\n", "data_std=b.std(1).cpu()\n", "data_mean,data_std" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image_batch(data.train_dl, train_ds.classes, 4)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "valid_tds = DatasetTfm(valid_ds, valid_tfms, padding_mode='zeros')\n", "train_tds = DatasetTfm(train_ds, train_tfms, padding_mode='zeros')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "norm,denorm = normalize_funcs(data_mean,data_std)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = DataBunch(train_tds, valid_tds, bs=bs, num_workers=12, tfms=norm)\n", "len(data.train_dl),len(data.valid_dl)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = Darknet([1, 2, 4, 4, 2], num_classes=c, nf=16)\n", "learn = Learner(data, model)\n", "opt_fn = partial(optim.SGD, momentum=0.9)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(1, 0.1, opt_fn=opt_fn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(1, 0.2, opt_fn=opt_fn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(5, 0.4, opt_fn=opt_fn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(5, 0.1, opt_fn=opt_fn)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "learn.fit(5, 0.01, opt_fn=opt_fn)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Fin" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }