{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%reload_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "from nb_003 import *\n",
    "import nb_002\n",
    "\n",
    "import operator\n",
    "from random import sample\n",
    "from torch.utils.data.sampler import Sampler"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_PATH = Path('data')\n",
    "PATH = DATA_PATH/'caltech101' # http://www.vision.caltech.edu/Image_Datasets/Caltech101/"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Caltech 101"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create validation set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The first step will be to create a dataset from our files. We need to separate a definite amount of files to be used as our validation set. We will do this randomly by setting a percentage apart, in this case 0.2."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "classes = [\"airplanes\", \"Motorbikes\", \"BACKGROUND_Google\", \"Faces\", \"watch\", \"Leopards\", \"bonsai\",\n",
    "    \"car_side\", \"ketch\", \"chandelier\", \"hawksbill\", \"grand_piano\", \"brain\", \"butterfly\", \"helicopter\", \"menorah\",\n",
    "    \"trilobite\", \"starfish\", \"kangaroo\", \"sunflower\", \"ewer\", \"buddha\", \"scorpion\", \"revolver\", \"laptop\", \"ibis\", \"llama\",\n",
    "    \"minaret\", \"umbrella\", \"electric_guitar\", \"crab\", \"crayfish\",]\n",
    "\n",
    "np.random.seed(42)\n",
    "train_ds,valid_ds = ImageDataset.from_folder(PATH, test_pct=0.2)\n",
    "\n",
    "x = train_ds[1114][0]\n",
    "def xi(): return Image(train_ds[1114][0])\n",
    "classes = train_ds.classes\n",
    "c = len(classes)\n",
    "\n",
    "len(train_ds),len(valid_ds),c"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Rectangular affine fix"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image(x, figsize=(6,3), hide_axis=False)\n",
    "print(x.shape)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rot_m = np.array(rotate.func(40.)); rot_m"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rotate(xi(), 40.).show(figsize=(6,3))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def affine_mult(c,m):\n",
    "    if m is None: return c\n",
    "    size = c.size()\n",
    "    _,h,w,_ = size\n",
    "    m[0,1] *= h/w\n",
    "    m[1,0] *= w/h\n",
    "    c = c.view(-1,2)\n",
    "    c = torch.addmm(m[:2,2], c,  m[:2,:2].t())\n",
    "    return c.view(size)\n",
    "\n",
    "nb_002.affine_mult = affine_mult"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "rotate(xi(), 40.).show(figsize=(6,3))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Crop with padding"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now we are going to add padding or crop automatically according to a desired final size. The best way to do this is to integrate both transforms into the same function. \n",
    "\n",
    "We will do the padding necessary to achieve a _size x size_ (square) image. If _size_ is greater than either the height or width dimension of our image, we know we will need to add padding. If _size_ is smaller than either _height_ or _width_ dimension of our image, we will have to crop. We might have to do one, the other, both or neither. In this example we are only adding padding since both our _height_ and _width_ are smaller than 300, our desired dimension for the new _height_ and _width_.\n",
    "\n",
    "As is the case with our original function, we can add a *row_pct* or *col_pct* to our transform to focus on different parts of the image instead of the center which is our default.\n",
    "\n",
    "**Crop_pad**\n",
    "\n",
    "Crop_pad crops and pads our image to create an output image according to a given target size.\n",
    "\n",
    "_Parameters_\n",
    "\n",
    "1. **Size** What is the target size of each side in pixels. If only one number *s* is specified, image is made square with dimensions *s* \\* *s*.\n",
    "\n",
    "    Domain: Positive integers.\n",
    "    \n",
    "2. **Padding_mode** What is the type of padding used in the transform.\n",
    "    \n",
    "    Domain: 'reflect', 'zeros', 'border'\n",
    "    \n",
    "3. **Row_pct** Determines where to cut our image vertically on the bottom and top when cropping (which rows are left out). If <0.5, more rows will be cut in the top than in the bottom and viceversa  (varies linearly).\n",
    "\n",
    "    Domain: Real numbers between 0 and 1.\n",
    "    \n",
    "4. **Col_pct** Determines where to cut our image horizontally on the left and right when cropping (which columns are left out). If <0.5, more rows will be cut in the left than in the right and viceversa (varies linearly).\n",
    "\n",
    "    Domain: Real numbers between 0 and 1.\n",
    "\n",
    "Note: While experimenting take into account that this example image contains a thin black border in the original. This affects our transforms and can be seen when we use reflect padding."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "class TfmCrop(TfmPixel): order=99\n",
    "\n",
    "@TfmCrop\n",
    "def crop_pad(x, size, padding_mode='reflect',\n",
    "             row_pct:uniform = 0.5, col_pct:uniform = 0.5):\n",
    "    size = listify(size,2)\n",
    "    rows,cols = size\n",
    "    if x.size(1)<rows or x.size(2)<cols:\n",
    "        row_pad = max((rows-x.size(1)+1)//2, 0)\n",
    "        col_pad = max((cols-x.size(2)+1)//2, 0)\n",
    "        x = F.pad(x[None], (col_pad,col_pad,row_pad,row_pad), mode=padding_mode)[0]\n",
    "    row = int((x.size(1)-rows+1)*row_pct)\n",
    "    col = int((x.size(2)-cols+1)*col_pct)\n",
    "\n",
    "    x = x[:, row:row+rows, col:col+cols]\n",
    "    return x.contiguous() # without this, get NaN later - don't know why"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "crop_pad(xi(), 300, row_pct=0.,col_pct=0., padding_mode='constant').show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "crop_pad(xi(), 150).show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "crop_pad(xi(), 150, row_pct=0.,col_pct=0.98, padding_mode='constant').show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfm = crop_pad(size=100, row_pct=(0,1.), col_pct=(0,1.))\n",
    "\n",
    "_,axes = plt.subplots(1,4, figsize=(12,3))\n",
    "for ax in axes.flat:\n",
    "    tfm.resolve()\n",
    "    tfm(xi()).show(ax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combine crop/resize"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Next, we are going to combine our cropping and padding with the resize operation. In other words, we will get a picture, and crop/pad it in such a way that we get our desired size. It is similar to our previous transform only this time the final dimensions don't have to be square. This gives us more flexibility since our network architecture might take rectangular pictures as input.\n",
    "\n",
    "First, we will get the target dimensions. For this we have built *get_crop_target*. This function takes three arguments: a target_px, a target_aspect and a multiple. *target_px* is our base dimension, *target_aspect* is our relation between width and height and _mult_ is what do we need our dimensions to be a multiple of. \n",
    "\n",
    "To understand this better, let's take our example where our values are *target_px*=220, *target_aspect*=2., _mult_=32 (default). In plain text we are telling our function: return the dimensions that meet a ~220\\*220 area image with a width twice as long as the height and where height and width are multiples of 32."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are now going to transform our image to our desired dimensions by using crop or padding. Before we crop or pad we will make an intermediate transform that will allow us to later get our output image with the desired dimensions. Let's call our initial dimensions h_i, w_i, our intermediate dimensions h_m, w_m and our output dimensions h_o, w_o. Our objective will be to get our output image by either cropping or padding, but not both. \n",
    "\n",
    "We will first enlarge or reduce our original image. *get_resize_target* will enlarge or reduce our input image (keeping the shape or h_i/w_i constant) until one of the dimensions is equal to the corresponding final output dimension (i.e. h_m=h_o or w_m=w_o). But how does it know which dimension to equate? Let's think about this in detail. \n",
    "\n",
    "If we intend to crop, our intermediate image's area has to be larger than our output image (since we are going to crop out some pixels) and if we intend to pad, our intermediate image's area has to be smaller than our output image (since we will add some pixels). This means that the dimension we will chose to equate will depend on the relationship between the ratios h_i/h_o and w_i/w_o. If we want to **crop** we will want to equate the dimension with **the smallest ratio** since that would mean that (h_m, w_m) >= (h_o, w_o) which is exactly what we want (a larger area). Conversely if we want to **pad**, we will equate the dimension with **the largest ratio** since that will guarantee that (h_m, w_m) <= (h_o, w_o) (a smaller area).\n",
    "\n",
    "As an example say we have our image with dimensions h_i = 192 and w_i = 128 and our target dimensions are h_o=160 w_o=320. That is, we have to turn a vertical rectangle into a horizontal rectangle. We can do this in to ways:\n",
    "\n",
    "1. Padding the borders so we make our image wider\n",
    "2. Cropping the top and bottom so we squash our image and make it wider\n",
    "\n",
    "If we intend to crop, our intermediate dimensions will be (h_m, w_m) = (480, 320). If we intend to pad (h_m, w_m) = (160, 107). Note that 480/320 ≈ 160/107 ≈ 192/128, that is our intermidiate image's aspect ratio is always equal to our input image's aspect ratio."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def round_multiple(x, mult): return (int(x/mult+0.5)*mult)\n",
    "\n",
    "def get_crop_target(target_px, target_aspect=None, mult=32):\n",
    "    target_px = listify(target_px, 2)\n",
    "    target_r,target_c = target_px\n",
    "    if target_aspect:\n",
    "        target_r = math.sqrt(target_r*target_c/target_aspect)\n",
    "        target_c = target_r*target_aspect\n",
    "    return round_multiple(target_r,mult),round_multiple(target_c,mult)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_crop_target(220)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_crop_target((220,110))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "crop_target = get_crop_target(220, 2.);\n",
    "target_r,target_c = crop_target\n",
    "crop_target, target_r*target_c"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_,r,c = x.shape; x.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "@partial(Transform, order=99)\n",
    "def crop_pad(img, size=None, mult=32, padding_mode=None,\n",
    "             row_pct:uniform = 0.5, col_pct:uniform = 0.5):\n",
    "    aspect = img.aspect if hasattr(img, 'aspect') else 1.\n",
    "    if not size and hasattr(img, 'size'): size = img.size\n",
    "    if not padding_mode:\n",
    "        if hasattr(img, 'sample_kwargs') and ('padding_mode' in img.sample_kwargs):\n",
    "            padding_mode = img.sample_kwargs['padding_mode']\n",
    "        else: padding_mode='reflect'\n",
    "    if padding_mode=='zeros': padding_mode='constant'\n",
    "\n",
    "    rows,cols = get_crop_target(size, aspect, mult=mult)\n",
    "    x = img.px\n",
    "    if x.size(1)<rows or x.size(2)<cols:\n",
    "        row_pad = max((rows-x.size(1)+1)//2, 0)\n",
    "        col_pad = max((cols-x.size(2)+1)//2, 0)\n",
    "        x = F.pad(x[None], (col_pad,col_pad,row_pad,row_pad), mode=padding_mode)[0]\n",
    "    row = int((x.size(1)-rows+1)*row_pct)\n",
    "    col = int((x.size(2)-cols+1)*col_pct)\n",
    "\n",
    "    x = x[:, row:row+rows, col:col+cols]\n",
    "    img.px = x.contiguous() # without this, get NaN later - don't know why\n",
    "    return img"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = xi()\n",
    "img.aspect = 2\n",
    "img = crop_pad(img, 220)\n",
    "img.show(figsize=(9,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We are now going to transform our image to our desired dimensions by using crop or padding. Before we crop or pad we will make an intermediate transform that will allow us to later get our output image with the desired dimensions. Let's call our initial dimensions h_i, w_i, our intermediate dimensions h_m, w_m and our output dimensions h_o, w_o.\n",
    "\n",
    "Our objective will be to get our output image by cropping or padding but not both. To achive this, we will first enlarge or reduce our original image. **get_resize_target will enlarge or reduce our input image (keeping the shape or h_i/w_i constant) until one of the dimensions is equal to the corresponding final output dimension (i.e. h_m=h_o or w_m=w_o)**. But how does it know which dimension to equate? We can figure this out intuitively. If we intend to crop, our intermediate image's area has to be larger than our output image (since we are going to crop out some pixels) and if we intend to pad, our intermediate image's area has to be smaller than our output image (since we will add some pixels). This means that the dimension we will chose to equate will depend on the relationship between the ratios h_i/h_0 and w_i/w_o. If we want to **crop** we will want to equate the dimension with **the smallest ratio** since that would mean that (h_m, w_m) >= (h_o, w_o) which is exactly what we want (a larger area). Conversely if we want to **pad**, we will equate the dimension with **the largest ratio** since that will guarantee that (h_m, w_m) <= (h_o, w_o) (a smaller area).\n",
    "\n",
    "As an example say we have our image with dimensions h_i = 192 and w_i = 128 and our target dimensions are h_o=160 w_o=320. That is, we have to turn a vertical rectangle into a horizontal rectangle. We can do this in to ways:\n",
    "\n",
    "1. Padding the borders so we make our image wider\n",
    "2. Cropping the top and bottom so we squash our image and make it wider\n",
    "\n",
    "If we intend to crop, our intermediate dimensions will be (h_m, w_m) = (480, 320). If we intend to pad (h_m, w_m) = (160, 107). Note that 480/320 ≈ 160/107 ≈ 192/128."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r_ratio = r/target_r\n",
    "c_ratio = c/target_c\n",
    "# min -> crop; max -> pad\n",
    "ratio = max(r_ratio,c_ratio)\n",
    "r_ratio,c_ratio,ratio"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "r2,c2 = round(r/ratio),round(c/ratio); r2,c2"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def get_resize_target(img, crop_target, do_crop=False):\n",
    "    if crop_target is None: return None\n",
    "    ch,r,c = img.shape\n",
    "    target_r,target_c = crop_target\n",
    "    ratio = (min if do_crop else max)(r/target_r, c/target_c)\n",
    "    return ch,round(r/ratio),round(c/ratio)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_resize_target(x, crop_target, False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "get_resize_target(x, crop_target, True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "@partial(Transform, order=TfmAffine.order-2)\n",
    "def resize_image(x, *args, **kwargs): return x.resize(*args, **kwargs)\n",
    "\n",
    "def _resize(self, size=None, do_crop=False, mult=32):\n",
    "    assert self._flow is None\n",
    "    if not size and hasattr(self, 'size'): size = self.size\n",
    "    aspect = self.aspect if hasattr(self, 'aspect') else None\n",
    "    crop_target = get_crop_target(size, aspect, mult=mult)\n",
    "    target = get_resize_target(self, crop_target, do_crop)\n",
    "    self.flow = affine_grid(target)\n",
    "    return self\n",
    "\n",
    "Image.resize=_resize"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = xi()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img.aspect = 2\n",
    "img.resize(220)\n",
    "img.show(figsize=(9,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "img = xi()\n",
    "img.aspect = 2\n",
    "img.resize(220, do_crop=True)\n",
    "img.show(figsize=(9,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def is_listy(x)->bool: return isinstance(x, (tuple,list))\n",
    "\n",
    "def apply_tfms(tfms, x, do_resolve=True, xtra=None, aspect=None, size=None,\n",
    "               padding_mode='reflect', **kwargs):\n",
    "    if not tfms: return x\n",
    "    if not xtra: xtra={}\n",
    "    tfms = sorted(listify(tfms), key=lambda o: o.tfm.order)\n",
    "    if do_resolve: resolve_tfms(tfms)\n",
    "    x = Image(x.clone())\n",
    "    x.set_sample(padding_mode=padding_mode, **kwargs)\n",
    "    x.aspect = aspect\n",
    "    x.size = size\n",
    "    \n",
    "    for tfm in tfms:\n",
    "        if tfm.tfm in xtra: x = tfm(x, **xtra[tfm.tfm])\n",
    "        x = tfm(x)\n",
    "    return x.px\n",
    "\n",
    "nb_002.apply_tfms = apply_tfms\n",
    "\n",
    "import nb_002b\n",
    "nb_002b.apply_tfms = apply_tfms"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [resize_image(size=crop_target),\n",
    "        rotate(degrees=(40.,40.))]\n",
    "\n",
    "img = apply_tfms(tfms, x)\n",
    "show_image(img, figsize=(6,3))\n",
    "crop_target,img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [resize_image(size=crop_target, do_crop=True),\n",
    "        rotate(degrees=(40.,40.))]\n",
    "\n",
    "img = apply_tfms(tfms, x, aspect=2)\n",
    "show_image(img, figsize=(6,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [resize_image(size=220),\n",
    "        rotate(degrees=(40.,40.))]\n",
    "\n",
    "img = apply_tfms(tfms, x, aspect=2)\n",
    "show_image(img, figsize=(6,3))\n",
    "get_crop_target(220, 2),img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [rotate(degrees=(40.,40.)), crop_pad(size=220)]\n",
    "\n",
    "img = apply_tfms(tfms, x, aspect=2)\n",
    "show_image(img, figsize=(6,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [rotate(degrees=(40.,40.)),\n",
    "        resize_image(),\n",
    "        crop_pad()]\n",
    "\n",
    "img = apply_tfms(tfms, x, aspect=2, size=220)\n",
    "show_image(img, figsize=(6,3))\n",
    "get_crop_target(220,2), img.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "def resize_crop(size=None, do_crop=False, mult=32, rand_crop=False):\n",
    "    crop_kw = {'row_pct':(0,1.),'col_pct':(0,1.)} if rand_crop else {}\n",
    "    return [resize_image(size=size, do_crop=do_crop, mult=mult),\n",
    "           crop_pad(size=size, mult=mult, **crop_kw)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [rotate(degrees=(40.,40.)), *resize_crop()]\n",
    "\n",
    "img = apply_tfms(tfms, x, aspect=2, size=220)\n",
    "show_image(img, figsize=(6,3))\n",
    "get_crop_target(220,2), img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [rotate(degrees=(40.,40.)), *resize_crop(do_crop=True)]\n",
    "img = apply_tfms(tfms, x, size=220, aspect=2)\n",
    "show_image(img, figsize=(6,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [rotate(degrees=(40.,40.)), *resize_crop(do_crop=False)]\n",
    "img = apply_tfms(tfms, x, size=220, aspect=2, padding_mode='zeros')\n",
    "show_image(img, figsize=(6,3))\n",
    "img.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's see how our transforms look for different values of zoom, rotate and crop_pad."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Transform"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "#export\n",
    "def rand_zoom(*args, **kwargs): return zoom(*args, row_pct=(0,1), col_pct=(0,1), **kwargs)\n",
    "def rand_crop(*args, **kwargs): return crop_pad(*args, row_pct=(0,1), col_pct=(0,1), **kwargs)\n",
    "def zoom_crop(scale, do_rand=False, p=1.0):\n",
    "    zoom_fn = rand_zoom if do_rand else zoom\n",
    "    crop_fn = rand_crop if do_rand else crop_pad\n",
    "    return [zoom_fn(scale=scale, p=p), crop_fn()]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [\n",
    "    rotate(degrees=(-20,20.)),\n",
    "    rand_zoom(scale=(1.,1.95)),\n",
    "    *resize_crop(size=100, rand_crop=True, do_crop=False)\n",
    "]\n",
    "\n",
    "_,axes = plt.subplots(1,4, figsize=(12,3))\n",
    "for ax in axes.flat:\n",
    "    show_image(apply_tfms(tfms, x, padding_mode='zeros'), ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "tfms = [\n",
    "    rotate(degrees=(-20,20.)),\n",
    "    rand_zoom(scale=(1.,1.95)),\n",
    "    *resize_crop(size=100, rand_crop=True, do_crop=True)\n",
    "]\n",
    "\n",
    "_,axes = plt.subplots(1,4, figsize=(12,3))\n",
    "for ax in axes.flat:\n",
    "    show_image(apply_tfms(tfms, x), ax)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Fit"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, with our choice of transforms and parameters we are going to fit our Darknet model and check our results. To fit our model we will need to resize our images to have the same size so we can feed them in batches to our model. We face the same decisions as before. \n",
    "\n",
    "In this case we chose to pad our images (since in \\_apply_affine do_crop default is False). If we wanted to crop instead, we can easily add do_crop=True to train_tds. \n",
    "\n",
    "We also decided to make our images square, with dimension size x size. If we wanted a rectangle with width to height ratio *a* we could have added aspect=*a* to train_ds."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "[PIL.Image.open(fn).size for fn in np.random.choice(train_ds.x, 5)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "size = 150"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_tfms = [\n",
    "    rotate(degrees=(-20,20.)),\n",
    "    rand_zoom(scale=(1.,1.5)),\n",
    "    *resize_crop(size=size, rand_crop=True, do_crop=True)\n",
    "]\n",
    "valid_tfms = [\n",
    "    *resize_crop(size=size, rand_crop=False, do_crop=True)\n",
    "]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "_,axes = plt.subplots(1,4, figsize=(10,5))\n",
    "for ax in axes.flat: show_image(apply_tfms(train_tfms, x), ax)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image(apply_tfms(valid_tfms, x, size=size))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "bs = 128"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "valid_tds = DatasetTfm(valid_ds, valid_tfms, padding_mode='zeros')\n",
    "data = DataBunch(valid_tds, valid_tds, bs=bs, num_workers=0)\n",
    "xb,yb = next(iter(data.train_dl))\n",
    "b = xb.transpose(1,0).reshape(3,-1)\n",
    "data_mean=b.mean(1).cpu()\n",
    "data_std=b.std(1).cpu()\n",
    "data_mean,data_std"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "show_image_batch(data.train_dl, train_ds.classes, 4)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "valid_tds = DatasetTfm(valid_ds, valid_tfms, padding_mode='zeros')\n",
    "train_tds = DatasetTfm(train_ds, train_tfms, padding_mode='zeros')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "norm,denorm = normalize_funcs(data_mean,data_std)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "data = DataBunch(train_tds, valid_tds, bs=bs, num_workers=12, tfms=norm)\n",
    "len(data.train_dl),len(data.valid_dl)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = Darknet([1, 2, 4, 4, 2], num_classes=c, nf=16)\n",
    "learn = Learner(data, model)\n",
    "opt_fn = partial(optim.SGD, momentum=0.9)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn.fit(1, 0.1, opt_fn=opt_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn.fit(1, 0.2, opt_fn=opt_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn.fit(5, 0.4, opt_fn=opt_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn.fit(5, 0.1, opt_fn=opt_fn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "learn.fit(5, 0.01, opt_fn=opt_fn)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Fin"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}