{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from nb_002 import *\n", "\n", "import operator\n", "from random import sample\n", "from torch.utils.data.sampler import Sampler" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA_PATH = Path('data')\n", "PATH = DATA_PATH/'caltech101'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Caltech 101" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "classes = [\"airplanes\", \"Motorbikes\", \"BACKGROUND_Google\", \"Faces\", \"watch\", \"Leopards\", \"bonsai\",\n", " \"car_side\", \"ketch\", \"chandelier\", \"hawksbill\", \"grand_piano\", \"brain\", \"butterfly\", \"helicopter\", \"menorah\",\n", " \"trilobite\", \"starfish\", \"kangaroo\", \"sunflower\", \"ewer\", \"buddha\", \"scorpion\", \"revolver\", \"laptop\", \"ibis\", \"llama\",\n", " \"minaret\", \"umbrella\", \"electric_guitar\", \"crab\", \"crayfish\",]\n", "\n", "np.random.seed(42)\n", "train_ds,valid_ds = ImageDataset.from_folder(PATH, test_pct=0.2)\n", "\n", "x = train_ds[1][0]\n", "classes = train_ds.classes\n", "c = len(classes)\n", "\n", "len(train_ds),len(valid_ds),c" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(x, figsize=(6,3), hide_axis=False)\n", "print(x.shape)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "rot_m = np.array(rotate(40.)); rot_m" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def affine_grid(x, matrix, size=None):\n", " h,w = x.shape[1:]\n", " if size is None: size=x.shape\n", " matrix[0,1] *= h/w; matrix[1,0] *= w/h\n", " return F.affine_grid(matrix[None,:2], torch.Size((1,)+size))\n", "\n", "import nb_002\n", "nb_002.affine_grid = affine_grid" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(apply_affine(rot_m)(x), figsize=(6,3))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The goal is to replicate the [RandomResizedCrop function](https://pytorch.org/docs/stable/torchvision/transforms.html#torchvision.transforms.RandomResizedCrop) from torchvision. First we take a crop of the picture with a random size and aspect ratio and then we resize it (via interpolation) to the desired output size. \n", "\n", "The _scale_ parameter tells the function what percentage of the original area do we want the cropped image to have and _ratio_ tells the function what aspect ratio do we want the image to have. With these two parameters we can imply a new height and width and we can proceed to randomize the crop in the image (how much do we crop from the top, bottom, left and right borders to achieve our desired dimensions). If either the new width or the new size is bigger than our original, we will discard those dimensions since then we would have to pad. We make 10 attempts to get a new width or size, if none work out we just make it square by cropping equally from both sides of the longest side." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_crop(t, scale, ratio):\n", " for attempt in range(10):\n", " area = t.size(1) * t.size(2)\n", " target_area = random.uniform(*scale) * area\n", " aspect_ratio = random.uniform(*ratio)\n", "\n", " w = int(round(math.sqrt(target_area * aspect_ratio)))\n", " h = int(round(math.sqrt(target_area / aspect_ratio)))\n", "\n", " if random.random() < 0.5: w, h = h, w # not sure why this happens\n", "\n", " if w <= t.size(2) and h <= t.size(1):\n", " i = random.randint(0, t.size(1) - h)\n", " j = random.randint(0, t.size(2) - w)\n", " return np.s_[:,i:i+h,j:j+w]\n", "\n", " # Fallback\n", " w = min(t.size(1), t.size(2))\n", " i = (t.size(1) - w) // 2\n", " j = (t.size(2) - w) // 2\n", " return np.s_[:,i:i+h,j:j+w]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_crop(x, (0.08,1.), (3./4.,4./3.))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(4,4,figsize=(8,8))\n", "for ax in axs.flatten():\n", " crop_slice = get_crop(x, (0.08,1.), (3./4.,4./3.))\n", " y = x[crop_slice]\n", " y = F.interpolate(y[None], size=(224,224), mode='bilinear')\n", " show_image(y[0], ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. With a start tfm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "One way to implement this transformation in the pipeline is to create a transform of type start and then go through with the rest of the pipeline with the target size.\n", "\n", "Note that scale, ratio and invert are arrays in the function that are supposed to have a size corresponding to the number of attempts before fallback. Also note that after this transform we use interpolation (in this case bilinear interpolation) to resize the output image to a standard size.\n", "\n", "**Crop_with_ratio**\n", "\n", "Crop_with_ratio is a transform that cuts a series of pixels from an image and outputs an image with a target area and a target aspect ratio.\n", "\n", "_Parameters_\n", "\n", "1. **Scale** What percentage of the original area do we want the cropped image to have.\n", "\n", " Domain: Real numbers from 0 to 1.\n", " \n", " \n", "2. **Ratio** What aspect ratio do we want the image to have.\n", "\n", " Domain: Positive real numbers\n", " \n", " \n", "3. **Invert** Whether we want to invert the target number of rows and target number of columns implied by *Scale* and *Ratio* (rows=columns and columns=rows).\n", "\n", " Domain: 0 or 1.\n", " \n", " \n", "4. **Row_pct** Determines where to cut our image vertically on the bottom and top (which rows are left out). If <0.5, more rows will be cut in the top than in the bottom and viceversa (varies linearly).\n", "\n", " Domain: Real numbers between 0 and 1.\n", " \n", "5. **Col_pct** Determines where to cut our image horizontally on the left and right (which columns are left out). If <0.5, more rows will be cut in the left than in the right and viceversa (varies linearly).\n", "\n", " Domain: Real numbers between 0 and 1.\n", " \n", "Our three parameters are related with the following equations:\n", "\n", "1. output_rows = [**row_pct***(input_rows-**target_height**):**target_height**+**row_pct***(input_rows-**target_height**)]\n", "\n", "2. output_cols = [**col_pct***(input_cols-**target_width**):**target_width**+**col_pct***(input_cols-**target_width**)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@reg_transform\n", "def crop_with_ratio(x, scale:uniform, ratio:uniform, invert:rand_bool, row_pct:uniform, col_pct:uniform) -> TfmType.Start:\n", " #scale, ratio and invert are supposed to have a size corresponding to the number of attempts before fallback.\n", " for s,r,i in zip(scale, ratio, invert):\n", " area = x.size(1) * x.size(2)\n", " target_area = area * s\n", " cols = int(round(math.sqrt(target_area * r)))\n", " rows = int(round(math.sqrt(target_area / r)))\n", "\n", " if i: cols,rows = rows,cols\n", "\n", " if cols <= x.size(2) and rows <= x.size(1):\n", " row = int((x.size(1)-rows+1)*row_pct)\n", " col = int((x.size(2)-cols+1)*col_pct)\n", " return x[:, row:row+rows, col:col+cols].contiguous()\n", " # Fallback\n", " rows = min(x.size(1), x.size(2))\n", " row = (x.size(1) - rows) // 2\n", " col = (x.size(2) - rows) // 2\n", " return x[:, row:row+rows, col:col+rows].contiguous()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "random_resized_crop = crop_with_ratio_tfm(scale=(0.08,1.,10), ratio=(0.75,1.33,10),invert=(0.5,10),\n", " row_pct=(0,1.), col_pct=(0,1.))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(4,4,figsize=(8,12))\n", "for ax in axs.flatten():\n", " #Crop\n", " y = random_resized_crop()(x)\n", " #Then resize to the output size.\n", " y = F.interpolate(y[None], size=(224,224), mode='bilinear')\n", " show_image(y[0], ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(4,4,figsize=(8,12))\n", "for ax in axs.flatten():\n", " y = apply_tfms([random_resized_crop])(x, size=(3,224,224))\n", " show_image(y, ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.The affine way" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Not working yet**" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Cropping to a specific ratio is just an affine transformation that zooms in and squishes the picture in a given way. The randomness of the crop corresponds to a zoom center different from (0,0). So all of this (including the interpolation step) can be replaced with an affine transformation (then coupled with others, like a rotation).\n", "\n", "**Zoom_squish**\n", "\n", "Zoom_squish squishes an image and then performs a random zoom into the image.\n", "\n", "_Parameters_\n", "\n", "1. **Scale** / **TODO**\n", "\n", " Domain: Real numbers from 0 to 1.\n", " \n", " \n", "2. **Squish** / **TODO**\n", "\n", " Domain: Positive real numbers\n", " \n", " \n", "3. **Invert** / **TODO**\n", "\n", " Domain: True or False.\n", " \n", " \n", "4. **Row_pct** / **TODO**\n", "\n", " Domain: Real numbers between 0 and 1.\n", " \n", "5. **Col_pct** / **TODO**\n", "\n", " Domain: Real numbers between 0 and 1." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@reg_affine\n", "def zoom_squish(scale: uniform = 1.0, squish: uniform=1.0, invert: rand_bool = False, \n", " row_pct:uniform = 0.5, col_pct:uniform = 0.5) -> TfmType.Affine:\n", " for s,r, i in zip(scale,squish, invert):\n", " s,r = math.sqrt(s),math.sqrt(r)\n", " if s * r <= 1 and s / r < 1:\n", " w,h = (s/r, s*r) if i else (s*r,s/r)\n", " col_c = (1-w) * (2*col_pct - 1)\n", " row_c = (1-h) * (2*row_pct - 1)\n", " return [[w, 0, col_c],\n", " [0, h, row_c],\n", " [0, 0, 1. ]]\n", " return [[1, 0, 0.],\n", " [0, 1, 0.],\n", " [0, 0, 1.]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "random_resized_crop = zoom_squish_tfm(scale=(0.08,1.,10), squish=(0.75,1.33, 10), invert=(0.5,10), row_pct=(0,1.), col_pct=(0,1.))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fig, axs = plt.subplots(4,4,figsize=(8,12))\n", "for ax in axs.flatten():\n", " #Crop\n", " y = apply_tfms([random_resized_crop])(x, size=(3,224,224))\n", " show_image(y, ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Deterministic RandomResizedCrop" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Finally, we are going to test the validity of our implementations. If our implementations are right, a deterministic crop with ratio function, our original crop_with_ratio and zoom_squish should result in the same tranformation (if we use the same parameters)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.size()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "area = x.size(1) * x.size(2)\n", "target_area = 0.5 * area\n", "aspect_ratio = 0.8\n", "w = int(round(math.sqrt(target_area * aspect_ratio)))\n", "h = int(round(math.sqrt(target_area / aspect_ratio)))\n", "w,h" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def crop_v1(img):\n", " area = img.size(1) * img.size(2)\n", " target_area = 0.5 * area\n", " aspect_ratio = 0.8\n", "\n", " w = int(round(math.sqrt(target_area * aspect_ratio)))\n", " h = int(round(math.sqrt(target_area / aspect_ratio)))\n", "\n", " w, h = h, w\n", "\n", " i = int(0.2 * (img.size(1) - h))\n", " j = int(0.4 * (img.size(2) - w))\n", " x = img[:,i:i+h, j:j+w]\n", " return F.interpolate(x[None], size=(224,224), mode='bilinear')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(crop_v1(x)[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def crop_v2(img):\n", " x = crop_with_ratio(img, [0.5], [0.8], [True], 0.2, 0.4)\n", " x = F.interpolate(x[None], size=(224,224), mode='bilinear')\n", " return x[0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(crop_v2(x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orig_ratio = math.sqrt(x.size(2)/x.size(1))\n", "orig_ratio" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def get_zoom_mat(sw, sh, c, r):\n", " return [[sw, 0, c],\n", " [0, sh, r],\n", " [0, 0, 1.]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "@reg_affine\n", "def zoom_squish1(scale: uniform = 1.0, squish: uniform=1.0, invert: rand_bool = False, \n", " row_pct:uniform = 0.5, col_pct:uniform = 0.5) -> TfmType.Affine:\n", " for s,r, i in zip(scale,squish, invert):\n", " s,r = math.sqrt(s),math.sqrt(r)\n", " if s * r <= 1 and s / r < 1:\n", " w,h = (s/r, s*r) if i else (s*r,s/r)\n", " w /= orig_ratio\n", " h *= orig_ratio\n", " col_c = (1-w) * (2*col_pct - 1)\n", " row_c = (1-h) * (2*row_pct - 1)\n", " return get_zoom_mat(w, h, col_c, row_c)\n", "\n", " if orig_ratio > 1: return get_zoom_mat(1/orig_ratio**2, 1, 0, 0.)\n", " else: return get_zoom_mat(1, orig_ratio**2, 0, 0.)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def crop_v3(img):\n", " return apply_affine(zoom_squish1([0.5], [0.8], [True], 0.2, 0.4))(img, size=(3,224,224))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(crop_v3(x))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }