{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Rectangular data loader" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from nb_003a import *\n", "\n", "from itertools import groupby" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA_PATH = Path('data')\n", "PATH = DATA_PATH/'caltech101'\n", "\n", "np.random.seed(42)\n", "train_ds,valid_ds = ImageDataset.from_folder(PATH, test_pct=0.2)\n", "\n", "x = train_ds[-1][0]\n", "classes = train_ds.classes\n", "c = len(classes)\n", "\n", "len(train_ds),len(valid_ds),c" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Closest ntile" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Since we are dealing with different sized rectangular images we need to standarize size to be able to train our network. We will start by comparing the dimension ratio for each of the images in our dataset. The dimension ratio is the height of our image divided by its width. Our final objective is to group images by dimension ratio and then standarize the dimensions for all images in the same group. This is useful because it means that when training we will be able to feed our network a standarized batch of images in each iteration. This does not mean all batches have to have the same dimensions (in fact, we will have a number of distinct dimensions equal to the number of distinct groups) but we do need images in a single batch to have the same dimensions. \n", "\n", "In the following example, we chose to divide our images in 5 groups so we defined 5 percentiles to do this: 2, 20, 50, 80, 98." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(train_ds[1][0], figsize=(6,3))\n", "x.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "asp_ratios = [operator.truediv(*PIL.Image.open(fn).size) for fn in train_ds.fns]\n", "asp_ratios[:4]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "asp_ntiles = np.percentile(asp_ratios, [2,20,50,80,98])\n", "asp_ntiles" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def closest_ntile(aspect, ntiles):\n", " return ntiles[np.argmin(abs(log(aspect)-log(ntiles)))]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "aspect = x.shape[2]/x.shape[1]\n", "nearest_aspect = closest_ntile(aspect, asp_ntiles)\n", "aspect,nearest_aspect" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "get_crop_target(128, nearest_aspect)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## SortAspectBatchSampler" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will now sort our images by aspect ratio and then group them by percentile group. We will build a Batch Sampler that samples images from the same group in a random way.\n", "\n", "Notice that our SortAspectBatchSampler returns the image number and its group aspect ratio. When we create our DataLoader, this extra parameter will be fed into our DatasetTfm function. It will enable our *crop_pad* transform defined in *003a_rect_images* to transform each of the images into their group aspect ratio so we finally have only 5 distinct aspect ratios in our dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "asp_nearests = [closest_ntile(o, asp_ntiles) for o in asp_ratios]\n", "asp_nearests[:10]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "bs=32" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sort_nearest = sorted(enumerate(asp_nearests), key=itemgetter(1))\n", "sort_nearest[:5]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "groups = [[(a,{'aspect':b}) for a,b in o] for _,o in groupby(sort_nearest, key=itemgetter(1))]\n", "len(groups)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "groups[0][:4]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "sum(math.ceil(len(g)/bs) for g in groups)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# TODO: use actual AR for shuffle=False, and use mean AR in __iter__" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "@dataclass\n", "class SortAspectBatchSampler(Sampler):\n", " ds:Dataset; bs:int; shuffle:bool = False\n", "\n", " def __post_init__(self):\n", " asp_ratios = [operator.truediv(*PIL.Image.open(img).size) for img in self.ds.fns]\n", " asp_ntiles = np.percentile(asp_ratios, [2,20,50,80,98])\n", " asp_nearests = [closest_ntile(o, asp_ntiles) for o in asp_ratios]\n", " sort_nearest = sorted(enumerate(asp_nearests), key=itemgetter(1))\n", " self.groups = [[(a,{'aspect':b}) for a,b in o]\n", " for _,o in groupby(sort_nearest, key=itemgetter(1))]\n", " self.n = sum(math.ceil(len(g)/self.bs) for g in self.groups)\n", " \n", " def __len__(self): return self.n\n", " \n", " def __iter__(self):\n", " if self.shuffle: groups = [sample(group, len(group)) for group in self.groups]\n", " else: groups = self.groups\n", " batches = [group[i:i+self.bs] for group in groups for i in range(0, len(group), self.bs)]\n", " if self.shuffle: batches = sample(batches, len(batches))\n", " return iter(batches)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "next(iter(SortAspectBatchSampler(train_ds, 4)))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "it = iter(SortAspectBatchSampler(train_ds, 4, True))\n", "next(it),next(it)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Rectangular dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will first build a function that applies the transforms to our raw images. We will then use a DataBunch function that returns a DataLoader which loads our transformed data in batches. This function will be integrated with our SortAspectBatchSampler so that the images on one batch are transformed not only to have the same dimensions (and by definition, the same aspect ratio)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "class DatasetTfm(Dataset):\n", " def __init__(self, ds: Dataset, tfms: Collection[Callable] = None, **kwargs):\n", " self.ds,self.tfms,self.kwargs = ds,tfms,kwargs\n", " \n", " def __len__(self): return len(self.ds)\n", " def __getattr__(self, k): return getattr(self.ds, k)\n", " \n", " def __getitem__(self,idx):\n", " if isinstance(idx, tuple): idx,xtra = idx\n", " else: xtra={}\n", " x,y = self.ds[idx]\n", " return apply_tfms(self.tfms, x, **{**self.kwargs, **xtra}), y" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def rand_zoom_crop(scale, size):\n", " return [rand_zoom(scale=scale), rand_crop(size=size)]\n", "\n", "def zoom_crop(scale, size):\n", " return [zoom(scale=scale), crop(size=size)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_tfms = [rotate(degrees=(-20,20.)),\n", " *rand_zoom_crop(scale=(1.,2.), size=150)]\n", "valid_tfms = [crop_pad(size=150)]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_tds = DatasetTfm(train_ds, train_tfms)\n", "valid_tds = DatasetTfm(valid_ds, valid_tfms)\n", "\n", "xtra = {'size':100}\n", "train_tds[(1,xtra)][0].shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(2,2, figsize=(8,6))\n", "for i,ax in enumerate(axes.flat): show_image(valid_tds[i][0], ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(2,2, figsize=(8,6))\n", "for ax in axes.flat: show_image(train_tds[1][0], ax, hide_axis=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(2,2, figsize=(8,6))\n", "for i,ax in enumerate(axes.flat):\n", " im = train_tds[(i, xtra)][0]\n", " print(im.shape)\n", " show_image(im, ax, hide_axis=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "train_tds = DatasetTfm(train_ds, train_tfms, size=100, padding_mode='zeros')\n", "valid_tds = DatasetTfm(valid_ds, valid_tfms, size=100, padding_mode='zeros')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "len(train_tds), len(valid_tds)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(2,2, figsize=(8,6))\n", "for i,ax in enumerate(axes.flat):\n", " im = train_tds[i][0]\n", " print(im.shape)\n", " show_image(im, ax, hide_axis=False)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "class DataBunch():\n", " def __init__(self, train_dl, valid_dl, device=None, **kwargs):\n", " self.device = default_device if device is None else device\n", " self.train_dl = DeviceDataLoader(train_dl, device=self.device, **kwargs)\n", " self.valid_dl = DeviceDataLoader(valid_dl, device=self.device, **kwargs)\n", "\n", " @classmethod\n", " def create(cls, train_ds, valid_ds, bs=64, device=None, num_workers=4, progress_func=tqdm,\n", " train_tfm=None, valid_tfm=None, sample_func=None, dl_tfms=None, **kwargs):\n", " if train_tfm is not None: train_tfm = DatasetTfm(train_ds, train_tfm, **kwargs)\n", " if valid_tfm is not None: valid_tfm = DatasetTfm(valid_ds, valid_tfm, **kwargs)\n", " if sample_func is None:\n", " train_dl = DataLoader(train_ds, bs, shuffle=True, num_workers=num_workers)\n", " valid_dl = DataLoader(valid_ds, bs*2, shuffle=False, num_workers=num_workers)\n", " else:\n", " train_samp = sample_func(train_ds, bs, True)\n", " valid_samp = sample_func(valid_ds, bs*2, False)\n", " train_dl = DataLoader(train_ds, num_workers=num_workers, batch_sampler=train_samp)\n", " valid_dl = DataLoader(valid_ds, num_workers=num_workers, batch_sampler=valid_samp)\n", " return cls(train_dl, valid_dl, device, tfms=dl_tfms, progress_func=progress_func)\n", " \n", " @property\n", " def train_ds(self): return self.train_dl.dl.dataset\n", " @property\n", " def valid_ds(self): return self.valid_dl.dl.dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = DataBunch.create(train_tds, valid_tds, bs, num_workers=8,\n", " train_tfm=train_tfms, valid_tfm=valid_tfms, size=100, padding_mode='zeros')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x,y = next(iter(data.train_dl))\n", "print(x[0].shape)\n", "\n", "_,axes = plt.subplots(2,4, figsize=(9,3))\n", "for i,ax in enumerate(axes.flat): show_image(x[i], ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data = DataBunch.create(train_tds, valid_tds, bs, num_workers=8, sample_func=SortAspectBatchSampler,\n", " train_tfm=train_tfms, valid_tfm=valid_tfms, padding_mode='zeros')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x,y = next(iter(data.train_dl))\n", "print(x[0].shape)\n", "_,axes = plt.subplots(2,4, figsize=(12,4))\n", "for i,ax in enumerate(axes.flat): show_image(x[i], ax, hide_axis=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## fin" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }