{ "cells": [ { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%reload_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "from nb_002b import *" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "DATA_PATH = Path('data')\n", "PATH = DATA_PATH/'cifar10_dog_air'\n", "TRAIN_PATH = PATH/'train'\n", "\n", "train_ds = ImageDataset.from_folder(PATH/'train')\n", "valid_ds = ImageDataset.from_folder(PATH/'test')\n", "\n", "x = train_ds[1][0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Perspective warping" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Perspective warping is about bending or curving the image. This allows us to create multiple new images which contain the object but seen from a different perspective, thus enriching our original dataset.\n", "Mathematically, warping is finding the function that maps one arbitrary 2D quadrilateral into another. \n", "\n", "We will apply the transformation:\n", "\n", "`\n", "(x',y') = ((a*x + b*y + c) / (g*x + h*y + 1), (d*x + e*y + f) / (g*x + h*y + 1))\n", "`\n", "\n", "to the coordinates, where (a,b,c,d,e,f,g,h) are 8 coefficients we need to find. To do this we solve a system of 8 equations given by where we want to send four points (with two coordinates each). Usually we use the four corners of the picture.\n", "\n", "The derivation of how to find what the values we need using PW is explained with mathematical detail [here](https://web.archive.org/web/20150222120106/xenia.media.mit.edu/~cwren/interpolator/) and [here](http://www.math.ubc.ca/~cass/graphics/Perspective.pdf)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "Point=Tuple[float,float]\n", "Points=Collection[Point]\n", "def find_coeffs(orig_pts:Points, targ_pts:Points)->Tensor:\n", " \"Find 8 coeff mentioned [here](https://web.archive.org/web/20150222120106/xenia.media.mit.edu/~cwren/interpolator/)\"\n", " matrix = []\n", " #The equations we'll need to solve.\n", " for p1, p2 in zip(targ_pts, orig_pts):\n", " matrix.append([p1[0], p1[1], 1, 0, 0, 0, -p2[0]*p1[0], -p2[0]*p1[1]])\n", " matrix.append([0, 0, 0, p1[0], p1[1], 1, -p2[1]*p1[0], -p2[1]*p1[1]])\n", "\n", " A = FloatTensor(matrix)\n", " B = FloatTensor(orig_pts).view(8)\n", " #The 8 scalars we seek are solution of AX = B\n", " return torch.gesv(B,A)[0][:,0]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "orig_pts1 = [[-1,-1], [1,-1], [-1,1], [1,1]]\n", "targ_pts1 = [[-1,-1], [1,-0.5], [-1,1], [1,1]]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "coeffs1 = find_coeffs(orig_pts1, targ_pts1); coeffs1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Apply perspective" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we find the coefficients we have to perform the transformation to get the new coordinates. Remember we have to do\n", "`\n", "(x',y') = ((a*x + b*y + c) / (g*x + h*y + 1), (d*x + e*y + f) / (g*x + h*y + 1))\n", "`\n", "to a lot of (x,y) coordinates that will be in a matrix c. \n", "\n", "Let's say c is of shape N \\* 2 to make it simple. If we add ones to the second dimension to make the matrix N \\* 3, and if we rewrite the coeffs in a matrix\n", "`\n", "[[a,b,c], [d,e,f], [g,h,1]]\n", "`\n", "then the matrix product c @ coeffs.t() will be N * 3, and it will be\n", "`\n", "[ax + by + c, dx + ey + f, gx + hy + 1]\n", "`\n", "With this matrix, we will just need to divide the first two columns by the last one to get the new coordinates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def add_ones(coords):\n", " coords = coords.view(-1,2)\n", " ones = torch.ones(coords.size(0)).unsqueeze(1)\n", " return torch.cat([coords, ones], 1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def apply_perspective(coords, coeffs):\n", " size = coords.size()\n", " #compress all the dims expect the last one ang adds ones, coords become N * 3\n", " coords = add_ones(coords)\n", " #Transform the coeffs in a 3*3 matrix with a 1 at the bottom left\n", " coeffs = torch.cat([coeffs, FloatTensor([1])]).view(3,3)\n", " coords = torch.mm(coords, coeffs.t())\n", " coords.mul_(1/coords[:,2].unsqueeze(1))\n", " return coords[:,:2].view(size)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "m = torch.eye(3)[:2]\n", "coords = F.affine_grid(m[None], torch.Size((1,) + x.shape))" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit res = apply_perspective(coords, coeffs1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = apply_perspective(coords, coeffs1)\n", "y = F.grid_sample(x.data[None], res)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The top right corner should be lowered by one quarter of the image." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "show_image(y[0])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x.show()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def apply_perspective(coords:FlowField, coeffs:Points)->FlowField:\n", " \"Transform `coords` with `coeffs`\"\n", " size = coords.size()\n", " #compress all the dims expect the last one ang adds ones, coords become N * 3\n", " coords = coords.view(-1,2)\n", " #Transform the coeffs in a 3*3 matrix with a 1 at the bottom left\n", " coeffs = torch.cat([coeffs, FloatTensor([1])]).view(3,3)\n", " coords = torch.addmm(coeffs[:,2], coords, coeffs[:,:2].t())\n", " coords.mul_(1/coords[:,2].unsqueeze(1))\n", " return coords[:,:2].view(size)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%timeit res = apply_perspective(coords, coeffs1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This version is a bit faster (instead of adding the ones, we do coords = coords * (first two columns).t() + last column" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "res = apply_perspective(coords, coeffs1)\n", "y = F.grid_sample(x.data[None], res)\n", "show_image(y[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Transforms" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "First thing we can try, moving all the corners by different bits. We implement this by adding a uniformly distributed random magnitude to each of our four corners.\n", "\n", "**Perspective warp**\n", "\n", "Maps one 2D quadrilateral into another one by modifying the four coordinates that define it.\n", "\n", "_Parameters_\n", "\n", "1. **Magnitude** Displacement of each coordinate.\n", "\n", " Domain: Real numbers.\n", " \n", "2. **Img_size** Size of our image in pixels.\n", "\n", " Domain: Positive integers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def perspective_warp(c, img_size, magnitude:uniform=0):\n", " magnitude = magnitude.view(4,2)\n", " orig_pts = [[-1,-1], [-1,1], [1,-1], [1,1]]\n", " targ_pts = [[x+m for x,m in zip(xs, ms)] for xs, ms in zip(orig_pts, magnitude)]\n", " coeffs = find_coeffs(orig_pts, targ_pts)\n", " return apply_perspective(c, coeffs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "_orig_pts = [[-1,-1], [-1,1], [1,-1], [1,1]]\n", "\n", "def _perspective_warp(c:FlowField, targ_pts:Points):\n", " \"Apply warp to `targ_pts` from `_orig_pts` to `c` `FlowField`\"\n", " return apply_perspective(c, find_coeffs(_orig_pts, targ_pts))\n", "\n", "@TfmCoord\n", "def perspective_warp(c, img_size, magnitude:partial(uniform,size=8)=0):\n", " \"Apply warp to `c` and with size `img_size` with `magnitude` amount\"\n", "\n", " magnitude = magnitude.view(4,2)\n", " targ_pts = [[x+m for x,m in zip(xs, ms)] for xs, ms in zip(_orig_pts, magnitude)]\n", " return _perspective_warp(c, targ_pts)\n", "\n", "@TfmCoord\n", "def symmetric_warp(c, img_size, magnitude:partial(uniform,size=4)=0):\n", " \"Apply warp to `c` with size `img_size` and `magnitude` amount\"\n", " m = listify(magnitude, 4)\n", " targ_pts = [[-1-m[3],-1-m[1]], [-1-m[2],1+m[1]], [1+m[3],-1-m[0]], [1+m[2],1+m[0]]]\n", " return _perspective_warp(c, targ_pts)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfm = perspective_warp(magnitude=(-0.4,0.4))\n", "_,axes = plt.subplots(4,4, figsize=(12,12))\n", "for ax in axes.flatten():\n", " apply_tfms(tfm, x, padding_mode='zeros').show(ax)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "tfm = symmetric_warp(magnitude=(-0.4,0.4))\n", "_,axes = plt.subplots(4,4, figsize=(12,12))\n", "for ax in axes.flatten():\n", " apply_tfms(tfm, x, padding_mode='zeros').show(ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To be a bit less messy, perspective wraps are of two type: tilt and skews. , skew changes one corner only.\n", "\n", "**Tilt**\n", "\n", "Changes the perspective from where we see the image (on the left, right, top or bottom)\n", "\n", "_Parameters_\n", "\n", "1. **Magnitude** Displacement of each coordinate.\n", "\n", " Domain: Real numbers.\n", " \n", "2. **Direction** Direction to where the image must be titled. 0 stands for right, 1 stands for left, 2 stands for top and 3 stands for bottom.\n", "\n", " Domain: [0, 1, 2, 3]\n", " \n", "3. **Img_size** Size of our image in pixels. \n", "\n", " Domain: Positive integers.\n", " \n", "**Skew** \n", "\n", "Skew warps the image by changing one of the coordinates of one corner.\n", "\n", "_Parameters_\n", "\n", "1. **Magnitude** Displacement of each coordinate.\n", "\n", " Domain: Real numbers.\n", " \n", "2. **Direction** Direction on how should the image be skewed.\n", "\n", " Domain: [0, 1, 2, 3, 4, 5, 6, 7]\n", " \n", "3. **Img_size** Size of our image in pixels.\n", "\n", " Domain: Positive integers." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "#export\n", "def rand_int(low:int,high:int)->int: return random.randint(low, high)\n", "\n", "@TfmCoord\n", "def tilt(c, img_size, direction:rand_int, magnitude:uniform=0):\n", " \"Tilt `c` field and resize to`img_size` with random `direction` and `magnitude`\"\n", " orig_pts = [[-1,-1], [-1,1], [1,-1], [1,1]]\n", " if direction == 0: targ_pts = [[-1,-1], [-1,1], [1,-1-magnitude], [1,1+magnitude]]\n", " elif direction == 1: targ_pts = [[-1,-1-magnitude], [-1,1+magnitude], [1,-1], [1,1]]\n", " elif direction == 2: targ_pts = [[-1,-1], [-1-magnitude,1], [1,-1], [1+magnitude,1]]\n", " elif direction == 3: targ_pts = [[-1-magnitude,-1], [-1,1], [1+magnitude,-1], [1,1]] \n", " coeffs = find_coeffs(orig_pts, targ_pts)\n", " return apply_perspective(c, coeffs)\n", "\n", "@TfmCoord\n", "def skew(c, img_size, direction:rand_int, magnitude:uniform=0):\n", " \"Skew `c` field and resize to`img_size` with random `direction` and `magnitude`\"\n", " orig_pts = [[-1,-1], [-1,1], [1,-1], [1,1]]\n", " if direction == 0: targ_pts = [[-1-magnitude,-1], [-1,1], [1,-1], [1,1]]\n", " elif direction == 1: targ_pts = [[-1,-1-magnitude], [-1,1], [1,-1], [1,1]]\n", " elif direction == 2: targ_pts = [[-1,-1], [-1-magnitude,1], [1,-1], [1,1]]\n", " elif direction == 3: targ_pts = [[-1,-1], [-1,1+magnitude], [1,-1], [1,1]]\n", " elif direction == 4: targ_pts = [[-1,-1], [-1,1], [1+magnitude,-1], [1,1]]\n", " elif direction == 5: targ_pts = [[-1,-1], [-1,1], [1,-1-magnitude], [1,1]]\n", " elif direction == 6: targ_pts = [[-1,-1], [-1,1], [1,-1], [1+magnitude,1]]\n", " elif direction == 7: targ_pts = [[-1,-1], [-1,1], [1,-1], [1,1+magnitude]] \n", " coeffs = find_coeffs(orig_pts, targ_pts)\n", " return apply_perspective(c, coeffs)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = train_ds[1][0]\n", "x.shape" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "img = open_image(DATA_PATH/'caltech101/airplanes/image_0054.jpg')\n", "img.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The four deterministic tilts, going to the back of the image on the first row, and to the front on the second one." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(2,4, figsize=(12,3))\n", "for i,ax in enumerate(axes.flatten()):\n", " magns = [-0.4,0.4]\n", " tilt(img.clone(), direction=i%4, magnitude=magns[i//4]\n", " ).set_sample(padding_mode='zeros').show(ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The 8 types of skew, again back or front." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "_,axes = plt.subplots(4,4, figsize=(12,6))\n", "for i,ax in enumerate(axes.flatten()):\n", " magns = [-0.4,0.4]\n", " skew(img.clone(), direction=i%8, magnitude=magns[i//8]\n", " ).set_sample(padding_mode='zeros').show(ax)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fin" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 2 }