{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "Copyright (c) Microsoft Corporation. All rights reserved.\n", "\n", "Licensed under the MIT License." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Image annotation UI" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Open-source annotation tools for object detection and for image segmentation exist, however for image classification are less common. When there is only one object per image, labeling can be done by moving images manually into separate folders for each image class. This stategy however is manual, and does not work when it's possible to have multiple different objects in a single image. For such cases, either this notebook can be used, or e.g. this cloud-based [labeling tool](https://docs.microsoft.com/en-us/azure/machine-learning/service/how-to-label-images).\n", "\n", "This notebook provides a simple UI to assist in labeling images. Each image can be annotated with one or more classes or be marked as \"Exclude\" to indicate that the image should not be used for model training or evaluation. " ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Ensure edits to libraries are loaded and plotting is shown in the notebook.\n", "%reload_ext autoreload\n", "%autoreload 2\n", "%matplotlib inline" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "import os\n", "import sys\n", "\n", "import scrapbook as sb\n", "\n", "sys.path.append(\"../../\")\n", "from utils_cv.classification.widget import AnnotationWidget\n", "from utils_cv.classification.data import Urls\n", "from utils_cv.common.data import unzip_url" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Set the location of the images to annotate and path to save the annotations. Here `unzip_url` is used to download example data if not already present and set the path.\n", "\n", "See the [FAQ.md](../FAQ.md) for a brief discussion on how to scrape images from the internet." ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [ "parameters" ] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Using images in directory: /data/home/pabuehle/Desktop/ComputerVision/data/fridgeObjectsTiny/can.\n" ] } ], "source": [ "IM_DIR = os.path.join((unzip_url(Urls.fridge_objects_tiny_path, exist_ok=True)), 'can')\n", "ANNO_PATH = \"cvbp_ic_annotation.txt\"\n", "print(f\"Using images in directory: {IM_DIR}.\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Start the UI. Check the \"Allow multi-class labeling\" box to allow for images to be annotated with multiple classes. When in doubt what the annotation for an image should be, or for any other reason (e.g. blur or over-exposure), mark an image as \"EXCLUDE\". All annotations are saved to (and loaded from) a pandas dataframe with path specified in `anno_path`. Note that the toy dataset in this notebook only contains images of cans. \n", "\n", "" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "67d4f7c8d01d4eacbd437b0c1637bd6e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Tab(children=(VBox(children=(HBox(children=(Button(description='Previous', layout=Layout(width='80px'), style=…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "w_anno_ui = AnnotationWidget(\n", " labels=[\"can\", \"carton\", \"milk_bottle\", \"water_bottle\"],\n", " im_dir=IM_DIR,\n", " anno_path=ANNO_PATH,\n", " im_filenames=None, # Set to None to annotate all images in IM_DIR\n", ")\n", "\n", "display(w_anno_ui.show())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Below is an example how to create a fast.ai `ImageList` object using the ground truth annotations generated by the `AnnotationWidget`. Fast.ai does not support the `Exclude` flag, hence we handle this by removing these images before calling the `from_df()` and `label_from_df()` functions. \n", "\n", "For this example, we create a toy annotation file at `example_annotation.csv` rather than using `ANNO_PATH`. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Overwriting example_annotation.csv\n" ] } ], "source": [ "%%writefile example_annotation.csv\n", "IM_FILENAME\tEXCLUDE\tLABELS\n", "10.jpg\tFalse\tcan\n", "12.jpg\tFalse\tcan,carton\n", "13.jpg\tTrue\t\n", "14.jpg\tFalse\tcarton\n", "15.jpg\tFalse\tcarton,milk_bottle\n", "18.jpg\tFalse\tcan\n", "19.jpg\tTrue\t\n", "20.jpg\tFalse\tcan" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
namelabel
010.jpgcan
112.jpgcan,carton
214.jpgcarton
315.jpgcarton,milk_bottle
418.jpgcan
520.jpgcan
\n", "
" ], "text/plain": [ " name label\n", "0 10.jpg can\n", "1 12.jpg can,carton\n", "2 14.jpg carton\n", "3 15.jpg carton,milk_bottle\n", "4 18.jpg can\n", "5 20.jpg can" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "LabelLists;\n", "\n", "Train: LabelList (3 items)\n", "x: ImageList\n", "Image (3, 665, 499),Image (3, 665, 499),Image (3, 665, 499)\n", "y: MultiCategoryList\n", "carton,carton;milk_bottle,can\n", "Path: /data/home/pabuehle/Desktop/ComputerVision/data/fridgeObjectsTiny/can;\n", "\n", "Valid: LabelList (3 items)\n", "x: ImageList\n", "Image (3, 665, 499),Image (3, 665, 499),Image (3, 665, 499)\n", "y: MultiCategoryList\n", "can,can,can;carton\n", "Path: /data/home/pabuehle/Desktop/ComputerVision/data/fridgeObjectsTiny/can;\n", "\n", "Test: None\n" ] } ], "source": [ "import pandas as pd\n", "\n", "from fastai.vision import ImageList, ImageDataBunch\n", "\n", "\n", "# Load annotation, discard excluded images, and convert to format fast.ai expects\n", "data = []\n", "with open(\"example_annotation.csv\", \"r\") as f:\n", " for line in f.readlines()[1:]:\n", " vec = line.strip().split(\"\\t\")\n", " exclude = vec[1] == \"True\"\n", " if not exclude and len(vec) > 2:\n", " data.append((vec[0], vec[2]))\n", "\n", "df = pd.DataFrame(data, columns=[\"name\", \"label\"])\n", "display(df)\n", "\n", "data = (\n", " ImageList.from_df(path=IM_DIR, df=df)\n", " .split_by_rand_pct(valid_pct=0.5)\n", " .label_from_df(cols=\"label\", label_delim=\",\")\n", ")\n", "print(data)" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/scrapbook.scrap.json+json": { "data": 6, "encoder": "json", "name": "num_images", "version": 1 } }, "metadata": { "scrapbook": { "data": true, "display": false, "name": "num_images" } }, "output_type": "display_data" } ], "source": [ "# Preserve some of the notebook outputs\n", "num_images = len(data.valid) + len(data.train)\n", "sb.glue(\"num_images\", num_images)" ] } ], "metadata": { "kernelspec": { "display_name": "Python (cv)", "language": "python", "name": "cv" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.8" } }, "nbformat": 4, "nbformat_minor": 2 }