{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# ipyannotate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Widget for Jupyter Notebook to check and annotate data. A simplified version of Prodigy for Jupyter Notebook.\n", "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Examples of markup tasks that can be solved using `ipyannotate`:\n", "- Assigning one or more topics to a group of news articles\n", "- Filtering out toxic reviews from your dataset\n", "- Marking adult content from a set of screenshots\n", "- Given a NER markup, count the number of errors grouped by type\n", "\n", "Examples of tasks for which `ipyannotate` is not suitable:\n", "- Drawing masks or selecting background vs foreground\n", "- Given a set of portraits, detecting eyes/ears/nose/etc\n", "- Other types of tasks which involved graphically annotating an image\n", "- Given a text, markup spans with names, locations, organizations.\n", "\n", "`ipyannotate` is well-suited for conveniently placing one or several labels for a set of objects (pictures, texts, anything that Jupyter can display).\n", "\n", "There is a number of tools that do the same thing: Prodigy, Yandex.Toloka, for example. `ipyannotate` is a Jupyter widget, and there are three reasons why it is convenient:\n", "\n", "1. Input, output\n", "\n", "Often the whole process of working with data occurs inside Jupyter Notebook. It is inconvenient to format the data in a json or xml file, send it to another service, upload and parse the results. With `ipyannotate` you can mark up the data directly in Jupyter:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2a35d9287a28406c8b2dd8e4e5a75363", "version_major": 2, "version_minor": 0 }, "text/html": [ "

Failed to display Jupyter Widget of type Annotation.

\n", "

\n", " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", " that the widgets JavaScript is still loading. If this message persists, it\n", " likely means that the widgets JavaScript library is either not installed or\n", " not enabled. See the Jupyter\n", " Widgets Documentation for setup instructions.\n", "

\n", "

\n", " If you're reading this message in another frontend (for example, a static\n", " rendering on GitHub or NBViewer),\n", " it may mean that your frontend doesn't currently support widgets.\n", "

\n" ], "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, , , , , , , ]), toolbar=Toolbar(buttons=[ValueButton(color='gray', icon=None, label='even'), ValueButton(color='gray', icon=None, label='odd'), BackButton(color='gray', icon='← ', label='back', shortcut='k'), NextButton(color='gray', icon='→ ', label='next', shortcut='j')]))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from ipyannotate import annotate\n", "from ipyannotate.buttons import (\n", " ValueButton as Button,\n", " NextButton as Next,\n", " BackButton as Back\n", ")\n", "\n", "data = [1, 15, 62, 33, 83, 12, 949, 71]\n", "annotation = annotate(data, buttons=[Button('even'), Button('odd'), Back(), Next()])\n", "annotation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
Animation recorded with RecordIt
\n", "" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Task(output=1, value=odd),\n", " Task(output=15, value=odd),\n", " Task(output=62, value=even),\n", " Task(output=33, value=odd),\n", " Task(output=83, value=odd),\n", " Task(output=12, value=even),\n", " Task(output=949, value=odd),\n", " Task(output=71, value=odd)]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annotation.tasks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2. Data Presentation\n", "\n", "Jupyter has a convenient data presentation system. If the result of the command is a graph, the user sees the picture, not `
` like in a regular console. This functionality is available in `ipyannotate`, for example, you can conveniently mark out the images:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "b49dcd1bd9d049e59dac94d52014b7cb", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from glob import glob\n", "from PIL import Image\n", "\n", "data = [Image.open(_) for _ in glob('i/dogs_cats/*.jpg')]\n", "annotation = annotate(data, buttons=[Button('dog'), Button('cat'), Back(), Next()])\n", "annotation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "3. Interactivity\n", "\n", "When marking texts by category, all possible topics are often not known in advance. Roughly speaking, before starting work, you cannot set a list of buttons in `ipyannotate.annotate`. Widgets are interactive, so you can add a button on the go. Also, for example, intermediate markup results can be stored, `annotation.tasks` is updated in memory interactively." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "56dc8cff03894eb5a4973dcf53a0a7a4", "version_major": 2, "version_minor": 0 }, "text/html": [ "

Failed to display Jupyter Widget of type Annotation.

\n", "

\n", " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", " that the widgets JavaScript is still loading. If this message persists, it\n", " likely means that the widgets JavaScript library is either not installed or\n", " not enabled. See the Jupyter\n", " Widgets Documentation for setup instructions.\n", "

\n", "

\n", " If you're reading this message in another frontend (for example, a static\n", " rendering on GitHub or NBViewer),\n", " it may mean that your frontend doesn't currently support widgets.\n", "

\n" ], "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, , , , , , , , ]), toolbar=Toolbar(buttons=[ValueButton(color='gray', icon=None, label='place'), ValueButton(color='gray', icon=None, label='hotel'), BackButton(color='gray', icon='← ', label='back', shortcut='k'), NextButton(color='gray', icon='→ ', label='next', shortcut='j')]))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data = [\n", " 'tours to turkey',\n", " 'lake garda september',\n", " 'holets at black sea',\n", " 'Smartline Konaktepe Hotel 4*',\n", " 'amsterdam where to go',\n", " 'hotel dream in a forest',\n", " 'kabardinka where to go',\n", " 'dubovichi restaurant',\n", " 'dali museum'\n", "]\n", "\n", "buttons = [\n", " Button('place'),\n", " Button('hotel')\n", "]\n", "controls = [\n", " Back(),\n", " Next()\n", "]\n", "annotation = annotate(data, buttons=buttons + controls)\n", "annotation" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "fun = Button('fun')\n", "annotation.toolbar.buttons = buttons + [fun] + controls" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The main user interface is the `annotate` function, it has one required argument` tasks` — a list of tasks, and three optional: `buttons`,` display` and `multi`" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "67eaea9f1a744e90af85fd99c6e76077", "version_major": 2, "version_minor": 0 }, "text/html": [ "

Failed to display Jupyter Widget of type Annotation.

\n", "

\n", " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", " that the widgets JavaScript is still loading. If this message persists, it\n", " likely means that the widgets JavaScript library is either not installed or\n", " not enabled. See the Jupyter\n", " Widgets Documentation for setup instructions.\n", "

\n", "

\n", " If you're reading this message in another frontend (for example, a static\n", " rendering on GitHub or NBViewer),\n", " it may mean that your frontend doesn't currently support widgets.\n", "

\n" ], "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, , , , , , , , , ]), toolbar=Toolbar(buttons=[OkButton(color='green', icon='👌', label='ok', shortcut='1'), ErrorButton(color='red', icon='❌', label='err', shortcut='2'), BackButton(color='gray', icon='← ', label='back', shortcut='k'), NextButton(color='gray', icon='→ ', label='next', shortcut='j')]))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data = range(10)\n", "\n", "annotate(data, buttons=None, display=None, multi=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, `buttons` is a list of four: \"ok\", \"err\", \"back\", \"next\". This set is suitable for simple verification of the results of the work of any classifier. The user can specify his own buttons, customize colors and shortcuts:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e8fb50e2b54d487481e1944a983ddc77", "version_major": 2, "version_minor": 0 }, "text/html": [ "

Failed to display Jupyter Widget of type Annotation.

\n", "

\n", " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", " that the widgets JavaScript is still loading. If this message persists, it\n", " likely means that the widgets JavaScript library is either not installed or\n", " not enabled. See the Jupyter\n", " Widgets Documentation for setup instructions.\n", "

\n", "

\n", " If you're reading this message in another frontend (for example, a static\n", " rendering on GitHub or NBViewer),\n", " it may mean that your frontend doesn't currently support widgets.\n", "

\n" ], "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, , , , , , , , , ]), toolbar=Toolbar(buttons=[ValueButton(color='blue', icon='½ ', label='divisible by 2', shortcut='1'), ValueButton(color='red', icon='⅓ ', label='by 3', shortcut='2'), ValueButton(color='green', icon='⅕ ', label='by 5', shortcut='3'), BackButton(color='gray', icon='← ', label='back', shortcut='k'), NextButton(color='gray', icon='→ ', label='next', shortcut='j')]))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from ipyannotate.buttons import (\n", " ValueButton as Button,\n", " BackButton as Back,\n", " NextButton as Next\n", ")\n", "\n", "buttons = [\n", " Button(2, label='divisible by 2', color='blue', icon='½ ', shortcut='1'),\n", " Button(3, label='by 3', color='red', icon='⅓ ', shortcut='2'),\n", " Button(5, label='by 5', color='green', icon='⅕ ', shortcut='3'),\n", "]\n", "controls = [\n", " Back(),\n", " Next()\n", "]\n", "annotate(data, buttons=buttons + controls)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "When `multi = False`, the widget is in radiobutton mode, the user can specify only one label for the object. If `multi = True` you can specify several labels:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "11dfd0517f454275b987685cffb44068", "version_major": 2, "version_minor": 0 }, "text/html": [ "

Failed to display Jupyter Widget of type Annotation.

\n", "

\n", " If you're reading this message in the Jupyter Notebook or JupyterLab Notebook, it may mean\n", " that the widgets JavaScript is still loading. If this message persists, it\n", " likely means that the widgets JavaScript library is either not installed or\n", " not enabled. See the Jupyter\n", " Widgets Documentation for setup instructions.\n", "

\n", "

\n", " If you're reading this message in another frontend (for example, a static\n", " rendering on GitHub or NBViewer),\n", " it may mean that your frontend doesn't currently support widgets.\n", "

\n" ], "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, , , , , , , , , ]), toolbar=MultiToolbar(buttons=[ValueButton(color='blue', icon='½ ', label='divisible by 2', shortcut='1'), ValueButton(color='red', icon='⅓ ', label='by 3', shortcut='2'), ValueButton(color='green', icon='⅕ ', label='by 5', shortcut='3'), BackButton(color='gray', icon='← ', label='back', shortcut='k'), NextButton(color='gray', icon='→ ', label='next', shortcut='j')]))" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "buttons = [\n", " Button(2, label='divisible by 2', color='blue', icon='½ ', shortcut='1'),\n", " Button(3, label='by 3', color='red', icon='⅓ ', shortcut='2'),\n", " Button(5, label='by 5', color='green', icon='⅕ ', shortcut='3'),\n", "]\n", "controls = [\n", " Back(),\n", " Next()\n", "]\n", "annotation = annotate(data, buttons=buttons + controls, multi=True)\n", "annotation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If `multi=True` then the `value` of `annotation.tasks` will be `set`:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[MultiTask(output=0, value={2, 3, 5}),\n", " MultiTask(output=1, value=set()),\n", " MultiTask(output=2, value={2}),\n", " MultiTask(output=3, value={3}),\n", " MultiTask(output=4, value={2}),\n", " MultiTask(output=5, value={5}),\n", " MultiTask(output=6, value={2, 3}),\n", " MultiTask(output=7, value=set()),\n", " MultiTask(output=8, value={2}),\n", " MultiTask(output=9, value={3})]" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "annotation.tasks" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "By default, the standard `IPython.display` is used for output. The user can specify his function, for example, display the path to the file with the picture:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "fc51cc68236c42dbb48c575744ef4f0b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Annotation(canvas=OutputCanvas(), progress=Progress(atoms=[, …" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "data = {_: Image.open(_) for _ in glob('i/dogs_cats/*.jpg')}\n", "\n", "\n", "def display_item(item):\n", " from IPython.display import display\n", " \n", " path, image = item\n", " print(path)\n", " display(image)\n", "\n", "\n", "buttons = [\n", " Button('dog', shortcut='1'),\n", " Button('cat', shortcut='2'),\n", " Back(),\n", " Next()\n", "]\n", "annotation = annotate(data.items(), buttons=buttons, display=display_item)\n", "annotation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "See ipyannotate-examples repo for more examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 2 }