{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "k4bVoXnmx6Nz" }, "source": [ "This notebook is adapted from the Tensorflow\n", "research notebook\n", "[hosted on colab](https://colab.research.google.com/github/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb#scrollTo=nyHoF4mUrv5-)\n", "and\n", "[shared (r) at github](https://github.com/tensorflow/models/blob/master/research/object_detection/colab_tutorials/eager_few_shot_od_training_tf2_colab.ipynb)\n", "\n", "This notebook adds to the tutorial an option for android lawn statue images and tests on a YouTube video.\n", "\n", "The model configuration used by the original duckies tutorial is a TPU trained model.\n", "Other pretrained models to explore and more information:\n", "\n", "* https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/install.html#tensorflow-object-detection-api-installation\n", "* https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/auto_examples/object_detection_camera.html#sphx-glr-auto-examples-object-detection-camera-py\n", "* https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_training_and_evaluation.md\n" ] }, { "cell_type": "markdown", "metadata": { "id": "rOvvWAVTkMR7" }, "source": [ "# Eager Few Shot Object Detection Colab\n", "\n", "Welcome to the Eager Few Shot Object Detection Colab --- in this colab we demonstrate fine tuning of a (TF2 friendly) RetinaNet architecture on very few examples of a novel class after initializing from a pre-trained COCO checkpoint.\n", "Training runs in eager mode.\n", "\n", "Estimated time to run through this colab (with GPU): < 5 minutes." ] }, { "cell_type": "markdown", "metadata": { "id": "vPs64QA1Zdov" }, "source": [ "## Imports" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "LBZ9VWZZFUCT" }, "outputs": [], "source": [ "#!pip install -U --pre tensorflow==\"2.2.0\"" ] }, { "cell_type": "markdown", "source": [ "## Choose dataname by uncommenting one of the following:" ], "metadata": { "id": "CIRY3RTYg-HL" } }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "DV1wlZUZ2sAA" }, "outputs": [], "source": [ "'''\n", "duckies:\n", " this is adapted from the original tensorflow notebook (referenced above)\n", "\n", "statues:\n", " training data: 7 images, containing a mix of 8 android mascots\n", " test data: a short YouTube video of mavy android mascots\n", "\n", "gingerbread_man:\n", " training data: 3 images containing the gingerbread man mascot\n", " test data: 2 images containing the gingerbread man mascot\n", "\n", "gingerbread_man_2:\n", " training data: 5 images containing the gingerbread man mascot\n", " test data: a short YouTube video of mavy android mascots\n", "'''\n", "#dataname = \"duckies\"\n", "dataname = \"statues\"\n", "#dataname = \"gingerbread_man\"\n", "#dataname = \"gingerbread_man_2\"\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "oi28cqGGFWnY" }, "outputs": [], "source": [ "import os\n", "import pathlib\n", "import math\n", "\n", "# Clone the tensorflow models repository if it doesn't already exist\n", "if \"models\" in pathlib.Path.cwd().parts:\n", " while \"models\" in pathlib.Path.cwd().parts:\n", " os.chdir('..')\n", "elif not pathlib.Path('models').exists():\n", " !git clone --depth 1 https://github.com/tensorflow/models" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NwdsBdGhFanc" }, "outputs": [], "source": [ "# Install the Object Detection API\n", "#\n", "# if you are running a notebook in an environment missing protoc through protobuf,\n", "# there are several ways to install protoc. one is to download a binary and put\n", "# it in your path (add that binary path to $PATH in .bash_profile or other shell init file)\n", "# https://github.com/protocolbuffers/protobuf/releases\n", "# if you're installing on macos, you may need to follow these directions to give the app permission to run\n", "# https://support.apple.com/guide/mac-help/apple-cant-check-app-for-malicious-software-mchleab3a043/mac\n", "%%bash\n", "cd models/research/\n", "protoc object_detection/protos/*.proto --python_out=.\n", "cp object_detection/packages/tf2/setup.py .\n", "python -m pip install ." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "uZcqD4NLdnf4" }, "outputs": [], "source": [ "import matplotlib\n", "import matplotlib.pyplot as plt\n", "\n", "import os\n", "import random\n", "import io\n", "import imageio\n", "import glob\n", "import scipy.misc\n", "import numpy as np\n", "from six import BytesIO\n", "from PIL import Image, ImageDraw, ImageFont\n", "from IPython.display import display, Javascript\n", "from IPython.display import Image as IPyImage\n", "\n", "import tensorflow as tf\n", "\n", "from object_detection.utils import label_map_util\n", "from object_detection.utils import config_util\n", "from object_detection.utils import visualization_utils as viz_utils\n", "from object_detection.utils import colab_utils\n", "from object_detection.builders import model_builder\n", "\n", "%matplotlib inline\n", "\n", "FIGSIZE = (8, 6)\n", "THRESH = 0.5" ] }, { "cell_type": "markdown", "metadata": { "id": "IogyryF2lFBL" }, "source": [ "# Utilities" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-y9R0Xllefec" }, "outputs": [], "source": [ "def load_image_into_numpy_array(path : str):\n", " \"\"\"Load an image from file into a numpy array.\n", "\n", " Puts image into numpy array to feed into tensorflow graph.\n", " Note that by convention we put it into a numpy array with shape\n", " (height, width, channels), where channels=3 for RGB.\n", "\n", " Args:\n", " path: a file path.\n", "\n", " Returns:\n", " uint8 numpy array with shape (img_height, img_width, 3)\n", " \"\"\"\n", " img_data = tf.io.gfile.GFile(path, 'rb').read()\n", " image = Image.open(BytesIO(img_data))\n", " (im_width, im_height) = image.size\n", " return np.array(image.getdata()).reshape(\n", " (im_height, im_width, 3)).astype(np.uint8)\n", "\n", "def plot_detections(image_np : np.array,\n", " boxes,\n", " classes,\n", " scores,\n", " category_index,\n", " figsize=FIGSIZE, thresh=THRESH,\n", " image_name=None):\n", " \"\"\"Wrapper function to visualize detections.\n", "\n", " Args:\n", " image_np: uint8 numpy array with shape (img_height, img_width, 3)\n", " boxes: a numpy array of shape [N, 4]\n", " classes: a numpy array of shape [N]. Note that class indices are 1-based,\n", " and match the keys in the label map.\n", " scores: a numpy array of shape [N] or None. If scores=None, then\n", " this function assumes that the boxes to be plotted are groundtruth\n", " boxes and plot all boxes as black with no classes or scores.\n", " category_index: a dict containing category dictionaries (each holding\n", " category index `id` and category name `name`) keyed by category indices.\n", " figsize: size for the figure.\n", " image_name: a name for the image file.\n", " \"\"\"\n", " #print(f'classes={classes}')\n", " #print(f'category_index={category_index}')\n", " #print(f'image_np.shape={image_np.shape}')\n", " image_np_with_annotations = image_np.copy()\n", " viz_utils.visualize_boxes_and_labels_on_image_array(\n", " image_np_with_annotations,\n", " boxes,\n", " classes,\n", " scores,\n", " category_index,\n", " use_normalized_coordinates=True,\n", " max_boxes_to_draw=25,\n", " min_score_thresh=THRESH)\n", " if image_name:\n", " plt.imsave(image_name, image_np_with_annotations)\n", " else:\n", " plt.imshow(image_np_with_annotations)\n", " # consider import google.colab.patches import cv2_imshow\n" ] }, { "cell_type": "markdown", "metadata": { "id": "sSaXL28TZfk1" }, "source": [ "# Load Rubber Ducky or android statues or gingerbread man data\n", "\n", "We will start with some toy (literally) data consisting of 5 images of a rubber\n", "ducky. Note that the [coco](https://cocodataset.org/#explore) dataset contains a number of animals, but notably, it does *not* contain rubber duckies (or even ducks for that matter), so this is a novel class." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "SQy3ND7EpFQM" }, "outputs": [], "source": [ "# Load images and visualize\n", "train_images_np = []\n", "test_images_np = [] #this will be empty for video tests\n", "\n", "if dataname == \"duckies\" :\n", " train_image_dir = 'models/research/object_detection/test_images/ducky/train/'\n", " for i in range(1, 6):\n", " image_path = os.path.join(train_image_dir, 'robertducky' + str(i) + '.jpg')\n", " train_images_np.append(load_image_into_numpy_array(image_path))\n", "\n", " num_classes = 1\n", " duck_class_id = 0\n", " class_mapping = {duck_class_id:'rubber_ducky'}\n", " category_index = {i: {'id':i, 'name':cls}\n", " for i, cls in class_mapping.items()}\n", " gt_boxes = [\n", " np.array([[0.436, 0.591, 0.629, 0.712]], dtype=np.float32),\n", " np.array([[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),\n", " np.array([[0.464, 0.414, 0.626, 0.548]], dtype=np.float32),\n", " np.array([[0.313, 0.308, 0.648, 0.526]], dtype=np.float32),\n", " np.array([[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)\n", " ]\n", " class_ids = [[duck_class_id], [duck_class_id], [duck_class_id], [duck_class_id], [duck_class_id]]\n", "\n", " class_ids_test = []\n", " gt_boxes_test = [] #empty as we don't have bounding boxes.\n", " test_image_dir = 'models/research/object_detection/test_images/ducky/test/'\n", " for i in range(1, 50):\n", " image_path = os.path.join(test_image_dir, 'out' + str(i) + '.jpg')\n", " #test_images_np.append(np.expand_dims(\n", " # load_image_into_numpy_array(image_path), axis=0))\n", " test_images_np.append(load_image_into_numpy_array(image_path))\n", " class_ids_test.append([duck_class_id])\n", "\n", "else:\n", " #statues or gingerbread_man or gingerbread_man_2\n", "\n", " # define bounding boxes and load images\n", "\n", " # gt_boxes.shape=(batch, num_classes_in_image, 4)\n", " # [1,6,4]\n", " # [1,4,4]\n", " # [1,4,4]\n", " # [1,6,4]\n", " # [1,2,4]\n", " # [1,3,4]\n", " # [1,2,4]\n", " '''\n", " android statues 01 is from:\n", " https://www.flickr.com/photos/67287915@N00/8570385915\n", " android statues 02 is from:\n", " https://www.flickr.com/photos/quinnanya/5847206255\n", " android statues 03 and 04 are from:\n", " https://github.com/nking/curvature-scale-space-corners-and-transformations.git\n", " android statues 05 is from:\n", " https://commons.wikimedia.org/wiki/File:IceCream_Sandwich_%287791561448%29.jpg\n", " android statues 06 is from:\n", " https://upload.wikimedia.org/wikipedia/commons/thumb/3/3c/Sculpture_for_Android_Donut_at_Google_Mountain_View.jpg/320px-Sculpture_for_Android_Donut_at_Google_Mountain_View.jpg\n", " android statues 07 is from:\n", " https://upload.wikimedia.org/wikipedia/commons/thumb/e/ef/Android_Jelly_Bean_Lawn_Statue_%2812757851595%29.jpg/320px-Android_Jelly_Bean_Lawn_Statue_%2812757851595%29.jpg\n", "\n", " ytop xleft ybottom xright w h\n", " android 1 (tr) w=640 h=279\n", " 0 cupcake 134, 18, 191, 70,\n", " 1 euclair 176, 101, 244, 229,\n", " 2 icecream 129, 125, 188, 175,\n", " 3 gingerbread_man 122, 212, 193, 262,\n", " 4 icecream_sandwich 101, 280, 213, 365,\n", " 5 honey_comb 128, 391, 190, 455,\n", "\n", " android 2 (tr) w=640, h=427\n", " 0 cupcake 125, 128, 213, 212\n", " 1 euclair 186, 61, 358, 207,\n", " 2 icecream 105, 274, 233, 376\n", " 3 gingerbread_man 59, 493, 272, 594\n", "\n", " android 3 (te) w=1280, h=960\n", " 1 euclair 396, 278, 467, 497,\n", " 3 gingerbread_man 234, 57, 525, 303,\n", " 4 icecream_sandwich 227, 501, 494, 740,\n", " (obscured)\n", " 5 honeycomb 279, 702, 447, 788,\n", " 6 kitkat 228, 494, 590, 714,\n", " 7 jellybean 255, 932, 477, 1046,\n", "\n", " android 4 (tr) w=1280, h=960\n", " 0 cupcake 309, 311, 615, 605,\n", " 1 euclair 456, 998, 507, 1131,\n", " 2 icecream 329, 671, 575, 848,\n", " 3 gingerbread_man 318, 875, 552, 1041,\n", "\n", " android 5 (te) w=450, h=600\n", " 3 gingerbread_man 210, 107, 413, 177,\n", " 4 icecream_sandwich 161, 152, 507, 480,\n", "\n", " android 6 (tr) w=320, h=181\n", " 0 cupcake 33, 0, 121, 41,\n", " 6 kitkat 44, 239, 97, 261,\n", " 8 donut 0, 58, 180, 244,\n", "\n", " android 7 (tr) w=320, h=213\n", " 7 jellybean 15, 151, 197, 274,\n", " 8 donut 64, 80, 110, 112,\n", " ytop xleft ybottom xright\n", " '''\n", "\n", " def make_bb(ytop, xleft, ybottom, xright, w, h) :\n", " return [ytop/h, xleft/w, ybottom/h, xright/w]\n", "\n", " class_mapping = {0:'cupcake', 1:'euclair', 2:'icecream', 3:'gingerbread_man',\n", " 4:'icecream_sandwich', 5:'honey_comb', 6:'kitkat',\n", " 7:'jellybean', 8:'donut'}\n", "\n", " class_ids_01 = [0, 1, 2, 3, 4, 5]\n", " bb_01 = [make_bb(134, 18, 191, 70, 640, 279),\n", " make_bb(176, 101, 244, 229, 640, 279),\n", " make_bb(129, 125, 188, 175, 640, 279),\n", " make_bb(122, 212, 193, 262, 640, 279),\n", " make_bb(101, 280, 213, 365, 640, 279),\n", " make_bb(128, 391, 190, 455, 640, 279),\n", " ]\n", "\n", " class_ids_02 = [0, 1, 2, 3]\n", " bb_02 = [make_bb(125, 128, 213, 212, 640, 427),\n", " make_bb(186, 61, 358, 207, 640, 427),\n", " make_bb(105, 274, 233, 376, 640, 427),\n", " make_bb(59, 493, 272, 594, 640, 427),\n", " ]\n", "\n", " class_ids_03 = [1, 3, 4, 5, 6, 7]\n", " bb_03 = [make_bb(396, 278, 467, 497, 1280, 960),\n", " make_bb(234, 57, 525, 303, 1280, 960),\n", " make_bb(227, 501, 494, 740, 1280, 960),\n", " make_bb(279, 702, 447, 788, 1280, 960),\n", " make_bb(228, 494, 590, 714, 1280, 960),\n", " make_bb(255, 932, 477, 1046, 1280, 960),\n", " ]\n", "\n", " class_ids_04 = [0,1,2,3]\n", " bb_04 = [make_bb(309, 311, 615, 605, 1280, 960),\n", " make_bb(456, 998, 507, 1131, 1280, 960),\n", " make_bb(329, 671, 575, 848, 1280, 960),\n", " make_bb(318, 875, 552, 1041, 1280, 960),\n", " ]\n", "\n", " class_ids_05 = [3, 4]\n", " bb_05 = [make_bb(210, 107, 413, 177, 450, 600),\n", " make_bb(161, 152, 507, 480, 450, 600),\n", " ]\n", "\n", " class_ids_06 = [0, 6, 8]\n", " bb_06 = [make_bb(33, 0, 121, 41, 320, 181),\n", " make_bb(44, 239, 97, 261, 320, 181),\n", " make_bb(0, 58, 180, 244, 320, 181)\n", " ]\n", "\n", " class_ids_07 = [7, 8]\n", " bb_07 = [make_bb(15, 151, 197, 274, 320, 213),\n", " make_bb(64, 80, 110, 112, 320, 213)\n", " ]\n", "\n", " import tempfile\n", " from tempfile import TemporaryDirectory\n", "\n", " remote_url = \"https://raw.githubusercontent.com/nking/curvature-scale-space-corners-and-transformations/master/testresources/\"\n", "\n", " data_dir = tempfile.mkdtemp()\n", " print(f'temp_dataset_dir={data_dir}')\n", "\n", " # download and store images\n", " for i in range(1, 8):\n", " file_name = 'android_statues_0' + str(i) + '.jpg'\n", " req_url = remote_url + file_name\n", " !wget {req_url} -P {data_dir}\n", " image_path = os.path.join(data_dir, file_name)\n", " # statues train is all images, 1 - 7\n", " # gingerbread_man train is images 1,2,4, test is 3,5\n", " # gingerbread_man_2 train is images 1-5\n", " if dataname == \"statues\":\n", " train_images_np.append(load_image_into_numpy_array(image_path))\n", " elif dataname == \"gingerbread_man\":\n", " if i == 1 or i == 2 or i == 4:\n", " train_images_np.append(load_image_into_numpy_array(image_path))\n", " elif i == 3 or i == 5:\n", " test_images_np.append(load_image_into_numpy_array(image_path))\n", " elif dataname == \"gingerbread_man_2\":\n", " if i < 6:\n", " train_images_np.append(load_image_into_numpy_array(image_path))\n", "\n", " if dataname == \"statues\":\n", " gt_boxes = [\n", " np.array(bb_01, dtype=np.float32),\n", " np.array(bb_02, dtype=np.float32),\n", " np.array(bb_03, dtype=np.float32),\n", " np.array(bb_04, dtype=np.float32),\n", " np.array(bb_05, dtype=np.float32),\n", " np.array(bb_06, dtype=np.float32),\n", " np.array(bb_07, dtype=np.float32)\n", " ]\n", " class_ids = [class_ids_01, class_ids_02, class_ids_03, class_ids_04,\n", " class_ids_05, class_ids_06, class_ids_07]\n", " category_index = {i: {'id':i, 'name':cls}\n", " for i, cls in class_mapping.items()}\n", " # test is video\n", " elif dataname == \"gingerbread_man_2\":\n", " gt_boxes = [\n", " np.array([bb_01[3]], dtype=np.float32),\n", " np.array([bb_02[3]], dtype=np.float32),\n", " np.array([bb_03[3]], dtype=np.float32),\n", " np.array([bb_04[3]], dtype=np.float32),\n", " np.array([bb_05[3]], dtype=np.float32),\n", " ]\n", " class_ids = [[0], [0], [0], [0], [0]]\n", " category_index = {0: {'id': 0, 'name': 'gingerbread_man'}}\n", " # test is video\n", " elif dataname == \"gingerbread_man\":\n", " # gingerbread_man\n", " gt_boxes = [\n", " np.array([bb_01[3]], dtype=np.float32),\n", " np.array([bb_02[3]], dtype=np.float32),\n", " np.array([bb_04[3]], dtype=np.float32),\n", " ]\n", " class_ids = [[0], [0], [0]]\n", " gt_boxes_test = [\n", " np.array([bb_03[1]], dtype=np.float32),\n", " np.array([bb_05[0]], dtype=np.float32),\n", " ]\n", " class_ids_test = [[0], [0]]\n", " category_index = {0: {'id': 0, 'name': 'gingerbread_man'}}\n", "\n", " num_classes = len(category_index)\n", "\n", "print(f'num_classes={num_classes}')\n", "print(f'category_index={category_index}')\n", "print(f'len(gt_boxes)={len(gt_boxes)}')\n", "print(f'class_ids={class_ids}')\n", "\n", "plt.rcParams['axes.grid'] = False\n", "plt.rcParams['xtick.labelsize'] = False\n", "plt.rcParams['ytick.labelsize'] = False\n", "plt.rcParams['xtick.top'] = False\n", "plt.rcParams['xtick.bottom'] = False\n", "plt.rcParams['ytick.left'] = False\n", "plt.rcParams['ytick.right'] = False\n", "plt.rcParams['figure.figsize'] = [FIGSIZE[0], FIGSIZE[1]]\n" ] }, { "cell_type": "markdown", "metadata": { "id": "cbKXmQoxcUgE" }, "source": [ "# Annotate images with bounding boxes\n", "\n", "In this cell you will annotate the rubber duckies --- draw a box around the rubber ducky in each image; click `next image` to go to the next image and `submit` when there are no more images.\n", "\n", "If you'd like to skip the manual annotation step, we totally understand. In this case, simply skip this cell and run the next cell instead, where we've prepopulated the groundtruth with pre-annotated bounding boxes.\n", "\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-nEDRoUEcUgL" }, "outputs": [], "source": [ "# nice tool, but can only label one object per image\n", "#gt_boxes = []\n", "#colab_utils.annotate(train_images_np, box_storage_pointer=gt_boxes)" ] }, { "cell_type": "markdown", "metadata": { "id": "wTP9AFqecUgS" }, "source": [ "# In case you didn't want to label...\n", "\n", "Run this cell only if you didn't annotate anything above and\n", "would prefer to just use our preannotated boxes. Don't forget\n", "to uncomment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "wIAT6ZUmdHOC" }, "outputs": [], "source": [ "# moved this to a higher cell and skipped the interactive labelling\n", "# gt_boxes = [\n", "# np.array([[0.436, 0.591, 0.629, 0.712]], dtype=np.float32),\n", "# np.array([[0.539, 0.583, 0.73, 0.71]], dtype=np.float32),\n", "# np.array([[0.464, 0.414, 0.626, 0.548]], dtype=np.float32),\n", "# np.array([[0.313, 0.308, 0.648, 0.526]], dtype=np.float32),\n", "# np.array([[0.256, 0.444, 0.484, 0.629]], dtype=np.float32)\n", "# ]" ] }, { "cell_type": "markdown", "metadata": { "id": "Dqb_yjAo3cO_" }, "source": [ "# Prepare data for training\n", "\n", "Below we add the class annotations (for simplicity, we assume a single class in this colab; though it should be straightforward to extend this to handle multiple classes). We also convert everything to the format that the training\n", "loop below expects (e.g., everything converted to tensors, classes converted to one-hot representations, etc.)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HWBqFVMcweF-" }, "outputs": [], "source": [ "\n", "\n", "# Convert class labels to one-hot; convert everything to tensors.\n", "train_image_tensors = []\n", "gt_classes_one_hot_tensors = []\n", "gt_box_tensors = []\n", "for idx, train_image_np in enumerate(train_images_np):\n", " gt_box_np = gt_boxes[idx]\n", " n = gt_box_np.shape[0]\n", " box_label_idxs = class_ids[idx] #TODO check on this\n", " train_image_tensors.append(tf.expand_dims(tf.convert_to_tensor(\n", " train_image_np, dtype=tf.float32), axis=0))\n", " gt_box_tensors.append(tf.convert_to_tensor(gt_box_np, dtype=tf.float32))\n", "\n", " #print(f'{n} labeled boxes. num_classes={num_classes}')\n", " print(f'gt_box_tensors={gt_box_tensors}')\n", "\n", " #zero_indexed_groundtruth_classes = tf.convert_to_tensor(\n", " # np.ones(shape=[gt_box_np.shape[0]], dtype=np.int32) - label_id_offset)\n", " cls_arr = np.array(box_label_idxs, dtype=np.int32)\n", " zero_indexed_groundtruth_classes = tf.convert_to_tensor(cls_arr)\n", " #print(f'zero_indexed_groundtruth_classes={zero_indexed_groundtruth_classes}')\n", "\n", " one_hot_per_box_tensor = tf.one_hot(\n", " indices=zero_indexed_groundtruth_classes, depth=num_classes)\n", " gt_classes_one_hot_tensors.append(one_hot_per_box_tensor)\n", "\n", " print(f'gt_classes_one_hot_tensors={gt_classes_one_hot_tensors}')\n", "\n", "print('Done prepping data.')" ] }, { "cell_type": "markdown", "metadata": { "id": "b3_Z3mJWN9KJ" }, "source": [ "# Let's just visualize the rubber duckies (or statues) as a sanity check\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "YBD6l-E4N71y" }, "outputs": [], "source": [ "nt = len(train_images_np)\n", "ncols = 3\n", "nrows = int(math.ceil(nt/ncols))\n", "plt_h = int(nrows * FIGSIZE[1]/2)\n", "fig = plt.figure(figsize=(FIGSIZE[0], plt_h))\n", "\n", "for idx in range(nt):\n", " n = gt_boxes[idx].shape[0]\n", "\n", " fake_pred_classes = np.array(class_ids[idx])\n", " fake_pred_scores = np.ones(shape=[n], dtype=np.float32) # give boxes a score of 100%\n", "\n", " print(f'[{idx}] box labels = {fake_pred_classes}')\n", " print(f' scores={fake_pred_scores}')\n", " print(f'gt_boxes[{idx}]={gt_boxes[idx]}')\n", "\n", " ax = plt.subplot(nrows, ncols, idx+1)\n", " ax.set_title('train ' + str(idx))\n", " plot_detections(train_images_np[idx], gt_boxes[idx],\n", " fake_pred_classes, fake_pred_scores, category_index)\n", "\n", "has_test_bounding_boxes = False\n", "try:\n", " has_test_bounding_boxes = len(gt_boxes_test) == len(test_images_np)\n", "except NameError:\n", " pass\n", "\n", "if has_test_bounding_boxes:\n", " nt = len(test_images_np)\n", " ncols = 3\n", " nrows = int(math.ceil(nt/ncols))\n", " for idx in range(len(test_images_np)):\n", " n = gt_boxes_test[idx].shape[0]\n", "\n", " fake_pred_classes = np.array(class_ids_test[idx])\n", " fake_pred_scores = np.ones(shape=[n], dtype=np.float32) # give boxes a score of 100%\n", "\n", " #print(f'[{idx}] box labels = {fake_pred_classes}')\n", " #print(f' scores={fake_pred_scores}')\n", " #print(f'gt_boxes_test[{idx}]={gt_boxes_test[idx]}')\n", "\n", " ax = plt.subplot(nrows, ncols, idx+1)\n", " ax.set_title('test ' + str(idx))\n", " plot_detections(test_images_np[idx], gt_boxes_test[idx],\n", " fake_pred_classes, fake_pred_scores, category_index)\n", "else :\n", " # a peek at subset of test images. will do noting for empty test array\n", " j = 1\n", " nt = len(test_images_np)\n", " ncols = 3\n", " nrows = int(math.ceil((nt/10.)/ncols))\n", " for idx in range(0, nt, 10):\n", " ax = plt.subplot(nrows, ncols, j)\n", " ax.set_title('test ' + str(idx))\n", " plt.imshow(test_images_np[idx])\n", " j += 1\n" ] }, { "cell_type": "markdown", "metadata": { "id": "ghDAsqfoZvPh" }, "source": [ "# Create model and restore weights for all but last layer\n", "\n", "In this cell we build a single stage detection architecture (RetinaNet) and restore all but the classification layer at the top (which will be automatically randomly initialized).\n", "\n", "For simplicity, we have hardcoded a number of things in this colab for the specific RetinaNet architecture at hand (including assuming that the image size will always be 640x640), however it is not difficult to generalize to other model configurations." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "9J16r3NChD-7" }, "outputs": [], "source": [ "# Download the checkpoint and put it into models/research/object_detection/test_data/\n", "\n", "#TODO: consider pretrained SSD models trained on architecture runnable on desktop\n", "# https://pytorch.org/hub/nvidia_deeplearningexamples_ssd/ if not jetson...\n", "# https://docs.nvidia.com/metropolis/TLT/tlt-user-guide/text/object_detection/ssd.html\n", "# https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md\n", "\n", "#consider checking for existence of models/research/object_detection/test_data/checkpoint\n", "# if not pathlib.Path('checkpoint').exists():\n", "## however, the download is less than 10 sec on collab.\n", "\n", "!wget http://download.tensorflow.org/models/object_detection/tf2/20200711/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz\n", "!tar -xf ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.tar.gz\n", "!mv ssd_resnet50_v1_fpn_640x640_coco17_tpu-8/checkpoint models/research/object_detection/test_data/\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "RyT4BUbaMeG-" }, "outputs": [], "source": [ "tf.keras.backend.clear_session()\n", "\n", "#https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/auto_examples/object_detection_camera.html#sphx-glr-auto-examples-object-detection-camera-py\n", "\n", "print('Building model and restoring weights for fine-tuning...', flush=True)\n", "pipeline_config = 'models/research/object_detection/configs/tf2/ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config'\n", "checkpoint_path = 'models/research/object_detection/test_data/checkpoint/ckpt-0'\n", "\n", "# Lists the checkpoint keys and shapes of variables in a checkpoint.\n", "# Returns: List of tuples (key, shape).\n", "#cp_list = tf.compat.v2.train.list_variables(checkpoint_path)\n", "#for cp_key, cp_shape in cp_list:\n", "# print(f'checkpoint key={cp_key}')\n", "\n", "# Load pipeline config and build a detection model.\n", "#\n", "# Since we are working off of a COCO architecture which predicts 90\n", "# class slots by default, we override the `num_classes` field here to be just\n", "# one (for our new rubber ducky class).\n", "configs = config_util.get_configs_from_pipeline_file(pipeline_config)\n", "'''\n", "config key=model\n", "config key=train_config\n", "config key=train_input_config\n", "config key=eval_config\n", "config key=eval_input_configs\n", "config key=eval_input_config\n", "'''\n", "\n", "'''\n", "configs[model] = ssd {\n", " num_classes: 90\n", " image_resizer {\n", " fixed_shape_resizer {\n", " height: 640\n", " width: 640\n", " }\n", " }\n", " feature_extractor {\n", " type: \"ssd_resnet50_v1_fpn_keras\"\n", " depth_multiplier: 1.0\n", " min_depth: 16\n", " conv_hyperparams {\n", " regularizer {\n", " l2_regularizer {\n", " weight: 0.00039999998989515007\n", " }\n", " }\n", " initializer {\n", " truncated_normal_initializer {\n", " mean: 0.0\n", " stddev: 0.029999999329447746\n", " }\n", " }\n", " activation: RELU_6\n", " batch_norm {\n", " decay: 0.996999979019165\n", " scale: true\n", " epsilon: 0.0010000000474974513\n", " }\n", " }\n", " override_base_feature_extractor_hyperparams: true\n", " #FPN is Feature Pyramid Network. FPN constructions uses features maps starting from fpn_min_levelupto the fpn_max_level.\n", " fpn {\n", " min_level: 3\n", " max_level: 7\n", " }\n", " }\n", " box_coder {\n", " faster_rcnn_box_coder {\n", " y_scale: 10.0\n", " x_scale: 10.0\n", " height_scale: 5.0\n", " width_scale: 5.0\n", " }\n", " }\n", " matcher {\n", " argmax_matcher {\n", " matched_threshold: 0.5\n", " unmatched_threshold: 0.5\n", " ignore_thresholds: false\n", " negatives_lower_than_unmatched: true\n", " force_match_for_each_row: true\n", " use_matmul_gather: true\n", " }\n", " }\n", " similarity_calculator {\n", " iou_similarity {\n", " }\n", " }\n", " box_predictor {\n", " weight_shared_convolutional_box_predictor {\n", " conv_hyperparams {\n", " regularizer {\n", " l2_regularizer {\n", " weight: 0.00039999998989515007\n", " }\n", " }\n", " initializer {\n", " random_normal_initializer {\n", " mean: 0.0\n", " stddev: 0.009999999776482582\n", " }\n", " }\n", " activation: RELU_6\n", " batch_norm {\n", " decay: 0.996999979019165\n", " scale: true\n", " epsilon: 0.0010000000474974513\n", " }\n", " }\n", " depth: 256\n", " num_layers_before_predictor: 4\n", " kernel_size: 3\n", " class_prediction_bias_init: -4.599999904632568\n", " }\n", " }\n", " anchor_generator {\n", " multiscale_anchor_generator {\n", " min_level: 3\n", " max_level: 7\n", " anchor_scale: 4.0\n", " aspect_ratios: 1.0\n", " aspect_ratios: 2.0\n", " aspect_ratios: 0.5\n", " scales_per_octave: 2\n", " }\n", " }\n", " post_processing {\n", " batch_non_max_suppression {\n", " score_threshold: 9.99999993922529e-09\n", " iou_threshold: 0.6000000238418579\n", " max_detections_per_class: 100\n", " max_total_detections: 100\n", " }\n", " score_converter: SIGMOID\n", " }\n", " normalize_loss_by_num_matches: true\n", " loss {\n", " localization_loss {\n", " weighted_smooth_l1 {\n", " }\n", " }\n", " classification_loss {\n", " # sigmoid for multi-label classification\n", " weighted_sigmoid_focal {\n", " gamma: 2.0\n", " alpha: 0.25\n", " }\n", " }\n", " classification_weight: 1.0\n", " localization_weight: 1.0\n", " }\n", " encode_background_as_zeros: true\n", " normalize_loc_loss_by_codesize: true\n", " inplace_batchnorm_update: true\n", " freeze_batchnorm: false\n", "}\n", "'''\n", "model_config = configs['model']\n", "model_config.ssd.num_classes = num_classes\n", "model_config.ssd.freeze_batchnorm = True\n", "\n", "#TODO: deprecated, so update to use createModel (Context context, String modelPath, Model.Options options)\n", "detection_model = model_builder.build(model_config=model_config, is_training=True)\n", "\n", "trainable_variables = detection_model.trainable_variables\n", "for i in range(len(trainable_variables) // 2):\n", " print(f'trainable variable[{i}] = {trainable_variables[i]}')\n", " #trainable_variables[i].trainable = False\n", "\n", "# Set up object-based checkpoint restore\n", "# RetinaNet has two prediction `heads`\n", "# --- one for classification,\n", "# --- the other for box regression.\n", "# We will restore the box regression head but initialize the classification head\n", "# from scratch (we show the omission below by commenting out the line that\n", "# we would add if we wanted to restore both heads)\n", "fake_box_predictor = tf.compat.v2.train.Checkpoint(\n", " _base_tower_layers_for_heads=detection_model._box_predictor._base_tower_layers_for_heads,\n", " # to restore the classification weights requires us to use\n", " # configs['model'].ssd.num_classes = 90 to match the checkpoint\n", " # else uncommenting the next line leads to ValueError from shape in WeightSharedConvolutionalClassHead\n", " #_prediction_heads=detection_model._box_predictor._prediction_heads,\n", " # (i.e., _prediction_heads includes the classification head that we *will not* restore)\n", " _box_prediction_head=detection_model._box_predictor._box_prediction_head,\n", " )\n", "\n", "# fake_model loads the pre-trained weights. the feature extractor is used by our classification training\n", "fake_model = tf.compat.v2.train.Checkpoint(\n", " _feature_extractor = detection_model._feature_extractor,\n", " _box_predictor = fake_box_predictor)\n", "ckpt = tf.compat.v2.train.Checkpoint(model=fake_model)\n", "ckpt.restore(checkpoint_path).expect_partial()\n", "\n", "# Run model through a dummy image so that variables are created\n", "# arg is a rank 4 image tensor: [1, height, width, channels]\n", "image, shapes = detection_model.preprocess(tf.zeros([1, 640, 640, 3]))\n", "# implies we are using batch_size=1; argument features shape is[batch_size, height, width, channels]\n", "\n", "#print(f'fake shapes from preprocess: \\n {shapes}') #[[640 640 3]]\n", "\n", "prediction_dict = detection_model.predict(image, shapes)\n", "\n", "#print(f'fake prediction from predict: \\n {prediction_dict}')\n", "# {'preprocessed_inputs'\n", "#'feature_maps'\n", "#'anchors'\n", "#'final_anchors'\n", "#'box_encodings'\n", "#'class_predictions_with_background'\n", "\n", "#class_predictions_with_background: A float tensor of shape\n", "# [batch_size, 1, num_class_slots] representing the class predictions for\n", "# the proposals.\n", "\n", "_ = detection_model.postprocess(prediction_dict, shapes)\n", "\n", "#print(f'fake postprocess = {_}')\n", "#fake postprocess = {'detection_boxes'\n", "#'detection_scores' # a tensor of shape=(1, 100). for 100 coco classes I think. follow up on this\n", "#'detection_classes': 1-1=0 so best scores are very small numbers. min_iou=0 => 1\n", " # (1-0.99) should be resolvable as an integer so will use fctr >= 100\n", " fctr = 1000.\n", " for i in range(m):\n", " for j in range(n):\n", " iou_i_j = intersection_over_union(gt_b[i], detected_b[j])\n", " costs[i][j] = round(fctr*(1. - iou_i_j))\n", "\n", " solver = pywraplp.Solver.CreateSolver(\"SCIP\")\n", "\n", " if not solver:\n", " print(f'no solver found')\n", " exit()\n", "\n", " num_workers = m\n", " num_tasks = n\n", "\n", " # Variables\n", " # x[i, j] is an array of 0-1 variables, which will be 1\n", " # if worker i is assigned to task j.\n", " x = {}\n", " for i in range(num_workers):\n", " for j in range(num_tasks):\n", " x[i, j] = solver.IntVar(0, 1, \"\")\n", "\n", " # Constraints\n", " # Each worker is assigned to at most 1 task.\n", " for i in range(num_workers):\n", " solver.Add(solver.Sum([x[i, j] for j in range(num_tasks)]) <= 1)\n", "\n", " # Each task is assigned to exactly one worker.\n", " for j in range(num_tasks):\n", " solver.Add(solver.Sum([x[i, j] for i in range(num_workers)]) == 1)\n", "\n", " # Objective\n", " objective_terms = []\n", " for i in range(num_workers):\n", " for j in range(num_tasks):\n", " objective_terms.append(costs[i][j] * x[i, j])\n", " solver.Minimize(solver.Sum(objective_terms))\n", "\n", " # Solve\n", " status = solver.Solve()\n", "\n", " # Print solution.\n", " if status == pywraplp.Solver.OPTIMAL or status == pywraplp.Solver.FEASIBLE:\n", " print(f\"Total cost = {solver.Objective().Value()}\\n\")\n", " matched = {}\n", " for i in range(num_workers):\n", " for j in range(num_tasks):\n", " # Test if x[i,j] is 1 (with tolerance for floating point arithmetic).\n", " if x[i, j].solution_value() > 0.5:\n", " if swap:\n", " matched[j] = i\n", " print(f\"Worker {j} assigned to task {i}.\" + f\" Cost: {costs[i][j]}\")\n", " else:\n", " matched[i] = j\n", " print(f\"Worker {i} assigned to task {j}.\" + f\" Cost: {costs[i][j]}\")\n", " return matched\n", " else:\n", " print(\"No solution found.\")\n", " return None\n", "\n", "def classification_loss(matched_idxs : dict, gt_class_ids : list,\n", " detection_class_ids : list, detection_scores : list):\n", " '''\n", " for use if want to calculate the classification losses based up the\n", " bipartite unbalanced weighted matching by bounding boxes. Not completely\n", " consistent to use this method because we use the detection scores and not what\n", " went into making them.\n", " '''\n", " # MCCE: -sum_over_k_classes(y_k * log(p_k))\n", " # where y_k is the indicator function. it is 1 when y_pred matches y_true, else is 0.\n", " # the p_k are the scores from the predictions. they have already been\n", " # converted to probabilities using softmax and so from_logits=False in use\n", " # of tensorflow's SparseCategoricalCrossEntropy.\n", "\n", " loss = 1E-7\n", " for idx1, idx2 in matched_idxs.items():\n", " c1 = gt_class_ids[idx1]\n", " c2 = detection_class_ids[idx2]\n", " loss += -1.*math.log(c2*c1)\n", " loss /= (len(matched_idxs))\n", " return loss\n", "\n", "# if wanted to do the mAP analysis:\n", "# use a range of thresholds of iou to decide about the class labeling:\n", "# thresh 0.5 to 0.95 with d=0.05.\n", "# - calc confusion matrix\n", "# - calc precision and recall\n", "# - calc area under precision recall curve\n", "# use iou as PR_AUC = integral from 0 to 1 of precision(rev) * d(rec)\n", "# - measure the average precision for each class, AP_i\n", "# - mAP = (1/N) * sum over i (AP_i) where N is the number of classes\n", "\n", "'''\n", "Got this error from installing ortools, but the protobuf version conflict did\n", "not prevent use of the solver:\n", "\n", "ERROR: pip's dependency resolver does not currently take into account all the\n", "packages that are installed. This behaviour is the source of the following\n", "dependency conflicts.\n", "tensorflow-metadata 1.14.0 requires protobuf<4.21,>=3.20.3,\n", "but you have protobuf 4.25.0 which is incompatible.\n", "apache-beam 2.51.0 requires protobuf!=4.0.*,!=4.21.*,!=4.22.0,!=4.23.*,!=4.24.0,!=4.24.1,!=4.24.2,<4.25.0,>=3.20.3,\n", "but you have protobuf 4.25.0 which is incompatible.\n", "'''\n", "!pip install ortools\n", "import ortools\n", "from ortools.linear_solver import pywraplp\n", "\n", "def calc_losses(gt_b : list, gt_class_ids: list, detected_b : list,\n", " detected_class_ids : list, detected_scores : list) :\n", " '''\n", " given the ground truth bounding boxes and the detected bounding boxes,\n", " and their respective classification indexes,\n", " find the best matches and\n", " return the matched indexes, ious as localication loss, classification loss.\n", " TODO: consider adding mAP too.\n", "\n", " Note: It is not completely consistent to use this method because we use the\n", " detection scores and not what went into making them.\n", "\n", " Args:\n", " gt_b (list) : list of ground truth bounding boxes. each bounding box is\n", " a list of floating point values of length 4\n", " gt_class_ids (list) : list of ground truth class ids\n", " detected_b (list) : list of detected bounding boxes\n", " detected_class_ids (list) : list of detected class ids\n", " detection_scores (list) : list of detected scores for the bounding boxes and class ids.\n", "\n", " Returns: a dictionary with keys 'localization_loss', 'classification_loss'\n", " '''\n", "\n", " matched_idxs = calc_matched_indexes(gt_b, detected_b)\n", "\n", " if matched_idxs is None:\n", " return None\n", "\n", " classification_loss = classification_loss(matched_idxs, gt_class_ids,\n", " detected_class_ids, detected_scores)\n", "\n", " localization_loss = 0.\n", " for idx1, idx2 in matched_idxs.items():\n", " iou = intersection_over_union(gt_b[idx1], detected_b[idx2])\n", " localization_loss += (1. - iou)\n", "\n", " localization_loss /= (len(matched_idxs))\n", "\n", " return {'localization_loss': localization_loss, 'classification_loss': classification_loss}\n", "\n" ] }, { "cell_type": "markdown", "metadata": { "id": "WHlXL1x_Z3tc" }, "source": [ "# Load any test video and run inference with new model!" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WcE6OwrHQJya" }, "outputs": [], "source": [ "# @title\n", "\n", "# @tf.function is used to tell TensorFlow that a function should be compiled\n", "# into a graph that can be executed by TensorFlow's optimized runtime\n", "\n", "# Again, uncomment this decorator if you want to run inference eagerly\n", "@tf.function\n", "def detect(input_tensor : tf.Tensor):\n", " \"\"\"Run detection on an input image.\n", "\n", " Args:\n", " input_tensor: A [1, height, width, 3] Tensor of type tf.float32.\n", " Note that height and width can be anything since the image will be\n", " immediately resized according to the needs of the model within this\n", " function.\n", "\n", " Returns:\n", " A dict containing 3 Tensors (`detection_boxes`, `detection_classes`,\n", " and `detection_scores`).\n", " \"\"\"\n", " preprocessed_image, shapes = detection_model.preprocess(input_tensor)\n", " # model.predict returns Numpy array(s) of predictions.\n", " prediction_dict = detection_model.predict(preprocessed_image, shapes)\n", " detections = detection_model.postprocess(prediction_dict, shapes)\n", " return detections, prediction_dict\n", "\n", "# Note that the first frame will trigger tracing of the tf.function, which will\n", "# take some time, after which inference should be fast.\n", "\n", "if dataname == \"duckies\":\n", " # there are 50, so plot only 7\n", " for i in range(0, len(test_images_np), 7):\n", "\n", " input_tensor = tf.convert_to_tensor(np.expand_dims(test_images_np[i], axis=0), dtype=tf.float32)\n", " detections, prediction_dict = detect(input_tensor)\n", "\n", " #print(f\"detections['detection_boxes']={detections['detection_boxes']}\")\n", " #print(f\"detections['detection_classes']={detections['detection_classes']}\")\n", " #print(f\"detections['detection_scores']={detections['detection_scores']}\")\n", "\n", " print(f'({i}) next plot_detections')\n", " plot_detections(\n", " test_images_np[i],\n", " detections['detection_boxes'][0].numpy(),\n", " detections['detection_classes'][0].numpy().astype(np.uint32),\n", " detections['detection_scores'][0].numpy(),\n", " category_index, figsize=FIGSIZE, thresh=THRESH\n", " #, image_name=\"gif_frame_\" + ('%02d' % i) + \".jpg\"\n", " )\n", "\n", "elif dataname == \"gingerbread_man\":\n", " for i in range(len(test_images_np)):\n", " input_tensor = tf.convert_to_tensor(np.expand_dims(test_images_np[i], axis=0), dtype=tf.float32)\n", " detections, prediction_dict = detect(input_tensor)\n", "\n", " print(f\"detections['detection_boxes']={detections['detection_boxes']}\")\n", " print(f\"detections['detection_classes']={detections['detection_classes']}\")\n", " print(f\"detections['detection_scores']={detections['detection_scores']}\")\n", "\n", " bb = gt_boxes_test[i].tolist()\n", " labels = class_ids_test[i]\n", " pred = detections['detection_boxes'][0].numpy().tolist()\n", " plot_detections(\n", " test_images_np[i],\n", " detections['detection_boxes'][0].numpy(),\n", " detections['detection_classes'][0].numpy().astype(np.uint32),\n", " detections['detection_scores'][0].numpy(),\n", " category_index, figsize=FIGSIZE, thresh=THRESH)\n", " # find matching bounding boxes\n", " #breakpoint()\n", " matched_idxs = calc_matched_indexes(bb, pred)\n", " print(f'matched_idxs={matched_idxs}')\n", "else:\n", " #android statues video as test\n", " # video title: \"Google HQ Android Statues\"\n", " # creator: Polsky Morillo\n", " video_url = \"https://www.youtube.com/watch?v=BRKLw_16Lac\"\n", "\n", " !pip install opencv-python\n", " !pip install --upgrade yt-dlp\n", " !apt -y install ffmpeg lame\n", "\n", " import imutils\n", " import cv2\n", "\n", " def get_frame(time, frame_count, filepath) :\n", " vid_cap.set(cv2.CAP_PROP_POS_MSEC, time)\n", " # youtube frame rate options: 24 to 60 frames/sec\n", " frame_det, frame = vid_cap.read()\n", " print(f'read correctly={frame_det}')\n", " if frame_det:\n", " frame = imutils.resize(frame, width=640) #smaller width of 128?\n", " print(f'writing to {filepath}')\n", " return cv2.imwrite(filepath, frame) #return True if written\n", " else:\n", " return None\n", "\n", " stream_uri = os.path.join(data_dir, \"android_statues.mp4\")\n", "\n", " #https://pypi.org/project/yt-dlp/\n", " #Video Format Options\n", " # --check-formats\n", "\n", " #!yt-dlp $video_url --list-formats\n", " '''\n", " [info] Available formats for BRKLw_16Lac:\n", " ID EXT RESOLUTION FPS CH │ FILESIZE TBR PROTO │ VCODEC VBR ACODEC ABR ASR MORE INFO\n", " ────────────────────────────────────────────────────────────────────────────────────────────────────────────────\n", " sb3 mhtml 48x27 3 │ mhtml │ images storyboard\n", " sb2 mhtml 79x45 1 │ mhtml │ images storyboard\n", " sb1 mhtml 159x90 1 │ mhtml │ images storyboard\n", " sb0 mhtml 319x180 1 │ mhtml │ images storyboard\n", " 233 mp4 audio only │ m3u8 │ audio only unknown Default\n", " 234 mp4 audio only │ m3u8 │ audio only unknown Default\n", " 599 m4a audio only 2 │ 110.08KiB 31k https │ audio only mp4a.40.5 31k 22k ultralow, m4a_dash\n", " 600 webm audio only 2 │ 13.08KiB 4k https │ audio only opus 4k 48k ultralow, webm_dash\n", " 139 m4a audio only 2 │ 174.06KiB 49k https │ audio only mp4a.40.5 49k 22k low, m4a_dash\n", " 249 webm audio only 2 │ 13.08KiB 4k https │ audio only opus 4k 48k low, webm_dash\n", " 250 webm audio only 2 │ 13.08KiB 4k https │ audio only opus 4k 48k low, webm_dash\n", " 140 m4a audio only 2 │ 459.75KiB 130k https │ audio only mp4a.40.2 130k 44k medium, m4a_dash\n", " 251 webm audio only 2 │ 13.08KiB 4k https │ audio only opus 4k 48k medium, webm_dash\n", " 17 3gp 176x144 7 1 │ 248.81KiB 70k https │ mp4v.20.3 mp4a.40.2 22k 144p\n", " 597 mp4 256x144 15 │ 113.67KiB 32k https │ avc1.4d400b 32k video only 144p, mp4_dash\n", " 602 mp4 256x144 15 │ ~375.83KiB 104k m3u8 │ vp09.00.10.08 104k video only\n", " 598 webm 256x144 15 │ 141.00KiB 40k https │ vp9 40k video only 144p, webm_dash\n", " 269 mp4 256x144 30 │ ~617.65KiB 170k m3u8 │ avc1.4D400C 170k video only\n", " 160 mp4 256x144 30 │ 393.15KiB 111k https │ avc1.4D400C 111k video only 144p, mp4_dash\n", " 603 mp4 256x144 30 │ ~668.17KiB 184k m3u8 │ vp09.00.11.08 184k video only\n", " 278 webm 256x144 30 │ 362.32KiB 103k https │ vp09.00.11.08 103k video only 144p, webm_dash\n", " 229 mp4 426x240 30 │ ~ 1.13MiB 318k m3u8 │ avc1.4D4015 318k video only\n", " 133 mp4 426x240 30 │ 871.24KiB 247k https │ avc1.4D4015 247k video only 240p, mp4_dash\n", " 604 mp4 426x240 30 │ ~ 1.23MiB 347k m3u8 │ vp09.00.20.08 347k video only\n", " 242 webm 426x240 30 │ 803.84KiB 228k https │ vp09.00.20.08 228k video only 240p, webm_dash\n", " 230 mp4 638x360 30 │ ~ 2.83MiB 800k m3u8 │ avc1.4D401E 800k video only\n", " 134 mp4 638x360 30 │ 2.18MiB 631k https │ avc1.4D401E 631k video only 360p, mp4_dash\n", " 18 mp4 638x360 30 2 │ 2.55MiB 735k https │ avc1.42001E mp4a.40.2 44k 360p\n", " 605 mp4 638x360 30 │ ~ 2.37MiB 668k m3u8 │ vp09.00.21.08 668k video only\n", " 243 webm 638x360 30 │ 1.47MiB 426k https │ vp09.00.21.08 426k video only 360p, webm_dash\n", " '''\n", "\n", " if (not os.path.exists(stream_uri)):\n", " print(f'downloading youtube file')\n", " # default is ffmpeg, can choose mp4 with -f\n", " !yt-dlp $video_url -vU -f mp4 -o $stream_uri\n", " '''\n", " [youtube] Extracting URL: https://www.youtube.com/watch?v=BRKLw_16Lac\n", " [youtube] BRKLw_16Lac: Downloading webpage\n", " [youtube] BRKLw_16Lac: Downloading ios player API JSON\n", " [youtube] BRKLw_16Lac: Downloading android player API JSON\n", " [youtube] BRKLw_16Lac: Downloading m3u8 information\n", " ERROR: [youtube] BRKLw_16Lac: Requested format is not available.\n", " Use --list-formats for a list of available formats\n", " ...\n", " [debug] Sort order given by extractor: quality, res, fps, hdr:12, source, vcodec:vp9.2, channels, acodec, lang, proto\n", " [debug] Formats sorted by: hasvid, ie_pref, quality, res, fps, hdr:12(7), source, vcodec:vp9.2(10), channels, acodec, lang, proto, size, br, asr, vext, aext, hasaud, id\n", " [info] BRKLw_16Lac: Downloading 1 format(s): 18\n", " [debug] Invoking http downloader on \"https://rr2---sn-npoldn7s.googlevideo.com/videoplayback?expire=1700180177&ei=cVxWZb78DIOZ3LUPq7WkIA&ip=34.124.190.55&id=o-ABYYepNgKahbwzodzQLY1cI5jo4X2FgV_7-a4v57a-2k&itag=18&source=youtube&requiressl=yes&xpc=EgVo2aDSNQ%3D%3D&mh=-v&mm=31%2C26&mn=sn-npoldn7s%2Csn-a5m7lnl6&ms=au%2Conr&mv=m&mvi=2&pl=23&initcwndbps=5193750&spc=UWF9f8kgIgBkhETrsm2hHAQAVuedUjg&vprv=1&svpuc=1&mime=video%2Fmp4&gir=yes&clen=2669045&ratebypass=yes&dur=29.048&lmt=1694608020841733&mt=1700157908&fvip=4&fexp=24007246&beids=24350018&c=ANDROID&txp=5318224&sparams=expire%2Cei%2Cip%2Cid%2Citag%2Csource%2Crequiressl%2Cxpc%2Cspc%2Cvprv%2Csvpuc%2Cmime%2Cgir%2Cclen%2Cratebypass%2Cdur%2Clmt&sig=ANLwegAwRgIhAMAqjIs-alzT8ZigFN1dr-txOKvx54pQ93O3f2h0X-d-AiEAquZ2bo8ZKdBYRtyUYOJlmRozRkF1WZPREMBT3gIDy5k%3D&lsparams=mh%2Cmm%2Cmn%2Cms%2Cmv%2Cmvi%2Cpl%2Cinitcwndbps&lsig=AM8Gb2swRQIgVn2fb2Xc4YPnMR9JpFyQ84cHuGtLFXT9AvwsOQcYgdICIQCWxhVd3Sc9iRaR4zkRYwr-dFiy0IOAXRBC9BpyoPB2Ng%3D%3D\"\n", " [download] Destination: /tmp/tmphwbs6im8/android_statues.mp4\n", " [download] 100% of 2.55MiB in 00:00:01 at 1.44MiB/s\n", " '''\n", " else:\n", " print(f'have already downloaded YouTube mp4 file')\n", "\n", " # the video is 35 sec long\n", " vid_cap = cv2.VideoCapture(stream_uri)\n", " if not vid_cap.isOpened():\n", " print(f'video capture is not opened yet. try now:')\n", " vid_cap.open(video_url)\n", " print(f'vid_cap.isOpened()={vid_cap.isOpened()}')\n", "\n", " #vid_cap.set(cv2.CV_CAP_PROP_FRAME_WIDTH, 640)\n", "\n", " fps = vid_cap.get(cv2.CAP_PROP_FPS)\n", " print(f'fps={fps}')\n", " print(f'CAP_PROP_FRAME_COUNT={vid_cap.get(cv2.CAP_PROP_FRAME_COUNT)}')\n", " num_frames = int(vid_cap.get(cv2.CAP_PROP_FRAME_COUNT))\n", " if (fps > 0):\n", " end_time = int(num_frames/fps)*1000\n", " else:\n", " end_time = 35*1000\n", " print(f'end_time={end_time}')\n", "\n", " pred_all = []\n", " scores_all = []\n", "\n", " start_time = 0\n", " frame_rate = 2000 #in millisec = 2 sec\n", " frame_count = 1\n", " for time in range(start_time, end_time, frame_rate):\n", " time = round(time, 2)\n", " filepath = os.path.join(data_dir, str(frame_count) + '.jpg')\n", " result = get_frame(time, frame_count, filepath)\n", " frame_count += 1\n", " if result is not None:\n", " img = load_image_into_numpy_array(filepath)\n", " input_tensor = tf.convert_to_tensor(np.expand_dims(img, axis=0), dtype=tf.float32)\n", " detections, prediction_dict = detect(input_tensor)\n", " dc = detections['detection_classes'][0].numpy().astype(np.uint32)\n", " ds = detections['detection_scores'][0].numpy()\n", " plot_detections(\n", " img,\n", " detections['detection_boxes'][0].numpy(),\n", " dc, ds,\n", " category_index, figsize=(15, 20), thresh=THRESH\n", " #, image_name=filepath\n", " )\n", " plt.show()\n", " # useful for making confusion matrix, prec, recall, mAP, etc:\n", " p = []\n", " s = []\n", " for ii in range(len(ds)):\n", " if ds[ii] >= THRESH:\n", " p.append(dc[ii])\n", " s.append(ds[ii])\n", " print(f\"pred_{frame_count-1}={p}\")\n", " print(f\"score_{frame_count-1}=\",[\"{0:0.2f}\".format(ii) for ii in s])\n", " pred_all.extend(p)\n", " scores_all.extend(s)\n", " else:\n", " print(f'get_frame failed')\n", "\n", " vid_cap.release()\n", " cv2.destroyAllWindows()\n", "\n", " print(f'the id to class mapping={class_mapping}')\n", " print(f\"pred_all= {pred_all}\")\n", " print(\"scores_all=\",[\"{0:0.2f}\".format(ii) for ii in scores_all])\n", "\n", "plt.show() #plt.show() should be called only once per python session\n", "print('done')" ] }, { "cell_type": "markdown", "metadata": { "id": "AYePKT31_7ms" }, "source": [ "# **Misc Notes**\n", "\n", "---\n", "Variations tried for android lawn statues:\n", "* (1) For a train set being android statues 01, 02, and 04, and a test set being android statues 03 and 05:\n", "There were 100 detections for each of the test images.\n", "The train set includes classes 0 through 5, inclusive, but not 6 or 7. The detector did not report any classes as 6 or 7, as expected because those were not in the train set.\n", "> android statues 03 test had 3 scores above 0.5. Of those with scores >= 0.5, only the bounding box for detection[1] matched the expected bounding box and classification. To look at details further, I used an unbalanced weighted bipartite matching for the bounding boxes where the cost was 1-iou between 2 boxes. IOU is intersection over union a.k.a. Jaccard index. The matches by IOU showed matched_idxs={1: 1, 5: 14, 0: 21, 2: 22, 4: 45, 3: 66} which correspond to classes 3:3, 7:5, 1:1, ... The 3rd match is correct too, but has score 0.117.\n", "> android statues 05 test. Looks like non-max supression failed for the first 2 detection\n", "bounding boxes. predicted 1st class correctly, but with wrong bounding box.\n", "> * If I truly wanted to keep the training set this small, then a way to improve the detections might be to\n", "reduce the number of classes as that would improve the scores for the expected detections. e.g. Could reduce the classes to gingerbread man only, as the duckies example does. (In progress...)\n", "> Increasing the training set number of images and using a video to test, that is, put all my old train and test images into train, and then use a short video from YouTube to test. (done, can see this now when dataname == \"statues\")\n", "\n", "* (2) For a train set being android statues 01, 02, 03, 04, and 05, and a test set being several frames from a short YouTube video of the android statues:\n", "> The detector sometimes identified non-statue objects in the image as statues, such as a truck or a bench.\n", "> * We are currently using transfer learning as a pre-trained model built from ssd_resnet50_v1_fpn_640x640_coco17_tpu-8.config.\n", "We use the feature extractor and bounding box detector from the pretrained model and re-train for classification (which uses feature extractor).\n", "To identify the non-android statue objects we could use the object detector without retraining (it identifies COCO class objects). We note those\n", "identifications and scores.\n", "And then identify android statues with our re-trained model. The identified android statue bounding boxes can be compared to the identified COCO bounding boxes and one chosen over the other.\n", "(note: COCO has 80 classes (640x480).\n", "PascalVOC has 20 classes ().\n", "imagenet1K has 1000 classes (~470x~390).\n", "open-image has 600 classes.)\n", "> * I added train image(s) for the android donut, lolli-pop and jelly-bean to improve the training.\n", "\n", "* (3) For a train set being android statues 01, 02, 03, 04, 05, 06, and 07 and a test set being several frames from a short YouTube video of the android statues:\n", "> precision, recall, f1 score and accuracy were calculated as a function of score threshold and a threshold of 0.5 was found to result in a macro averaged precision, recall, f1 score and accuracy > 0.72. For that reason, THRESH above has been set to 0.5 now.\n", "Here are the statistics derived from THRESH 0.2 to 1.0, which can also be found [here](https://github.com/nking/curvature-scale-space-corners-and-transformations/raw/master/doc/statues_transfer_learning_object_detection.pdf)\n", "```\n", "class id class name #of images w/ class in it\n", "0 cupcake : 4\n", "1 euclair: 4\n", "2 icecream: 3\n", "3 gingerbread_man: 5\n", "4 icecream_sandwich: 3\n", "5 honeycomb: 2\n", "6 kitkat: 2\n", "7 jellybean: 2\n", "8 donut: 2\n", "```\n", "```\n", "14 test frames were taken from a YouTube video.\n", "```\n", "ground_truth_labels_all=[3, 9, 2, 2, 3, 9, 5, 2, 2, 2, 2, 2, 3, 2, 8, 8, 2, 8, 8, 3, 8, 9, 9, 3, 6, 3, 5, 8, 9, 8, 0, 0, 9, 4, 9, 9, 9, 9, 8, 0, 0, 4, 0, 4, 4, 9, 9, 0, 9, 0, 9, 0, 3, 0, 3, 7, 1, 6, 9, 1, 9, 6]\n", "```\n", "predicted_labels_all= [3, 7, 0, 0, 3, 7, 5, 1, 0, 2, 0, 2, 3, 1, 8, 8, 2, 1, 8, 3, 1, 4, 7, 5, 4, 3, 5, 8, 7, 1, 0, 0, 0, 7, 0, 0, 7, 0, 0, 7, 7, 7, 0, 8, 7, 5, 0, 0, 7, 0, 4, 7, 3, 7, 8, 2, 1, 7, 1, 1, 7, 6]\n", "```\n", "scores_all= ['0.29', '0.25', '0.55', '0.45', '0.39', '0.33', '0.32', '0.32', '0.30', '0.29', ‘0.25', '0.25', '0.53', '0.38', '0.29', '0.62', '0.32', '0.31', '0.66', '0.45', '0.43', '0.29', '0.28', '0.27', '0.32', '0.63', '0.47', '0.44', '0.42', '0.32', '0.29', '0.28', '0.26', '0.52', '0.48', '0.45', '0.37', '0.33', '0.30', '0.28', '0.28', '0.44', '0.34', '0.39', '0.36', '0.35', '0.33', '0.31', '0.31', '0.28', '0.27', '0.27', '0.61', '0.48', '0.35', '0.28', '0.35', '0.28', '0.50', '0.47', '0.38', '0.35']\n", "```\n", "\n", "![statues.png]()\n", "---" ] }, { "cell_type": "markdown", "source": [ "\n", "# *In Conclusion* next steps could be:\n", "* add more training data\n", "* try other pre-trained models followed by training for classification of android statues.\n", "* try other variants of transfer-learning with object detection models.\n", "* for the test frames from the video, could consider applying HMM inference to the best scoring identifications for “smoothing” inference where missing data. can use “filtering” queries for the next observations… can use particle filtering for similar reasons. **can build a better training set by adding the inferred missing identifications (bounding boxes and new ground truth labels) to the existing training dataset and then get a new test dataset.** consider whether using optical flow for a small number of frames after good identifications would be a fast addition to the boot-strapping of bounding boxes between frames. consider multi-object tracking models." ], "metadata": { "id": "v7LxXMtKWfCb" } } ], "metadata": { "accelerator": "TPU", "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "name": "python3" } }, "nbformat": 4, "nbformat_minor": 0 }