{ "cells": [ { "cell_type": "markdown", "id": "2bd24310", "metadata": {}, "source": [ "\n", "\n", " \n", " \n", " \n", " \n", "
\n", " \n", " \n", " Try in Google Colab\n", " \n", " \n", " \n", " \n", " Share via nbviewer\n", " \n", " \n", " \n", " \n", " View on GitHub\n", " \n", " \n", " \n", " \n", " Download notebook\n", " \n", "
\n" ] }, { "cell_type": "markdown", "id": "3NPxYbvIG5wc", "metadata": { "id": "3NPxYbvIG5wc" }, "source": [ "## Exploring Football Player Segmentation dataset and using Segment Anything Model for prediction of segmentations\n", "In this notebook, we will be exploring the [football player segmentation](https://www.kaggle.com/datasets/ihelon/football-player-segmentation\n", ") dataset. The notebook goes through steps of loading the dataset, filtering and using [FiftyOne Brain](https://docs.voxel51.com/user_guide/brain.html) similarity method to find images of different scenarios on during a football game. In the final step, [Segment Anything Model](https://github.com/facebookresearch/segment-anything) is used for predicting segmentations on a subset of dataset and evaluate the predictions against ground_truth" ] }, { "cell_type": "markdown", "id": "ZrgoW-DxMqDH", "metadata": { "id": "ZrgoW-DxMqDH" }, "source": [ "Run the below code cell to get the required python libraries and restart the notebook" ] }, { "cell_type": "code", "execution_count": null, "id": "K9_emTVWG8X9", "metadata": { "id": "K9_emTVWG8X9" }, "outputs": [], "source": [ "%%shell\n", "\n", "pip install fiftyone pycocotools umap-learn kaggle torchvision wget opencv-python shapely\n", "pip install git+https://github.com/facebookresearch/segment-anything.git" ] }, { "cell_type": "markdown", "id": "I3Zc1aYmNqtJ", "metadata": { "id": "I3Zc1aYmNqtJ" }, "source": [ "## Import" ] }, { "cell_type": "code", "execution_count": 2, "id": "14586f36-763d-4424-a122-59e8be8ae20f", "metadata": { "id": "c45c0c95-3fd5-4e6c-a360-701488fe10d3" }, "outputs": [], "source": [ "import fiftyone as fo\n", "import fiftyone.brain as fob\n", "from fiftyone import ViewField as F\n", "import os\n", "import cv2\n", "import wget\n", "import matplotlib.pyplot as plt\n", "from zipfile import ZipFile\n", "import torch\n", "import torchvision\n", "import numpy as np\n", "from segment_anything import SamPredictor, sam_model_registry, SamAutomaticMaskGenerator\n", "import PIL" ] }, { "cell_type": "markdown", "id": "741b97e1-a287-4195-ad93-71692de32d31", "metadata": {}, "source": [ "## Get current working directory" ] }, { "cell_type": "code", "execution_count": 3, "id": "92b889d9-28dc-4fd7-aba6-34c349b9df61", "metadata": {}, "outputs": [], "source": [ "# Get the current working directory\n", "cwd = os.path.abspath(os.getcwd())" ] }, { "cell_type": "markdown", "id": "9iJCP4pVGMQu", "metadata": { "id": "9iJCP4pVGMQu" }, "source": [ "## Download and Extract Dataset from Kaggle\n", "\n", "1. If you are not a Kaggle user, you will first need to create a Kaggle account. After creation of the account, go to your Kaggle Account page and scroll down to API section.\n", "2. Click on Create New API Token. A new API token in the form of kaggle.json file will be created which you can save locally. The kaggle.json file contains your Kaggle username and key.\n", "3. Download the kaggle.json file to your current working directory" ] }, { "cell_type": "code", "execution_count": null, "id": "h2LhUGjbLiwI", "metadata": { "id": "h2LhUGjbLiwI" }, "outputs": [], "source": [ "os.environ['KAGGLE_CONFIG_DIR']=cwd\n", "\n", "# Download the dataset\n", "!kaggle datasets download -d ihelon/football-player-segmentation\n", "\n", "# Extract the dataset to the current working directory\n", "!unzip football-player-segmentation.zip" ] }, { "cell_type": "markdown", "id": "Xn35TeWHNuMT", "metadata": { "id": "Xn35TeWHNuMT" }, "source": [ "## Load the dataset\n", "The football player segmentation dataset is already formatted to COCO dataset format, therefore we can import the dataset using fo.types.COCODetectionDataset" ] }, { "cell_type": "code", "execution_count": 4, "id": "5d1a163c-5165-4850-81f0-f85f74c2dcf8", "metadata": { "id": "5d1a163c-5165-4850-81f0-f85f74c2dcf8" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |█████████████████| 512/512 [29.4s elapsed, 0s remaining, 21.7 samples/s] \n" ] } ], "source": [ "# The directory containing the source images\n", "data_path = \"./images\"\n", "\n", "# The path to the COCO labels JSON file\n", "labels_path = \"./annotations/instances_default.json\"\n", "\n", "# name of dataset\n", "name = \"football-player-segmentation\"\n", "\n", "# Import the dataset\n", "dataset = fo.Dataset.from_dir(\n", " dataset_type=fo.types.COCODetectionDataset,\n", " data_path=data_path,\n", " labels_path=labels_path,\n", " name=name\n", ")\n", "dataset.compute_metadata()" ] }, { "cell_type": "markdown", "id": "znVo4w4PcECR", "metadata": { "id": "znVo4w4PcECR" }, "source": [ "## Add Embeddings\n", "We are going to use [FiftyOne Brain's embedding similarity capabaility](https://docs.voxel51.com/user_guide/brain.html#brain-similarity) to visualize some scenarios in a football game." ] }, { "cell_type": "code", "execution_count": null, "id": "w22VrI4abph8", "metadata": { "id": "w22VrI4abph8" }, "outputs": [], "source": [ "fob.compute_visualization(\n", " dataset,\n", " model=\"clip-vit-base32-torch\",\n", " brain_key=\"img_sim\",\n", ")" ] }, { "cell_type": "code", "execution_count": 6, "id": "4bbdab42-7946-49bf-a49d-19eaf8859ef0", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "4bbdab42-7946-49bf-a49d-19eaf8859ef0", "outputId": "83f058fa-1e77-4d15-d7f8-5b8376c2562a" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session = fo.launch_app(dataset=dataset)" ] }, { "cell_type": "code", "execution_count": 7, "id": "26b1c475-2756-4409-8bc2-335f3edd6431", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "Jhp16dUYdryg", "metadata": { "id": "Jhp16dUYdryg" }, "source": [ "## Filter by detection field" ] }, { "cell_type": "code", "execution_count": 11, "id": "qMDznTKwdyjy", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 804 }, "id": "qMDznTKwdyjy", "outputId": "a8043090-16a8-403e-b643-45b4642cec60" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "detection_view = dataset.select_fields('detections')\n", "session.view=detection_view" ] }, { "cell_type": "code", "execution_count": 12, "id": "d99c0d55-1559-47f8-a4b8-beb47acc7edc", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "KFyXirCqhZI_", "metadata": { "id": "KFyXirCqhZI_" }, "source": [ "## Filter by segmentations field" ] }, { "cell_type": "code", "execution_count": 13, "id": "w8ycrAZGgaMu", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 804 }, "id": "w8ycrAZGgaMu", "outputId": "2de494d4-67c6-4362-9f45-9c2914335c28" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "segmentation_view = dataset.select_fields('segmentations')\n", "session.view=segmentation_view" ] }, { "cell_type": "code", "execution_count": 14, "id": "d942a096-1f18-44ce-8544-30dc4b2a204e", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "HgLiCTkMka5-", "metadata": { "id": "HgLiCTkMka5-" }, "source": [ "## Filtering by id\n", "In the following case, filtering different persons detected in an image. We have filtered the referee in the below view" ] }, { "cell_type": "code", "execution_count": 17, "id": "wCUjxbxbkG6g", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 804 }, "id": "wCUjxbxbkG6g", "outputId": "102f3acf-40ba-49b8-d10f-5964b300c677" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view=None" ] }, { "cell_type": "code", "execution_count": 18, "id": "9b7fc721-a8fa-4395-9d0e-88aee73e6985", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "auanyADOkwkW", "metadata": { "id": "auanyADOkwkW" }, "source": [ "## Embeddings - Similarity\n", "\n", "Let's check out some different scenarios by selecting different clusters in the embeddings" ] }, { "cell_type": "code", "execution_count": null, "id": "LDUl0AfvkhjZ", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 804 }, "id": "LDUl0AfvkhjZ", "outputId": "79ced08a-8729-4a13-c855-f5b4d858524f" }, "outputs": [], "source": [ "session.view=None" ] }, { "cell_type": "markdown", "id": "1333dd85-cc28-4da5-8fcf-e9942daa7904", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "TyPsWaaFV66O", "metadata": { "id": "TyPsWaaFV66O" }, "source": [ "The above cluster with lasso selection shows 13 samples of what looks like positions of footballers during a corner kick at a certain side. This set of similar images helps to track the positions of the players before and after the kick and similar clusters can be used to analyse the player tracking for other corner kicks taken on same and opposite sides during the game." ] }, { "cell_type": "code", "execution_count": null, "id": "44eb9a9b-cf23-4a59-820c-f91824c12a50", "metadata": {}, "outputs": [], "source": [ "session.view = None" ] }, { "cell_type": "markdown", "id": "0354f465-185b-4b88-9b50-c8469479e7d1", "metadata": {}, "source": [ "" ] }, { "cell_type": "markdown", "id": "d370dbc7-c9b0-4863-bf26-17224d0d6d44", "metadata": {}, "source": [ "In the second cluster lasso selection, set of similar images show the player positions during a throw-in" ] }, { "cell_type": "markdown", "id": "6bfa0eda-a522-4f95-93b7-5fad6b32eff1", "metadata": {}, "source": [ "## Let's add segmentations predictions to subset of dataset with the help of SAM and evaluate them against ground_truths" ] }, { "cell_type": "markdown", "id": "b0e809a1-2ff1-424a-9bac-6b0bdf36d32d", "metadata": {}, "source": [ "### Download the Segment Anything Model" ] }, { "cell_type": "code", "execution_count": 7, "id": "45545e8f-4bfa-42a8-873b-c10c1073d87c", "metadata": {}, "outputs": [], "source": [ "# Create a new directory named 'model'\n", "os.mkdir(cwd+'/model')\n", "\n", "checkpoint = \"sam_vit_b_01ec64.pth\"\n", "model_url = \"https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth\"\n", "model_type = \"default\"\n", "\n", "# Set the path to the checkpoint\n", "checkpoint_path = cwd+'/model/sam_vit_h_4b8939.pth'\n", "\n", "# Download the files to their respective paths\n", "wget.download(model_url, out = checkpoint_path)" ] }, { "cell_type": "markdown", "id": "796cb183-c51e-4e78-9a03-27875939071e", "metadata": {}, "source": [ "### Create predictions view from a subset of dataset" ] }, { "cell_type": "code", "execution_count": 14, "id": "97f86913-2db0-46ba-9128-d83303661815", "metadata": {}, "outputs": [], "source": [ "predictions_view = dataset.take(30)" ] }, { "cell_type": "code", "execution_count": 15, "id": "7660b3d2-04b4-4e26-a39b-ef90fc6a5963", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view=predictions_view" ] }, { "cell_type": "code", "execution_count": 16, "id": "00e375ea-061d-4b31-a599-a0ae9a394703", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "4ac66a16-370e-47dd-9445-5199daaaf05f", "metadata": {}, "source": [ "### Load SAM model and predictor" ] }, { "cell_type": "code", "execution_count": 12, "id": "ede2642e-de02-4741-8e5a-7dc686b18597", "metadata": {}, "outputs": [], "source": [ "# Set path to the checkpoint running on CPU\n", "sam = sam_model_registry[model_type](checkpoint=checkpoint_path)\n", "device=torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n", "sam.to(device)\n", "# Instantiate SAM predictor model\n", "predictor = SamPredictor(sam)" ] }, { "cell_type": "markdown", "id": "468baa36-e8a3-4809-b0e8-6fa464b8fe36", "metadata": {}, "source": [ "Since we have the bounding box available from detection, we can use the bounding boxes to generate segmentation masks. Detailed explanation on the code and instance segmentation with SAM can be found in this [article](https://towardsdatascience.com/see-what-you-sam-4eea9ad9a5de#37d5) written by Jacob Marks" ] }, { "cell_type": "code", "execution_count": 13, "id": "ac91abb5-b0cf-4596-b841-4c3ebfead9ca", "metadata": {}, "outputs": [], "source": [ "# Converts from fiftyone relative coordinates to absolute\n", "def fo_to_sam(box, img_width, img_height):\n", " new_box = np.copy(np.array(box))\n", " new_box[0] *= img_width\n", " new_box[2] *= img_width\n", " new_box[1] *= img_height\n", " new_box[3] *= img_height\n", " new_box[2] += new_box[0]\n", " new_box[3] += new_box[1]\n", " return np.round(new_box).astype(int)\n", "\n", "def add_SAM_mask_to_detection(detection, mask, img_width, img_height):\n", " y0, x0, y1, x1 = fo_to_sam(detection.bounding_box, img_width, img_height) \n", " mask_trimmed = mask[x0:x1+1, y0:y1+1]\n", " detection[\"mask\"] = np.array(mask_trimmed)\n", " return detection\n", "\n", "def add_SAM_instance_segmentation(sample):\n", " w, h = sample.metadata.width, sample.metadata.height\n", " image = np.array(PIL.Image.open(sample.filepath))\n", " # process the image to produce an image embedding\n", " predictor.set_image(image)\n", " \n", " if sample.detections is None:\n", " return\n", "\n", " dets = sample.detections.detections\n", " boxes = [d.bounding_box for d in dets]\n", " sam_boxes = np.array([fo_to_sam(box, w, h) for box in boxes])\n", " \n", " input_boxes = torch.tensor(sam_boxes, device=predictor.device)\n", " transformed_boxes = predictor.transform.apply_boxes_torch(input_boxes, image.shape[:2])\n", " \n", " masks, _, _ = predictor.predict_torch(\n", " point_coords=None,\n", " point_labels=None,\n", " boxes=transformed_boxes,\n", " multimask_output=False,\n", " )\n", " \n", " new_dets = []\n", " for i, det in enumerate(dets):\n", " mask = masks[i, 0]\n", " new_dets.append(add_SAM_mask_to_detection(det, mask, w, h))\n", "\n", " sample['predictions'] = fo.Detections(detections = new_dets)\n", " sample.save() \n", "\n", "def add_SAM_instance_segmentations(dataset):\n", " for sample in dataset.iter_samples(autosave=True, progress=True):\n", " add_SAM_instance_segmentation(sample) " ] }, { "cell_type": "code", "execution_count": 14, "id": "b44c0440-5f9b-46ef-af92-9f38479bc4a2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " 100% |███████████████████| 30/30 [15.7m elapsed, 0s remaining, 0.0 samples/s] \n" ] } ], "source": [ "add_SAM_instance_segmentations(predictions_view) " ] }, { "cell_type": "code", "execution_count": 15, "id": "uYtsB6SX5qUP", "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 804 }, "id": "uYtsB6SX5qUP", "outputId": "52aa6c05-ee00-43fb-da45-a584d26572f9" }, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "session.view = predictions_view" ] }, { "cell_type": "code", "execution_count": 16, "id": "af6dfe2f-b753-4f83-a7c1-596c62a0299b", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "ca885d35-331f-4cc1-ae46-d5a48977e7ae", "metadata": {}, "source": [ "### Evaluate predictions" ] }, { "cell_type": "code", "execution_count": 24, "id": "fbe526d0-5f6a-4da7-ab7f-b7c66e379731", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |███████████████████| 30/30 [8.3s elapsed, 0s remaining, 3.9 samples/s] \n" ] } ], "source": [ "results = predictions_view.evaluate_detections(\n", " \"predictions\", gt_field=\"segmentations\", eval_key=\"eval\", use_masks = True\n", ")" ] }, { "cell_type": "code", "execution_count": 25, "id": "6a4dc852-5875-4042-aa28-a0299989670a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: football-player-segmentation\n", "Media type: image\n", "Num patches: 438\n", "Patch fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " sample_id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.ImageMetadata)\n", " segmentations: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " crowd: fiftyone.core.fields.BooleanField\n", " type: fiftyone.core.fields.StringField\n", " iou: fiftyone.core.fields.FloatField\n", "View stages:\n", " 1. Take(size=30, seed=None)\n", " 2. ToEvaluationPatches(eval_key='eval', config=None)\n" ] } ], "source": [ "# Convert to evaluation patches\n", "eval_patches = predictions_view.to_evaluation_patches(\"eval\")\n", "print(eval_patches)" ] }, { "cell_type": "code", "execution_count": 26, "id": "ad987579-a59e-4c1a-81bb-f99b56107041", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'fn': 9, 'fp': 9, 'tp': 420}\n" ] } ], "source": [ "print(eval_patches.count_values(\"type\"))" ] }, { "cell_type": "code", "execution_count": 23, "id": "d7401af5-130c-47d0-a846-3ec199696c4e", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "
\n", "
\n", " \n", "
\n", " \n", "
\n", "\n", "" ], "text/plain": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# View patches in the App\n", "session.view = eval_patches" ] }, { "cell_type": "code", "execution_count": 27, "id": "7e719053-b967-4eba-be46-e5056c71e6b7", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "code", "execution_count": null, "id": "dd89f628-e8cc-499d-94b4-f1bdf6d7828a", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "colab": { "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }