{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 2: Adding Instance Segmentation to a FiftyOne Dataset\n", "\n", "We will explore how to enrich your dataset by adding **instance segmentation predictions**.\n", "\n", "In this notebook, we’ll cover:\n", "- Using the FiftyOne Model Zoo to apply instance segmentation\n", "- Integrating predictions from a custom model (e.g., a model deployed via Intel Geti)\n", "\n", "---\n", "\n", "## Using a Instance Segmentation Dataset \n", "\n", "For education purposes, use this link in Drive for downloading an upgraded dataset with 100+ annotated unique images.\n", "\n", "Download the dataset with this [Link](https://cdn.voxel51.com/dataset/colombian_coffee-dataset_1600.zip) and unzip in your work folder.\n", "\n", "Let’s kick things off by loading the **colombian_coffee-dataset_1600**: (This is a new dataset, different from the one used in the last notebook.)\n" ] }, { "cell_type": "code", "execution_count": null, "id": "cc022a9c", "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "from fiftyone.utils.coco import COCODetectionDatasetImporter\n", "\n", "dataset = fo.Dataset.from_dir(\n", " dataset_type=fo.types.COCODetectionDataset,\n", " dataset_dir=\"./colombian_coffee-dataset_1600\",\n", " data_path=\"images/default\",\n", " labels_path=\"annotations/instances_default.json\",\n", " label_types=\"segmentations\",\n", " label_field=\"ground_truth\",\n", " name=\"coffe_1600\",\n", " include_id=True,\n", " overwrite=True\n", ")\n", "\n", "view = dataset.shuffle()\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": { "vscode": { "languageId": "bat" } }, "source": [ "----\n", "## Loading predictions using SAM2\n", "\n", "With FiftyOne, you have tons of pretrained models at your disposal to use via the [FiftyOne Model Zoo](https://docs.voxel51.com/model_zoo/index.html) or using one of our [integrations](https://docs.voxel51.com/integrations/index.html) such as [HuggingFace](https://docs.voxel51.com/integrations/huggingface.html)! To get started using them, first load the model in and pass it into the apply_model function. \n", "\n", "\n", "Install SAM2 following the instuctions from this [Repo](https://github.com/facebookresearch/sam2). You can also jump to the next step of this tutorials to understand how SAM2 works with FiftyOne" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!python -m pip install \"sam2\" " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install 'git+https://github.com/facebookresearch/sam2.git'" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you encounter any issues, please refer to the main SAM2 repository to verify the installation process [Repo](https://github.com/facebookresearch/sam2). \n", "\n", "Now apply Segment Anything [SAM2](https://voxel51.com/blog/sam-2-is-now-available-in-fiftyone/) from the FiftyOne Model Zoo. As you can see, some images in the dataset include ground truth annotations, but not all of them. With SAM2, we will apply segmentation across the entire dataset. (This could take around 1.5 hours)" ] }, { "cell_type": "code", "execution_count": null, "id": "16486845", "metadata": {}, "outputs": [], "source": [ "import fiftyone.zoo as foz\n", "model = foz.load_zoo_model(\"segment-anything-2-hiera-tiny-image-torch\")\n", "\n", "# Prompt with boxes\n", "dataset.apply_model(\n", " model,\n", " label_field=\"sam2_predictions\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Alternatively, you can apply SAM only to the images that already have ground truth segmentations." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "dataset.apply_model(\n", " model,\n", " label_field=\"sam2_predictions\",\n", " prompt_field=\"ground_truth_segmentations\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![sam2](https://cdn.voxel51.com/getting_started_segmentation/notebook2/sam2.webp)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will execute SAM only for images in the segmentation category." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading predictions using a custom model (Intel Geti Example)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let’s now simulate the pipeline with a custom instance segmentation model. If you want to run the inference using the same example, please refer tho this example for your reference.\n", "\n", "Assuming you’ve already set up inference with a model (e.g., via OpenVINO + Intel Geti SDK):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install geti-sdk==2.10.*" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Preparing the models for inference\n", "\n", "The Intel Geti SDK will be used to run inference with Intel Geti Models. The deployment folder of the best model must be downloaded and unzipped in the same folder as the project.\n", "\n", "Download and unzip the [model](https://cdn.voxel51.com/model/geti_sdk-deployment_90.zip) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Generating instance segmentation masks from polygons and bounding boxes\n", "\n", "This function extracts instance segmentation masks from polygon annotations, combining **detection (bounding boxes)** and **segmentation (masks)** in the same instance using `fo.Detection`.\n", "\n", "1. **Load Image** – Reads and converts the image to RGB. \n", "2. **Process Annotations** – Extracts polygon points, computes bounding boxes, and normalizes coordinates. \n", "3. **Generate Masks** – Creates, crops, and resizes binary masks for each annotation. \n", "4. **Save & Return** – Stores masks as temp files and returns `fo.Detection` objects, ensuring the bounding box and mask belong to the same instance. \n", "\n", "This enables accurate visualization and analysis in FiftyOne, preserving both object localization and shape details.\n", "\n", "Useful for visualizing or processing segmentation data in FiftyOne." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import cv2\n", "import fiftyone as fo\n", "from PIL import Image as PILImage\n", "from tempfile import NamedTemporaryFile\n", "from geti_sdk.deployment import Deployment\n", "from geti_sdk.data_models.shapes import Polygon\n", "\n", "def generate_mask_from_polygon_and_bboxes(sample, prediction):\n", " image = cv2.imread(sample.filepath)\n", " image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\n", " img_height, img_width = image.shape[:2]\n", " print(f\"Image size: {img_width}x{img_height}\")\n", " detections = []\n", " for annotation in prediction.annotations:\n", " if isinstance(annotation.shape, Polygon):\n", " polygon_points = [(point.x, point.y) for point in annotation.shape.points]\n", " polygon_points = np.array(polygon_points, dtype=np.int32)\n", " label = annotation.labels[0].name\n", " confidence = annotation.labels[0].probability\n", " x, y, w, h = cv2.boundingRect(polygon_points)\n", " scaled_x = x / img_width\n", " scaled_y = y / img_height\n", " scaled_w = w / img_width\n", " scaled_h = h / img_height\n", " bounding_box = [scaled_x, scaled_y, scaled_w, scaled_h]\n", " mask = np.zeros((img_height, img_width), dtype=np.uint8)\n", " cv2.fillPoly(mask, [polygon_points], 255)\n", " cropped_mask = mask[y:y + h, x:x + w]\n", " mask_resized = cv2.resize(cropped_mask, (w, h), interpolation=cv2.INTER_NEAREST)\n", " print(f\"Mask size: {mask_resized.shape} (expected: {h}x{w})\")\n", " with NamedTemporaryFile(delete=False, suffix='.png') as temp_mask_file:\n", " mask_path = temp_mask_file.name\n", " cv2.imwrite(mask_path, mask_resized)\n", " detection = fo.Detection(\n", " label=label,\n", " confidence=confidence,\n", " bounding_box=bounding_box,\n", " mask_path=mask_path\n", " )\n", " detections.append(detection)\n", " return detections" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For education purposes check what is happening in the first or last sample. Then you can apply this to the whole dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from openvino.runtime import Core\n", "\n", "ie = Core()\n", "devices = ie.available_devices\n", "\n", "for device in devices:\n", " device_name = ie.get_property(device, \"FULL_DEVICE_NAME\")\n", " print(f\"{device}: {device_name}\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Update the folder path to match the location where the model was downloaded and unzipped\n", "deployment_inference = Deployment.from_folder(\"geti_sdk-deployment_90\")\n", "deployment_inference.load_inference_models(device=\"CPU\")" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Test on one image\n", "sample = dataset.first()\n", "image_path = sample.filepath\n", "image_data = PILImage.open(image_path)\n", "image_data = np.array(image_data)\n", "prediction = deployment_inference.infer(image_data) \n", "detections = generate_mask_from_polygon_and_bboxes(sample, prediction)\n", "sample['predicted_segmentations_test'] = fo.Detections(detections=detections)\n", "sample.save()\n", "dataset.reload()\n", "print(dataset)\n", "print(sample)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "📝 Tip: Replace ```prediction.objects``` with your real output structure and masks." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Run the prediction in the whole dataset\n", "\n", "This loop processes each sample in the dataset by loading the image, running inference using Geti SDK, and generating instance segmentation masks. The function extracts detections with both bounding boxes and masks, ensuring they belong to the same instance. These predictions are then stored in the sample under `\"predictions_model\"` using `fo.Detections`. Finally, the dataset is reloaded to reflect the updates." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Iterate over the samples in the dataset\n", "for sample in dataset:\n", " # Load the image as a NumPy array using PIL or OpenCV\n", " image_path = sample.filepath # Path to the image file\n", " image_data = PILImage.open(image_path)\n", " image_data = np.array(image_data) # Convert the image to NumPy array\n", "\n", " # Run inference on the sample (using Geti SDK's inference)\n", " prediction = deployment_inference.infer(image_data)\n", "\n", " # Generate the segmentation mask and detections using the annotations from the prediction\n", " detections = generate_mask_from_polygon_and_bboxes(sample, prediction)\n", "\n", " # Add the detections as predicted segmentations\n", " sample[\"predictions_geti_sdk\"] = fo.Detections(detections=detections) \n", "\n", " # Save the updated sample\n", " sample.save()\n", "\n", "# Reload the dataset to reflect the changes\n", "dataset.reload()" ] }, { "cell_type": "markdown", "id": "bf19b2f0", "metadata": {}, "source": [ "## Compare Predictions in FiftyOne App\n", "Toggle between `ground_truth_segmentations`, `sam2_predictions`, and `predictions_geti_sdk` in the App to explore and compare different segmentations side-by-side!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![compare_prediction](https://cdn.voxel51.com/getting_started_segmentation/notebook2/compare_prediction.webp)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [] } ], "metadata": { "kernelspec": { "display_name": "OSS310", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.16" } }, "nbformat": 4, "nbformat_minor": 2 }