{ "cells": [ { "cell_type": "markdown", "id": "d7e55397-6322-4afa-9a01-bf1b75bb88d0", "metadata": {}, "source": [ "# Fine-tune YOLOv8 models for custom use cases with the help of FiftyOne" ] }, { "cell_type": "markdown", "id": "14c4148b-c792-4165-b87d-c8120196fa02", "metadata": {}, "source": [ "Since its [initial release back in 2015](https://arxiv.org/abs/1506.02640), the You Only Look Once (YOLO) family of computer vision models has been one of the most popular in the field. In late 2022, [Ultralytics](https://github.com/ultralytics/ultralytics) announced [YOLOv8](https://docs.ultralytics.com/#ultralytics-yolov8), which comes with a new [backbone](https://arxiv.org/abs/2206.08016#:~:text=Many%20networks%20have%20been%20proposed,before%20and%20demonstrates%20its%20effectiveness.).\n", "\n", "The basic YOLOv8 detection and segmentation models, however, are general purpose, which means for custom use cases they may not be suitable out of the box. With FiftyOne, we can visualize and evaluate YOLOv8 model predictions, and better understand where the model's predictive power breaks down.\n", "\n", "In this walkthrough, we will show you how to load YOLOv8 model predictions into FiftyOne, and use insights from model evaluation to fine-tune a YOLOv8 model for your custom use case.\n", "\n", "Specifically, this walkthrough covers:\n", "\n", "* Loading YOLOv8 model predictions into FiftyOne\n", "* Evaluating YOLOv8 model predictions\n", "* Curating a dataset for fine-tuning\n", "* Fine-tuning YOLOv8 models\n", "* Comparing the performance of out-of-the-box and fine-tuned YOLOv8 models.\n", "\n", "**So, what's the takeaway?**\n", "\n", "FiftyOne can help you to achieve better performance using YOLOv8 models on real-time inference tasks for custom use cases." ] }, { "cell_type": "markdown", "id": "a80e01a8-93c0-4e05-90f4-43bb40b3ce55", "metadata": { "tags": [] }, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "3a017676-09ef-4034-b1b2-e41f7c5e6c02", "metadata": {}, "source": [ "To get started, you need to install [FiftyOne](https://docs.voxel51.com/getting_started/install.html) and [Ultralytics](https://github.com/ultralytics/ultralytics):" ] }, { "cell_type": "code", "execution_count": null, "id": "6459573e-1cd7-4418-9b0f-1bf82f4c0746", "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone ultralytics" ] }, { "cell_type": "code", "execution_count": null, "id": "955e9f90-ef26-4a91-a119-06afb76f71c2", "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "from fiftyone import ViewField as F" ] }, { "cell_type": "code", "execution_count": null, "id": "784f2d2c", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import os\n", "from tqdm import tqdm" ] }, { "cell_type": "markdown", "id": "7d7f3799-7921-418b-a74a-dd6c006b5901", "metadata": {}, "source": [ "We will import the YOLO object from Ultralytics and use this to instantiate pretrained detection and segmentation models in Python. Along with the YOLOv8 architecture, Ultralytics released a set of pretrained models, with different sizes, for classification, detection, and segmentation tasks.\n", "\n", "For the purposes of illustration, we will use the smallest version, YOLOv8 Nano (YOLOv8n), but the same syntax will work for any of the pretrained models on the [Ultralytics YOLOv8 GitHub repo](https://github.com/ultralytics/ultralytics)." ] }, { "cell_type": "code", "execution_count": null, "id": "107ab27d-935c-41fb-b71a-081baa0a8551", "metadata": {}, "outputs": [], "source": [ "from ultralytics import YOLO\n", "\n", "detection_model = YOLO(\"yolov8n.pt\")\n", "seg_model = YOLO(\"yolov8n-seg.pt\")" ] }, { "cell_type": "markdown", "id": "e48ed457-60f1-4502-ac21-49f31ac51439", "metadata": {}, "source": [ "In Python, we can apply a YOLOv8 model to an individual image by passing the file path into the model call. For an image with file path `path/to/image.jpg`, running `detection_model(\"path/to/image.jpg\")` will generate a list containing a single `ultralytics.yolo.engine.results.Results` object. \n", "\n", "We can see this by applying the detection model to Ultralytics' test image:" ] }, { "cell_type": "code", "execution_count": null, "id": "1fb8fa56-804c-41e2-97b6-04857569e3fe", "metadata": {}, "outputs": [], "source": [ "results = detection_model(\"https://ultralytics.com/images/bus.jpg\")" ] }, { "cell_type": "markdown", "id": "b41eca08-9342-45db-8426-86a309d585c9", "metadata": {}, "source": [ "A similar result can be obtained if we apply the segmentation model to an image. These results contain bounding boxes, class confidence scores, and integers representing class labels. For a complete discussion of these results objects, see the Ultralytics YOLOv8 [Results API Reference](https://docs.ultralytics.com/reference/results/)." ] }, { "cell_type": "markdown", "id": "a8da286b-7507-4fa7-b7a7-9fd4f00aa5cf", "metadata": {}, "source": [ "If we want to run tasks on all images in a directory, then we can do so from the command line with the YOLO Command Line Interface by specifying the task `[detect, segment, classify]` and mode `[train, val, predict, export]`, along with other arguments." ] }, { "cell_type": "markdown", "id": "b8133ea3-35ab-43a6-93b6-32cabf5fe787", "metadata": {}, "source": [ "To run inference on a set of images, we must first put the data in the appropriate format. The best way to do so is to load your images into a FiftyOne `Dataset`, and then export the dataset in [YOLOv5Dataset](https://docs.voxel51.com/user_guide/dataset_creation/datasets.html#yolov5dataset) format, as YOLOv5 and YOLOv8 use the same data formats." ] }, { "cell_type": "markdown", "id": "60c6b1ea", "metadata": {}, "source": [ "💡 FiftyOne's Ultralytics Integration\n", "\n", "\n", "If you just want to run inference on your FiftyOne dataset with an existing YOLOv8 model, you can do so by passing this `ultralytics.YOLO` model directly into your FiftyOne dataset's `apply_model()` method:\n", "\n", "```python\n", "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "# Load a dataset\n", "dataset = foz.load_zoo_dataset(\"quickstart\")\n", "\n", "# Load a YOLOv8 model\n", "from ultralytics import YOLO\n", "model = YOLO(\"yolov8l.pt\")\n", "\n", "# Apply the model to the dataset\n", "dataset.apply_model(model, label_field=\"yolov8l\")\n", "\n", "# Launch the App to visualize the results\n", "session = fo.launch_app(dataset)\n", "```\n", "\n", "For more details, check out the [FiftyOne Ultralytics Integration docs](https://docs.voxel51.com/integrations/ultralytics.html)!" ] }, { "cell_type": "markdown", "id": "b9526c81-5229-484b-a4c7-66621e9cec7f", "metadata": {}, "source": [ "## Load YOLOv8 predictions in FiftyOne" ] }, { "cell_type": "markdown", "id": "bcb32c2e-8f9d-4093-a656-ea571bc85009", "metadata": {}, "source": [ "In this walkthrough, we will look at YOLOv8’s predictions on a subset of the [MS COCO](https://cocodataset.org/#home) dataset. This is the dataset on which these models were trained, which means that they are likely to show close to peak performance on this data. Additionally, working with COCO data makes it easy for us to map model outputs to class labels." ] }, { "cell_type": "markdown", "id": "f777503d-203c-4aaa-be1e-fb67ac20360c", "metadata": {}, "source": [ "Load the images and ground truth object detections in COCO’s validation set from the [FiftyOne Dataset Zoo](https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "eb16e311-d189-44f2-9845-3224518136fd", "metadata": {}, "outputs": [], "source": [ "dataset = foz.load_zoo_dataset(\n", " 'coco-2017',\n", " split='validation',\n", ")" ] }, { "cell_type": "markdown", "id": "cb3448e8-8406-4a67-ad6a-0a3cf4bdd4b4", "metadata": {}, "source": [ "We then generate a mapping from YOLO class predictions to COCO class labels. [COCO has 91 classes](https://cocodataset.org/#home), and YOLOv8, just like YOLOv3 and YOLOv5, ignores all of the numeric classes and [focuses on the remaining 80](https://imageai.readthedocs.io/en/latest/detection/)." ] }, { "cell_type": "code", "execution_count": null, "id": "4c929893-60a9-4be5-8308-1b87c57bdf50", "metadata": {}, "outputs": [], "source": [ "coco_classes = [c for c in dataset.default_classes if not c.isnumeric()]" ] }, { "attachments": {}, "cell_type": "markdown", "id": "77831bcd", "metadata": {}, "source": [ "### Generate predictions" ] }, { "cell_type": "markdown", "id": "ce3d00e0-276c-4825-93ef-cdd621c7b612", "metadata": {}, "source": [ "Export the dataset into a directory `coco_val` in YOLO format:" ] }, { "cell_type": "code", "execution_count": 3, "id": "717140d1-2705-442a-b608-3b3d6c5f278d", "metadata": {}, "outputs": [], "source": [ "def export_yolo_data(\n", " samples, \n", " export_dir, \n", " classes, \n", " label_field = \"ground_truth\", \n", " split = None\n", " ):\n", "\n", " if type(split) == list:\n", " splits = split\n", " for split in splits:\n", " export_yolo_data(\n", " samples, \n", " export_dir, \n", " classes, \n", " label_field, \n", " split\n", " ) \n", " else:\n", " if split is None:\n", " split_view = samples\n", " split = \"val\"\n", " else:\n", " split_view = samples.match_tags(split)\n", "\n", " split_view.export(\n", " export_dir=export_dir,\n", " dataset_type=fo.types.YOLOv5Dataset,\n", " label_field=label_field,\n", " classes=classes,\n", " split=split\n", " )" ] }, { "cell_type": "code", "execution_count": null, "id": "6ec8adf2-f689-4599-9c58-57501fbf40c5", "metadata": {}, "outputs": [], "source": [ "coco_val_dir = \"coco_val\"\n", "export_yolo_data(dataset, coco_val_dir, coco_classes)" ] }, { "cell_type": "markdown", "id": "84bb1121-e2b9-43e5-aa82-bc5594193ff1", "metadata": {}, "source": [ "Then run inference on these images:" ] }, { "cell_type": "code", "execution_count": null, "id": "bfcde98a-0afd-40bb-86f7-3fab2515c8eb", "metadata": {}, "outputs": [], "source": [ "!yolo task=detect mode=predict model=yolov8n.pt source=coco_val/images/val save_txt=True save_conf=True" ] }, { "cell_type": "markdown", "id": "4a720074-6ab3-4b72-8bc6-96163983428b", "metadata": {}, "source": [ "Running this inference generates a directory `runs/detect/predict/labels`, which will contain a separate `.txt` file for each image in the dataset, and a line for each object detection." ] }, { "cell_type": "markdown", "id": "80b85b8f-c0f8-4866-ae87-cc44d41df1da", "metadata": {}, "source": [ "Each line is in the form: an integer for the class label, a class confidence score, and four values representing the bounding box." ] }, { "cell_type": "code", "execution_count": 22, "id": "1a5ad723-f6a2-4fe7-9c29-cb656426496e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "56 0.663281 0.619718 0.0640625 0.201878 0.265856\n", "60 0.55625 0.619718 0.184375 0.225352 0.266771\n", "74 0.710938 0.307512 0.01875 0.0469484 0.277868\n", "60 0.860156 0.91784 0.279687 0.159624 0.278297\n", "72 0.744531 0.539906 0.101562 0.295775 0.356417\n", "75 0.888281 0.820423 0.0609375 0.241784 0.391675\n", "58 0.385156 0.457746 0.0640625 0.084507 0.420693\n", "56 0.609375 0.620892 0.090625 0.21831 0.50562\n", "56 0.650781 0.619718 0.0859375 0.215962 0.508265\n", "56 0.629687 0.619718 0.128125 0.220657 0.523211\n", "0 0.686719 0.535211 0.0828125 0.333333 0.712339\n", "56 0.505469 0.624413 0.0953125 0.230047 0.854189\n", "62 0.125 0.502347 0.23125 0.225352 0.927385\n", "\n" ] } ], "source": [ "label_file = \"runs/detect/predict/labels/000000000139.txt\"\n", "\n", "with open(label_file) as f: \n", " print(f.read())" ] }, { "attachments": {}, "cell_type": "markdown", "id": "952d1236", "metadata": {}, "source": [ "### Load detections" ] }, { "cell_type": "markdown", "id": "f2240717-18a2-4354-8c2b-c652dac4e8a4", "metadata": {}, "source": [ "We can read a YOLOv8 detection prediction file with $N$ detections into an $(N, 6)$ numpy array:" ] }, { "cell_type": "code", "execution_count": 23, "id": "77a5e02b-e7a4-4c02-99af-07a387c9b624", "metadata": {}, "outputs": [], "source": [ "def read_yolo_detections_file(filepath):\n", " detections = []\n", " if not os.path.exists(filepath):\n", " return np.array([])\n", " \n", " with open(filepath) as f:\n", " lines = [line.rstrip('\\n').split(' ') for line in f]\n", " \n", " for line in lines:\n", " detection = [float(l) for l in line]\n", " detections.append(detection)\n", " return np.array(detections)" ] }, { "cell_type": "markdown", "id": "efd685bb-f7bf-4c28-bc02-1d9c4966f13e", "metadata": {}, "source": [ "From here, we need to convert these detections into FiftyOne’s [Detections](https://docs.voxel51.com/user_guide/using_datasets.html#object-detection) format." ] }, { "cell_type": "markdown", "id": "fbb96e27-c923-42dd-b5d8-46b6e6511972", "metadata": {}, "source": [ "YOLOv8 represents bounding boxes in a centered format with coordinates `[center_x, center_y, width, height]`, whereas [FiftyOne stores bounding boxes](https://docs.voxel51.com/user_guide/using_datasets.html#object-detection) in `[top-left-x, top-left-y, width, height]` format. We can make this conversion by \"un-centering\" the predicted bounding boxes:" ] }, { "cell_type": "code", "execution_count": 24, "id": "aff712e4-10b2-470c-86d0-42cd324258e7", "metadata": {}, "outputs": [], "source": [ "def _uncenter_boxes(boxes):\n", " '''convert from center coords to corner coords'''\n", " boxes[:, 0] -= boxes[:, 2]/2.\n", " boxes[:, 1] -= boxes[:, 3]/2." ] }, { "cell_type": "markdown", "id": "992890d9-6ab7-45d7-a11b-04a51944bc15", "metadata": {}, "source": [ "Additionally, we can convert a list of class predictions (indices) to a list of class labels (strings) by passing in the class list:\n", "\n" ] }, { "cell_type": "code", "execution_count": 25, "id": "3957e8af-dfd0-4e81-8ddb-4ef2ec191598", "metadata": {}, "outputs": [], "source": [ "def _get_class_labels(predicted_classes, class_list):\n", " labels = (predicted_classes).astype(int)\n", " labels = [class_list[l] for l in labels]\n", " return labels" ] }, { "cell_type": "markdown", "id": "c67cfeed-02c0-4264-9dbf-349c3049f2b8", "metadata": {}, "source": [ "Given the output of a `read_yolo_detections_file()` call, `yolo_detections`, we can generate the FiftyOne `Detections` object that captures this data:" ] }, { "cell_type": "code", "execution_count": 26, "id": "24527175-44c3-42c7-9813-76fc2349b140", "metadata": {}, "outputs": [], "source": [ "def convert_yolo_detections_to_fiftyone(\n", " yolo_detections, \n", " class_list\n", " ):\n", "\n", " detections = []\n", " if yolo_detections.size == 0:\n", " return fo.Detections(detections=detections)\n", " \n", " boxes = yolo_detections[:, 1:-1]\n", " _uncenter_boxes(boxes)\n", " \n", " confs = yolo_detections[:, -1]\n", " labels = _get_class_labels(yolo_detections[:, 0], class_list) \n", " \n", " for label, conf, box in zip(labels, confs, boxes):\n", " detections.append(\n", " fo.Detection(\n", " label=label,\n", " bounding_box=box.tolist(),\n", " confidence=conf\n", " )\n", " )\n", "\n", " return fo.Detections(detections=detections)" ] }, { "cell_type": "markdown", "id": "22f0e43b-95de-4838-a840-2517017fb98b", "metadata": {}, "source": [ "The final ingredient is a function that takes in the file path of an image, and returns the file path of the corresponding YOLOv8 detection prediction text file." ] }, { "cell_type": "code", "execution_count": null, "id": "665346d8-1b06-4a6b-8084-6339553a6c43", "metadata": {}, "outputs": [], "source": [ "def get_prediction_filepath(filepath, run_number = 1):\n", " run_num_string = \"\"\n", " if run_number != 1:\n", " run_num_string = str(run_number)\n", " filename = filepath.split(\"/\")[-1].split(\".\")[0]\n", " return f\"runs/detect/predict{run_num_string}/labels/{filename}.txt\"" ] }, { "cell_type": "markdown", "id": "073df59d-01a5-45a5-b458-c8aeaaaeab48", "metadata": {}, "source": [ "If you run multiple inference calls for the same task, the predictions results are stored in a directory with the next available integer appended to `predict` in the file path. You can account for this in the above function by passing in the `run_number` argument.\n", "\n", "Putting the pieces together, we can write a function that adds these YOLOv8 detections to all of the samples in our dataset efficiently by batching the read and write operations to the underlying [MongoDB database](https://docs.voxel51.com/environments/index.html#connecting-to-a-localhost-database)." ] }, { "cell_type": "code", "execution_count": 31, "id": "b23b8c8f-ac9d-40f4-8e0d-f0c2bd77f374", "metadata": {}, "outputs": [], "source": [ "def add_yolo_detections(\n", " samples,\n", " prediction_field,\n", " prediction_filepath,\n", " class_list\n", " ):\n", "\n", " prediction_filepaths = samples.values(prediction_filepath)\n", " yolo_detections = [read_yolo_detections_file(pf) for pf in prediction_filepaths]\n", " detections = [convert_yolo_detections_to_fiftyone(yd, class_list) for yd in yolo_detections]\n", " samples.set_values(prediction_field, detections)" ] }, { "cell_type": "markdown", "id": "bf7ced49-85ca-4280-969c-452a8eb7c8ef", "metadata": {}, "source": [ "Now we can rapidly add the detections in a few lines of code:" ] }, { "cell_type": "code", "execution_count": null, "id": "0b661fea-d3fa-488d-8683-d62d90eefd74", "metadata": {}, "outputs": [], "source": [ "filepaths = dataset.values(\"filepath\")\n", "prediction_filepaths = [get_prediction_filepath(fp) for fp in filepaths]\n", "dataset.set_values(\n", " \"yolov8n_det_filepath\", \n", " prediction_filepaths\n", ")\n", "\n", "add_yolo_detections(\n", " dataset, \n", " \"yolov8n\", \n", " \"yolov8n_det_filepath\", \n", " coco_classes\n", ")" ] }, { "cell_type": "markdown", "id": "9bdb93d3-775b-4110-9b41-a2ca9c5b2d02", "metadata": {}, "source": [ "Now we can visualize these YOLOv8 model predictions on the samples in our dataset in the FiftyOne App:" ] }, { "cell_type": "code", "execution_count": null, "id": "7be5c69c-608e-41b6-9117-87d544ef1a53", "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "27032b1f", "metadata": {}, "source": [ "![yolov8-base-predictions](images/yolov8_coco_val_predictions.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "8f1c08cc", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "306b252e", "metadata": {}, "source": [ "### Load segmentation masks" ] }, { "cell_type": "markdown", "id": "97dbec94-c2f2-4379-a7fa-154fbd1f0a70", "metadata": {}, "source": [ "It is also worth noting that it is possible to convert YOLOv8 predictions directly from the output of a YOLO model call in Python, without first generating external prediction files and reading them in. Let’s see how this can be done for instance segmentations." ] }, { "cell_type": "markdown", "id": "77794bf3-85a4-46e5-bfc9-c55e841b1caf", "metadata": {}, "source": [ "Like detections, YOLOv8 stores instance segmentations with centered bounding boxes. In addition, [YOLOv8 stores a mask](https://docs.ultralytics.com/reference/results/#masks-api-reference) that covers the entire image, with only a rectangular region of that mask containing nonzero values. FiftyOne, on the other hand, [stores instance segmentations](https://docs.voxel51.com/user_guide/using_datasets.html#instance-segmentations) at `Detection` labels with a mask that only covers the given bounding box.\n", "\n", "We can convert from YOLOv8 instance segmentations to FiftyOne instance segmentations with this `convert_yolo_segmentations_to_fiftyone()` function:" ] }, { "cell_type": "code", "execution_count": 32, "id": "cf0a3c12-236e-48a4-802d-19b85b807690", "metadata": {}, "outputs": [], "source": [ "def convert_yolo_segmentations_to_fiftyone(\n", " yolo_segmentations, \n", " class_list\n", " ):\n", "\n", " detections = []\n", " boxes = yolo_segmentations.boxes.xywhn\n", " if not boxes.shape or yolo_segmentations.masks is None:\n", " return fo.Detections(detections=detections)\n", " \n", " _uncenter_boxes(boxes)\n", " masks = yolo_segmentations.masks.masks\n", " labels = _get_class_labels(yolo_segmentations.boxes.cls, class_list)\n", "\n", " for label, box, mask in zip(labels, boxes, masks):\n", " ## convert to absolute indices to index mask\n", " w, h = mask.shape\n", " tmp = np.copy(box)\n", " tmp[2] += tmp[0]\n", " tmp[3] += tmp[1]\n", " tmp[0] *= h\n", " tmp[2] *= h\n", " tmp[1] *= w\n", " tmp[3] *= w\n", " tmp = [int(b) for b in tmp]\n", " y0, x0, y1, x1 = tmp\n", " sub_mask = mask[x0:x1, y0:y1]\n", " \n", " detections.append(\n", " fo.Detection(\n", " label=label,\n", " bounding_box = list(box),\n", " mask = sub_mask.astype(bool)\n", " )\n", " )\n", "\n", " return fo.Detections(detections=detections)" ] }, { "cell_type": "markdown", "id": "80547f08-2ead-44bc-9eb0-98209a4568cf", "metadata": {}, "source": [ "Looping through all samples in the dataset, we can add the predictions from our `seg_model`, and then view these predicted masks in the FiftyOne App." ] }, { "cell_type": "code", "execution_count": null, "id": "f9248e31", "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "0190fc96", "metadata": {}, "source": [ "![yolov8-segmentation](images/yolov8_coco_val_segmentation.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "84b30082", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "60ddd44e-2b8b-41ca-94f6-1a28e1e102fa", "metadata": {}, "source": [ "## Evaluate YOLOv8 model predictions" ] }, { "cell_type": "markdown", "id": "5b1c4e37-9086-4bac-bff9-3f51a85a0492", "metadata": {}, "source": [ "Now that we have YOLOv8 predictions loaded onto the images in our dataset, we can evaluate the quality of these predictions using FiftyOne’s [Evaluation API](https://docs.voxel51.com/user_guide/evaluation.html)." ] }, { "cell_type": "markdown", "id": "a5d78170-220c-4015-a667-55c2beb4defc", "metadata": {}, "source": [ "To evaluate the object detections in the `yolov8_det` field relative to the `ground_truth` detections field, we can run:" ] }, { "cell_type": "code", "execution_count": null, "id": "600051bb-18c1-4118-ba00-419a22bbaddf", "metadata": {}, "outputs": [], "source": [ "detection_results = dataset.evaluate_detections(\n", " \"yolov8n\", \n", " eval_key=\"eval\",\n", " compute_mAP=True,\n", " gt_field=\"ground_truth\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8dbcaaf4", "metadata": {}, "source": [ "### Compute summary statistics" ] }, { "cell_type": "markdown", "id": "980c8901-5407-454b-a2eb-ba7c03d08f57", "metadata": {}, "source": [ "We can then get the [mean average precision](https://jonathan-hui.medium.com/map-mean-average-precision-for-object-detection-45c121a31173) (mAP) of the model’s predictions:" ] }, { "cell_type": "code", "execution_count": 44, "id": "91fee641-f588-4d9f-9177-7b84e1a7823b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mAP = 0.3121319189417518\n" ] } ], "source": [ "mAP = detection_results.mAP()\n", "print(f\"mAP = {mAP}\")" ] }, { "cell_type": "markdown", "id": "3f093f2c-dc78-4096-8cae-46d740554f12", "metadata": {}, "source": [ "We can also look at the model’s performance on the 20 most common object classes in the dataset, where it has seen the most examples so the statistics are most meaningful:" ] }, { "cell_type": "code", "execution_count": 45, "id": "cf2f85cc-c2c6-4a97-ba80-6441612e9fb2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " person 0.85 0.68 0.76 11573\n", " car 0.71 0.52 0.60 1971\n", " chair 0.62 0.34 0.44 1806\n", " book 0.61 0.12 0.20 1182\n", " bottle 0.68 0.39 0.50 1051\n", " cup 0.61 0.44 0.51 907\n", " dining table 0.54 0.42 0.47 697\n", "traffic light 0.66 0.36 0.46 638\n", " bowl 0.63 0.49 0.55 636\n", " handbag 0.48 0.12 0.19 540\n", " bird 0.79 0.39 0.52 451\n", " boat 0.58 0.29 0.39 430\n", " truck 0.57 0.35 0.44 415\n", " bench 0.58 0.27 0.37 413\n", " umbrella 0.65 0.52 0.58 423\n", " cow 0.81 0.61 0.70 397\n", " banana 0.68 0.34 0.45 397\n", " carrot 0.56 0.29 0.38 384\n", " motorcycle 0.77 0.58 0.66 379\n", " backpack 0.51 0.16 0.24 371\n", "\n", " micro avg 0.76 0.52 0.61 25061\n", " macro avg 0.64 0.38 0.47 25061\n", " weighted avg 0.74 0.52 0.60 25061\n", "\n" ] } ], "source": [ "counts = dataset.count_values(\"ground_truth.detections.label\")\n", "\n", "top20_classes = sorted(\n", " counts, \n", " key=counts.get, \n", " reverse=True\n", ")[:20]\n", "\n", "detection_results.print_report(classes=top20_classes)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f3c8c9ce-aba1-46f1-b9bb-3c872bd06ae9", "metadata": {}, "source": [ "From the output of `print_report()`, we can see that this model performs decently well, but certainly has its limitations. While its precision is relatively good on average, it is lacking when it comes to recall. This is especially pronounced for certain classes like the `book` class." ] }, { "attachments": {}, "cell_type": "markdown", "id": "be0ae2f3", "metadata": {}, "source": [ "### Inspect individual predictions" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3d06c02c", "metadata": {}, "source": [ "Fortunately, we can dig deeper into these results with FiftyOne. Using the FiftyOne App, we can for instance filter by class for both ground truth and predicted detections so that only `book` detections appear in the samples." ] }, { "attachments": {}, "cell_type": "markdown", "id": "f2963e05-b9ff-40ea-a751-2057bc540edb", "metadata": {}, "source": [ "![yolov8-book-predictions](images/yolov8_coco_val_books_modal.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "35113a7d", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "11c6aec1-c1f6-427c-82e1-1cd5d8d50c78", "metadata": {}, "source": [ "Scrolling through the samples in the sample grid, we can see that a lot of the time, COCO’s purported *ground truth* labels for the `book` class appear to be imperfect. Sometimes, individual books are bounded, other times rows or whole bookshelves are encompassed in a single box, and yet other times books are entirely unlabeled. Unless our desired computer vision application specifically requires good `book` detection, this should probably not be a point of concern when we are assessing the quality of the model. After all, the quality of a model is limited by the quality of the data it is trained on - this is why data-centric approaches to computer vision are so important!\n", "\n", "For other classes like the `bird` class, however, there appear to be challenges. One way to see this is to filter for `bird` ground truth detections and then convert to an [EvaluationPatchesView](https://docs.voxel51.com/api/fiftyone.core.patches.html#fiftyone.core.patches.EvaluationPatchesView). Some of these recall errors appear to be related to small features, where the resolution is poor.\n", "\n", "In other cases though, quick inspection confirms that the object is clearly a bird. This means that there is likely room for improvement." ] }, { "attachments": {}, "cell_type": "markdown", "id": "02067175-04c4-4855-a58e-17389ca6a381", "metadata": {}, "source": [ "![yolov8-base-bird_patches](images/yolov8_coco_val_bird_patch_view.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "cfdf24a2", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "9d66d63a-b831-451a-8460-be4dd30cc1ee", "metadata": {}, "source": [ "## Curate data for fine-tuning" ] }, { "cell_type": "markdown", "id": "a1ccf433-5ceb-43d5-8a52-2ce06b9588f9", "metadata": {}, "source": [ "For the remainder of this walkthrough, we will pretend that we are working for a bird conservancy group, putting computer vision models in the field to track and protect endangered species. Our goal is to fine-tune a YOLOv8 detection model to detect birds." ] }, { "attachments": {}, "cell_type": "markdown", "id": "451e1d31", "metadata": {}, "source": [ "### Generate test set " ] }, { "cell_type": "markdown", "id": "40c060f7-628f-4630-80e2-d9e0c940f4a0", "metadata": {}, "source": [ "We will use the COCO validation dataset above as our test set. Since we are only concerned with detecting birds, we can filter out all non-`bird` ground truth detections using `filter_labels()`. We will also filter out the non- `bird` predictions, but will pass the `only_matches = False` argument into `filter_labels()` to make sure we keep images that have ground truth `bird` detections without YOLOv8n `bird` predictions." ] }, { "cell_type": "code", "execution_count": null, "id": "c33281ed-f75e-4ad8-ab11-385fee9009b3", "metadata": {}, "outputs": [], "source": [ "test_dataset = dataset.filter_labels(\n", " \"ground_truth\", \n", " F(\"label\") == \"bird\"\n", ").filter_labels(\n", " \"yolov8n\", \n", " F(\"label\") == \"bird\",\n", " only_matches=False\n", ").clone()\n", "\n", "test_dataset.name = \"birds-test-dataset\"\n", "test_dataset.persistent = True\n", "\n", "## set classes to just include birds\n", "classes = [\"bird\"]" ] }, { "cell_type": "markdown", "id": "c1da5a78-50b9-4a5f-a192-73398516dcb2", "metadata": {}, "source": [ "We then give the dataset a name, make it persistent, and save it to the underlying database. This test set has only 125 images, which we can visualize in the FiftyOne App." ] }, { "cell_type": "code", "execution_count": null, "id": "5b5a9af8-fe89-4f30-ac33-43583c41fd58", "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8f7ed1d6-26bf-48d0-9e62-dc61c81763fc", "metadata": {}, "source": [ "![yolov8-birds-test-view](images/yolov8_bird_test_view.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "74a3c553", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "markdown", "id": "bb854711-4467-497f-9883-acafbde5483b", "metadata": {}, "source": [ "We can also run `evaluate_detections()` on this data to evaluate the YOLOv8n model's performance on images with ground truth bird detections. We will store the results under the `base` evaluation key:" ] }, { "cell_type": "code", "execution_count": 49, "id": "a58db000-e3e9-4683-a067-49c91a7dcf4f", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |█████████████████| 125/125 [886.0ms elapsed, 0s remaining, 141.1 samples/s] \n", "Performing IoU sweep...\n", " 100% |█████████████████| 125/125 [619.1ms elapsed, 0s remaining, 201.9 samples/s] \n" ] } ], "source": [ "base_bird_results = test_dataset.evaluate_detections(\n", " \"yolov8n\", \n", " eval_key=\"base\",\n", " compute_mAP=True,\n", ")" ] }, { "cell_type": "code", "execution_count": 54, "id": "43b4781f-8abe-4997-9649-ae2bf8c04d91", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Base mAP = 0.24897924786479841\n" ] } ], "source": [ "mAP = base_bird_results.mAP()\n", "print(f\"Base mAP = {mAP}\")" ] }, { "cell_type": "code", "execution_count": 56, "id": "1b0a0ad1-2653-4699-90f8-82a0f509edf2", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " bird 0.87 0.39 0.54 451\n", "\n", " micro avg 0.87 0.39 0.54 451\n", " macro avg 0.87 0.39 0.54 451\n", "weighted avg 0.87 0.39 0.54 451\n", "\n" ] } ], "source": [ "base_bird_results.print_report(classes=classes)" ] }, { "cell_type": "markdown", "id": "699a0bc2-80dc-46f4-9295-1f72a9eede39", "metadata": {}, "source": [ "We note that while the recall is the same as in the initial evaluation report over the entire COCO validation split, the precision is higher. This means there are images that have YOLOv8n `bird` predictions but not ground truth `bird` detections.\n", "\n", "The final step in preparing this test set is exporting the data into YOLOv8 format so we can run inference on just these samples with our fine-tuned model when we are done training. We will do so using the `export_yolo_data()` function we defined earlier." ] }, { "cell_type": "code", "execution_count": null, "id": "bf275148-8344-4a1b-be6c-4c3ae28d1701", "metadata": {}, "outputs": [], "source": [ "export_yolo_data(\n", " test_dataset, \n", " \"birds_test\", \n", " classes\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "33cf3ac1", "metadata": {}, "source": [ "### Generate training set" ] }, { "cell_type": "markdown", "id": "e9f710e1-3d00-45d4-96a6-ccc7dee4e138", "metadata": {}, "source": [ "Now we choose the data on which we will fine-tune the base YOLOv8 model. Our goal is to generate a high-quality training dataset whose examples cover all expected scenarios in that subset. \n", "\n", "In general, this is both an art and a science, and it can involve a variety of techniques, including \n", "\n", "* pulling in data from other datasets\n", "* annotating more data that you’ve already collected with ground truth labels, \n", "* augmenting your data with tools like [Albumentations](https://albumentations.ai/)\n", "* generating synthetic data with [diffusion models](https://blog.roboflow.com/synthetic-data-with-stable-diffusion-a-guide/) or [GANs](https://towardsai.net/p/l/gans-for-synthetic-data-generation)." ] }, { "cell_type": "markdown", "id": "62ac58e1-d763-4e03-a8da-bbd95ad5dc4e", "metadata": {}, "source": [ "We’ll take the first approach and incorporate existing high-quality data from Google’s [Open Images dataset](https://storage.googleapis.com/openimages/web/index.html). For a thorough tutorial on how to work with Open Images data, see [Loading Open Images V6 and custom datasets with FiftyOne](https://medium.com/voxel51/loading-open-images-v6-and-custom-datasets-with-fiftyone-18b5334851c3)." ] }, { "cell_type": "markdown", "id": "551906a4-f586-4639-b4fe-b4d7c3812313", "metadata": {}, "source": [ "The COCO training data on which YOLOv8 was trained contains $3,237$ images with `bird` detections. Open Images is more expansive, with the train, test, and validation splits together housing $20k+$ images with `Bird` detections.\n", "\n", "Let’s create our training dataset. First, we’ll create a dataset, `train_dataset`, by loading the `bird` detection labels from the COCO train split using the [FiftyOne Dataset Zoo](https://docs.voxel51.com/user_guide/dataset_zoo/datasets.html), and cloning this into a new `Dataset` object:" ] }, { "cell_type": "code", "execution_count": null, "id": "f4a75282-224e-41a0-b53d-e5c259bdc0c8", "metadata": {}, "outputs": [], "source": [ "train_dataset = foz.load_zoo_dataset(\n", " 'coco-2017',\n", " split='train',\n", " classes=classes\n", ").clone()\n", "\n", "train_dataset.name = \"birds-train-data\"\n", "train_dataset.persistent = True\n", "train_dataset.save()" ] }, { "cell_type": "markdown", "id": "a3dfc870-3cfe-46df-9e27-95b2e63d735a", "metadata": {}, "source": [ "Then, we’ll load Open Images samples with `Bird` detection labels, passing in `only_matching=True` to only load the `Bird` labels. We then map these labels into COCO label format by changing `Bird` into `bird`." ] }, { "cell_type": "code", "execution_count": null, "id": "acd9e94f-8073-42b3-8d4f-bac43609fc2e", "metadata": {}, "outputs": [], "source": [ "oi_samples = foz.load_zoo_dataset(\n", " \"open-images-v6\",\n", " classes = [\"Bird\"],\n", " only_matching=True,\n", " label_types=\"detections\"\n", ").map_labels(\n", " \"ground_truth\",\n", " {\"Bird\":\"bird\"}\n", ")" ] }, { "cell_type": "markdown", "id": "76b08e8f-56d1-4205-abba-3096643abc39", "metadata": {}, "source": [ "We can add these new samples into our training dataset with `merge_samples()`:" ] }, { "cell_type": "code", "execution_count": null, "id": "e4871287-5f5f-40fb-bc28-ffa34391c8cf", "metadata": {}, "outputs": [], "source": [ "train_dataset.merge_samples(oi_samples)" ] }, { "cell_type": "markdown", "id": "e5fa2773-2699-44d7-a7f9-26ac02281ff7", "metadata": {}, "source": [ "This dataset contains $24,226$ samples with `bird` labels, or more than seven times as many birds as the base YOLOv8n model was trained on. In the next section, we'll demonstrate how to fine-tune the model on this data using the [YOLO Trainer class](https://docs.ultralytics.com/reference/base_trainer/)." ] }, { "cell_type": "markdown", "id": "f73d8cd2-b8dd-49f6-8a28-235d22a76401", "metadata": {}, "source": [ "## Fine-tune a YOLOv8 detection model" ] }, { "attachments": {}, "cell_type": "markdown", "id": "913370fe-ee12-4d89-a8d9-9f6c10d7a3ce", "metadata": {}, "source": [ "The final step in preparing our data is splitting it into training and validation sets and exporting it into YOLO format. We will use an 80–20 train-val split, which we will select randomly using [FiftyOne’s random utils](https://docs.voxel51.com/api/fiftyone.utils.random.html)." ] }, { "cell_type": "code", "execution_count": null, "id": "a925e6f7-16a2-4c0d-9bd2-2dd8d51fabc8", "metadata": {}, "outputs": [], "source": [ "import fiftyone.utils.random as four\n", "\n", "## delete existing tags to start fresh\n", "train_dataset.untag_samples(train_dataset.distinct(\"tags\"))\n", "\n", "## split into train and val\n", "four.random_split(\n", " train_dataset,\n", " {\"train\": 0.8, \"val\": 0.2}\n", ")\n", "\n", "## export in YOLO format\n", "export_yolo_data(\n", " train_dataset, \n", " \"birds_train\", \n", " classes, \n", " split = [\"train\", \"val\"]\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6f8840f4", "metadata": {}, "source": [ "Now all that is left is to do the fine-tuning! We will use [YOLO command line syntax](https://docs.ultralytics.com/cli/), with `mode=train`. We will specify the initial weights as the starting point for training, the number of epochs, image size, and batch size." ] }, { "cell_type": "code", "execution_count": null, "id": "a5f515b1", "metadata": {}, "outputs": [], "source": [ "!yolo task=detect mode=train model=yolov8n.pt data=birds_train/dataset.yaml epochs=60 imgsz=640 batch=16" ] }, { "attachments": {}, "cell_type": "markdown", "id": "885d5529-2e1f-4a3c-98e9-db3de828ac70", "metadata": {}, "source": [ " Image sizes 640 train, 640 val\n", " Using 8 dataloader workers\n", " Logging results to runs/detect/train\n", " Starting training for 60 epochs...\n", "\n", " Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n", " 1/60 6.65G 1.392 1.627 1.345 22 640: 1\n", " Class Images Instances Box(P R mAP50 m\n", " all 4845 12487 0.677 0.524 0.581 0.339\n", "\n", " Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n", " 2/60 9.58G 1.446 1.407 1.395 30 640: 1\n", " Class Images Instances Box(P R mAP50 m\n", " all 4845 12487 0.669 0.47 0.54 0.316\n", "\n", " Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n", " 3/60 9.58G 1.54 1.493 1.462 29 640: 1\n", " Class Images Instances Box(P R mAP50 m\n", " all 4845 12487 0.529 0.329 0.349 0.188\n", "\n", " ......\n", "\n", " Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n", " 58/60 9.59G 1.263 0.9489 1.277 47 640: 1\n", " Class Images Instances Box(P R mAP50 m\n", " all 4845 12487 0.751 0.631 0.708 0.446\n", "\n", " Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n", " 59/60 9.59G 1.264 0.9476 1.277 29 640: 1\n", " Class Images Instances Box(P R mAP50 m\n", " all 4845 12487 0.752 0.631 0.708 0.446\n", "\n", " Epoch GPU_mem box_loss cls_loss dfl_loss Instances Size\n", " 60/60 9.59G 1.257 0.9456 1.274 41 640: 1\n", " Class Images Instances Box(P R mAP50 m\n", " all 4845 12487 0.752 0.631 0.709 0.446" ] }, { "attachments": {}, "cell_type": "markdown", "id": "081e52cd", "metadata": {}, "source": [ "For this walkthrough, $60$ epochs of training was sufficient to achieve convergence. If you are fine-tuning on a different dataset, you may need to change these parameters." ] }, { "attachments": {}, "cell_type": "markdown", "id": "cf499bbc", "metadata": {}, "source": [ "With fine-tuning complete, we can generate predictions on our test data with the “best” weights found during the training process, which are stored at `runs/detect/train/weights/best.pt`:" ] }, { "cell_type": "code", "execution_count": null, "id": "c3e9a106", "metadata": {}, "outputs": [], "source": [ "!yolo task=detect mode=predict model=runs/detect/train/weights/best.pt source=birds_test/images/val save_txt=True save_conf=True" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3538a65b", "metadata": {}, "source": [ "Then we can load these predictions onto our data and visualize the predictions in the FiftyOne App:" ] }, { "cell_type": "code", "execution_count": null, "id": "e86d0e6f", "metadata": {}, "outputs": [], "source": [ "filepaths = test_dataset.values(\"filepath\")\n", "prediction_filepaths = [get_prediction_filepath(fp, run_number=2) for fp in filepaths]\n", "\n", "test_dataset.set_values(\n", " \"yolov8n_bird_det_filepath\",\n", " prediction_filepaths\n", ")\n", "\n", "add_yolo_detections(\n", " birds_test_dataset, \n", " \"yolov8n_bird\", \n", " \"yolov8n_bird_det_filepath\", \n", " classes\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "c9a555fd", "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(test_dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c2dd8096-1f3e-47fd-b412-0dabcf38123b", "metadata": {}, "source": [ "![yolov8-finetune-predictions](images/yolov8_finetune_predictions_app.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "e5c04dae", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "991814fc", "metadata": {}, "source": [ "## Assess improvement from fine-tuning" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bb3f3c08", "metadata": {}, "source": [ "On a holistic level, we can compare the performance of the fine-tuned model to the original, pretrained model by stacking their standard metrics against each other. The easiest way to get these metrics is with FiftyOne’s Evaluation API:" ] }, { "cell_type": "code", "execution_count": 55, "id": "b8be56d6-1f49-4c9e-9cfc-aa683835c2e1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Evaluating detections...\n", " 100% |█████████████████| 125/125 [954.4ms elapsed, 0s remaining, 131.0 samples/s] \n", "Performing IoU sweep...\n", " 100% |█████████████████| 125/125 [751.8ms elapsed, 0s remaining, 166.3 samples/s] \n" ] } ], "source": [ "finetune_bird_results = test_dataset.evaluate_detections(\n", " \"yolov8n_bird\", \n", " eval_key=\"finetune\",\n", " compute_mAP=True,\n", ")" ] }, { "cell_type": "markdown", "id": "7a640b18", "metadata": {}, "source": [ "From this, we can immediately see improvement in the mean average precision (mAP):" ] }, { "cell_type": "code", "execution_count": 3, "id": "ff3ad392", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "yolov8n mAP: 0.24897924786479841\n", "fine-tuned mAP: 0.31339033693212076\n" ] } ], "source": [ "print(\"yolov8n mAP: {}.format(base_bird_results.mAP()))\n", "print(\"fine-tuned mAP: {}.format(finetune_bird_results.mAP()))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "24f8ed8e", "metadata": {}, "source": [ "Printing out a report, we can see that the recall has improved from $0.39$ to $0.56$. This major improvement offsets a minor dip in precision, giving an overall higher F1 score ($0.67$ compared to $0.54$)." ] }, { "cell_type": "code", "execution_count": 56, "id": "e601002b-5956-49e3-8ac7-ddcf53a97d7b", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " precision recall f1-score support\n", "\n", " bird 0.81 0.56 0.67 506\n", "\n", " micro avg 0.81 0.56 0.67 506\n", " macro avg 0.81 0.56 0.67 506\n", "weighted avg 0.81 0.56 0.67 506\n", "\n" ] } ], "source": [ "finetune_bird_results.print_report()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fabea891", "metadata": {}, "source": [ "We can also look more closely at individual images to see where the fine-tuned model is having trouble. In particular, we can look at images with the most false negatives, or the most false positives:" ] }, { "cell_type": "code", "execution_count": null, "id": "31917193", "metadata": {}, "outputs": [], "source": [ "fn_view = dataset.sort_by(\"eval_fn\", reverse=True)\n", "session.view = fn_view" ] }, { "attachments": {}, "cell_type": "markdown", "id": "63983d59-fa15-42e8-a68a-8354cd13fb59", "metadata": {}, "source": [ "![yolov8-finetune-fp](images/yolov8_finetune_fp_predictions.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "d0328cc2", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "cell_type": "code", "execution_count": null, "id": "f926a4e1", "metadata": {}, "outputs": [], "source": [ "fp_view = dataset.sort_by(\"eval_fp\", reverse=True)\n", "session.view = fp_view" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f2f80940-4a4c-4f91-a42d-a96fb39c5445", "metadata": {}, "source": [ "![yolov8-finetune_fn](images/yolov8_finetune_fn_predictions.png)" ] }, { "cell_type": "code", "execution_count": null, "id": "97e180d1", "metadata": {}, "outputs": [], "source": [ "session.freeze()" ] }, { "attachments": {}, "cell_type": "markdown", "id": "223c4620", "metadata": {}, "source": [ "Looking at both the false positives and false negatives, we can see that the model struggles to correctly handle small features. This poor performance could be in part due to quality of the data, as many of these features are grainy. It could also be due to the training parameters, as both the pre-training and fine-tuning for this model used an image size of $640$ pixels, which might not allow for fine-grained details to be captured." ] }, { "attachments": {}, "cell_type": "markdown", "id": "54a7fcce", "metadata": {}, "source": [ "To further improve the model’s performance, we could try a variety of approaches, including:\n", "\n", "* Using image augmentation to increase the proportion of images with small birds\n", "* Gathering and annotating more images with small birds\n", "* Increasing the image size during fine-tuning" ] }, { "attachments": {}, "cell_type": "markdown", "id": "377e2a2e", "metadata": {}, "source": [ "## Summary" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3d068626", "metadata": {}, "source": [ "While YOLOv8 represents a step forward for real-time object detection and segmentation models, out-of-the-box it’s aimed at general purpose uses. Before deploying the model, it is essential to understand how it performs on your data. Only then can you effectively fine-tune the YOLOv8 architecture to suit your specific needs.\n", "\n", "You can use FiftyOne to visualize, evaluate, and better understand YOLOv8 model predictions. After all, while YOLO may only look once, a conscientious computer vision engineer or researcher certainly looks twice (or more)!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }