{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "bGh0Lado1Uif" }, "source": [ "# Creating Views\n", "\n", "[FiftyOne datasets](https://voxel51.com/docs/fiftyone/user_guide/using_datasets.html) provide the flexibility to store large, complex data. While it is helpful that data can be imported and exported easily, the real potential of FiftyOne comes from its powerful query language that you can use to define custom **views** into your datasets.\n", "\n", "A [dataset view](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) can be thought of as a pipeline of operations that is applied to a dataset to extract a subset of the dataset whose samples and fields are filtered, sorted, shuffled, etc. Check out [this page](https://voxel51.com/docs/fiftyone/user_guide/using_views.html) for an extended discussion of dataset views.\n", "\n", "In this notebook, we'll do a brief walkthrough of creating and using dataset views." ] }, { "cell_type": "markdown", "metadata": { "id": "UAhVXC2kL5ZJ" }, "source": [ "## Setup\n", "\n", "If you haven't already, install FiftyOne:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "G250JmNsJy-q" }, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "metadata": { "id": "KF-UPFUaq-lN" }, "source": [ "## Overview\n", "\n", "To start out, lets import FiftyOne, load up a dataset, and evaluate some predicted object detections." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "CzXiRDEu8cHn" }, "outputs": [], "source": [ "import fiftyone as fo\n", "import fiftyone.zoo as foz" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "V3z0U6bSqAdM" }, "outputs": [], "source": [ "dataset = foz.load_zoo_dataset(\"quickstart\")\n", "dataset.evaluate_detections(\"predictions\", gt_field=\"ground_truth\", eval_key=\"eval\")" ] }, { "cell_type": "markdown", "metadata": { "id": "94WJ_iVzp-K0" }, "source": [ "Dataset views can range from as simple as \"select a slice of the dataset\" to \"filter sample that have at least two large bounding boxes of people or dogs with high confidence and that were evaluated to be a false positive, then crop all images to those bounding boxes\":" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "e5yck2hfqtF-" }, "outputs": [], "source": [ "from fiftyone import ViewField as F\n", "\n", "# Slice dataset\n", "simple_view = dataset[51:151]\n", "\n", "# Complex filtering and conversion\n", "complex_view = (\n", " dataset\n", " .filter_labels(\n", " \"predictions\", (\n", " (F(\"confidence\") > 0.7)\n", " & ((F(\"bounding_box\")[2] * F(\"bounding_box\")[3]) > 0.3)\n", " & (F(\"eval\") == \"fp\")\n", " & (F(\"label\").is_in([\"person\", \"dog\"]))\n", " )\n", " ).match(\n", " F(\"predictions.detections\").length() > 2\n", " ).to_patches(\"predictions\")\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "WSQMKOhuv8bg" }, "source": [ "The goal is that, by the end of this notebook, creating complex views like the one above will be as straight forward as the simple views." ] }, { "cell_type": "markdown", "metadata": { "id": "vBYW3JSQwQFB" }, "source": [ "## View basics\n", "\n", "\"Creating a view from a dataset\" is simply the process of performing an operation on a dataset that returns a `DatasetView`. The most basic way to turn a `Dataset` into a `DatasetView` is to just call `view()`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ITKGTBc_yp5i" }, "outputs": [], "source": [ "# A view that contains the entire dataset\n", "view = dataset.view()" ] }, { "cell_type": "markdown", "metadata": { "id": "yAgcifA_y3O0" }, "source": [ "Within FiftyOne, views and datasets are largely interchangable in nearly all operations. Anything you can do to a dataset, you can also do to a view." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ZpWoJoL4yrQ_", "outputId": "e4ba663c-a6a2-45c6-a546-b38bedc13f93" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: quickstart\n", "Media type: image\n", "Num samples: 200\n", "Tags: ['validation']\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " uniqueness: fiftyone.core.fields.FloatField\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " eval_tp: fiftyone.core.fields.IntField\n", " eval_fp: fiftyone.core.fields.IntField\n", " eval_fn: fiftyone.core.fields.IntField\n", " is_cloudy: fiftyone.core.fields.BooleanField\n", " classification: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " classifications: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "View stages:\n", " ---\n" ] } ], "source": [ "print(view)" ] }, { "cell_type": "markdown", "metadata": { "id": "WC3j8694zABK" }, "source": [ "To create some more interesting views, you need to apply a view stage operation to the dataset. The list of available view stages can be printed as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GE3FoRg8y_Br", "outputId": "890c936a-eaac-4319-d57b-17c7efeb2b1b" }, "outputs": [ { "data": { "text/plain": [ "['exclude',\n", " 'exclude_by',\n", " 'exclude_fields',\n", " 'exclude_frames',\n", " 'exclude_labels',\n", " 'exists',\n", " 'filter_field',\n", " 'filter_labels',\n", " 'filter_classifications',\n", " 'filter_detections',\n", " 'filter_polylines',\n", " 'filter_keypoints',\n", " 'geo_near',\n", " 'geo_within',\n", " 'group_by',\n", " 'limit',\n", " 'limit_labels',\n", " 'map_labels',\n", " 'set_field',\n", " 'match',\n", " 'match_frames',\n", " 'match_labels',\n", " 'match_tags',\n", " 'mongo',\n", " 'select',\n", " 'select_by',\n", " 'select_fields',\n", " 'select_frames',\n", " 'select_labels',\n", " 'shuffle',\n", " 'skip',\n", " 'sort_by',\n", " 'sort_by_similarity',\n", " 'take',\n", " 'to_patches',\n", " 'to_evaluation_patches',\n", " 'to_clips',\n", " 'to_frames']" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.list_view_stages()" ] }, { "cell_type": "markdown", "metadata": { "id": "KipRtBj7zf2q" }, "source": [ "These view stages allow you to perform many useful operations on datasets like slicing, sorting, shuffling, filtering, and more.\n", "\n", "For example, the [take()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.take) stage lets you extract a random subset of samples from the dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "CkUj5Qkdz8In", "outputId": "babeb9c9-7f16-475f-be90-f16dae572b29" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "100\n" ] } ], "source": [ "random_view = dataset.take(100)\n", "\n", "print(len(random_view))" ] }, { "cell_type": "markdown", "metadata": { "id": "nyMLsTeCz08M" }, "source": [ "These view stages can also be chained together, each operating on the view returned by the previous stage:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "przrgkwW0EOV" }, "outputs": [], "source": [ "sorted_random_view = random_view.sort_by(\"filepath\")\n", "sliced_sorted_random_view = sorted_random_view[10:51]" ] }, { "cell_type": "markdown", "metadata": { "id": "2juL3Ql-6vrB" }, "source": [ "Note that the slicing syntax is simply a different representation of the [skip()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.skip) and [limit()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.limit) stages:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nsIF_uZH64Qr" }, "outputs": [], "source": [ "sliced_sorted_random_view = sorted_random_view.skip(10).limit(41)" ] }, { "cell_type": "markdown", "metadata": { "id": "Sk_ahH7f0UeJ" }, "source": [ "An example of one of the stages used in this notebook is [match()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.match). This stage will keep or remove samples in the dataset one by one based on if some expression applied to the sample resolves to True or False.\n", "\n", "For example, we can create a view that includes all samples with a uniqueness greater than 0.5:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "L3T8QAjo29jQ" }, "outputs": [], "source": [ "matched_view = dataset.match(F(\"uniqueness\") > 0.5)" ] }, { "cell_type": "markdown", "metadata": { "id": "wCIXUgShDnKA" }, "source": [ "Another useful view stage is [set_field()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.set_field). This stage will actually modify a field in your dataset based on the provided expression. Note that this modification is only within the resulting `DatasetView` and will not modify the underlying dataset.\n", "\n", "For example, lets set a boolean field called `is_cloudy` to True for all samples in the dataset. Note that when using `set_field()`, you need to ensure that the field exists on the dataset first." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "NrQeNCpaD6iN" }, "outputs": [], "source": [ "dataset.add_sample_field(\"is_cloudy\", fo.BooleanField)\n", "cloudy_view = dataset.set_field(\"is_cloudy\", True)\n", "\n", "dataset.set_values(\"is_cloudy\", [True]*len(dataset))" ] }, { "cell_type": "markdown", "metadata": { "id": "iDMcB1MjwSgJ" }, "source": [ "## View expressions\n", "\n", "At this point, you might be wondering \"what is this `F` that I keep seeing everywhere\"? That `F` defines a [ViewField](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewField) which can be used to write a [ViewExpression](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression). These expressions are what give you the power to write custom queries based on information that exists in your dataset.\n", "\n", "In this section, we go over what some view expression operations and how to write more complex views." ] }, { "cell_type": "markdown", "metadata": { "id": "e07eLI1H6LZa" }, "source": [ "Most view stages accept a [ViewExpression](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression) as input. View stages that seemingly operate on fields can also accept expressions! For example, [sort_by()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.sort_by) can accept a field name or an expression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "ozDc5gIL6Kxh", "outputId": "0774fd4a-5061-4564-9d79-76449db28390" }, "outputs": [ { "data": { "text/plain": [ "Dataset: quickstart\n", "Media type: image\n", "Num samples: 200\n", "Tags: ['validation']\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " uniqueness: fiftyone.core.fields.FloatField\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " eval_tp: fiftyone.core.fields.IntField\n", " eval_fp: fiftyone.core.fields.IntField\n", " eval_fn: fiftyone.core.fields.IntField\n", " is_cloudy: fiftyone.core.fields.BooleanField\n", "View stages:\n", " 1. SortBy(field_or_expr={'$size': {'$ifNull': [...]}}, reverse=False)" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Sort by filepaths\n", "dataset.sort_by(\"filepath\")\n", "\n", "# Sort by the number of predicted objects per sample\n", "dataset.sort_by(F(\"predictions.detections\").length())" ] }, { "cell_type": "markdown", "metadata": { "id": "axIgHdSO7x3k" }, "source": [ "The idea is to think about what is expected by a view stage, then provide the input that is needed in the form of a string or an expression.\n", "\n", "[sort_by()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.sort_by) operates on a sample-level, meaning we can either provide it the name of a sample-level field to use for sorting (`filepath`) or we can provide it an expression that *resolves to* a sample-level value. In this case the expression is counting the number of predicted objects for each sample and using those integers to sort the dataset." ] }, { "cell_type": "markdown", "metadata": { "id": "U-w0xYtg8U05" }, "source": [ "### View fields\n", "\n", "As mentioned, view expressions are built around view fields. A [ViewField](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewField) is how you inject the information stored in a specific field of your dataset into a view expression. \n", "\n", "For example, if you had a boolean field on your dataset called `is_cloudy` indicating if the image contains cloudy or not, then for each sample, `F(\"is_cloudy\")` can be thought of as being replaced with the value of `\"is_cloudy\"` for that sample. Since values in the field are themselves boolean, the view to match samples where `\"is_cloudy\"` is True is simply: \n", "\n", "```python\n", "cloudy_view = dataset.match(F(\"is_cloudy\"))\n", "```" ] }, { "cell_type": "markdown", "metadata": { "id": "ziGaycJv9Hs5" }, "source": [ "In our dataset, after performing evaluation, we populated the field `eval_tp` on each sample with is an integer containing the number of true positive predictions exist in the sample. There are multiple ways to match samples based on the `eval_tp` field.\n", "\n", "The way to think about view expressions in this case is the same as the expressions for the if-statement in Python that resolve in a boolean context." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "Y1gdmejm95tD" }, "outputs": [], "source": [ "a = True\n", "b = 51\n", "\n", "if a: # Nothing else needed\n", " pass\n", "\n", "if b > 4:\n", " # True if b > 4\n", " pass\n", "\n", "if b:\n", " # True if b != 0\n", " pass" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "iU_YC5BO9117", "outputId": "4e3690bd-4bbe-48a2-a1e4-0a3f3bc00586" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "69\n" ] } ], "source": [ "tp_view = dataset.match(F(\"eval_tp\") > 4)\n", "\n", "print(len(tp_view))" ] }, { "cell_type": "markdown", "metadata": { "id": "Q1RCJjAmAIuO" }, "source": [ "When providing just an integer in the expression in a Python if-statement, then the statement is True as long as the integer is not zero. The same logic applies with view expressions in this case:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "C9pRF1_J_59p", "outputId": "226b3411-ea2a-436c-f8b1-daa53d858f6e" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "198\n" ] } ], "source": [ "nonzero_tp_view = dataset.match(F(\"eval_tp\"))\n", "\n", "print(len(nonzero_tp_view))" ] }, { "cell_type": "markdown", "metadata": { "id": "flmo6wSMro86" }, "source": [ "We can also use `~` to negate an expression:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "z_zHIi4DrrxG", "outputId": "78a33604-8477-4826-e81a-168ebdd1c98d" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0, 0]\n" ] } ], "source": [ "zero_tp_view = dataset.match(~F(\"eval_tp\"))\n", "\n", "print(zero_tp_view.values(\"eval_tp\"))" ] }, { "cell_type": "markdown", "metadata": { "id": "vRC1yj9LwcTt" }, "source": [ "### Nested lists\n", "\n", "The most difficult/subtle aspect of creating view expressions is how to handle nested lists.\n", "\n", "To get a better idea of which samples contain lists, you can print out your sample as a dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "GdIgHxrWFJBh", "outputId": "ad4f6526-1468-48a9-d693-263ec04149e5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\n", " 'filepath': '/content/example.png',\n", " 'tags': [],\n", " 'metadata': None,\n", " 'ground_truth': {\n", " '_cls': 'Detections',\n", " 'detections': [\n", " {\n", " '_id': {'$oid': '622f67345627ae9fa020e6f9'},\n", " '_cls': 'Detection',\n", " 'attributes': {},\n", " 'tags': [],\n", " 'label': 'cat',\n", " 'bounding_box': [0.1, 0.1, 0.8, 0.8],\n", " },\n", " ],\n", " },\n", "}\n" ] } ], "source": [ "sample = fo.Sample(\n", " filepath=\"example.png\",\n", " ground_truth=fo.Detections(\n", " detections=[\n", " fo.Detection(label=\"cat\", bounding_box=[0.1, 0.1, 0.8, 0.8])\n", " ]\n", " ),\n", ")\n", "\n", "fo.pprint(sample.to_dict())\n" ] }, { "cell_type": "markdown", "metadata": { "id": "F1eWnIhMFPkl" }, "source": [ "Here you can see that `ground_truth.detections` is a list." ] }, { "cell_type": "markdown", "metadata": { "id": "X2i_MQaMFH6R" }, "source": [ "If you have a field containing a primitive value, then it rarely requires more than one operation to get the value that is needed by the view stage. However, when working with a list of values in a field, then there can be multiple different operations that need to be performed to get to the desired value.\n", "\n", "The most important operations for working with lists are:\n", "\n", "* [filter()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression.filter): apply a boolean to each element of a list to determine what to keep, resolving to a list\n", "* [map()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression.map): apply a function to each element of a list, resolving to a list\n", "* [reduce()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression.reduce): operates on a list and resolves to a single value" ] }, { "cell_type": "markdown", "metadata": { "id": "b1cecut4C8ej" }, "source": [ "### Filtering list fields\n", "\n", "The [filter()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression.filter) operation is quite useful to allow for fine-grained access to the information that is to be kept and removed from the view. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "HoleiHQgC82A" }, "outputs": [], "source": [ "# Only include predictions with `confidence` of at least 0.9\n", "view = dataset.set_field(\n", " \"predictions.detections\",\n", " F(\"detections\").filter(F(\"confidence\") > 0.9)\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "OeLyUu66C9Bh" }, "source": [ "Note that the [filter_labels()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.filter_labels) operation is simply a simplification of the filter operation and [set_field()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.set_field). This operation will automatically apply the given expression to the corresponding list field of the label if applicable (`Detections`, `Classifications`, etc) or will apply the expression as a match operation for non-list labels (`Detection`, `Classification`, etc).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "Y9jUH8VkSLjb", "outputId": "7d4cdc7a-0a77-41e1-ebca-ec4848119ae4" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14\n", "14\n" ] } ], "source": [ "# Filter detections\n", "view1 = dataset.filter_labels(\"ground_truth\", F(\"label\") == \"cat\")\n", "\n", "# Equivalently\n", "view2 = (\n", " dataset\n", " .set_field(\"ground_truth.detections\", F(\"detections\").filter(F(\"label\") == \"cat\"))\n", " .match(F(\"ground_truth.detections\").length() > 0)\n", ")\n", "\n", "print(len(view1))\n", "print(len(view2))" ] }, { "cell_type": "markdown", "metadata": { "id": "5GVsT97ESnlT" }, "source": [ "The match operation above was added since by default, [filter_labels()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.filter_labels) sets the keyword argument `only_matches=True`." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "LidIHFGHP9V5", "outputId": "86381020-c4c7-468c-f55f-04d18b842606" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "200\n", "200\n" ] } ], "source": [ "# Add example classification labels\n", "dataset.set_values(\"classifications\", [fo.Classification(label=\"cat\")]*len(dataset))\n", "\n", "# Filter classification\n", "view1 = dataset.filter_labels(\"classifications\", F(\"label\") == \"cat\")\n", "\n", "# Equivalently\n", "view2 = dataset.match(F(\"classifications.label\") == \"cat\")\n", "\n", "print(len(view1))\n", "print(len(view2))" ] }, { "cell_type": "markdown", "metadata": { "id": "2806tSJOTlJZ" }, "source": [ "### Mapping list fields\n", "\n", "The [map()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression.map) operation can be used to apply an expression to every element of a list. For example, we can update the tags to set every tag to uppercase:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "c_AdTnM_Tu_7", "outputId": "495d89df-d14e-4534-a419-c5fefb0d2102" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Dataset: quickstart\n", "Media type: image\n", "Num samples: 200\n", "Tags: ['VALIDATION']\n", "Sample fields:\n", " id: fiftyone.core.fields.ObjectIdField\n", " filepath: fiftyone.core.fields.StringField\n", " tags: fiftyone.core.fields.ListField(fiftyone.core.fields.StringField)\n", " metadata: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.metadata.Metadata)\n", " ground_truth: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " uniqueness: fiftyone.core.fields.FloatField\n", " predictions: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Detections)\n", " eval_tp: fiftyone.core.fields.IntField\n", " eval_fp: fiftyone.core.fields.IntField\n", " eval_fn: fiftyone.core.fields.IntField\n", " is_cloudy: fiftyone.core.fields.BooleanField\n", " classification: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", " classifications: fiftyone.core.fields.EmbeddedDocumentField(fiftyone.core.labels.Classification)\n", "View stages:\n", " 1. SetField(field='tags', expr={'$map': {'as': 'this', 'in': {...}, 'input': '$tags'}})\n" ] } ], "source": [ "transform_tag = F().upper()\n", "view = dataset.set_field(\"tags\", F(\"tags\").map(transform_tag))\n", "\n", "print(view)" ] }, { "cell_type": "markdown", "metadata": { "id": "F0GwCUgHT3qd" }, "source": [ "Note that the `F()` above is empty, indicating that [upper()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression.upper) is applied to the primitives stored in each element of the field. In this case, the primitives are the string tags. In general, `F()` references the root of the current context." ] }, { "cell_type": "markdown", "metadata": { "id": "DG12yOEnZl0X" }, "source": [ "### Reducing list fields\n", "\n", "The [reduce()](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewField.reduce) operation lets you take a list, operate on each element of it, and return some value. Reduce expressions generally involve some `VALUE` that is being aggregated as each element is iterated over. For example, this could be some float that values are added to, a string that gets concatenated each iteration, or even a list to which elements are appended.\n", "\n", "Say that we want to set a field on our predictions containing the IDs of the corresponding ground truth objects that were matched to the true positives. We can use filter and reduce to accomplish this as follows:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "WeblvK0FWuuD" }, "outputs": [], "source": [ "from fiftyone.core.expressions import VALUE" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "55V3wdUPWh8Q", "outputId": "bdc1bd22-82c8-41ca-e9bd-09f67caa206b" }, "outputs": [ { "data": { "text/plain": [ "['5f452471ef00e6374aac53c8', '5f452471ef00e6374aac53ca']" ] }, "execution_count": 69, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Get all of the matched gt object ids\n", "view = (\n", " dataset\n", " .set_field(\n", " \"predictions.gt_ids\",\n", " F(\"detections\")\n", " .filter(F(\"eval\") == \"tp\")\n", " .reduce(VALUE.append(F(\"eval_id\")), init_val=[])\n", " )\n", ")\n", "view.first().predictions.gt_ids" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Referencing root fields\n", "\n", "Another useful property of expressions is prepending your field names with `$` to refer to the root of the document. This can be used, for example, to use sample-level information like `metadata` when filtering at a detection-level:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "W6UQ9rK8VZ4e" }, "outputs": [], "source": [ "dataset.compute_metadata()\n", "\n", "# Computes the area of each bounding box in pixels\n", "bbox_area = (\n", " F(\"$metadata.width\") * F(\"bounding_box\")[2] *\n", " F(\"$metadata.height\") * F(\"bounding_box\")[3]\n", ")\n", "\n", "# Only contains boxes whose area is between 32^2 and 96^2 pixels\n", "medium_boxes_view = dataset.filter_labels(\n", " \"predictions\", (32 ** 2 < bbox_area) & (bbox_area < 96 ** 2)\n", ")" ] }, { "cell_type": "markdown", "metadata": { "id": "9HuQJ7_bafUy" }, "source": [ "For a complete listing of all operations that can be performed to create view expressions and examples of each, [check out the API documentation](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html)." ] }, { "cell_type": "markdown", "metadata": { "id": "dZVwX-CQwTtW" }, "source": [ "## Aggregations\n", "\n", "[Aggregations](https://voxel51.com/docs/fiftyone/user_guide/using_aggregations.html) provide a convenient syntax to compute aggregate statistics or extract values across a dataset or view.\n", "\n", "For example, you can use aggregations to get information like:\n", "\n", "- The boundary values of a field\n", "- The unique label names in your dataset\n", "- The standard deviation of a value across your samples\n", "- Extract a slice of field values across a view\n", "\n", "You can view the available aggregations like so:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "cRjfl51ek4hI", "outputId": "bdaa009f-fd77-44f2-e36e-002352754378" }, "outputs": [ { "data": { "text/plain": [ "['bounds',\n", " 'count',\n", " 'count_values',\n", " 'distinct',\n", " 'histogram_values',\n", " 'mean',\n", " 'std',\n", " 'sum',\n", " 'values']" ] }, "execution_count": 70, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.list_aggregations()" ] }, { "cell_type": "markdown", "metadata": { "id": "qBS8r2uYk7HG" }, "source": [ "[The documentation](https://voxel51.com/docs/fiftyone/user_guide/using_aggregations.html) already contains plenty of detailed information about aggregations. This section just highlights how view expressions can be used with aggregations.\n", "\n", "In the simplest case, aggregations can be performed by providing the name of a field you want to compute on:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "T5X6OGiBk6XQ", "outputId": "57130f15-586d-4c32-9119-f93110f40f26" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['airplane', 'apple', 'backpack', 'banana', 'baseball glove', 'bear', 'bed', 'bench', 'bicycle', 'bird', 'boat', 'book', 'bottle', 'bowl', 'broccoli', 'bus', 'cake', 'car', 'carrot', 'cat', 'cell phone', 'chair', 'clock', 'couch', 'cow', 'cup', 'dining table', 'dog', 'donut', 'elephant', 'fire hydrant', 'fork', 'frisbee', 'giraffe', 'hair drier', 'handbag', 'horse', 'hot dog', 'keyboard', 'kite', 'knife', 'laptop', 'microwave', 'motorcycle', 'mouse', 'orange', 'oven', 'parking meter', 'person', 'pizza', 'potted plant', 'refrigerator', 'remote', 'sandwich', 'scissors', 'sheep', 'sink', 'skateboard', 'skis', 'snowboard', 'spoon', 'sports ball', 'stop sign', 'suitcase', 'surfboard', 'teddy bear', 'tennis racket', 'tie', 'toaster', 'toilet', 'toothbrush', 'traffic light', 'train', 'truck', 'tv', 'umbrella', 'vase', 'wine glass', 'zebra']\n" ] } ], "source": [ "print(dataset.distinct(\"predictions.detections.label\"))" ] }, { "cell_type": "markdown", "metadata": { "id": "SsW1p1hVlikB" }, "source": [ "However, you can also pass a [ViewExpression](https://voxel51.com/docs/fiftyone/api/fiftyone.core.expressions.html#fiftyone.core.expressions.ViewExpression) to the aggregation method, in which case the expression will be evaluated and then aggregated as requested:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "2PVf7k_RlxPJ", "outputId": "14aaed91-5769-4f36-a418-26157c7c00f5" }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.15, 0.16, 0.17, 0.18, 0.19, 0.2, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.34, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.7, 0.72, 0.73, 0.74, 0.75, 0.78, 0.8, 0.82, 0.92, 1.0]\n" ] } ], "source": [ "print(dataset.distinct(F(\"uniqueness\").round(2)))" ] }, { "cell_type": "markdown", "metadata": { "id": "T3dejQsznenG" }, "source": [ "## Summary\n", "\n", "Dataset views and the view expressions language are powerful and flexible aspects of FiftyOne.\n", "\n", "Getting comfortable with using views and expressions to slice and dice your datasets based on the questions you have will allow you to work efficiently to curate high quality datasets." ] } ], "metadata": { "colab": { "collapsed_sections": [], "name": "FiftyOne_Enterprise_View_Expressions.ipynb", "provenance": [] }, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.11" } }, "nbformat": 4, "nbformat_minor": 1 }