{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "3232eaf0",
   "metadata": {},
   "source": [
    "# Loading and Exploring Datasets\n",
    "[Explore more about VAD](https://github.com/abc-125/vad?tab=readme-ov-file)\n",
    "\n",
    "Welcome to this hands-on workshop where we will learn how to load and explore datasets using FiftyOne. \n",
    "This notebook will guide you through programmatic interaction via the **FiftyOne SDK** and visualization using the **FiftyOne App**.\n",
    "\n",
    "![vad-image](https://cdn.voxel51.com/getting_started_manufacturing/notebook9/vad-image.webp)\n",
    "\n",
    "## Learning Objectives:\n",
    "- Load datasets into FiftyOne from different sources.\n",
    "- Understand the structure and metadata of datasets.\n",
    "- Use FiftyOne’s querying and filtering capabilities.\n",
    "- Interactively explore datasets in the FiftyOne App.\n",
    "\n",
    "In this example, we use Hugging Face Hub for dataset loading, but you are encouraged to explore other sources like local files, cloud storage, or custom dataset loaders.\n"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2ad8719f",
   "metadata": {},
   "source": [
    "## In this notebook, we covered:\n",
    "1. Loading datasets from Hugging Face Hub (extendable to other sources).\n",
    "2. Exploring dataset structure and metadata.\n",
    "3. Applying filtering and querying techniques to analyze data.\n",
    "4. Utilizing the FiftyOne App for interactive visualization.\n",
    "5. Clone dataset views and export your Data in FiftyOne Format \n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Requirements and FiftyOne Installation\n",
    "\n",
    "First thing you need to do is create a Python environment in your system, if you are not familiar with that please take a look of this [ReadmeFile](https://github.com/voxel51/fiftyone-examples?tab=readme-ov-file#-prerequisites-for-beginners-), where we will explain how to create the environment. After that be sure you activate the created environment and install FiftyOne there."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Install FiftyOne"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install fiftyone huggingface_hub gdown"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4ee1b1c2",
   "metadata": {},
   "source": [
    "\n",
    "## Loading a Dataset into FiftyOne"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Alternative - Download from Google Drive\n",
    "\n",
    "If you find any issues downloading the dataset from Hugging Face, please uncomment and use the following code cell."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import gdown\n",
    "# Download the coffee dataset from Google Drive\n",
    "url = \"https://drive.google.com/uc?id=1LbHHJHCdkvhzVqekAIRdWjBWaBHxPjuu\"  \n",
    "\n",
    "gdown.download(url, output=\"vad.zip\", quiet=False)\n",
    "!unzip vad.zip"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import fiftyone as fo\n",
    "import fiftyone.utils.data as foud\n",
    "from fiftyone import Sample\n",
    "from pathlib import Path\n",
    "\n",
    "# Path to your dataset root (adjust if necessary)\n",
    "DATASET_DIR = \"vad\"\n",
    "\n",
    "# Create or load a FiftyOne dataset\n",
    "dataset_name = \"vad-dataset\"\n",
    "if dataset_name in fo.list_datasets():\n",
    "    fo.delete_dataset(dataset_name)\n",
    "\n",
    "dataset = fo.Dataset(dataset_name)\n",
    "\n",
    "# Helper: load all images with metadata from dir structure\n",
    "def add_samples_from_dir(dataset, root_dir):\n",
    "    for split in [\"train\", \"test\"]:\n",
    "        split_dir = Path(root_dir) / split\n",
    "        for label in os.listdir(split_dir):\n",
    "            label_dir = split_dir / label\n",
    "            if not label_dir.is_dir():\n",
    "                continue\n",
    "\n",
    "            for img_file in label_dir.glob(\"*\"):\n",
    "                if img_file.suffix.lower() not in [\".jpg\", \".jpeg\", \".png\", \".bmp\", \".tif\", \".tiff\"]:\n",
    "                    continue\n",
    "\n",
    "                sample = Sample(\n",
    "                    filepath=str(img_file.resolve()),\n",
    "                    metadata=None,  # will be auto-populated\n",
    "                    tags=[split],\n",
    "                )\n",
    "                sample[\"split\"] = split\n",
    "                sample[\"label\"] = label\n",
    "                dataset.add_sample(sample)\n",
    "\n",
    "# Ingest all samples\n",
    "add_samples_from_dir(dataset, DATASET_DIR)\n",
    "\n",
    "# Optionally compute metadata (dimensions, etc.)\n",
    "dataset.compute_metadata()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Launch the FiftyOne App\n",
    "session = fo.launch_app(dataset, port= 5152, auto=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert `label` string field to a proper Classification label\n",
    "for sample in dataset:\n",
    "    sample[\"ground_truth\"] = fo.Classification(label=sample[\"label\"])\n",
    "    sample.save()\n",
    "\n",
    "# Optionally delete the old string label field if not needed\n",
    "# dataset.delete_sample_field(\"label\")\n",
    "\n",
    "# Refresh the app session\n",
    "session.refresh()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset.persistent = True"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a13ff8a6",
   "metadata": {},
   "source": [
    "\n",
    "## Exploring the Dataset\n",
    "\n",
    "Once the dataset is loaded, we can inspect its structure using FiftyOne’s SDK.\n",
    "We will explore:\n",
    "- The number of samples in the dataset.\n",
    "- Available metadata and labels.\n",
    "- How images/videos are structured.\n",
    "\n",
    "**Relevant Documentation:** [Inspecting Datasets in FiftyOne](https://docs.voxel51.com/user_guide/using_datasets.html#using-fiftyone-datasets) You can also call the [first Sample](https://docs.voxel51.com/api/fiftyone.core.dataset.html#fiftyone.core.dataset.Dataset.first) of the Dataset to see what the Fields looks like:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "print(dataset)\n",
    "print(dataset.first())  # Inspect the first or last sample"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2f27130",
   "metadata": {},
   "source": [
    "\n",
    "## Querying and Filtering\n",
    "\n",
    "FiftyOne provides a powerful querying engine to filter and analyze datasets efficiently.\n",
    "We can apply filters to:\n",
    "- Retrieve specific labels (e.g., all images with \"cat\" labels).\n",
    "- Apply confidence thresholds to object detections.\n",
    "- Filter data based on metadata (e.g., image size, timestamp).\n",
    "\n",
    "**Relevant Documentation:** [Dataset views](https://docs.voxel51.com/user_guide/using_views.html#dataset-views), [Querying Samples](https://docs.voxel51.com/user_guide/using_views.html#querying-samples), [Common filters](https://docs.voxel51.com/user_guide/using_views.html#common-filters)\n",
    "\n",
    "### Examples:\n",
    "- Show all images containing a particular class.\n",
    "- Retrieve samples with object detection confidence above a threshold.\n",
    "- Filter out low-quality images based on metadata.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import fiftyone.core.expressions as foe\n",
    "\n",
    "# Query images where the defect is labeled as \"scratch\"\n",
    "view = dataset.match(foe.ViewField(\"split\") == \"test\")\n",
    "print(view)\n",
    "\n",
    "# Launch FiftyOne App with the filtered dataset\n",
    "session = fo.launch_app(view, port=5152, auto=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "filter = view.match(foe.ViewField(\"ground_truth.label\") == \"bad_unseen_defects\")\n",
    "session.view = filter\n",
    "print(filter)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Launch FiftyOne App with the filtered dataset\n",
    "session = fo.launch_app(filter, port=5152, auto=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7a0766d0",
   "metadata": {},
   "source": [
    "\n",
    "## Interactive Exploration with the FiftyOne App\n",
    "\n",
    "The **FiftyOne App** allows users to interactively browse, filter, and analyze datasets.\n",
    "This visual interface is an essential tool for understanding dataset composition and refining data exploration workflows.\n",
    "\n",
    "Key features of the FiftyOne App:\n",
    "- Interactive filtering of images/videos.\n",
    "- Object detection visualization.\n",
    "- Dataset statistics and metadata overview.\n",
    "\n",
    "**Relevant Documentation:** [Using the FiftyOne App](https://voxel51.com/docs/fiftyone/user_guide/app.html)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Intereacting with Plugins to understand the dataset\n",
    "\n",
    "FiftyOne provides a powerful [plugin framework](https://docs.voxel51.com/plugins/index.html) that allows for extending and customizing the functionality of the tool to suit your specific needs. In this case we will use the [@voxel51/dashboard](https://github.com/voxel51/fiftyone-plugins/blob/main/plugins/dashboard/README.md) plugin, a plugin that enables users to construct custom dashboards that display statistics of interest about the current dataset (and beyond)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!fiftyone plugins download https://github.com/voxel51/fiftyone-plugins --plugin-names @voxel51/dashboard"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## New dataset\n",
    "\n",
    "Creates a new dataset containing a copy of the contents of the view."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "new_dataset= view.clone()\n",
    "print(new_dataset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Exporting Dataset to FiftyOneDataset\n",
    "\n",
    "FiftyOne supports various dataset formats. In this notebook, we’ve worked with a custom dataset from Hugging Face Hub. Now, we export it into a FiftyOne-compatible dataset to leverage additional capabilities.\n",
    "\n",
    "For more details on the dataset types supported by FiftyOne, refer to this [documentation](https://docs.voxel51.com/api/fiftyone.types.dataset_types.html?highlight=dataset%20type#module-fiftyone.types.dataset_types)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "export_dir = \"VAD_test\"\n",
    "new_dataset.export(\n",
    "    export_dir=export_dir,\n",
    "    dataset_type=fo.types.FiftyOneDataset,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Next Steps:\n",
    "Try modifying the dataset loading parameters, apply different filters, and explore the FiftyOne App’s visualization features! "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "manu_env",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.17"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}