{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Loading Depth Data\n", "\n", "In this tutorial, we will explore how to load and visualize depth estimation datasets in FiftyOne. We will work with two popular depth datasets that use different storage formats: DIODE (with NumPy arrays) and NYU Depth V2 (with image files).\n", "\n", "## Installation\n", "\n", "Some packages are required to load and process the depth data:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install kagglehub pandas numpy Pillow tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Representing Depth Data in FiftyOne\n", "\n", "FiftyOne's [Heatmap](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Heatmap) class is ideal for representing depth data:\n", "\n", "```python\n", "fo.Heatmap(\n", " map=None, # 2D numpy array containing the data\n", " map_path=None, # OR path to the heatmap image on disk\n", " range=None # Optional [min, max] range for proper visualization\n", ")\n", "```\n", "\n", "There are essentially two ways you can load depth data:\n", "\n", "1. Parsing a 2D numpy array\n", "\n", "2. Pointing to a depth map on disk\n", "\n", "The optional `range` parameter specifies the min/max values of the heatmap. By default: \n", "\n", "- Floating point arrays use [0, 1]\n", "\n", "- Integer arrays use [0, 255]\n", "\n", "- Image files use their native data type range\n", "\n", "This tutorial will show you how to accomplish loading depth data for both scenarios using two datasets:\n", "\n", "* DIODE dataset\n", "\n", "* NYU Depth Dataset V2\n", "\n", "## DIODE Dataset\n", "\n", "DIODE (Dense Indoor and Outdoor DEpth) is a dataset of high-resolution color images with accurate, dense, far-range depth measurements. The DIODE dataset was created by researchers from TTI-Chicago, University of Chicago, and Beihang University, and is released under the MIT license. It was last updated on March 31, 2020.\n", "\n", "It's the first public dataset to include RGBD images of both indoor and outdoor scenes captured with a single sensor suite.\n", "\n", "* [Paper on arXiv](https://arxiv.org/abs/1908.00463)\n", "\n", "* [Project page](https://diode-dataset.org/)\n", "\n", "\n", "### File Naming Conventions and Formats \n", "\n", "The dataset consists of RGB images, depth maps, and depth validity masks. Their formats are as follows:\n", "\n", "* RGB images (`*.png`): RGB images with a resolution of 1024 × 768.\n", "\n", "* Depth maps (`*_depth.npy`): Depth ground truth with the same resolution as the images.\n", "\n", "* Depth masks (`*_depth_mask.npy`): Binary depth validity masks where 1 indicates valid sensor returns and 0 otherwise.\n", "\n", "The relationship between depth maps and depth validity masks is quite important for working with depth data:\n", "\n", "- **Depth Maps** contain the actual distance measurements from the camera to surfaces in the scene. Each pixel value represents how far away that point is (usually in meters). However, depth sensors often have limitations.\n", "\n", "- **Depth Validity Masks** indicate which pixels in the depth map have reliable measurements:\n", " - A value of 1 means the depth value is valid and can be trusted\n", " - A value of 0 means the depth sensor couldn't get a reliable reading at that pixel\n", "\n", "These invalid readings typically occur because:\n", "1. Some surfaces are too reflective, transparent, or absorptive\n", "2. Areas may be too far away or outside the sensor's range\n", "3. Occlusions where one object blocks the sensor's view of another\n", "4. Motion blur during capture\n", "\n", "Without the validity mask, you'd be treating unreliable depth values as real measurements, which would introduce significant errors in any algorithms or visualizations using the depth data.\n", "\n", "### Downloading the DIODE Dataset\n", "\n", "We will download and extract the validation split of the DIODE dataset. This contains the RGB images, depth maps, and validity masks we'll need:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!wget http://diode-dataset.s3.amazonaws.com/val.tar.gz\n", "!tar -xzf val.tar.gz" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Downloading DIODE Metadata\n", "\n", "Next, download the metadata associated with this dataset and parse it to a Python dictionary:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!wget https://raw.githubusercontent.com/diode-dataset/diode-devkit/refs/heads/master/diode_meta.json" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import json\n", "\n", "with open('diode_meta.json', 'r') as f:\n", " diode_meta = json.load(f)\n", "\n", "diode_meta = diode_meta['val']" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import numpy as np\n", "from PIL import Image\n", "\n", "import fiftyone as fo\n", "\n", "from tqdm import tqdm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This is a helper function to iterate and parse the file paths:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def enumerate_paths(src, prefix=\"\"):\n", " \"\"\"Flatten nested metadata dictionary into a list of paths.\n", " \n", " This function recursively traverses a nested dictionary or list structure and \n", " builds file paths by joining keys/items with the provided prefix.\n", " \n", " Args:\n", " src: The source data structure to traverse. Can be either:\n", " - A list of path components to join with the prefix\n", " - A dictionary whose keys and values should be recursively traversed\n", " prefix: Optional string prefix to prepend to all generated paths.\n", " Default is empty string.\n", " \n", " Returns:\n", " list: A flattened list of complete file paths created by joining the prefix\n", " with all path components found in the source structure.\n", " \n", " Raises:\n", " ValueError: If src is neither a list nor a dictionary.\n", "\n", " \"\"\"\n", " if isinstance(src, list):\n", " return [os.path.join(prefix, item) for item in src]\n", " elif isinstance(src, dict):\n", " results = []\n", " for k, v in src.items():\n", " new_prefix = os.path.join(prefix, k) if prefix else k\n", " results.extend(enumerate_paths(v, new_prefix))\n", " return results\n", " else:\n", " raise ValueError(f'Unexpected data type: {type(src)}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading DIODE Depth Data into FiftyOne\n", "\n", "This code converts the DIODE depth dataset into a FiftyOne dataset for visualization and exploration. \n", "\n", "#### Depth Maps as Heatmaps\n", "\n", "The function processes depth data in several important steps:\n", "\n", "1. **Loading the raw depth**: The depth maps are loaded from NumPy files, containing metric distance values in meters.\n", "\n", "2. **Applying the mask**: Not all pixels have valid depth measurements. The function applies the depth mask to zero out invalid measurements, ensuring we only visualize reliable data.\n", "\n", "3. **Computing visualization range**: To create meaningful visualizations, the function calculates an appropriate min/max range based on the actual depth values present in each image. It uses the minimum value and the 99th percentile (capped at 300 meters) to avoid outliers skewing the visualization. This is informed by the [source code](https://github.com/diode-dataset/diode-devkit/blob/8b1765b7d801a5f5e2877c434ffe164e62ce8c90/diode.py#L60) for the DIODE Dev Kit.\n", "\n", "4. **Creating the Heatmap**: The masked depth map is stored as a FiftyOne Heatmap with the calculated range, allowing for intuitive color-coded visualization when viewing the dataset.\n", "\n", "#### Depth Masks\n", "\n", "The depth mask indicates which depth measurements are valid:\n", "\n", "- A mask value of 1 means the depth measurement is valid and trustworthy\n", "- A mask value of 0 indicates an invalid measurement (typically due to reflective surfaces, sensor limitations, or occlusions)\n", "\n", "By storing both the masked depth map and the mask itself as separate fields, you can easily visualize which areas have valid depth readings and which don't. This is particularly important for depth estimation tasks where you need to know which ground truth values you can rely on for training or evaluation.\n", "\n", "#### Dataset Structure\n", "\n", "The resulting FiftyOne dataset contains:\n", "- RGB images as the primary media\n", "- Depth maps as heatmaps with appropriate visualization ranges\n", "- Binary depth masks indicating valid measurements\n", "- Metadata fields including scene type, split, scene ID, and scan ID\n", "\n", "This structure makes it easy to filter, sort, and visualize the dataset based on different criteria, such as scene type or depth range." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def create_fiftyone_dataset(base_dir, diode_meta, dataset_name=\"DIODE\"):\n", " \"\"\"Create a FiftyOne dataset from the DIODE depth estimation dataset.\n", " \n", " The DIODE dataset (Dense Indoor/Outdoor DEpth) contains RGB images paired with\n", " depth maps and depth masks. It includes both indoor and outdoor scenes captured\n", " using professional scanning equipment to obtain high-quality ground truth depth.\n", " \n", " Args:\n", " base_dir (str): Root directory containing the DIODE dataset files\n", " diode_meta (dict): Metadata dictionary containing the dataset structure\n", " dataset_name (str, optional): Name for the created FiftyOne dataset. \n", " Defaults to \"DIODE\".\n", " \n", " Returns:\n", " fo.Dataset: A FiftyOne dataset containing:\n", " - RGB images (.png)\n", " - Depth maps (.npy) with metric depth values in meters\n", " - Binary depth masks (.npy) indicating valid depth measurements\n", " - Metadata on the sample level including scene type (indoor/outdoor),\n", " scene identifier, and scan number\n", " \n", " \"\"\"\n", " dataset = fo.Dataset(dataset_name, persistent=True, overwrite=True)\n", " \n", " # Flatten the nested dictionary\n", " all_paths = []\n", " for split in diode_meta.keys():\n", " for scene_type in diode_meta[split].keys():\n", " paths = enumerate_paths(diode_meta[split][scene_type], \n", " prefix=os.path.join(split, scene_type))\n", " all_paths.extend(paths)\n", " \n", " # Add each sample to the dataset\n", " for file_path in tqdm(all_paths, desc=\"Creating dataset\"):\n", " # Construct paths\n", " prefix = os.path.join(base_dir, file_path)\n", " rgb_path = f\"{prefix}.png\"\n", " depth_path = f\"{prefix}_depth.npy\"\n", " mask_path = f\"{prefix}_depth_mask.npy\"\n", " \n", " # Skip if any file is missing\n", " if not all(os.path.exists(p) for p in [rgb_path, depth_path, mask_path]):\n", " continue\n", " \n", " # Extract metadata from path\n", " parts = file_path.split(os.sep)\n", " if len(parts) >= 4:\n", " split, scene_type, scene, scan = parts[:4]\n", " \n", " # Create sample\n", " sample = fo.Sample(filepath=rgb_path)\n", " \n", " # Add metadata\n", " sample[\"split\"] = split # This is optional, and you can also add this as a tag \n", " sample[\"scene_type\"] = scene_type\n", " sample[\"scene\"] = scene\n", " sample[\"scan\"] = scan\n", " \n", " # Load depth map and mask\n", " depth = np.load(depth_path).squeeze()\n", " mask = np.load(mask_path) > 0\n", " \n", " # Apply mask to depth map\n", " masked_depth = np.where(mask, depth, 0)\n", " \n", " # Determine depth range for better visualization\n", " valid_depths = masked_depth[masked_depth > 0]\n", " if len(valid_depths) > 0:\n", " min_depth = valid_depths.min()\n", " max_depth = min(300, np.percentile(valid_depths, 99))\n", " depth_range = [min_depth, max_depth]\n", " else:\n", " depth_range = [0, 1] # Default fallback\n", " \n", " # Add depth map as a Heatmap\n", " sample[\"depth_map\"] = fo.Heatmap(map=masked_depth, range=depth_range)\n", " \n", " # Add mask as a binary Heatmap\n", " sample[\"depth_mask\"] = fo.Heatmap(map=mask.astype(float), range=[0, 1])\n", " \n", " # Add sample to dataset\n", " dataset.add_sample(sample)\n", " dataset.compute_metadata()\n", " print(f\"Created dataset with {len(dataset)} samples\")\n", " return dataset\n", "\n", "# Example usage\n", "base_dir = \"val\"\n", "\n", "# Create the FiftyOne dataset\n", "dataset = create_fiftyone_dataset(base_dir, diode_meta)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# Launch the app to visualize\n", "session = fo.launch_app(dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading NYU Depth V2 Dataset into FiftyOne\n", "\n", "The [NYU Depth V2 dataset](https://cs.nyu.edu/~fergus/datasets/nyu_depth_v2.html) is another popular depth estimation dataset with RGB images paired with depth maps. Unlike the DIODE dataset where we loaded depth maps from NumPy arrays, the NYU dataset stores depth maps as PNG images.\n", "\n", "When working with the NYU dataset, the main difference is how we access and load the depth information:\n", "\n", "### Depth Maps as Image Files\n", "\n", "In the NYU dataset, depth maps are stored as PNG image files rather than NumPy arrays. These PNG files typically store depth values as 16-bit grayscale images to preserve precision.\n", "\n", "Unlike the DIODE example where we passed the depth array directly, we'll now use the `map_path` parameter of the `Heatmap` class to reference the depth map files.\n", "\n", "When using `map_path`, FiftyOne will:\n", "\n", "1. Load the depth map image file when needed\n", "2. Handle the conversion from image to array internally\n", "3. Apply the provided range for proper visualization\n", "\n", "### Determining the Depth Range\n", "\n", "For PNG depth maps, you need to know how the depth values are encoded:\n", "\n", "- Some datasets store raw depth in millimeters or meters\n", "\n", "- Others normalize depth values to the 0-65535 range (for 16-bit PNGs)\n", "\n", "- The range may also be specified in the dataset documentation\n", "\n", "You'll need to specify the appropriate range based on the dataset's depth encoding to ensure proper visualization. In this example I will load with default values for `range`, which in this case will be `[0, 255]` since the map values are integers.\n", "\n", "## Example Implementation Approach\n", "\n", "To create a FiftyOne dataset from your dataframe:\n", "1. Iterate through each row in the dataframe\n", "2. Create a sample with the RGB image path\n", "3. Add the depth map as a Heatmap using the `map_path` parameter\n", "4. Add any additional metadata (scene type, room, etc.)\n", "5. Add the sample to your FiftyOne dataset\n", "\n", "This approach allows you to work with image-based depth maps just as effectively as with the array-based approach used for DIODE.\n", "\n", "Note, we will download a version of this dataset from [Kaggle](https://www.kaggle.com/datasets/sohaibanwaar1203/image-depth-estimation/data). \n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install kagglehub" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import shutil\n", "import os\n", "\n", "import kagglehub\n", "\n", "# Download latest version\n", "path = kagglehub.dataset_download(\"sohaibanwaar1203/image-depth-estimation\")\n", "\n", "# Get current working directory\n", "current_dir = os.getcwd()\n", "\n", "# To move everything from that directory to current directory\n", "for item in os.listdir(path):\n", " source = os.path.join(path, item)\n", " destination = os.path.join(current_dir, item)\n", " shutil.move(source, destination)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note:\n", "If the download fails, please rerun the dataset download cell. It’s important to ensure the dataset is fully and correctly downloaded in your environment." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We'll parse the training datset. First, load the file `nyu2_train.csv` into a dataframe. This contains paired RGB and depth paths:\n", "\n", "- `image_path`: Points to RGB images\n", "- `depth_path`: Points to depth maps as PNG files\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "\n", "data_info = pd.read_csv(\n", " 'data/nyu2_train.csv',\n", " names=['image_path', 'depth_path'],\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NYU Depth V2 Dataset in FiftyOne:\n", "\n", "This code creates a structured, browsable dataset in FiftyOne from the NYU Depth V2 dataset, which is a benchmark dataset for indoor depth estimation. The function takes a DataFrame containing paths to RGB images and their corresponding depth maps, and builds a FiftyOne dataset that allows for interactive visualization and analysis.\n", "\n", "### 1. Dataset Organization\n", "\n", "The code creates a [persistent FiftyOne dataset](https://docs.voxel51.com/user_guide/using_datasets.html#dataset-persistence), meaning it will be saved to disk and can be reloaded in future sessions. It organizes the NYU Depth V2 data with meaningful metadata extracted from the file structure:\n", "\n", "- **Room Types**: Automatically extracted from directory names (e.g., \"living_room\")\n", "- **Scene IDs**: Identifies specific room instances (e.g., \"living_room_0038_out\")\n", "- **Frame Numbers**: Numeric identifiers for individual frames within a scene\n", "\n", "### 2. Depth Map Handling\n", "\n", "The depth maps are integrated as FiftyOne Heatmap objects, which enables specialized visualization. The code uses the [map_path](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Heatmap) parameter to reference the depth files directly.\n", "\n", "### 3. Data Validation and Processing\n", "\n", "The code includes several validation steps:\n", "- Verifying required columns in the input DataFrame\n", "- Converting relative paths to absolute paths\n", "- Checking that files exist before processing\n", "- Extracting structured metadata from filenames and paths\n", "\n", "### 4. Interactive Visualization\n", "\n", "Once created, this dataset can be explored in the FiftyOne App, where you can:\n", "- Browse through RGB-depth pairs\n", "- Filter by room type, scene, or frame number\n", "- Visualize depth maps with different colormaps\n", "- Sort and group samples based on metadata" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import fiftyone as fo\n", "import pandas as pd\n", "from tqdm import tqdm\n", "\n", "def create_nyu_fiftyone_dataset(dataframe, dataset_name=\"NYU_Depth_V2_Train\"):\n", " \"\"\"Create a FiftyOne dataset from the NYU Depth V2 training dataset.\n", " \n", " Args:\n", " dataframe (pd.DataFrame): DataFrame containing image_path and depth_path columns\n", " dataset_name (str, optional): Name for the created FiftyOne dataset. \n", " Defaults to \"NYU_Depth_V2_Train\".\n", " \n", " Returns:\n", " fo.Dataset: A FiftyOne dataset containing RGB images and their corresponding depth maps\n", " \"\"\"\n", " # Create a new dataset\n", " dataset = fo.Dataset(dataset_name, overwrite=True, persistent=True)\n", " \n", " # Check if the DataFrame has the required columns\n", " required_cols = [\"image_path\", \"depth_path\"]\n", " if not all(col in dataframe.columns for col in required_cols):\n", " raise ValueError(f\"DataFrame must contain columns: {required_cols}\")\n", " \n", " # Process each row in the dataframe\n", " for _, row in tqdm(dataframe.iterrows(), total=len(dataframe), desc=\"Creating dataset\"):\n", " # Get paths\n", " image_path = row[\"image_path\"]\n", " depth_path = row[\"depth_path\"]\n", " \n", " # Convert to absolute paths if they are relative\n", " image_path_abs = os.path.abspath(image_path)\n", " depth_path_abs = os.path.abspath(depth_path)\n", " \n", " # Ensure paths exist\n", " if not (os.path.exists(image_path_abs) and os.path.exists(depth_path_abs)):\n", " print(f\"Skipping sample: {image_path_abs} or {depth_path_abs} not found\")\n", " continue\n", " \n", " # Create a new sample with the RGB image\n", " sample = fo.Sample(filepath=image_path_abs)\n", " \n", " # Extract metadata from the path\n", " # Example path: data/nyu2_train/living_room_0038_out/37.jpg\n", " parts = image_path.split('/')\n", " if len(parts) >= 3:\n", " # Get filename and extract frame number\n", " filename = parts[-1]\n", " # Extract frame number from filename (remove file extension)\n", " frame_number = os.path.splitext(filename)[0]\n", " try:\n", " # Convert to integer if possible\n", " frame_number = int(frame_number)\n", " sample[\"frame_number\"] = frame_number\n", " except ValueError:\n", " # If not a number, just store it as string\n", " sample[\"frame_id\"] = frame_number\n", " \n", " # Extract scene folder\n", " scene_folder = parts[-2]\n", " sample[\"scene_id\"] = scene_folder\n", " \n", " # Extract room type\n", " scene_parts = scene_folder.split('_')\n", " if len(scene_parts) >= 3:\n", " room_type = \"_\".join(scene_parts[:-2])\n", " sample[\"room_type\"] = room_type\n", " \n", " \n", " # Add the depth map as a Heatmap using map_path and explicit range\n", " depth_array = np.asarray(Image.open(depth_path_abs), dtype=float)\n", " valid_depths = depth_array[depth_array > 0]\n", " depth_range = [float(valid_depths.min()), float(valid_depths.max())] if valid_depths.size else [0.0, 1.0]\n", " \n", " sample[\"depth\"] = fo.Heatmap(\n", " map_path=depth_path_abs,\n", " range=depth_range,\n", " )\n", " \n", " # Add sample to dataset\n", " dataset.add_sample(sample)\n", " dataset.compute_metadata()\n", " print(f\"Created dataset with {len(dataset)} samples\")\n", " return dataset" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "nyu_dataset = create_nyu_fiftyone_dataset(data_info)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo.launch_app(nyu_dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![exploring_nyu_depth](https://cdn.voxel51.com/getting_started_depth_estimation/notebook1/exploring_nyu_depth.webp)\n", "\n", "You may have noticed that each of these datasets are sequences of frames, thus they can be parsed as videos. However, converting frame sequences to MP4 videos is inefficient because:\n", "\n", "1. The conversion process is time-consuming\n", "\n", "2. High-resolution videos consume excessive storage space\n", "\n", "3. Machine learning tasks typically process individual frames anyway, making video conversion unnecessary\n", "\n", "Instead, you can use [group_by()](https://docs.voxel51.com/user_guide/using_views.html#sorting-and-grouping) to create a view that groups the data by scene, ordered by frame number/timestamp. When you load a [dynamic](https://docs.voxel51.com/user_guide/using_datasets.html#dataset-persistence) grouped view in the App, you'll have the same experience as video datasets:\n", "\n", "• You can hover over tiles in the grid to animate scenes' frame data\n", "\n", "• When you click on a tile, you'll have familiar video player controls in the modal to navigate the scene" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "view = nyu_dataset.group_by(\"scene_id\", order_by=\"frame_number\")\n", "\n", "# Save the view for easy loading in the App \n", "nyu_dataset.save_view(\"scenes\", view)" ] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.14" } }, "nbformat": 4, "nbformat_minor": 2 }