{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Using Depth Estimation Models\n", "\n", "In this tutorial, we will explore multiple approaches to running depth estimation models in FiftyOne. We'll work with pre-trained models from different sources and learn how to integrate them into your workflow.\n", "\n", "## Installation\n", "\n", "Make sure you have FiftyOne installed in your Python environment. Additionally, you'll need:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install torch transformers datasets diffusers" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Dataset\n", "\n", "Note, we've created an in-depth tutorial in the previous notebook that discusses the methods for loading depth data into FiftyOne. As discussed in that tutorial, FiftyOne's [Heatmap](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Heatmap) class is ideal for representing depth data:\n", "\n", "```python\n", "fo.Heatmap(\n", " map=None, # 2D numpy array containing the data\n", " map_path=None, # OR path to the heatmap image on disk\n", " range=None # Optional [min, max] range for proper visualization\n", ")\n", "```\n", "\n", "Let's start by loading a dataset from the Hugging Face Hub." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from datasets import load_dataset\n", "\n", "clevr_depth = load_dataset(\n", " \"erkam/clevr-with-depth\",\n", " split=\"train\",\n", " cache_dir=\"clevr_with_depth\",\n", ")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note how this dataset is saved:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "clevr_depth[0]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The code takes a Hugging Face dataset containing image-depth pairs and converts it into a FiftyOne dataset for visualization and analysis. \n", "\n", "For each sample, it saves the RGB image to disk (since FiftyOne requires file paths) and extracts the depth information from the first channel of the RGBA depth map. Each sample in the resulting FiftyOne dataset contains the path to the RGB image, the original prompt, and the depth map stored as a `Heatmap` visualization. \n", "\n", "The depth values are scaled between 0 and 198, which represents the range of depth values in this dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "import numpy as np\n", "from PIL import Image\n", "import os\n", "\n", "def convert_dataset_to_fiftyone(hf_dataset, save_dir=\"./clevr_depth_data\"):\n", " \"\"\"\n", " Converts a Hugging Face dataset containing image-depth pairs into a FiftyOne dataset.\n", "\n", " This function takes a dataset from Hugging Face that contains RGB images and their corresponding\n", " depth maps, saves the images to disk, and creates a FiftyOne dataset with the images and depth\n", " information stored as heatmaps.\n", "\n", " Args:\n", " hf_dataset: A Hugging Face dataset containing 'image', 'depth', and 'prompt' fields\n", " save_dir (str): Directory path where images and depth maps will be saved.\n", " Defaults to \"./clevr_depth_data\"\n", "\n", " Returns:\n", " fo.Dataset: A FiftyOne dataset containing:\n", " - RGB images stored on disk\n", " - Depth maps as FiftyOne Heatmap objects (scaled 0-198)\n", " - Original prompts from the dataset\n", "\n", " Note:\n", " The depth maps are extracted from the first channel of the RGBA depth images\n", " since all channels are identical in this dataset.\n", " \"\"\"\n", " # Create directories if they don't exist\n", " os.makedirs(os.path.join(save_dir, \"images\"), exist_ok=True)\n", " os.makedirs(os.path.join(save_dir, \"depth\"), exist_ok=True)\n", " \n", " samples = []\n", " # Create a FiftyOne dataset\n", " dataset = fo.Dataset(\"clevr_depth\", overwrite=True, persistent=True)\n", " \n", " for idx, item in enumerate(hf_dataset):\n", " # Generate filenames\n", " image_filename = f\"image_{idx:06d}.png\"\n", " depth_filename = f\"depth_{idx:06d}.png\"\n", " \n", " image_path = os.path.join(save_dir, \"images\", image_filename)\n", " depth_path = os.path.join(save_dir, \"depth\", depth_filename)\n", " \n", " # Save images to disk\n", " item['image'].save(image_path)\n", " \n", " # Extract depth map from first channel (since all channels are identical in this dataset)\n", " depth_np = np.array(item['depth'])[:, :, 0] # Taking channel 0\n", "\n", " # Create a FiftyOne sample\n", " sample = fo.Sample(\n", " filepath=image_path,\n", " prompt=item['prompt']\n", " )\n", " \n", " # Add depth as Heatmap with proper range\n", " sample[\"depth\"] = fo.Heatmap(\n", " map=depth_np,\n", " range=[0, 198] # if you know the range of your dataset, use those values\n", " )\n", " # Add the sample to the dataset\n", " samples.append(sample)\n", "\n", " dataset.add_samples(samples)\n", " dataset.compute_metadata()\n", " return dataset\n", "\n", "# Usage:\n", "fo_dataset = convert_dataset_to_fiftyone(clevr_depth)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can verify the depth map was parsed by calling the Dataset:\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And inspect the values of the first map like so:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.first()['depth']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Refer to our guide for loading depth data for other examples and more detail. Once the dataset has been parsed to FiftyOne format you can [launch the app](https://docs.voxel51.com/user_guide/app.html) and inspect its contents" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo.launch_app(fo_dataset)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![clevr-dataset](https://cdn.voxel51.com/getting_started_depth_estimation/notebook2/clevr-dataset.webp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using Depth Estimation Models in FiftyOne\n", "\n", "### As a Zoo Model\n", "\n", "You can load `transformers` depth estimation models directly from the [FiftyOne Model Zoo](https://docs.voxel51.com/user_guide/model_zoo/index.html)! \n", "\n", "To load a transformers depth estimation model from the zoo, specify `depth-estimation-transformer-torch` as the first argument, and pass in the model's name or path as a keyword argument:\n", "\n", "```python\n", "model = foz.load_zoo_model(\n", " \"depth-estimation-transformer-torch\",\n", " name_or_path=\"path/to-model\",\n", ")\n", "```\n", "\n", "Any model that can be run in a Hugging Face pipeline for the `depth-estimation` task can be loaded as a Zoo model. A non-exhaustive list of such models includes:\n", "\n", "* [Intel/dpt-large](https://huggingface.co/Intel/dpt-large)\n", "\n", "* [Intel/dpt-hybrid-midas](https://huggingface.co/Intel/dpt-hybrid-midas)\n", "\n", "* [Intel/zoedepth-nyu-kitti](https://huggingface.co/Intel/zoedepth-nyu-kitti)\n", "\n", "* [vinvino02/glpn-kitti](https://huggingface.co/vinvino02/glpn-kitti)\n", "\n", "* [LiheYoung/depth-anything-small-hf](https://huggingface.co/LiheYoung/depth-anything-small-hf)\n", "\n", "* [depth-anything/Depth-Anything-V2-Small-hf](https://huggingface.co/depth-anything/Depth-Anything-V2-Small-hf)\n", "\n", "* [depth-anything/Depth-Anything-V2-Base-hf](https://huggingface.co/depth-anything/Depth-Anything-V2-Base-hf)\n", "\n", "* [depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf](https://huggingface.co/depth-anything/Depth-Anything-V2-Metric-Indoor-Large-hf)\n", "\n", "Refer to the Hugging Face documentation on [*Monocular depth estimation*](https://huggingface.co/docs/transformers/tasks/monocular_depth_estimation) to stay up to date on which models can be run in a pipeline. \n", "\n", "**Note:** When selecting a model, it's advisable to refer to its model card and determine whether it's suitable for your dataset and use case.\n", "\n", "Below is an example of using the `depth-anything/Depth-Anything-V2-Small-hf` on the dataset we parsed earlier:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import torch\n", "\n", "import fiftyone as fo\n", "import fiftyone.zoo as foz\n", "\n", "dav2_model = foz.load_zoo_model(\n", " \"depth-estimation-transformer-torch\",\n", " name_or_path=\"depth-anything/Depth-Anything-V2-Small-hf\",\n", " device=\"cuda\" if torch.cuda.is_available() else \"cpu\"\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.apply_model(\n", " dav2_model, \n", " label_field=\"dav2_small\",\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "To verify:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.first()[\"dav2_small\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Hugging Face Model That's Not Compatible with Integration\n", "\n", "Admittedly, it's not always clear which Hugging Face model can be run as part of a pipeline. \n", "\n", "A good first entry point is to just try it and pass the model name into `name_or_path` in the [load_zoo_model](https://docs.voxel51.com/api/fiftyone.zoo.models.html#fiftyone.zoo.models.load_zoo_model) method. If a Hugging Face model is not compatible with the integration, you'll see an error to the effect of: \n", "\n", "```python\n", "ValueError: Unrecognized model in \n", "```\n", "\n", "In this case, you will need to run the model manually. All this means is that you need to instantiate the model, its processor, and write some logic to parse the model output into a FiftyOne Heatmap. \n", "\n", "Here's an example of how you can do this:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import torch\n", "from PIL import Image\n", "from transformers import DPTImageProcessor, DPTForDepthEstimation\n", "import fiftyone as fo\n", "\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "dpt_processor = DPTImageProcessor.from_pretrained(\"Intel/dpt-beit-large-512\")\n", "\n", "dpt_model = DPTForDepthEstimation.from_pretrained(\n", " \"Intel/dpt-beit-large-512\",\n", " device_map=device\n", " )\n", "\n", "dpt_model.eval()\n", "\n", "file_paths = fo_dataset.values(\"filepath\") # a list of all filepaths in Dataset\n", "\n", "dpt_depth_maps = [] # to store the depth maps\n", "\n", "for img in file_paths:\n", "\n", " image = Image.open(img).convert(\"RGB\")\n", " \n", " inputs = dpt_processor(images=image, return_tensors=\"pt\").to(device)\n", "\n", " with torch.no_grad():\n", " outputs = dpt_model(**inputs)\n", " predicted_depth = outputs.predicted_depth\n", " \n", " # interpolate to original size\n", " prediction = torch.nn.functional.interpolate(\n", " predicted_depth.unsqueeze(1),\n", " size=image.size[::-1],\n", " mode=\"bicubic\",\n", " align_corners=False,\n", " )\n", " \n", " output = prediction.squeeze().cpu().numpy()\n", "\n", " formatted = (output * 255 / np.max(output)).astype(\"uint8\")\n", "\n", " fo_depth_map = fo.Heatmap(map=formatted)\n", "\n", " dpt_depth_maps.append(fo_depth_map)\n", "\n", "\n", "fo_dataset.set_values(\"dpt_beit_maps\", dpt_depth_maps)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.first()[\"dpt_beit_maps\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The majority of the logic in the code above comes directly from the [model card](https://huggingface.co/Intel/dpt-beit-large-512).\n", "\n", "The only FiftyOne-specific aspects are just grabbing the filepaths for the Samples, parsing the model output as numpy arrays, loading it as a FiftyOne [Heatmap](https://docs.voxel51.com/api/fiftyone.core.labels.html#fiftyone.core.labels.Heatmap), and adding it as a [Field](https://docs.voxel51.com/api/fiftyone.core.fields.html) to the Dataset." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Plugin\n", "\n", "The FiftyOne community contributes [Plugins](https://docs.voxel51.com/plugins/index.html) which can make it easy to run a depth estimation model on your Dataset. For example, there is a plugin for [DepthPro](https://docs.voxel51.com/plugins/plugins_ecosystem/depth_pro_plugin.html).\n", "\n", "To use this plugin, download it and install the requirements:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!fiftyone plugins download https://github.com/harpreetsahota204/depthpro-plugin" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!fiftyone plugins requirements @harpreetsahota/depth_pro_plugin --install" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Then instantiate the operator:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import fiftyone.operators as foo\n", "\n", "depthpro = foo.get_operator(\"@harpreetsahota/depth_pro_plugin/depth_pro_estimator\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You'll need to start a [delegated service](https://docs.voxel51.com/plugins/developing_plugins.html#delegated-execution), which you can do by opening your terminal and executing the following command:\n", "\n", "```bash\n", "fiftyone delegated launch\n", "```\n", "\n", "And then run the plugin on the dataset:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "await depthpro(\n", " fo_dataset,\n", " depth_field=\"depthpro_map\",\n", " depth_type=\"inverse\", # or \"regular\" see the plugin repo for more details\n", " delegate=True\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You may have to call the `reload` method of the dataset if you don't see your field:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.reload()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.first()[\"depthpro_map_depth\"]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 🧨 Diffusers Depth Estimation \n", "\n", "You can also use the `Diffusers` library for zero-shot prediction of depth maps.\n", "\n", "Start by installing the library and instantiating the model, in this case we'll use [Marigold Depth model](https://huggingface.co/prs-eth/marigold-depth-v1-0)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install diffusers" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import diffusers\n", "import torch\n", "\n", "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n", "\n", "marigold_pipe = diffusers.MarigoldDepthPipeline.from_pretrained(\n", " \"prs-eth/marigold-depth-v1-0\", \n", " variant=\"fp16\", \n", " torch_dtype=torch.float16\n", " ).to(device)\n", "\n", "marigold_pipe.set_progress_bar_config(disable=True) # disable progress bar" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With the model instantiated, we can iterate through the filepaths of our Dataset and run inference. This is an example of a model that outputs a `png` depth map. We'll save the depth map to disk and point to the filepath of the png via the `map_path` argument of `Heatmap`: " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "file_paths = fo_dataset.values(\"filepath\") # a list of all filepaths in Dataset\n", "\n", "marigold_depth_maps = [] # to store the depth maps\n", "\n", "for img in file_paths:\n", "\n", " # Create new filename with _marigold_map suffix, save wherever you want\n", " base_path = os.path.splitext(img)[0] # Remove extension\n", " depth_map_path = f\"{base_path}_marigold_map.png\"\n", "\n", " image = diffusers.utils.load_image(img)\n", "\n", " depth_estimate = marigold_pipe(image)\n", "\n", " depth_map = marigold_pipe.image_processor.visualize_depth(depth_estimate.prediction)\n", " depth_map[0].save(depth_map_path)\n", " \n", " # Alternatively, you can extract a 16 bit depth map\n", " # depth_16bit = marigold_pipe.image_processor.export_depth_to_16bit_png(depth_estimate.prediction)\n", " \n", " fo_depth_map = fo.Heatmap(map_path=depth_map_path)\n", "\n", " marigold_depth_maps.append(fo_depth_map)\n", "\n", "fo_dataset.set_values(\"marigold_depth\", marigold_depth_maps)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "fo_dataset.first()['marigold_depth']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now, let's launch the FiftyOne app and inspect all the depth maps we created. \n", "\n", "```python\n", "fo.launch_app(fo_dataset)\n", "```\n", "\n", "![all_predicted_depths](https://cdn.voxel51.com/getting_started_depth_estimation/notebook2/all_predicted_depths.webp)" ] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 2 }