{ "cells": [ { "cell_type": "markdown", "id": "8527d902-f8b0-43c2-84ad-b9ee5e83121e", "metadata": {}, "source": [ "# Build a 3D self-driving dataset from scratch with OpenAI's Point-E and FiftyOne" ] }, { "cell_type": "markdown", "id": "7b4ae8e0-5ec1-4ae3-b77e-74b6523a14cb", "metadata": {}, "source": [ "In this walkthrough, we will show you how to build your own $3D$ point cloud dataset using OpenAI's [Point-E](https://github.com/openai/point-e) for $3D$ point cloud synthesis, and FiftyOne for dataset curation and visualization.\n", "\n", "Specifically, this walkthrough covers:\n", "\n", "* Generating $3D$ point clouds from text with Point-E\n", "* Loading point cloud data into FiftyOne\n", "* Curating synthetically generated data assets\n", "* Constructing a high-quality point-cloud dataset for self-driving applications\n", "\n", "**So, what's the takeaway?**\n", "\n", "FiftyOne can help you to understand, curate, and process $3D$ point cloud data and build high quality $3D$ datasets" ] }, { "attachments": {}, "cell_type": "markdown", "id": "3192c0d9", "metadata": {}, "source": [ "![pointe-preview](images/pointe_preview.gif)" ] }, { "cell_type": "markdown", "id": "d9733f11-89e8-430e-a27d-b56f79926987", "metadata": {}, "source": [ "## Setup" ] }, { "cell_type": "markdown", "id": "131969ca-ad7d-4c47-abcc-2f02306affa0", "metadata": {}, "source": [ "To get started, you need to install [FiftyOne](https://docs.voxel51.com/getting_started/install.html) and [Point-E](https://github.com/openai/point-e):" ] }, { "cell_type": "markdown", "id": "e577b17a-0621-4559-94fa-1a7158a0f25a", "metadata": {}, "source": [ "To install FiftyOne, you can use the Python package installer `pip`:" ] }, { "cell_type": "code", "execution_count": null, "id": "7edb3f94-49df-4a33-a75b-702921ede0ae", "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone" ] }, { "cell_type": "markdown", "id": "36aa3e50-d54d-450f-a802-897835f1677b", "metadata": {}, "source": [ "To install Point-E, you will need to clone the [Point-E github repo](https://github.com/openai/point-e):" ] }, { "cell_type": "code", "execution_count": null, "id": "b63056e9-bc96-40ad-bb1c-8f38eeeb6f21", "metadata": {}, "outputs": [], "source": [ "!git clone https://github.com/openai/point-e.git" ] }, { "cell_type": "markdown", "id": "cbcc293b-79ba-47d4-bd9a-fad30d97ef6f", "metadata": {}, "source": [ "And then `cd` into the `point-e` directory and install the package locally:" ] }, { "cell_type": "code", "execution_count": null, "id": "5e3cea77-2e86-43ea-91cb-1dfa21f806b1", "metadata": {}, "outputs": [], "source": [ "!pip install -e ." ] }, { "cell_type": "markdown", "id": "da8ca6f5-8e66-4223-bd2f-2e93eaa755d8", "metadata": {}, "source": [ "You will also need to have [Open3D](http://www.open3d.org/) and [PyTorch](https://pytorch.org/) installed:" ] }, { "cell_type": "code", "execution_count": null, "id": "87e5a662-42f4-4b8b-9909-f36ee889cec3", "metadata": {}, "outputs": [], "source": [ "!pip install open3d torch" ] }, { "cell_type": "markdown", "id": "ab618411-bb9c-40d8-a403-6f818df6245c", "metadata": {}, "source": [ "Next, we'll import all of the relevant modules that we will be using in this walkthrough:" ] }, { "cell_type": "code", "execution_count": null, "id": "1aaca266-9f32-475b-9e30-a70385fdbf1f", "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import open3d as o3d\n", "import random\n", "import torch\n", "from tqdm.auto import tqdm\n", "import uuid" ] }, { "cell_type": "code", "execution_count": 4, "id": "385f0b11-e7e1-4b3c-92e8-6fa742abcdf5", "metadata": {}, "outputs": [], "source": [ "import fiftyone as fo\n", "import fiftyone.brain as fob\n", "import fiftyone.zoo as foz\n", "import fiftyone.utils.utils3d as fou3d\n", "from fiftyone import ViewField as F" ] }, { "cell_type": "code", "execution_count": 18, "id": "7a05695d-7fc5-4d13-8814-5e040a11d5f5", "metadata": {}, "outputs": [], "source": [ "from point_e.diffusion.configs import DIFFUSION_CONFIGS, diffusion_from_config\n", "from point_e.diffusion.sampler import PointCloudSampler\n", "from point_e.models.download import load_checkpoint\n", "from point_e.models.configs import MODEL_CONFIGS, model_from_config\n", "from point_e.util.plotting import plot_point_cloud" ] }, { "cell_type": "markdown", "id": "e8d483a0-3b5c-42fc-b79d-bcd6ee8012e6", "metadata": {}, "source": [ "We will also set our device:" ] }, { "cell_type": "code", "execution_count": 5, "id": "9b11196d-d6f9-40c4-97bd-809ccbe238dc", "metadata": {}, "outputs": [], "source": [ "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')" ] }, { "cell_type": "markdown", "id": "720afca5-ba5f-42f4-b66e-6ad2ad5bfedb", "metadata": {}, "source": [ "## Generating a point cloud from text" ] }, { "cell_type": "markdown", "id": "572d089a-5365-496d-ab6a-96ee8de68927", "metadata": {}, "source": [ "Following the OpenAI's [text2pointcloud](https://github.com/openai/point-e/tree/main/point_e/examples) example notebook, we will show how to generate a $3D$ point cloud with an input text prompt." ] }, { "cell_type": "markdown", "id": "147795ee-8b94-4d01-b1da-ffd454985b6a", "metadata": {}, "source": [ "For this walkthrough, we will use OpenAI's `base40M-textvec` model, which is a model with $40M$ parameters which takes a text prompt as input and generates an embedding vector." ] }, { "cell_type": "code", "execution_count": 14, "id": "e3087681-e34c-4a4b-94be-fe9ca76205c7", "metadata": {}, "outputs": [], "source": [ "base_name = 'base40M-textvec'\n", "base_model = model_from_config(MODEL_CONFIGS[base_name], device)\n", "base_model.eval();\n", "base_model.load_state_dict(load_checkpoint(base_name, device));\n", "base_diffusion = diffusion_from_config(DIFFUSION_CONFIGS[base_name])" ] }, { "cell_type": "markdown", "id": "d0447331-0826-46e2-967c-f40124f9a302", "metadata": {}, "source": [ "Applied on its own, this model will generate a point cloud with $1024$ points. We will also use an upsampler to generate from this a point cloud with $4096$ points:" ] }, { "cell_type": "code", "execution_count": 15, "id": "50b23050-5f3f-4eb1-b41b-a8aae47900c8", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 15, "metadata": {}, "output_type": "execute_result" } ], "source": [ "upsampler_model = model_from_config(MODEL_CONFIGS['upsample'], device)\n", "upsampler_model.eval()\n", "upsampler_diffusion = diffusion_from_config(DIFFUSION_CONFIGS['upsample'])\n", "upsampler_model.load_state_dict(load_checkpoint('upsample', device))" ] }, { "cell_type": "markdown", "id": "38578935-7ad2-4b9d-8a24-f6cb882ccca4", "metadata": {}, "source": [ "The base diffusion model and upsampling diffusion model are joined together in a `PointCloudSampler` object, which will take in a text prompt, and output a point cloud with $4096$ points:" ] }, { "cell_type": "code", "execution_count": 16, "id": "1c9a35ae-74cb-4dba-978a-90394ff97a65", "metadata": {}, "outputs": [], "source": [ "sampler = PointCloudSampler(\n", " device=device,\n", " models=[base_model, upsampler_model],\n", " diffusions=[base_diffusion, upsampler_diffusion],\n", " num_points=[1024, 4096 - 1024],\n", " aux_channels=['R', 'G', 'B'],\n", " guidance_scale=[3.0, 0.0],\n", " model_kwargs_key_filter=('texts', ''), # Do not condition the upsampler at all\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "bc664437-5766-4bf3-9aec-3dd5efbfbe78", "metadata": {}, "source": [ "Let's see this point cloud diffusion model in action, with the text prompt 'red and silver headphones':" ] }, { "cell_type": "code", "execution_count": null, "id": "40616de0-93fd-4bd5-a9c3-1bbc393c2bd2", "metadata": {}, "outputs": [], "source": [ "# Set a prompt to condition on.\n", "prompt = 'red and silver headphones'\n", "\n", "# Produce a sample from the model.\n", "samples = None\n", "for x in tqdm(sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt]))):\n", " samples = x" ] }, { "cell_type": "markdown", "id": "478eb7d5-1771-498b-be21-ccf0c0a30297", "metadata": {}, "source": [ "We can visualize this with Point-E's native visualizer: " ] }, { "cell_type": "code", "execution_count": null, "id": "f6112fee-42ca-47e4-8da5-7e9b53aea131", "metadata": {}, "outputs": [], "source": [ "pc = sampler.output_to_point_clouds(samples)[0]\n", "fig = plot_point_cloud(pc, grid_size=3, fixed_bounds=((-0.75, -0.75, -0.75),(0.75, 0.75, 0.75)))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6f7835ea", "metadata": {}, "source": [ "\n", "![pointe-headphones](images/pointe_headphones.png)" ] }, { "cell_type": "markdown", "id": "84e29970-35d3-48c9-b241-20d3d9205e49", "metadata": {}, "source": [ "Point-E's diffusion model is probabilistic, so if we were to run the model again, we would get a different result." ] }, { "cell_type": "markdown", "id": "8736e074-25d6-4f17-83f3-b0f1005c02aa", "metadata": {}, "source": [ "## Loading a Point-E point cloud into FiftyOne" ] }, { "cell_type": "markdown", "id": "335b5ec1-a249-4e0d-aed1-646bd0ed0f25", "metadata": {}, "source": [ "It is nice that Point-E provides its own native visualizer, but these two-dimensional projections are inherently limited. We can far more thoroughly and interactively visualize point clouds in FiftyOne's [$3D$ visualizer](https://docs.voxel51.com/user_guide/groups.html#using-the-3d-visualizer). Let's see how to load a Point-E point cloud into FiftyOne:" ] }, { "cell_type": "markdown", "id": "cc8c11ea-b214-47d0-ae77-f793fc39f813", "metadata": {}, "source": [ "In order to load the point cloud into FiftyOne, we will convert the point cloud from Point-E's native format into a more standard Open3D format, and create a sample in FiftyOne. First, let's see what the data structure for Point-E point clouds look like:" ] }, { "cell_type": "code", "execution_count": null, "id": "f0e66dff-df7f-4374-9d27-558b44e2cb4a", "metadata": {}, "outputs": [], "source": [ "print(pc)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "08655b41", "metadata": {}, "source": [ "\n", " PointCloud(coords=array([[ 0.00992463, 0.18218482, -0.40539104],\n", " [ 0.0122245 , 0.21034709, -0.32078362],\n", " [ 0.10288931, 0.4029989 , -0.40072548],\n", " ...,\n", " [-0.06964707, -0.33723998, -0.48611435],\n", " [ 0.00664746, 0.3134488 , -0.4915944 ],\n", " [ 0.1077411 , 0.3176389 , -0.423187 ]], dtype=float32), channels={'R': array([0.03921569, 0.04313726, 0.9450981 , ..., 0.9490197 , 0.9490197 ,\n", " 0.9490197 ], dtype=float32), 'G': array([0.04313726, 0.04705883, 0.05490196, ..., 0.04705883, 0.05490196,\n", " 0.03529412], dtype=float32), 'B': array([0.04313726, 0.04705883, 0.0509804 , ..., 0.04705883, 0.0509804 ,\n", " 0.03137255], dtype=float32)})" ] }, { "cell_type": "markdown", "id": "bd24fe2a-22d2-4d0d-ac4d-e50f831e5eef", "metadata": {}, "source": [ "Position coordinates are represented by a $(4096, 3)$ array in the `coords` attribute:" ] }, { "cell_type": "code", "execution_count": 27, "id": "2b3717d1-8898-40f0-9ea6-9a4c8e5c6b59", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(4096, 3)\n" ] } ], "source": [ "print(pc.coords.shape)" ] }, { "cell_type": "markdown", "id": "0bed1f4f-43dd-4b6d-8f42-2ad336cf2fda", "metadata": {}, "source": [ "And point colors are stored in a dict object within `channels`:" ] }, { "cell_type": "code", "execution_count": 31, "id": "49365be3-9fb7-4257-931d-c7845cd9bdd9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "dict_keys(['R', 'G', 'B'])\n", "4096\n" ] } ], "source": [ "print(pc.channels.keys())\n", "print(len(pc.channels['R']))" ] }, { "cell_type": "markdown", "id": "7332fd09-b46f-4613-a3ce-34667d46bf74", "metadata": {}, "source": [ "We can write a simple function that will take in a text prompt, generate the Point-E point cloud, and convert this into a standard Open3D point cloud (`open3d.geometry.PointCloud`) object:" ] }, { "cell_type": "code", "execution_count": 32, "id": "78087933-05d7-4da7-b19b-720e52de79a2", "metadata": {}, "outputs": [], "source": [ "def generate_pcd_from_text(prompt):\n", " samples = None\n", " for x in sampler.sample_batch_progressive(batch_size=1, model_kwargs=dict(texts=[prompt])):\n", " samples = x\n", " pointe_pcd = sampler.output_to_point_clouds(samples)[0]\n", "\n", " channels = pointe_pcd.channels\n", " r, g, b = channels[\"R\"], channels[\"G\"], channels[\"B\"]\n", " colors = np.vstack((r, g, b)).T\n", " points = pointe_pcd.coords\n", "\n", " pcd = o3d.geometry.PointCloud()\n", " pcd.points = o3d.utility.Vector3dVector(points)\n", " pcd.colors = o3d.utility.Vector3dVector(colors)\n", " return pcd" ] }, { "cell_type": "markdown", "id": "bcbeb50e-0f61-453e-aba0-85236afe4bba", "metadata": {}, "source": [ "To load this Open3D point cloud into FiftyOne, we can use Open3D's `open3d.io` module to write the point cloud to a `.pcd` file, and then create a FiftyOne Sample object associated with this file:" ] }, { "cell_type": "code", "execution_count": null, "id": "200a990a-cd9c-4675-875b-ef85f00a34d1", "metadata": {}, "outputs": [], "source": [ "headphone_pcd = generate_pcd_from_text('red and silver headphones')\n", "headphone_file = \"headphone.pcd\"\n", "o3d.io.write_point_cloud(headphone_file, headphone_pcd)" ] }, { "cell_type": "code", "execution_count": null, "id": "b536c820-0044-4151-9b66-dbc918abd3da", "metadata": {}, "outputs": [], "source": [ "headphone_dataset = fo.Dataset(name = \"headphone_dataset\")\n", "headphone_dataset.add_sample(\n", " fo.Sample(filepath=headphone_file)\n", ")\n", "session = fo.launch_app(headphone_dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "731cc0c4-2d73-41bc-a0c8-ac98a7bc0137", "metadata": {}, "source": [ "![pointe-headphones-fo](images/pointe_headphones_fo.gif)" ] }, { "cell_type": "markdown", "id": "d2b1f830-5fcf-4070-ab3a-cec57f548a13", "metadata": {}, "source": [ "## Curating synthetic $3D$ point cloud assets" ] }, { "cell_type": "markdown", "id": "60258c1f-4835-4564-a91d-ffcd28705ae3", "metadata": {}, "source": [ "Now that we have a workflow for generating $3D$ point cloud samples in FiftyOne with Point-E, we can generate an entire dataset of synthetic $3D$ point clouds. " ] }, { "cell_type": "markdown", "id": "3aed4dde-af30-49c8-af81-f0a63c94e9f0", "metadata": {}, "source": [ "In this walkthrough, we will generate a variety of vehicles. In particular, we will generate point clouds for vehicles of type `car`, `bike`, `bus`, and `motorcycle`. We will specify what vehicle we want Point-E to generate in our text prompt. Due to the probabilistic nature of diffusion models like Point-E, merely running a simple prompt like \"a bicycle\" multiple times will generate distinct point cloud models. \n", "\n", "To add even more variety, we will instruct Point-E to paint each of the vehicles two randomly chosen colors with a prompt of the form: `\"a $(COLOR1) $(VEHICLE_TYPE) with $(COLOR2) wheels\"`. This is just an illustrative example." ] }, { "cell_type": "code", "execution_count": 1, "id": "991b2097-0fa9-4a3f-9ea9-a0f1ce9e6c79", "metadata": {}, "outputs": [], "source": [ "VEHICLE_TYPES = [\"car\", \"bus\", \"bike\", \"motorcycle\"]\n", "VEHICLE_COLORS = [\"red\", \"blue\", \"green\", \"yellow\", \"white\"]" ] }, { "cell_type": "markdown", "id": "ae2fdc98-709c-4a0c-9dd3-dab8da9b6fbf", "metadata": {}, "source": [ "In this example, we will generate random filenames for each of the point cloud models, but you can specify filenames however you'd like:" ] }, { "cell_type": "code", "execution_count": 2, "id": "e3750f61-fd09-4176-b166-0f5b85f74fec", "metadata": {}, "outputs": [], "source": [ "def generate_filename():\n", " rand_str = str(uuid.uuid1()).split('-')[0]\n", " return \"pointe_vehicles/\" + rand_str + \".pcd\"" ] }, { "cell_type": "code", "execution_count": 3, "id": "b50e06d1-f272-4080-8d55-531bb4d3ab6e", "metadata": {}, "outputs": [], "source": [ "def generate_pointe_vehicle_dataset(\n", " dataset_name = \"point-e-vehicles\",\n", " num_samples = 100\n", "):\n", " samples = []\n", " for i in tqdm(range(num_samples)):\n", " vehicle_type = random.choice(VEHICLE_TYPES)\n", " cols = random.choices(VEHICLE_COLORS, k=2)\n", " prompt = f\"a {cols[0]} {vehicle_type} with {cols[1]} wheels\"\n", " pcd = generate_pcd_from_text(prompt)\n", " ofile = generate_filename()\n", " o3d.io.write_point_cloud(ofile, pcd)\n", " \n", " sample = fo.Sample(\n", " filepath = ofile,\n", " vehicle_type = fo.Classification(label = vehicle_type)\n", " )\n", " samples.append(sample)\n", " \n", " dataset = fo.Dataset(dataset_name)\n", " dataset.add_samples(samples)\n", " return dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "831f6f5d-098a-4a7f-b684-b827da0cef7f", "metadata": {}, "outputs": [], "source": [ "vehicle_dataset = generate_pointe_vehicle_dataset(\n", " dataset_name = \"point-e-vehicles\",\n", " num_samples = 100\n", ")" ] }, { "cell_type": "markdown", "id": "ddf1eb15-28de-4384-bc87-44ddc7879792", "metadata": {}, "source": [ "We will then make the dataset persistent so that it saves to database and we can load it at a later time." ] }, { "cell_type": "code", "execution_count": null, "id": "43d62fb2-c62a-4dec-97c4-8719f704378a", "metadata": {}, "outputs": [], "source": [ "vehicle_dataset.persistent = True" ] }, { "cell_type": "markdown", "id": "5df62b79-99c5-437c-8bc2-feb5dc66f3df", "metadata": {}, "source": [ "Before viewing this in the FiftyOne App, we can use FiftyOne's $3D$ utils to generate a two dimensional image for each point cloud, which will allow us to preview our samples in the sample grid. To do this, we will project the point clouds onto a $2D$ plane with the `fou3d.compute_orthographic_projection_images()` method. We will pass in a vector for the `projection_normal` argument to specify the plane about which to perform the orthographic projection." ] }, { "cell_type": "code", "execution_count": null, "id": "80d67f4d-6931-4922-b270-a3f74f3cba27", "metadata": {}, "outputs": [], "source": [ "size = (-1, 608) \n", "## height of images should be 608 pixels\n", "## - with aspect ratio preserved\n", "\n", "fou3d.compute_orthographic_projection_images(\n", " vehicle_dataset,\n", " size,\n", " \"/vehicle_side_view_images\",\n", " shading_mode=\"height\",\n", " projection_normal = (0, -1, 0)\n", ")" ] }, { "cell_type": "markdown", "id": "4598530f-015e-4b29-aaea-e190aba735a2", "metadata": {}, "source": [ "Now we are ready to look at our $3D$ point cloud models of vehicles:" ] }, { "cell_type": "code", "execution_count": null, "id": "6dcf3400-8c26-4b14-8482-804e19acf06d", "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(vehicle_dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "6e5a11f7-5ea4-48f9-9a33-c609b983f61b", "metadata": {}, "source": [ "![pointe-vehicle-projections](images/pointe_vehicles_projections.gif)" ] }, { "cell_type": "markdown", "id": "3b657251-7ce8-4d74-a9c2-91cb61f7a2e3", "metadata": {}, "source": [ "Taking a look at these point cloud models (or their orthographic projections), we can see that the vehicles are facing a variety of directions. For the sake of consistency (this will come in handy in the next section), let's use FiftyOne's in-app tagging capabilities to tag each of the point cloud samples with an orientation $\\in[$ `left`, `right`, `front`, `back`$]$. Once we tag a set of samples, we can omit samples with this tag from view in the app, so the we are only looking at untagged samples. " ] }, { "attachments": {}, "cell_type": "markdown", "id": "0d95c1aa-ceae-479f-85e8-06ead9faff80", "metadata": {}, "source": [ "![pointe-tag-orientations](images/pointe_tag_orientations.gif)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "54d42e19-d7c5-44ee-aa0c-1ed4a84d00b3", "metadata": {}, "source": [ "Additionally, as we are curating a high quality dataset of vehicle point cloud assets, we can identify models that do not fit our needs. This may include point cloud models that are not good enough representations of vehicles, models that are out of our desired distribution, or models we feel are too similar to other models in our dataset. We will tag these samples as \"bad\"." ] }, { "attachments": {}, "cell_type": "markdown", "id": "bf6e15ba-829a-4c89-aacf-beaf17000cbd", "metadata": {}, "source": [ "![pointe-tag-bad](images/pointe_tag_bad.gif)" ] }, { "cell_type": "markdown", "id": "6737a7f4-b49a-4e4b-9ac3-3a3cd752cc77", "metadata": {}, "source": [ "Once we have tagged all of our samples, we can convert the orientation tags into a new `orientation` field on our samples:" ] }, { "cell_type": "code", "execution_count": null, "id": "c70e71bf-a5a6-4f2c-ab60-42cd5d1f5d06", "metadata": {}, "outputs": [], "source": [ "orientations = [\"left\", \"right\", \"front\", \"back\"]\n", "\n", "vehicle_dataset.add_sample_field(\"orientation\", fo.StringField)\n", "\n", "for orientation in orientations:\n", " view = vehicle_dataset.match_tags(orientation)\n", " view.set_values(\"orientation\", [orientation]*len(view))" ] }, { "cell_type": "markdown", "id": "9d1a88cf-33d6-4b59-b05d-3626ad807323", "metadata": {}, "source": [ "Additionally, we can pick out our desired subset of vehicle assets by matching for samples without the `bad` tag:" ] }, { "cell_type": "code", "execution_count": null, "id": "1972dcda-9510-4a0f-8911-f403a46780cf", "metadata": {}, "outputs": [], "source": [ "usable_vehicles_dataset = vehicle_dataset.match(~F(\"tags\").contains(\"bad\"))" ] }, { "cell_type": "markdown", "id": "2fc3fa51-5f87-400b-b894-ceca93c57ebc", "metadata": {}, "source": [ "## Constructing a self-driving dataset" ] }, { "cell_type": "markdown", "id": "8af33650-cd17-4d20-911a-70b7983d2623", "metadata": {}, "source": [ "Now that we have a dataset of usable $3D$ point cloud models for vehicles, we can use these to construct a $3D$ point cloud dataset of road scenes, for use in self-driving applications. In this section, we'll show you how to get started building your own simple road scenes dataset." ] }, { "cell_type": "markdown", "id": "73be25f9-04d8-466c-8614-29095c049697", "metadata": {}, "source": [ "### Constructing a road point cloud" ] }, { "cell_type": "markdown", "id": "6eec5592-8ae9-4521-a344-55e4a60e7728", "metadata": {}, "source": [ "In this walkthough, we will limit our scope to basic roads with one lane of traffic going in each direction, separated by a dashed yellow line. We will use the same road as the foundation for each scene. Let's construct a simple point cloud model for the road:" ] }, { "cell_type": "code", "execution_count": null, "id": "44db524a-1cf0-444a-93f1-9b891daeccaa", "metadata": {}, "outputs": [], "source": [ "ROAD_LENGTH = 40\n", "ROAD_WIDTH = 6\n", "FRONT = -ROAD_LENGTH/2.\n", "BACK = ROAD_LENGTH/2.\n", "LANE_WIDTH = ROAD_WIDTH/2.\n", "LINE_POINTS = 1000" ] }, { "cell_type": "markdown", "id": "8f15245d-2b5c-413d-a6e4-c123540f3478", "metadata": {}, "source": [ "We will build the point cloud model for the road out of two white lines, for the left and right edges of the scene, and a dashed yellow line as a divider. We will save the point cloud model in the file `road.pcd`:" ] }, { "cell_type": "code", "execution_count": null, "id": "87420cdb-bf76-4412-b7a9-de5ac9d3df25", "metadata": {}, "outputs": [], "source": [ "def generate_road_pcd():\n", " ## LEFT LINE\n", " road_line_left_points = np.vstack(\n", " (np.zeros(LINE_POINTS) - ROAD_WIDTH/2., \n", " np.linspace(FRONT, BACK, LINE_POINTS),\n", " np.zeros(LINE_POINTS))\n", " ).T\n", " \n", " ## RIGHT LINE\n", " road_line_right_points = np.copy(-road_line_left_points)\n", " \n", " ## CENTER LINE\n", " road_line_center_points_y = np.linspace(\n", " FRONT, \n", " BACK, \n", " LINE_POINTS\n", " )\n", " road_line_center_points_y = road_line_center_points_y[\n", " np.where(np.mod(road_line_center_points_y, 1) > 0.3)[0]\n", " ]\n", " num_center_points = len(road_line_center_points_y)\n", " road_line_center_points = np.vstack(\n", " (np.zeros(num_center_points), \n", " road_line_center_points_y,\n", " np.zeros(num_center_points))\n", " ).T\n", " \n", " ## CONCATENATE\n", " road_pcd_points = np.concatenate(\n", " (road_line_center_points,\n", " road_line_left_points,\n", " road_line_right_points,\n", " )\n", " )\n", " \n", " ## COLOR\n", " ## white\n", " road_pcd_colors = 255 * np.ones(road_pcd_points.shape)\n", " ## yellow\n", " road_pcd_colors[:num_center_points, 2] = 0\n", " \n", " road_pcd = o3d.geometry.PointCloud()\n", " road_pcd.points = o3d.utility.Vector3dVector(road_pcd_points)\n", " road_pcd.colors = o3d.utility.Vector3dVector(road_pcd_colors)\n", " o3d.io.write_point_cloud(\"road.pcd\", road_pcd)" ] }, { "cell_type": "code", "execution_count": null, "id": "6c03559f-e44b-4243-8de4-2e92c9b118d7", "metadata": {}, "outputs": [], "source": [ "generate_road_pcd()" ] }, { "cell_type": "markdown", "id": "a6b6e91c-f74e-4a14-9fb0-3bc4b374880f", "metadata": {}, "source": [ "### Placing a vehicle on the road" ] }, { "cell_type": "markdown", "id": "79fbf78c-e162-4410-9234-535ace2240b8", "metadata": {}, "source": [ "Next, we piece together a workflow for placing individual vehicles on the road in a scene. This workflow needs to take into account a few factors:\n", "\n", "* Point-E point clouds are not always centered at the origin\n", "* The Point-E point clouds were all generated to fill the same volume, so a `bus` point cloud model will be the same size as a `bicycle` point cloud model\n", "* The vehicle models were generated with variable orientations, which we have categorized, but need to be taken into account in our road scenes\n", "* Vehicles must face in the direction that traffic is flowing, which depends on the side of the road on which we place the vehicle" ] }, { "cell_type": "markdown", "id": "70c66e5b-39af-49c4-b508-18f09672c7e7", "metadata": {}, "source": [ "With these considerations in mind, we compose the following steps:\n", "\n", "1. Center the vehicle at the origin\n", "2. Scale the vehicle according to its vehicle type\n", "3. Pick a side of the road for the vehicle\n", "4. Orient the vehicle, given its initial orientation and selected side of the road\n", "5. Position the vehicle on the road\n", "\n", "In the cells below, we implement minimal functions for these steps, leveraging Open3D's comprehensive point cloud functionality:" ] }, { "cell_type": "code", "execution_count": 17, "id": "458bf99e-9dbe-421b-ab5e-da989cfc824b", "metadata": {}, "outputs": [], "source": [ "ORIGIN = (0.0, 0.0, 0.0)\n", "\n", "def center_vehicle(vehicle_pcd):\n", " vehicle_pcd.translate(ORIGIN, relative=False)" ] }, { "cell_type": "code", "execution_count": null, "id": "a42d2550-950b-4bd8-ac2f-ce2886196056", "metadata": {}, "outputs": [], "source": [ "VEHICLE_SCALE_MAP = {\n", " \"car\": 4.0,\n", " \"bus\": 6.0,\n", " \"motorcycle\": 2.5,\n", " \"bike\": 1.5\n", "}\n", "\n", "def scale_vehicle(vehicle_pcd, vehicle_type):\n", " vehicle_scale = VEHICLE_SCALE_MAP[vehicle_type]\n", " vehicle_pcd.scale(vehicle_scale, ORIGIN)" ] }, { "cell_type": "code", "execution_count": null, "id": "ea005162-83f5-4e70-af85-7812796aa722", "metadata": {}, "outputs": [], "source": [ "def choose_side_of_road():\n", " return random.choice([\"left\", \"right\"])" ] }, { "cell_type": "code", "execution_count": null, "id": "db98408c-a51f-4166-97ca-e3a7238ad517", "metadata": {}, "outputs": [], "source": [ "ORIENTATION_TO_ROTATION = {\n", " \"back\": 0.,\n", " \"right\": np.pi/2.,\n", " \"front\": np.pi,\n", " \"left\": 3 * np.pi/2.,\n", "}\n", "\n", "def orient_vehicle(vehicle_pcd, side_of_road, initial_orientation):\n", " rot_xyz = [0., 0., 0.]\n", " rot_xyz[2] += ORIENTATION_TO_ROTATION[initial_orientation]\n", " \n", " if side_of_road == \"left\":\n", " rot_xyz[2] += np.pi\n", " R = vehicle_pcd.get_rotation_matrix_from_xyz(rot_xyz)\n", " vehicle_pcd.rotate(R)" ] }, { "cell_type": "code", "execution_count": null, "id": "148ae477-2a53-4259-afa0-ffaa9c61f35d", "metadata": {}, "outputs": [], "source": [ "## randomly position the vehicle in its lane\n", "def position_vehicle(vehicle_pcd, side_of_road):\n", " ## raise vehicle so it is ON the road\n", " minz = np.amin(np.array(vehicle_pcd.points), axis = 0)[-1]\n", " \n", " xpos = np.random.normal(loc = LANE_WIDTH/2., scale = 0.4)\n", " if side_of_road == \"left\":\n", " xpos *= -1\n", " \n", " ypos = np.random.uniform(low = FRONT, high=BACK)\n", " \n", " translation = [xpos, ypos, -minz]\n", " vehicle_pcd.translate(translation)" ] }, { "cell_type": "markdown", "id": "5586c79c-8d03-484e-bb8d-0fb51c12429e", "metadata": {}, "source": [ "We can then wrap all of this up in a function which takes in a sample from the usable subset of the FiftyOne point cloud vehicle asset dataset, and returns a tuple containing the transformed point cloud for the vehicle, and a label for its vehicle type:" ] }, { "cell_type": "code", "execution_count": null, "id": "28aa99ba-fa6e-44a9-8f7c-e8e97d6968c1", "metadata": {}, "outputs": [], "source": [ "def generate_scene_vehicle(sample):\n", " vehicle_type = sample.vehicle_type.label\n", " \n", " initial_orientation = sample.orientation\n", " side_of_road = choose_side_of_road()\n", " \n", " vehicle_pcd = o3d.io.read_point_cloud(sample.filepath)\n", " center_vehicle(vehicle_pcd)\n", " scale_vehicle(vehicle_pcd, vehicle_type)\n", " orient_vehicle(vehicle_pcd, side_of_road, initial_orientation)\n", " position_vehicle(vehicle_pcd, side_of_road)\n", " return (vehicle_pcd, vehicle_type)" ] }, { "cell_type": "markdown", "id": "72746178-d5c2-4407-b257-e2e8fb4a206c", "metadata": {}, "source": [ "We can generate a \"scene\" with this vehicle placed on the road by adding this point cloud to the point cloud for the road:" ] }, { "cell_type": "code", "execution_count": null, "id": "b49a2da3-d18b-4f92-93cf-f24c0f1ae1c6", "metadata": {}, "outputs": [], "source": [ "sample = usable_vehicles_dataset.take(1).first()\n", "vehicle_pcd, vehicle_type = generate_scene_vehicle(sample)\n", "road_pcd = o3d.io.read_point_cloud(\"road.pcd\")\n", "vehicle_on_road_pcd = road_pcd + vehicle_pcd" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fb075ba6-42d9-4838-bded-1cef5c11846b", "metadata": {}, "source": [ "![pointe-single-vehicle-scene](images/pointe_single_vehicle_scene.png)" ] }, { "cell_type": "markdown", "id": "72911de0-692c-4356-babb-6f0060e3fa2b", "metadata": {}, "source": [ "### Constructing road scenes" ] }, { "cell_type": "markdown", "id": "b91c2c94-3020-4c1f-ace7-3389ac83ecd2", "metadata": {}, "source": [ "Now that we have a workflow for placing a single vehicle on the road, we can construct \"scenes\" including multiple vehicles on the road. " ] }, { "cell_type": "markdown", "id": "5fd3db12-b739-4d1d-b6d3-4f6d07c25c18", "metadata": {}, "source": [ "To do this, we need to ensure that none of the vehicles in our scene overlap with each other. We can implement this logic by using Open3D's `compute_point_cloud_distance()` method to compute the distance between the point cloud models for vehicles that are already in a given scene, and the point cloud model for a prospective vehicle that will potentially be placed in the scene. If the minimum distance between any of the existing scene vehicle point clouds and the candidate vehicle point cloud is below some threshold, then we retry randomly placing a vehicle in the scene." ] }, { "cell_type": "markdown", "id": "47cb3016-590f-4cbd-b2e8-dc923e0cd2c4", "metadata": {}, "source": [ "We will wrap this logic in a new `check_compatibility()` function:" ] }, { "cell_type": "code", "execution_count": null, "id": "ea0fad2e-8558-4754-aee6-594bd8512066", "metadata": {}, "outputs": [], "source": [ "def check_compatibility(\n", " vehicle,\n", " scene_vehicles,\n", " thresh = 0.2\n", "):\n", " for sv in scene_vehicles:\n", " dists = vehicle[0].compute_point_cloud_distance(sv[0])\n", " if np.amin(np.array(dists)) < thresh:\n", " return False\n", " return True" ] }, { "cell_type": "markdown", "id": "4877b23e-4b16-407b-8606-a3dd64d4af0c", "metadata": {}, "source": [ "The last ingredient is a simple function to randomly select a single sample from our usable vehicle assets dataset:" ] }, { "cell_type": "code", "execution_count": null, "id": "c254391b-012c-48dc-89a7-846336ec843b", "metadata": {}, "outputs": [], "source": [ "def choose_vehicle_sample():\n", " return usable_vehicles_dataset.take(1).first()" ] }, { "cell_type": "markdown", "id": "64c1b97e-b3ab-4bb3-9f97-dc56982f824f", "metadata": {}, "source": [ "Finally, we are ready to generate road scenes! The following function generates a FiftyOne point cloud sample for a road scene with `num_vehicles` vehicles, and stores the point cloud for the scene in `scene_filepath`:" ] }, { "cell_type": "code", "execution_count": null, "id": "188cdc6b-e2a5-43fd-b39e-9ce71d54275e", "metadata": {}, "outputs": [], "source": [ "def generate_scene_sample(num_vehicles, scene_filepath):\n", " ZERO_ROT = [0., 0., 0.]\n", " \n", " sample = choose_vehicle_sample()\n", " scene_vehicles = [generate_scene_vehicle(sample)]\n", " \n", " k = 1\n", " while k < num_vehicles:\n", " sample = choose_vehicle_sample()\n", " candidate_vehicle = generate_scene_vehicle(sample)\n", " if check_compatibility(\n", " candidate_vehicle,\n", " scene_vehicles\n", " ):\n", " scene_vehicles.append(candidate_vehicle)\n", " k += 1\n", " \n", " detections = []\n", " scene_pcd = o3d.io.read_point_cloud(\"road.pcd\")\n", " for vehicle in scene_vehicles:\n", " vehicle_pcd, vehicle_type = vehicle\n", " scene_pcd = scene_pcd + vehicle_pcd\n", " obb = vehicle_pcd.get_oriented_bounding_box()\n", " dim, loc = obb.extent, obb.center\n", " dim = [dim[2], dim[0], dim[1]]\n", " detection = fo.Detection(\n", " label = vehicle_type,\n", " location = list(loc),\n", " dimensions = list(dim),\n", " rotation = list(ZERO_ROT)\n", " )\n", " detections.append(detection)\n", " \n", " o3d.io.write_point_cloud(scene_filepath, scene_pcd)\n", " sample = fo.Sample(\n", " filepath = scene_filepath,\n", " ground_truth = fo.Detections(detections=detections)\n", " )\n", " \n", " return sample" ] }, { "cell_type": "markdown", "id": "e05dc19e-1250-48e3-b155-fab82a242a8e", "metadata": {}, "source": [ "This `generate_scene_sample()` function not only generates a complete point cloud scene - it also generates labeled $3D$ object detection bounding boxes for each vehicle using Open3D's `get_oriented_bounding_box()` method. You can then use these as ground truth labels to train your own $3D$ road scenes object detection model." ] }, { "cell_type": "markdown", "id": "0debf8c3-cd4c-457f-88c8-2cb660b258cf", "metadata": {}, "source": [ "All that is left to do is populate a new FiftyOne dataset with these generated road scenes." ] }, { "cell_type": "markdown", "id": "39dbff13-4bbc-4895-a4fe-80884198e174", "metadata": {}, "source": [ "To generate this dataset, we will randomly select a number of vehicles for each scene, from within a set range:" ] }, { "cell_type": "code", "execution_count": null, "id": "0a13f1d9-331a-4d71-ae21-449a85db9939", "metadata": {}, "outputs": [], "source": [ "MIN_SCENE_VEHICLES = 1\n", "MAX_SCENE_VEHICLES = 7\n", "\n", "def choose_num_scene_vehicles():\n", " return random.randint(\n", " MIN_SCENE_VEHICLES, \n", " MAX_SCENE_VEHICLES\n", " )" ] }, { "cell_type": "markdown", "id": "aed5ee31-7b12-4227-a385-e5bce91999f0", "metadata": {}, "source": [ "And we will randomly generate filepaths for the scenes:" ] }, { "cell_type": "code", "execution_count": null, "id": "aefa84f2-e0da-4820-9dbd-5a763dd75b27", "metadata": {}, "outputs": [], "source": [ "def generate_scene_filepath():\n", " rand_str = str(uuid.uuid1()).split('-')[0]\n", " return \"pointe_road_scenes/\" + rand_str + \".pcd\"" ] }, { "cell_type": "markdown", "id": "94c5772e-670a-45e4-81c9-cc8b51e5a7fd", "metadata": {}, "source": [ "Putting it all together:" ] }, { "cell_type": "code", "execution_count": null, "id": "ebfd445c-a6ed-4c81-ae25-e0672c8179f5", "metadata": {}, "outputs": [], "source": [ "def generate_road_scenes_dataset(num_scenes):\n", " samples = []\n", " for i in range(num_scenes):\n", " num_scene_vehicles = choose_num_scene_vehicles()\n", " scene_filepath = generate_scene_filepath()\n", " \n", " sample = generate_scene_sample(\n", " num_scene_vehicles,\n", " scene_filepath\n", " )\n", " samples.append(sample)\n", " \n", " dataset = fo.Dataset(name = \"point-e-road-scenes\")\n", " dataset.add_samples(samples)\n", " dataset.persistent = True\n", " return dataset" ] }, { "cell_type": "code", "execution_count": null, "id": "f60e2744-45f9-4984-bca6-6dc221d762b9", "metadata": {}, "outputs": [], "source": [ "num_scenes = 100\n", "road_scene_dataset = generate_road_scenes_dataset(num_scenes)" ] }, { "cell_type": "markdown", "id": "d4992980-af68-479e-88d5-f4ff91a274fa", "metadata": {}, "source": [ "If you'd like, you can also generate bird's eye view projection images for these scenes, so you can preview scenes in the sample grid:" ] }, { "cell_type": "code", "execution_count": null, "id": "1ce83aeb-d4e6-40a7-a471-688b2a7846d5", "metadata": {}, "outputs": [], "source": [ "size = (-1, 608)\n", "\n", "fou3d.compute_orthographic_projection_images(\n", " road_scene_dataset,\n", " size,\n", " \"/road_scene_bev_images\",\n", " shading_mode=\"rgb\",\n", ")" ] }, { "cell_type": "code", "execution_count": null, "id": "685aec03-f09f-49d2-9344-cea8849260e1", "metadata": {}, "outputs": [], "source": [ "session = fo.launch_app(road_scene_dataset)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "d4cb1121", "metadata": {}, "source": [ "![pointe-road-scene](images/pointe_road_scenes_3d_visualizer.gif)" ] }, { "cell_type": "markdown", "id": "cefb8dc5-c09a-46a9-8455-07037f6f524a", "metadata": {}, "source": [ "## Conclusion" ] }, { "cell_type": "markdown", "id": "caa41bd3-710a-47ad-a5ec-177eaff8dc25", "metadata": {}, "source": [ "FiftyOne is a valuable tool that can help you to build high quality computer vision datasets. This is true whether you are working with images, videos, point clouds, or geo data. And this is true whether you are adapting existing datasets, or constructing your own datasets from scratch!" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.13" } }, "nbformat": 4, "nbformat_minor": 5 }