{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Step 1: Loading a Self Driving Car Dataset into FiftyOne\n", "\n", "## Installation\n", "Before gettting started,it is highly suggested that you create a venv or conda environment to manage dependencies. This is a difficult one to nail right so here is the steps to be able to run the getting started experience" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!conda create -n driving_workshop python=3.10" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!conda activate driving_workshop" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install fiftyone transformers umap-learn \\\n", "torch torchvision nuscenes-devkit open3d numpy==1.26.4" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "!pip install matplotlib --upgrade" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exploring nuScenes in FiftyOne\n", "Due to the multi-sensor structure of self-driving car datasets, the dataset in FiftyOne will be a [Grouped Dataset](https://docs.voxel51.com/user_guide/groups.html?_gl=1*1uqzezz*_gcl_au*MTI3OTEyMjEwMy4xNzI2MTQ5NDk3) with some [Dynamic Group Views](https://docs.voxel51.com/user_guide/using_views.html?_gl=1*1s195kz*_gcl_au*MTI3OTEyMjEwMy4xNzI2MTQ5NDk3#view-groups) thrown in there as well. We will be using nuscenes for our examples.\n", "\n", "At a high level, we will group together our samples by their associated scene in nuScenes. At regular intervals of each keyframe or approximately every 0.5 seconds (2Hz), we incorporate data from every sensor type, including their respective detections. This amalgamation of data results in distinct groups, each representing the sensor perspective at a given keyframe. We do have each sensor input for every frame, but since only keyframes are annotated, we choose to only load those in.\n", "\n", "## Ingesting nuScenes\n", "To get started with nuScenes in FiftyOne, first we need to set up our environment for nuScenes. It will require downloading the dataset or a snippet of it as well as downloading the nuScenes python sdk. Full steps on installing can be found [here](https://www.nuscenes.org/nuscenes?tutorial=nuscenes). \n", "\n", "We want two files, `v1.0-mini.tgz` (the mini version of the full v1.0 dataset) and `nuScenes-lidarseg-mini-v1.0.tar.bz2` The tgz files sshould be dropped into the data/ folder and unpacked with:" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "vscode": { "languageId": "powershell" } }, "outputs": [], "source": [ "cd data\n", "tar zxf v1.0-mini.tgz\n", "tar xf nuScenes-lidarseg-mini-v1.0.tar.bz2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Once your nuScenes is downloaded, we can kick things off. Let’s start by initializing both nuScenes as well as our FiftyOne dataset. We define our dataset as well as add a group to initialize the dataset to expect grouped data." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nuscenes import NuScenes\n", "import fiftyone as fo\n", "\n", "# Make sure its absolute path!\n", "dataroot='/path/to/Self-Driving-Dataset/data/'\n", "nusc = NuScenes(version='v1.0-mini', dataroot=dataroot, verbose=True)\n", "\n", "dataset = fo.Dataset(\"nuscenes_cameras\",overwrite=True)\n", "dataset.add_group_field(\"group\", default=\"CAM_FRONT\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Sensor Data\n", "It is recommended that you make helper functions to load in each sensor modality. These functions should take in path or token that points to sensor data file and any associated labels or metadata. \n", "\n", "### Camera Data \n", "Camera data is a bit more straightforward than the lidar data as there is no need to do any prep on the image or save it as a different format. We can go right ahead and grab the detections to add them. The only tricky part is these are no ordinary bounding boxes, they are 3D bounding boxes given in global coordinates!\n", "\n", "Luckily for us, nuScenes provides some easy ways to convert their bounding boxes to our pixel space relative to what camera the image came from. For camera data, we load all of our boxes for our sample, check to see which ones are in the frame of our camera data, and then add the cuboids to the sample. In order to add a cuboid or a 3D bounding box, we use [polylines](https://docs.voxel51.com/user_guide/using_datasets.html?_gl=1*1vonm1t*_gcl_au*MTI3OTEyMjEwMy4xNzI2MTQ5NDk3#cuboids) and the [from_cuboid()](https://docs.voxel51.com/api/fiftyone.core.labels.html?_gl=1*z39nk3*_gcl_au*MTI3OTEyMjEwMy4xNzI2MTQ5NDk3#fiftyone.core.labels.Polyline.from_cuboid) method. Let’s take a look at how it is done:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nuscenes.utils.geometry_utils import box_in_image, view_points\n", "import numpy as np\n", "from nuscenes.utils.geometry_utils import BoxVisibility\n", "from nuscenes.scripts.export_poses import derive_latlon\n", "\n", "def camera_sample(group, filepath, sensor, token, scene):\n", " sample = fo.Sample(filepath=filepath, group=group.element(sensor))\n", " data_path, boxes, camera_intrinsic = nusc.get_sample_data(token, box_vis_level=BoxVisibility.NONE,)\n", " image = Image.open(data_path)\n", " width, height = image.size\n", " shape = (height,width)\n", " polylines = []\n", " log = nusc.get('log', scene[\"log_token\"])\n", " location = log[\"location\"]\n", " ego = nusc.get('ego_pose', data[\"ego_pose_token\"])\n", " ego_list = [ego]\n", "\n", " latlon = derive_latlon(location,ego_list)\n", " lat = latlon[0][\"latitude\"]\n", " lon = latlon[0][\"longitude\"]\n", " sample[\"location\"] = fo.GeoLocation(\n", " point = [lon,lat]\n", " )\n", " for box in boxes:\n", " if box_in_image(box,camera_intrinsic,shape,vis_level=BoxVisibility.ALL):\n", " c = np.array(nusc.colormap[box.name]) / 255.0\n", " corners = view_points(box.corners(), camera_intrinsic, normalize=True)[:2, :]\n", " front = [(corners[0][0]/width,corners[1][0]/height),\n", " (corners[0][1]/width,corners[1][1]/height),\n", " (corners[0][2]/width,corners[1][2]/height),\n", " (corners[0][3]/width,corners[1][3]/height),]\n", " back = [(corners[0][4]/width,corners[1][4]/height),\n", " (corners[0][5]/width,corners[1][5]/height),\n", " (corners[0][6]/width,corners[1][6]/height),\n", " (corners[0][7]/width,corners[1][7]/height),]\n", " polylines.append(fo.Polyline.from_cuboid(front + back, label=box.name))\n", " sample[\"cuboids\"] = fo.Polylines(polylines=polylines)\n", " return sample" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Adding Samples to the Dataset\n", "Next we need to add our camera samples to make our first dataset! Lets loop through all the scenes and add each camera angle, forming groups of 6 samples each." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from PIL import Image\n", "from nuscenes.utils.geometry_utils import view_points, BoxVisibility, box_in_image\n", "import numpy as np\n", "import os\n", "\n", "# Define our sensor groups\n", "groups = [\"CAM_FRONT\", \"CAM_FRONT_RIGHT\", \"CAM_BACK_RIGHT\", \"CAM_BACK\",\n", " \"CAM_BACK_LEFT\", \"CAM_FRONT_LEFT\",]\n", "\n", "samples = []\n", "\n", "# Iterate over each scene\n", "for scene in nusc.scene:\n", " my_scene = scene\n", " token = my_scene['first_sample_token']\n", " my_sample = nusc.get('sample', token)\n", " last_sample_token = my_scene['last_sample_token']\n", " \n", " # Iterate over each sample in the scene\n", " while not my_sample[\"next\"] == \"\":\n", " scene_token = my_sample[\"scene_token\"]\n", " group = fo.Group()\n", " # Iterate over each sensor in the sample\n", " for sensor in groups:\n", " data = nusc.get('sample_data', my_sample['data'][sensor])\n", " filepath = dataroot + data[\"filename\"]\n", "\n", " # Check if the sensor is a camera\n", " if data[\"sensor_modality\"] == \"camera\":\n", " sample = camera_sample(group, filepath, sensor, my_sample['data'][sensor],scene)\n", "\n", " # Add metadata to the sample\n", " sample[\"token\"] = data[\"token\"]\n", " sample[\"ego_pose_token\"] = data[\"ego_pose_token\"]\n", " sample[\"calibrated_sensor_token\"] = data[\"calibrated_sensor_token\"]\n", " sample[\"timestamp\"] = data[\"timestamp\"]\n", " sample[\"is_key_frame\"] = data[\"is_key_frame\"]\n", " sample[\"prev\"] = data[\"prev\"]\n", " sample[\"next\"] = data[\"next\"]\n", " sample[\"scene_token\"] = scene_token\n", "\n", " \n", " samples.append(sample)\n", "\n", " token = my_sample[\"next\"]\n", "\n", " my_sample = nusc.get('sample', token)\n", "\n", "# Add the samples to the dataset, group by scene_token, and launch the app \n", "dataset.add_samples(samples)\n", "view = dataset.group_by(\"scene_token\", order_by=\"timestamp\")\n", "session = fo.launch_app(view)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![nuscenes_view](https://cdn.voxel51.com/getting_started_self_driving/notebook1/nuscenes_view.webp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading Point Cloud Data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# New dataset for all sensors including LIDAR and RADAR\n", "all_sensor_dataset = fo.Dataset(\"nuscenes_sensors\",overwrite=True)\n", "all_sensor_dataset.add_group_field(\"group\", default=\"CAM_FRONT\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Loading a LIDAR sample from nuScenes is composed of two steps, generating the pointcloud and adding the detections. We must convert the binary point clouds to standard in order to ingest them. nuScenes also offers a LIDAR segmentation optional package that allows us to color each point cloud point a color corresponding to its class that we will be utilizing. We start with our lidar token, load in the color map and point cloud that corresponds to the token, and save them back to file with the new coloring and standard pcd point cloud file formatting." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nuscenes.utils.data_io import load_bin_file\n", "from nuscenes.utils.color_map import get_colormap\n", "from nuscenes.lidarseg.lidarseg_utils import paint_points_label\n", "from nuscenes.utils.data_classes import LidarPointCloud\n", "import open3d as o3d\n", "import os\n", "\n", "\n", "def load_lidar(lidar_token):\n", "\n", " #Grab and Generate Colormaps\n", " gt_from = \"lidarseg\"\n", " lidarseg_filename = dataroot + nusc.get(gt_from, lidar_token)['filename']\n", " colormap = get_colormap()\n", " name2index = nusc.lidarseg_name2idx_mapping\n", "\n", " coloring = paint_points_label(lidarseg_filename,None,name2index, colormap=colormap)\n", " filepath = dataroot + nusc.get(\"sample_data\", lidar_token)['filename']\n", " root, extension = os.path.splitext(filepath)\n", "\n", " #Load Point Cloud\n", " cloud = LidarPointCloud.from_file(filepath)\n", " pcd = o3d.geometry.PointCloud()\n", " pcd.points = o3d.utility.Vector3dVector(cloud.points[:3,:].T)\n", " colors = coloring[:,:3]\n", " colors.max()\n", " pcd.colors = o3d.utility.Vector3dVector(colors)\n", "\n", " #Save back Point Cloud\n", " o3d.io.write_point_cloud(root, pcd)\n", "\n", " return root " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With our point cloud file now properly prepared for ingestion, we can move along to adding detections. To do so, we grab all the detections from the keyframe. We use nuScenes SDK’s builtin box methods to retrieve the location, rotation, and dimensions of the box. To match [FiftyOne’s 3D detection](https://docs.voxel51.com/user_guide/using_datasets.html#d-datasets) input, we take box.orientation.yaw_pitch_roll for rotation, box.wlh for width, length, and height, and box.center for its location. Note too that fo.Sample(filepath=filepath, group=group.element(sensor)) will automatically detect the pcd file and ingest the sample as a point cloud as well! After the method is run and detections are added, we have our LIDAR sample with detections!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nuscenes.utils.geometry_utils import BoxVisibility\n", "from nuscenes.scripts.export_poses import derive_latlon\n", "\n", "\n", "def lidar_sample(group, filepath, sensor, lidar_token, scene):\n", " # Get the lidar data\n", " data_path, boxes, camera_intrinsic = nusc.get_sample_data(lidar_token, box_vis_level=BoxVisibility.NONE,)\n", " data = nusc.get('sample_data', lidar_token)\n", " log = nusc.get('log', scene[\"log_token\"])\n", " location = log[\"location\"]\n", " ego = nusc.get('ego_pose', data[\"ego_pose_token\"])\n", " ego_list = [ego]\n", " latlon = derive_latlon(location,ego_list)\n", " lat = latlon[0][\"latitude\"]\n", " lon = latlon[0][\"longitude\"]\n", "\n", " # Create a sample\n", " sample = fo.Sample(filepath=filepath, group=group.element(sensor))\n", "\n", " # Add the coords to the sample\n", " sample[\"location\"] = fo.GeoLocation(\n", " point = [lon,lat]\n", " )\n", " \n", " # Add detections to the pcd\n", " detections = []\n", " for box in boxes:\n", " \n", " x, y, z = box.orientation.yaw_pitch_roll\n", " w, l, h = box.wlh.tolist()\n", "\n", " detection = fo.Detection(\n", " label=box.name,\n", " location=box.center.tolist(),\n", " rotation=[z, y, x],\n", " dimensions=[l,w,h]\n", " )\n", " detections.append(detection)\n", " sample[\"ground_truth\"] = fo.Detections(detections=detections)\n", " return sample" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "RADAR is an interesting case. Since we have already stored our 3d detections in the LIDAR sample and RADAR is laid on top of the LIDAR in the 3D visualizer, we don’t need to copy our detections for each point cloud. The simplifies loading RADAR to just:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from nuscenes.utils.data_classes import RadarPointCloud\n", "from pyquaternion import Quaternion\n", "\n", "\n", "def load_radar(filepath, data ):\n", "\n", " root, extension = os.path.splitext(filepath)\n", " \n", " #Load Point Cloud\n", " pc = RadarPointCloud.from_file(filepath)\n", "\n", " cs_record = nusc.get('calibrated_sensor', data['calibrated_sensor_token'])\n", " pc.rotate(Quaternion(cs_record['rotation']).rotation_matrix)\n", " pc.translate(np.array(cs_record['translation']))\n", "\n", " pcd = o3d.geometry.PointCloud()\n", " print(pc.points.shape)\n", " pcd.points = o3d.utility.Vector3dVector(pc.points[:3,:].T)\n", "\n", " #Save back Point Cloud\n", " o3d.io.write_point_cloud(root+\"_NEW.pcd\", pcd)\n", " \n", " return root+\"_NEW.pcd\"" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's run it back again except this time we are taking cameras, LIDAR, and RADAR samples!" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "groups = [\"CAM_FRONT\", \"CAM_FRONT_RIGHT\", \"CAM_BACK_RIGHT\", \"CAM_BACK\",\n", " \"CAM_BACK_LEFT\", \"CAM_FRONT_LEFT\",\"LIDAR_TOP\", \"RADAR_FRONT\",\n", " \"RADAR_FRONT_LEFT\", \"RADAR_FRONT_RIGHT\", \"RADAR_BACK_LEFT\", \"RADAR_BACK_RIGHT\"]\n", "\n", "samples = []\n", "\n", "# Iterate over each scene\n", "for scene in nusc.scene:\n", " my_scene = scene\n", " token = my_scene['first_sample_token']\n", " my_sample = nusc.get('sample', token)\n", " last_sample_token = my_scene['last_sample_token']\n", " \n", " # Iterate over each sample in the scene\n", " while not my_sample[\"next\"] == \"\":\n", " scene_token = my_sample[\"scene_token\"]\n", " lidar_token = my_sample[\"data\"][\"LIDAR_TOP\"]\n", " group = fo.Group()\n", " # Iterate over each sensor in the sample\n", " for sensor in groups:\n", " data = nusc.get('sample_data', my_sample['data'][sensor])\n", " filepath = dataroot + data[\"filename\"]\n", " \n", " # Check if the sensor is lidar\n", " if data[\"sensor_modality\"] == \"lidar\":\n", " filepath = load_lidar(lidar_token)\n", " sample = lidar_sample(group,filepath, sensor, lidar_token, scene)\n", "\n", " # Check if the sensor is camera\n", " elif data[\"sensor_modality\"] == \"camera\":\n", " sample = camera_sample(group, filepath, sensor, my_sample['data'][sensor],scene)\n", "\n", " # Else its radar\n", " else:\n", " radar_filepath = load_radar(filepath,data)\n", " sample = fo.Sample(filepath=radar_filepath, group=group.element(sensor))\n", "\n", " \n", " # Add metadata to the sample\n", " sample[\"token\"] = data[\"token\"]\n", " sample[\"ego_pose_token\"] = data[\"ego_pose_token\"]\n", " sample[\"calibrated_sensor_token\"] = data[\"calibrated_sensor_token\"]\n", " sample[\"timestamp\"] = data[\"timestamp\"]\n", " sample[\"is_key_frame\"] = data[\"is_key_frame\"]\n", " sample[\"prev\"] = data[\"prev\"]\n", " sample[\"next\"] = data[\"next\"]\n", " sample[\"scene_token\"] = scene_token\n", "\n", " \n", " samples.append(sample)\n", "\n", " token = my_sample[\"next\"]\n", "\n", " my_sample = nusc.get('sample', token)\n", "\n", "# Add the samples to the dataset, group by scene_token, and launch the app\n", "all_sensor_dataset.add_samples(samples)\n", "view = all_sensor_dataset.group_by(\"scene_token\", order_by=\"timestamp\")\n", "session.dataset = all_sensor_dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "![nuscenes_view_pointcloud](https://cdn.voxel51.com/getting_started_self_driving/notebook1/point_cloud_data2.webp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Awesome work! You now have a full fledged self driving dataset in FiftyOne! In our next step, we will look at some advanced techniques for curation and how to enhance self driving datasets with [FiftyOne Brain](https://docs.voxel51.com/brain.html) and [FiftyOne Zoo](https://docs.voxel51.com/dataset_zoo/index.html)." ] } ], "metadata": { "kernelspec": { "display_name": "env", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.13" } }, "nbformat": 4, "nbformat_minor": 2 }