{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "g33_Yr52Wx9p" }, "source": [ "\n", "\n", "# 3D Reconstruction\n", "\n", "In this lab, we will experiment with 3D Structure-from-Motion systems, both feature-based and feed forward-based. More specifically, you will compare the performance of a state-of-the-art feature-based approach ([Colmap](https://colmap.github.io/index.html)) with a state-of-the-art feed forward-based approach ([Depth Anything 3 (DA3)](https://depth-anything-3.github.io/)).\n", "\n", "## Speed and hardware\n", "DA3 typically requires a significant amount of available GPU compute. Thus, in this exercise we only experiment with a small dataset, such that computation is feasible (within a few minutes) on the CPU.\n", "\n", "Another option is to use [Google Colab](https://colab.research.google.com/). In the latter case, you will find comments with additional commands to run. Also, all the data (saved networks, datasets) will be auto-deleted after the end of the runtime session, so make sure to download the results of the training." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "zlrQLm8pWx9q" }, "outputs": [], "source": [ "# FOR COLAB: uncomment and run the following code every time you are starting the session\n", "!pip install pycolmap\n", "!pip install awesome-depth-anything-3" ] }, { "cell_type": "markdown", "metadata": { "id": "kMFXhfBlWx9r" }, "source": [ "As a first step, let's download the data.\n", "\n", "We are going to use the **Zurich Graffiti Dataset** available at https://github.com/tsattler/zurich_graffiti_dataset/ .\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "J4CNVWfMWx9s", "collapsed": true }, "outputs": [], "source": [ "# FOR COLAB: uncomment and run the following code every time you are starting the session\n", "# for local: uncomment and run once\n", "!wget https://github.com/tsattler/zurich_graffiti_dataset/raw/refs/heads/main/images.zip\n", "!unzip images.zip" ] }, { "cell_type": "markdown", "metadata": { "id": "40qOgisdWx9s" }, "source": [ "# Structure-from-Motion with Colmap\n", "\n", "We will now build a 3D Structure-from-Motion model using Colmap. To this end, we will use pycolmap, a python interface to the Colmap library (the library itself is written in C++). \n", "\n", "We will need to perform the following steps:\n", "\n", "\n", "* Feature Extraction\n", "* Feature Matching & Spatial Verification\n", "* Incremental Structure-from-Motion\n", "\n", "For each step, we will essentially call ready-made functionality from pycolmap (see [here](https://colmap.github.io/pycolmap/index.html#api) for pycolmap's API).\n", "\n", "Pycolmap is storing all intermediate data (features, matches, etc.) in database. We will create this database automatically when extracting features." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "KrLXi2kjWx9t" }, "outputs": [], "source": [ "import pycolmap\n", "\n", "# 1st step: Extract features and write them into a database.\n", "# Use the extract_features function from pycolmap.\n", "reader_options = pycolmap.ImageReaderOptions(camera_model=\"RADIAL\")\n", "camera_mode = \"PER_IMAGE\"\n", "pycolmap.extract_features(#TODO)\n", "\n", "# 2nd step: Exhaustive matching.\n", "# Use the match_exhaustive function from pycolmap.\n", "pycolmap.match_exhaustive(#TODO)\n", "\n", "# 3rd step: Run incremental Structure-from-Motion.\n", "# Use the incremental_mapping function from pycolmap. Note that Colmap\n", "# potentially generates multiple 3D models. Thus, incremental_mapping returns\n", "# a dictionary, where each entry consists of a model number and the\n", "# corresponding reconstruction.\n", "reconstructions = pycolmap.incremental_mapping(#TODO)\n", "\n", "# Find the reconstruction that contains the largest number of images by\n", "# looking at the number of registered images.\n", "# TODO: Implement this functionality. The best model should be stored\n", "# in a variable named best_recon" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "h0-lU_t3Wx9t" }, "outputs": [], "source": [ "# We will print the following statistics for the reconstruction:\n", "# * The number of registered images.\n", "# * The number of reconstructed 3D points.\n", "# * The number of observations used to reconstruct the 3D points (the number of\n", "# features that were used to triangulate the 3D points).\n", "# * The average track length for the points, i.e., the average number of\n", "# features used to triangulate the 3D point.\n", "# * The average reprojection error of the 3D points.\n", "\n", "def print_statistics(recon):\n", " # For printing the number of registered images and reconstructed 3D points, we\n", " # can directly use functions provided by the pycolmap.Reconstruction class.\n", " num_registered_images=#TODO: Get the number of registered images.\n", " num_3D_points=#TODO: Get the number of 3D points.\n", " print(f\"Number of registered images: {num_registered_images}\")\n", " print(f\"Number of reconstructed 3D points: {num_3D_points}\")\n", "\n", " # To obtain the number of observations and the average track length, we\n", " # iterate over all 3D points and look at the track length of each point.\n", " num_observations = 0\n", " # TODO: Compute the number of observations.\n", " for p_id, point3D in recon.points3D.items():\n", "\n", " print(f\"Total number of observations: {num_observations}\")\n", " print(f\"Average track length: {num_observations/max(num_3D_points,1)}\")\n", "\n", " # Similarly, for the average reprojection error, we iterate over all 3D points\n", " # and look at the error of each point.\n", " mean_reproj_error = 0.0\n", " # TODO: Compute the mean reprojection error.\n", "\n", " print(f\"Mean reprojection error: {mean_reproj_error / float(max(num_3D_points,1))}\")\n", "\n", "# Print statistics for the Colmap model\n", "print_statistics(best_recon)" ] }, { "cell_type": "markdown", "metadata": { "id": "K07DoV87Wx9u" }, "source": [ "You should get something close to 4,000 points triangulated from around 16,000 observations, with an average track length around 4. Of course, it is not bad if you get better numbers :)\n", "\n", "Next, we are going to reconstruct the scene using DA3. You can find details on using DA3 [here](https://github.com/Aedelon/awesome-depth-anything-3) and the Python API [here](https://github.com/Aedelon/awesome-depth-anything-3/blob/main/docs/API.md).\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "ogdxECANWx9v" }, "outputs": [], "source": [ "import torch, os, glob\n", "from depth_anything_3.api import DepthAnything3\n", "device = torch.device(\"cpu\") # set to \"cuda\" if you have a powerful GPU.\n", "# We will use the small model, which should be sufficient for such a small\n", "# scene.\n", "model = DepthAnything3.from_pretrained(\"depth-anything/DA3-SMALL\")\n", "model = model.to(device=device)\n", "# TODO: Collect the paths to all images in a list called images.\n", "# TODO: Run the network on the list of images to produce a set of predictions,\n", "# store them in a variable named prediction ." ] }, { "cell_type": "markdown", "metadata": { "id": "9GY4pw03Wx9v" }, "source": [ "Next, we want to compare the quality of the reconstruction obtained with DA3 with the reconstruction obtained via Colmap. Note that since both reconstructions can have arbirtary, and differing, scale factors, and since we do not have ground truth, we need to be a bit creative here.\n", "\n", "We will compare the quality of the estimated poses indirectly using the metrics for Structure-from-Motion reconstructions implemented above. To be able to apply the function to the DA3 reconstruction, we will use its predicted poses and intrinsics to initialize an \"empty\" Colmap model that does not contain any 3D points. We will then use the database created by Colmap to triangulate 3D points using the poses and intrinsics predicted by DA3. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "-E81b_nvWx9v" }, "outputs": [], "source": [ "# Create a Colmap reconstruction for the camera poses predicted by DA3.\n", "# For simplicity, we start off with the reconstruction generated by Colmap\n", "# and modify the camera poses and intrinsics while removing all 3D points.\n", "da3_recon = pycolmap.Reconstruction(reconstruction=best_recon)\n", "# Notice that this will also clear best_recon.\n", "da3_recon.delete_all_points2D_and_points3D()\n", "\n", "# For each image: Add a new camera and image to the reconstruction.\n", "for i in range(0, len(images)):\n", " # Finds the corresponding image ID in the da3_recon model.\n", " # Note that the image name should not contain the image\n", " # folder name, e.g., use name IMG_20151114_134548.jpg and not\n", " # name images/IMG_20151114_134548.jpg .\n", " image_id = da3_recon.find_image_with_name(images[i].split(\"/\")[1]).image_id\n", " camera_id = da3_recon.images[image_id].camera_id\n", " frame_id = da3_recon.images[image_id].frame_id\n", "\n", " # Compute the Colmap pose for this image. DA3 provides estimated poses\n", " # under prediction.extrinsics. DA3 already provides the mapping from world\n", " # to camera coordinates required by Colmap. All we need to do is create a new\n", " # Rigid3d object that stores the pose estimated for the image.\n", " pose = # TODO: Obtain the pose\n", "\n", " # Update the camera pose by updating the frame with frame ID frame_id.\n", " da3_recon.frames[frame_id].set_cam_from_world(camera_id=camera_id,\n", " cam_from_world=pose)\n", "\n", " # Update the camera intrinsics by modifying the parameters of the camera\n", " # with camera ID camera_id.\n", " # The camera parameters should be stored as focal_length_x, focal_length_y,\n", " # principal_point_x, principal_point_y, radial. Set radial to 0.\n", " # DA3 provides the K matrix for the images under prediction.intrincis.\n", " # Notice that we need to rescale the intrinsics as DA3 downscales\n", " # the images before processing. The original images have size\n", " # 1024x768.\n", " scale_x = # TODO: Implement the scaling factor in x-direction (image width)\n", " scale_y = # TODO: Implement the scaling factor in y-direction (image height)\n", " K = # Obtain the intrinsic calibration matrix for this image.\n", " parameters = [# Rescale and store the parameters]\n", " # Set the parameters.\n", " da3_recon.cameras[camera_id].set_params_from_string(\n", " f\"{parameters[0]}, {parameters[1]}, {parameters[2]}, {parameters[3]}\")\n", "\n" ] }, { "cell_type": "code", "source": [ "# Use the existing database and the newly created model to triangulate\n", "# 3D points for the DA3 poses.\n", "def retriangulate_reconstruction(recon):\n", " # Set parameters for the incremental reconstruction pipeline as to fix\n", " # intrinsics and extrinsics.\n", " incremental_pipeline_options = pycolmap.IncrementalPipelineOptions()\n", " incremental_pipeline_options.image_path = \"images/\"\n", " incremental_pipeline_options.load_all_images = True\n", " incremental_pipeline_options.fix_existing_frames = True\n", " incremental_pipeline_options.ba_refine_focal_length = False\n", " incremental_pipeline_options.ba_refine_principal_point = False\n", " incremental_pipeline_options.ba_refine_extra_params = False\n", "\n", " # pycolmap provides functionality to triangulate 3D points from a given\n", " # 3D reconstruction. We simply call it here. You will need to set an\n", " # output path where the reconstruction is stored.\n", " # TODO: Implement this functionality. Store the triangulated model in\n", " # a variable named triangulated_recon .\n", "\n", " return triangulated_recon\n", "\n", "# Perform the actual retriangulation\n", "triangulated_recon = retriangulate_reconstruction(da3_recon)" ], "metadata": { "id": "lFka62t2tmcY" }, "execution_count": null, "outputs": [] }, { "cell_type": "code", "source": [ "# Finally, print statistics for the DA3 model.\n", "print_statistics(triangulated_recon)" ], "metadata": { "id": "zE-Xl5LE8JNq" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "lwNHF-28Wx9w" }, "source": [ "You should observe that the resulting model has fewer triangulated 3D points, a smaller number of observations, a smaller average track length, and a larger mean reprojection error than the Colmap model. This suggests that the poses estimated by DA3 are not as accurate as those predicted by Colmap.\n", "\n", "Finally, we try to improve the reconstruction via bundle adjustment." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "nDZ-tXHzWx9w" }, "outputs": [], "source": [ "# Bundle adjust the reconstruction we generated above. You can\n", "# use readily-available pycolmap functionality.\n", "# TODO: Implement the bundle adjustment.\n", "\n", "# Print statistics for the reconstruction.\n", "print_statistics(triangulated_recon)" ] }, { "cell_type": "markdown", "metadata": { "id": "8jWTOnTkWx92" }, "source": [ "You should observe that bundle adjustment is able to reduce the average reprojection error quite drastically, indicating that the refined poses are significantly better.\n", "\n", "As a final step, we try to retriangulate the refined reconstruction again and print its statistics. What do you observe? Can you explain the behavior?" ] }, { "cell_type": "code", "source": [ "retriangulated_recon2 = retriangulate_reconstruction(triangulated_recon)\n", "print_statistics(retriangulated_recon2)" ], "metadata": { "id": "WyyNF_8xPAKF" }, "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "source": [ "This completes this exercise. Please do not forget to submit your results." ], "metadata": { "id": "JX1pCVOiPAnD" } } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.4" }, "colab": { "provenance": [] } }, "nbformat": 4, "nbformat_minor": 0 }