{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Object pose estimation" ] }, { "cell_type": "markdown", "metadata": { "jp-MarkdownHeadingCollapsed": true, "tags": [] }, "source": [ "## Overview\n", "\n", "In this tutorial, we will show how to optimize the pose of an object while correctly accounting for the visibility discontinuities. We are going to optimize several latent variables that control the translation and rotation of the object.\n", "\n", "In differentiable rendering, we aim to evaluate the derivative of a pixel intensity integral with respect to a scene parameter $\\pi$ as follows:\n", "\n", "\$$\n", "\\partial_\\pi I(\\pi) = \\partial_\\pi \\int_P f(\\textbf{x}, \\pi) ~ d\\textbf{x}\n", "\$$\n", "\n", "where $\\textbf{x}$ is a light path in the path space $P$.\n", "\n", "When the function $f(\\cdot)$ is continuous w.r.t. $\\pi$, we can move the derivative into the integral and then apply Monte Carlo integration. Under this assumption, differentiating the rendering process via automatic differentiation, as in the previous tutorials, is correct.\n", "\n", "However, if $f(\\cdot)$ has discontinuities w.r.t. $\\pi$, direct application of automatic differentiation is not correct anymore, as it omits an integral term given by the [Reynolds transport theorem](https://en.wikipedia.org/wiki/Reynolds_transport_theorem). This needs to be considered when differentiating shape-related parameters (e.g., position), as the discontinuities in the visiblity function (the silhouette of the object) are then dependent on the differentiated parameter.\n", "\n", "In the last years, several works tried to address this issue (e.g., Li et al. (2018), Zhang et al. (2020), Loubet et al. (2019), Bangaru et al. (2020), Zhang et al. (2023), ...). Mitsuba provides dedicated integrators implementing the \"*projective sampling*\"-based approach (Zhang et al. (2023)).\n", "\n", "- [direct_projective][1]: projective sampling direct illumination integrator\n", "- [prb_projective][2]: projective sampling wth Path Replay Backpropagation (PRB) integrator\n", "\n", "In this tutorial, we will optimize the position and rotation of a mesh in order to match a target rendering. To keep things simple, we will use the direct_projective integrator.\n", "You will learn more about this integrator in the following tutorials.\n", "\n", "\n", "
\n", "\n", "ðŸš€ **You will learn how to:**\n", " \n", "
\n", "
• Perform an optimization with discontinuity-aware methods
• \n", "
• Optimize latent variables to control the motion of an object
• \n", "
\n", " \n", "
\n", "\n", "[1]: https://mitsuba.readthedocs.io/en/latest/src/generated/plugins_integrators.html#direct-illumination-projective-sampling-direct-projective\n", "[2]: https://mitsuba.readthedocs.io/en/latest/src/generated/plugins_integrators.html#projective-sampling-path-replay-backpropagation-prb-prb-projective" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Setup\n", "\n", "As always, let's import drjit and mitsuba and set a differentiation-aware variant." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import drjit as dr\n", "import mitsuba as mi\n", "\n", "mi.set_variant('cuda_ad_rgb')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## direct_projective and scene construction\n", "\n", "We will rely on the direct_projective integrator for this tutorial to properly handle the visibility discontinuities in our differentiable simulation. In primal rendering, this integrator is identical to the direct integrator." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "integrator = {\n", " 'type': 'direct_projective',\n", "}" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We create a simple scene with a bunny placed in front of a gray wall, illuminated by a spherical light." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "from mitsuba.scalar_rgb import Transform4f as T\n", "\n", "scene = mi.load_dict({\n", " 'type': 'scene',\n", " 'integrator': integrator,\n", " 'sensor': {\n", " 'type': 'perspective',\n", " 'to_world': T.look_at(\n", " origin=(0, 0, 2),\n", " target=(0, 0, 0),\n", " up=(0, 1, 0)\n", " ),\n", " 'fov': 60,\n", " 'film': {\n", " 'type': 'hdrfilm',\n", " 'width': 64,\n", " 'height': 64,\n", " 'rfilter': { 'type': 'gaussian' },\n", " 'sample_border': True\n", " },\n", " },\n", " 'wall': {\n", " 'type': 'obj',\n", " 'filename': '../scenes/meshes/rectangle.obj',\n", " 'to_world': T.translate([0, 0, -2]).scale(2.0),\n", " 'face_normals': True,\n", " 'bsdf': {\n", " 'type': 'diffuse',\n", " 'reflectance': { 'type': 'rgb', 'value': (0.5, 0.5, 0.5) },\n", " }\n", " },\n", " 'bunny': {\n", " 'type': 'ply',\n", " 'filename': '../scenes/meshes/bunny.ply',\n", " 'to_world': T.scale(6.5),\n", " 'bsdf': {\n", " 'type': 'diffuse',\n", " 'reflectance': { 'type': 'rgb', 'value': (0.3, 0.3, 0.75) },\n", " },\n", " },\n", " 'light': {\n", " 'type': 'obj',\n", " 'filename': '../scenes/meshes/sphere.obj',\n", " 'emitter': {\n", " 'type': 'area',\n", " 'radiance': {'type': 'rgb', 'value': [1e3, 1e3, 1e3]}\n", " },\n", " 'to_world': T.translate([2.5, 2.5, 7.0]).scale(0.25)\n", " }\n", "})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Reference image\n", "\n", "Next we generate the target rendering. We will later modify the bunny's position and rotation to set the initial optimization state." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "Bitmap[\n", " pixel_format = rgb,\n", " component_format = uint8,\n", " size = [64, 64],\n", " srgb_gamma = 1,\n", " struct = Struct<3>[\n", " uint8 R; // @0, normalized, gamma, premultiplied alpha\n", " uint8 G; // @1, normalized, gamma, premultiplied alpha\n", " uint8 B; // @2, normalized, gamma, premultiplied alpha\n", " ],\n", " data = [ 12 KiB of image data ]\n", "]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "img_ref = mi.render(scene, seed=0, spp=1024)\n", "\n", "mi.util.convert_to_bitmap(img_ref)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Optimizer and latent variables\n", "\n", "As done in previous tutorial, we access the scene parameters using the traverse() mechanism. We then store a copy of the initial vertex positions. Those will be used later to compute the new vertex positions at every iteration, always applying a different transformation on the same base shape. \n", "\n", "Since the vertex positions in Mesh are stored in a linear buffer (e.g., x_1, y_1, z_1, x_2, y_2, z_2, ...), we use the dr.unravel() routine to unflatten that array into a Point3f array." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "params = mi.traverse(scene)\n", "initial_vertex_positions = dr.unravel(mi.Point3f, params['bunny.vertex_positions'])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "While it would be possible to optimize the vertex positions of the bunny independently, in this example we are only going to optimize a translation and rotation parameter. This drastically constrains the optimization process, which helps with convergence.\n", "\n", "Therefore, we instantiate an optimizer and assign two variables to it: angle and trans." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "opt = mi.ad.Adam(lr=0.025)\n", "opt['angle'] = mi.Float(0.25)\n", "opt['trans'] = mi.Point2f(0.1, -0.25)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "From the optimizer's point of view, those variables are the same as any other variables optimized in the previous tutorials, to the exception that when calling opt.update(), the optimizer doesn't know how to propagate their new values to the scene parameters. This has to be done *manually*, and we encapsulate exactly that logic in the function defined below. More detailed explaination on this can be found [here][1].\n", "\n", "After clamping the optimized variables to a proper range, this function creates a transformation object combining a translation and rotation and applies it to the vertex positions stored previously. It then flattens those new vertex positions before assigning them to the scene parameters.\n", "\n", "[1]: https://mitsuba.readthedocs.io/en/latest/src/how_to_guides/use_optimizers.html#Optimizing-latent-variables" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "def apply_transformation(params, opt):\n", " opt['trans'] = dr.clamp(opt['trans'], -0.5, 0.5)\n", " opt['angle'] = dr.clamp(opt['angle'], -0.5, 0.5)\n", " \n", " trafo = mi.Transform4f.translate([opt['trans'].x, opt['trans'].y, 0.0]).rotate([0, 1, 0], opt['angle'] * 100.0)\n", " \n", " params['bunny.vertex_positions'] = dr.ravel(trafo @ initial_vertex_positions)\n", " params.update()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It is now time to apply our first transformation to get the bunny to its initial state before starting the optimization." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/html": [ "" ], "text/plain": [ "Bitmap[\n", " pixel_format = rgb,\n", " component_format = uint8,\n", " size = [64, 64],\n", " srgb_gamma = 1,\n", " struct = Struct<3>[\n", " uint8 R; // @0, normalized, gamma, premultiplied alpha\n", " uint8 G; // @1, normalized, gamma, premultiplied alpha\n", " uint8 B; // @2, normalized, gamma, premultiplied alpha\n", " ],\n", " data = [ 12 KiB of image data ]\n", "]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "apply_transformation(params, opt)\n", "\n", "img_init = mi.render(scene, seed=0, spp=1024)\n", "\n", "mi.util.convert_to_bitmap(img_init)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following cell we define the hyper parameters controlling the optimization, such as the number of iterations and number of samples per pixels for the differentiable rendering simulation:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "iteration_count = 50\n", "spp = 16" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "nbsphinx": "hidden", "tags": [] }, "outputs": [], "source": [ "# IGNORE THIS: When running under pytest, adjust parameters to reduce computation time\n", "import os\n", "if 'PYTEST_CURRENT_TEST' in os.environ:\n", " iteration_count = 2\n", " spp = 1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The optimization loop below is very similar to the one used in the other tutorials, except that we need to apply the transformation to update the bunny's state and record the relation between the rendered image and the optimized parameters." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Iteration 49: error=0.002856, angle=0.0287, trans=[-0.0071, -0.0043]]\r" ] } ], "source": [ "import time\n", "loss_hist = []\n", "for it in range(iteration_count):\n", " # Apply the mesh transformation\n", " apply_transformation(params, opt)\n", " \n", " # Perform a differentiable rendering\n", " img = mi.render(scene, params, seed=it, spp=spp)\n", "\n", " # Evaluate the objective function\n", " loss = dr.sum(dr.sqr(img - img_ref)) / len(img)\n", " \n", " # Backpropagate through the rendering process\n", " dr.backward(loss)\n", "\n", " # Optimizer: take a gradient descent step\n", " opt.step()\n", "\n", " loss_hist.append(loss)\n", " print(f\"Iteration {it:02d}: error={loss[0]:6f}, angle={opt['angle'][0]:.4f}, trans=[{opt['trans'].x[0]:.4f}, {opt['trans'].y[0]:.4f}]\", end='\\r')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Visualizing the results\n", "\n", "Finally, let's visualize the results and plot the loss over iterations" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "nbsphinx-thumbnail": {}, "tags": [] }, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "