{ "cells": [ { "attachments": {}, "cell_type": "markdown", "id": "2fc6b72c-989c-4dbe-b12c-468190868959", "metadata": {}, "source": [ "# Stable Fast 3D Mesh Reconstruction and OpenVINO\n", "\n", "
Important note: This notebook has problems with installation [pynim](https://github.com/vork/PyNanoInstantMeshes/issues/2) library on MacOS. The issue may be environment dependent and may occur on other OSes.
\n", "\n", "[Stable Fast 3D (SF3D)](https://huggingface.co/stabilityai/stable-fast-3d) is a large reconstruction model based on [TripoSR](https://huggingface.co/spaces/stabilityai/TripoSR), which takes in a single image of an object and generates a textured UV-unwrapped 3D mesh asset.\n", "\n", "You can find [the source code on GitHub](https://github.com/Stability-AI/stable-fast-3d) and read the paper [SF3D: Stable Fast 3D Mesh Reconstruction with UV-unwrapping and Illumination Disentanglement](https://arxiv.org/abs/2408.00653).\n", "\n", "![Teaser Video](https://github.com/Stability-AI/stable-fast-3d/blob/main/demo_files/teaser.gif?raw=true)\n", "\n", "\n", "Unlike most existing approaches, SF3D is explicitly trained for mesh generation, incorporating a fast UV unwrapping technique that enables swift texture generation rather than relying on vertex colors. \n", "The method also learns to predict material parameters and normal maps to enhance the visual quality of the reconstructed 3D meshes.\n", "\n", "The authors compare their results with TripoSR:\n", "\n", "![sf3d-improvements](https://github.com/user-attachments/assets/fb1277e5-610f-47d7-97e4-1267624f7f1f)\n", "\n", "> The top shows the effect of light bake-in when relighting the asset. SF3D produces a more plausible relighting. By not using vertex colors, our\n", "method is capable of encoding finer details while also having a lower polygon count. Our vertex displacement enables estimating smooth shapes, which do not introduce stair-stepping artifacts\n", "from marching cubes. Lastly, our material property prediction allows us to express a variety of different surface types.\n", "\n", "\n", "#### Table of contents:\n", "\n", "- [Prerequisites](#Prerequisites)\n", "- [Get the original model](#Get-the-original-model)\n", "- [Convert the model to OpenVINO IR](#Convert-the-model-to-OpenVINO-IR)\n", "- [Compiling models and prepare pipeline](#Compiling-models-and-prepare-pipeline)\n", "- [Interactive inference](#Interactive-inference)\n", "\n", "### Installation Instructions\n", "\n", "This is a self-contained example that relies solely on its own code.\n", "\n", "We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.\n", "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n", "\n", "" ] }, { "attachments": {}, "cell_type": "markdown", "id": "8991ab76-9c05-45a8-a901-24692730eb01", "metadata": {}, "source": [ "## Prerequisites\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": null, "id": "05865a94-9556-4e38-b98f-d42cf8822430", "metadata": {}, "outputs": [], "source": [ "import requests\n", "from pathlib import Path\n", "\n", "if not Path(\"notebook_utils.py\").exists():\n", " r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\",\n", " )\n", " open(\"notebook_utils.py\", \"w\").write(r.text)\n", "\n", "if not Path(\"cmd_helper.py\").exists():\n", " r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/cmd_helper.py\",\n", " )\n", " open(\"cmd_helper.py\", \"w\").write(r.text)\n", "\n", "if not Path(\"pip_helper.py\").exists():\n", " r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/pip_helper.py\",\n", " )\n", " open(\"pip_helper.py\", \"w\").write(r.text)\n", "\n", "\n", "from pip_helper import pip_install\n", "\n", "\n", "pip_install(\"-q\", \"gradio>=4.19\", \"openvino>=2024.3.0\", \"wheel\", \"gradio-litmodel3d==0.0.1\", \"pydantic==2.10.6\")\n", "pip_install(\"-q\", \"torch>=2.2.2\", \"torchvision\", \"transformers>=4.42.3\", \"open_clip_torch==2.24.0\", \"--extra-index-url\", \"https://download.pytorch.org/whl/cpu\")\n", "pip_install(\"-q\", \"omegaconf==2.4.0.dev3\")\n", "pip_install(\"-q\", \"jaxtyping\", \"gpytoolbox\", \"trimesh\", \"einops\", \"pynanoinstantmeshes==0.0.3\")\n", "# rembg requires opencv-python-headless that can't be installed with opencv-python. So, we install rembg without dependencies and install its requirements\n", "pip_install(\"-q\", \"rembg\", \"--no-deps\")\n", "pip_install(\"-q\", \"onnxruntime\", \"opencv-python\", \"pymatting\", \"pooch\", \"pynim\")\n", "\n", "# Read more about telemetry collection at https://github.com/openvinotoolkit/openvino_notebooks?tab=readme-ov-file#-telemetry\n", "from notebook_utils import collect_telemetry\n", "\n", "collect_telemetry(\"stable-fast-3d.ipynb\")" ] }, { "cell_type": "code", "execution_count": null, "id": "cb66f24b-fc3a-4852-8f55-10b316e343fb", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "\n", "from cmd_helper import clone_repo\n", "\n", "\n", "clone_repo(\"https://github.com/Stability-AI/stable-fast-3d\")\n", "\n", "pip_install(\"-q\", \"--no-build-isolation\", Path(\"stable-fast-3d/texture_baker/\"))\n", "pip_install(\"-q\", \"--no-build-isolation\", Path(\"stable-fast-3d/uv_unwrapper/\"))" ] }, { "attachments": {}, "cell_type": "markdown", "id": "fde31441-436f-4317-b655-781bb958f753", "metadata": {}, "source": [ "## Get the original model" ] }, { "cell_type": "code", "execution_count": null, "id": "5b488801", "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "id": "92cd23dd-ce40-4b52-a1cb-c5484aba5621", "metadata": { "test_replace": { "\"stabilityai/stable-fast-3d\",\n": "\"patrickbdevaney/stable-fast-3d\",\n" } }, "outputs": [], "source": [ "from sf3d.system import SF3D\n", "\n", "\n", "model = SF3D.from_pretrained(\n", " \"stabilityai/stable-fast-3d\",\n", " config_name=\"config.yaml\",\n", " weight_name=\"model.safetensors\",\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "c9ab6c02-6a53-41a3-9247-f16ab77934f5", "metadata": {}, "source": [ "### Convert the model to OpenVINO IR\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "891b4031-180c-4519-9e4c-b0beb68fdea4", "metadata": {}, "source": [ "SF3D is PyTorch model. OpenVINO supports PyTorch models via conversion to OpenVINO Intermediate Representation (IR). [OpenVINO model conversion API](https://docs.openvino.ai/2024/openvino-workflow/model-preparation.html#convert-a-model-with-python-convert-model) should be used for these purposes. `ov.convert_model` function accepts original PyTorch model instance and example input for tracing and returns `ov.Model` representing this model in OpenVINO framework. Converted model can be used for saving on disk using `ov.save_model` function or directly loading on device using `core.complie_model`. \n", "`ov_stable_fast_3d_helper.py` script contains helper function for model conversion, please check its content if you interested in conversion details.\n", "\n", "
\n", " Click here for more detailed explanation of conversion steps\n", "\n", "![sf3d-overview](https://github.com/user-attachments/assets/8b37e08e-ddda-4dae-b5de-cf3adc4b79c8)\n", "\n", "As illustrated in SF3D Overview image, SF3D has 5 main components: \n", "\n", "1. An enhanced transformer network that predicts higher resolution triplanes, which helps in reducing aliasing artifacts (top left in the figure). In this part `LinearCameraEmbedder` (`camera_embedder` in the implemented pipeline) obtains camera embeddings for `DINOv2` model (`image_tokenizer`) that obtains image tokens. `TriplaneLearnablePositionalEmbedding` model (`tokenizer`) obtains triplane tokens. The transformer `TwoStreamInterleaveTransformer` (`backbone`) gets triplane tokens (`hidden_states`) and image tokens (`encoder_hidden_states`). Then `PixelShuffleUpsampleNetwork` (`post_processor`) processes the output. We will convert all these 5 models to OpenVINO format and then replace the original models by compiled OV-models in the original pipeline.\n", "Here is a specific for `DINOv2` model that calls `nn.functional.interpolate` in its method `interpolate_pos_encoding`. This method accepts a tuple of floats as `scale_factor`, but during conversion a tuple of floats converts to a tuple of tensors due to conversion specific. It raises an error. So, we need to patch it by converting in float.\n", "\n", "2. Material Estimation. `MaterialNet` is implemented in `ClipBasedHeadEstimator` model (`image_estimator`). We will convert it too.\n", " \n", "3. Illumination Modeling. It is not demonstrated in the original demo and its results are not used in the original pipeline, so we will not use it too. Thus `global_estimator` is not needed to be converted. \n", "\n", "4. Mesh Extraction and Refinement. In these part `MaterialMLP` (`decoder`) is used. The `decoder` accepts lists of include or exclude heads in forward method and uses them to choose a part of heads. We can't accept a list of strings in IR-model, but we can build 2 decoders with required structures.\n", "\n", "5. Fast UV-Unwrapping and Export. It is finalizing step and there are no models for conversion.\n", "\n", "
" ] }, { "cell_type": "code", "execution_count": null, "id": "3ef6cd5f-3db8-4729-b35a-628ecde1808b", "metadata": {}, "outputs": [], "source": [ "from ov_stable_fast_3d_helper import (\n", " convert_image_tokenizer,\n", " convert_tokenizer,\n", " convert_backbone,\n", " convert_post_processor,\n", " convert_camera_embedder,\n", " convert_image_estimator,\n", " convert_decoder,\n", ")\n", "\n", "# uncomment the code below to see the model conversion code of convert_image_tokenizer.\n", "# replace the function name if you want see the code for another model\n", "\n", "# ??convert_image_tokenizer" ] }, { "cell_type": "code", "execution_count": null, "id": "f010a06c", "metadata": {}, "outputs": [], "source": [ "IMAGE_TOKENIZER_OV_PATH = Path(\"models/image_tokenizer_ir.xml\")\n", "TOKENIZER_OV_PATH = Path(\"models/tokenizer_ir.xml\")\n", "BACKBONE_OV_PATH = Path(\"models/backbone_ir.xml\")\n", "POST_PROCESSOR_OV_PATH = Path(\"models/post_processor_ir.xml\")\n", "CAMERA_EMBEDDER_OV_PATH = Path(\"models/camera_embedder_ir.xml\")\n", "IMAGE_ESTIMATOR_OV_PATH = Path(\"models/image_estimator_ir.xml\")\n", "INCLUDE_DECODER_OV_PATH = Path(\"models/include_decoder_ir.xml\")\n", "EXCLUDE_DECODER_OV_PATH = Path(\"models/exclude_decoder_ir.xml\")\n", "\n", "\n", "convert_image_tokenizer(model.image_tokenizer, IMAGE_TOKENIZER_OV_PATH)\n", "convert_tokenizer(model.tokenizer, TOKENIZER_OV_PATH)\n", "convert_backbone(model.backbone, BACKBONE_OV_PATH)\n", "convert_post_processor(model.post_processor, POST_PROCESSOR_OV_PATH)\n", "convert_camera_embedder(model.camera_embedder, CAMERA_EMBEDDER_OV_PATH)\n", "convert_image_estimator(model.image_estimator, IMAGE_ESTIMATOR_OV_PATH)\n", "convert_decoder(model.decoder, INCLUDE_DECODER_OV_PATH, EXCLUDE_DECODER_OV_PATH)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "78044cc7-cd4d-4026-9ca7-ca715233ad49", "metadata": {}, "source": [ "## Compiling models and prepare pipeline\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "attachments": {}, "cell_type": "markdown", "id": "f1b3405c-f35b-4d73-a3c2-d2ba1df1f73f", "metadata": {}, "source": [ "Select device from dropdown list for running inference using OpenVINO." ] }, { "cell_type": "code", "execution_count": null, "id": "d65fc34c-8fd9-45ad-bd22-c27caedada91", "metadata": {}, "outputs": [], "source": [ "from notebook_utils import device_widget\n", "\n", "device = device_widget()\n", "\n", "device" ] }, { "attachments": {}, "cell_type": "markdown", "id": "5be49b39", "metadata": {}, "source": [ "`get_compiled_model` function defined in `ov_ov_stable_fast_3d.py` provides convenient way for getting compiled ov-model that is compatible with the original interface. It accepts the original model, inference device and directories with converted models as arguments." ] }, { "cell_type": "code", "execution_count": null, "id": "8d5a3970-0727-44d6-934c-49cefdfb33c0", "metadata": {}, "outputs": [], "source": [ "from ov_stable_fast_3d_helper import get_compiled_model\n", "\n", "\n", "model = get_compiled_model(\n", " model,\n", " device,\n", " IMAGE_TOKENIZER_OV_PATH,\n", " TOKENIZER_OV_PATH,\n", " BACKBONE_OV_PATH,\n", " POST_PROCESSOR_OV_PATH,\n", " CAMERA_EMBEDDER_OV_PATH,\n", " IMAGE_ESTIMATOR_OV_PATH,\n", " INCLUDE_DECODER_OV_PATH,\n", " EXCLUDE_DECODER_OV_PATH,\n", ")" ] }, { "attachments": {}, "cell_type": "markdown", "id": "b0864266-7743-4336-af8e-994d190df1b0", "metadata": {}, "source": [ "## Interactive inference\n", "[back to top ⬆️](#Table-of-contents:)\n", "It's taken from the original `gradio_app.py`, but the model is replaced with the one defined above. " ] }, { "cell_type": "code", "execution_count": null, "id": "8f5a2031-195e-4d77-bb74-a9d4cd3dc7e4", "metadata": {}, "outputs": [], "source": [ "import requests\n", "\n", "if not Path(\"gradio_helper.py\").exists():\n", " r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/stable-fast-3d/gradio_helper.py\")\n", " open(\"gradio_helper.py\", \"w\").write(r.text)\n", "\n", "from gradio_helper import make_demo\n", "\n", "demo = make_demo(model=model)\n", "\n", "try:\n", " demo.launch(debug=True)\n", "except Exception:\n", " demo.launch(share=True, debug=True)\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# Read more in the docs: https://gradio.app/docs/" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" }, "openvino_notebooks": { "imageUrl": "https://github.com/Stability-AI/stable-fast-3d/blob/main/demo_files/teaser.gif?raw=true", "tags": { "categories": [ "Model Demos", "AI Trends" ], "libraries": [], "other": [], "tasks": [ "Image-to-3D" ] } } }, "nbformat": 4, "nbformat_minor": 5 }