{ "cells": [ { "cell_type": "markdown", "id": "4b009df1-9484-40b5-8497-c7532afdae04", "metadata": {}, "source": [ "# Image generation with Stable Diffusion v3 and OpenVINO\n", "\n", "Stable Diffusion V3 is next generation of latent diffusion image Stable Diffusion models family that outperforms state-of-the-art text-to-image generation systems in typography and prompt adherence, based on human preference evaluations. In comparison with previous versions, it based on Multimodal Diffusion Transformer (MMDiT) text-to-image model that features greatly improved performance in image quality, typography, complex prompt understanding, and resource-efficiency.\n", "\n", "![mmdit.png](https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/dd079427-89f2-4d28-a10e-c80792d750bf)\n", "\n", "More details about model can be found in [model card](https://huggingface.co/stabilityai/stable-diffusion-3-medium), [research paper](https://stability.ai/news/stable-diffusion-3-research-paper) and [Stability.AI blog post](https://stability.ai/news/stable-diffusion-3-medium).\n", "In this tutorial, we will consider how to convert Stable Diffusion v3 for running with OpenVINO. An additional part demonstrates how to run optimization with [NNCF](https://github.com/openvinotoolkit/nncf/) to speed up pipeline.\n", "If you want to run previous Stable Diffusion versions, please check our other notebooks:\n", "\n", "* [Stable Diffusion](../stable-diffusion-text-to-image)\n", "* [Stable Diffusion v2](../stable-diffusion-v2)\n", "* [Stable Diffusion XL](../stable-diffusion-xl)\n", "* [LCM Stable Diffusion](../latent-consistency-models-image-generation)\n", "* [Turbo SDXL](../sdxl-turbo)\n", "* [Turbo SD](../sketch-to-image-pix2pix-turbo)\n", "\n", "#### Table of contents:\n", "\n", "- [Prerequisites](#Prerequisites)\n", "- [Build PyTorch pipeline](#Build-PyTorch-pipeline)\n", "- [Convert models with OpenVINO](#Convert-models-with-OpenVINO)\n", " - [Transformer](#Transformer)\n", " - [T5 Text Encoder](#T5-Text-Encoder)\n", " - [Clip text encoders](#Clip-text-encoders)\n", " - [VAE](#VAE)\n", "- [Prepare OpenVINO inference pipeline](#Prepare-OpenVINO-inference-pipeline)\n", "- [Run OpenVINO model](#Run-OpenVINO-model)\n", "- [Quantization](#Quantization)\n", " - [Prepare calibration dataset](#Prepare-calibration-dataset)\n", " - [Run Quantization](#Run-Quantization)\n", " - [Run Weights Compression](#Run-Weights-Compression)\n", " - [Compare model file sizes](#Compare-model-file-sizes)\n", " - [Compare inference time of the FP16 and optimized pipelines](#Compare-inference-time-of-the-FP16-and-optimized-pipelines)\n", "- [Interactive demo](#Interactive-demo)\n", "\n", "\n", "### Installation Instructions\n", "\n", "This is a self-contained example that relies solely on its own code.\n", "\n", "We recommend running the notebook in a virtual environment. You only need a Jupyter server to start.\n", "For details, please refer to [Installation Guide](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/README.md#-installation-guide).\n", "\n", "\n" ] }, { "cell_type": "markdown", "id": "2a14c937-a5c7-4830-ad80-94945928c1dd", "metadata": {}, "source": [ "## Prerequisites\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 1, "id": "5b394231-61b2-40a9-938f-c65dd81aa651", "metadata": {}, "outputs": [], "source": [ "%pip install -q \"git+https://github.com/initml/diffusers.git@clement/feature/flash_sd3\" \"gradio>=4.19\" \"torch>=2.1\" \"transformers\" \"nncf>=2.12.0\" \"datasets>=2.14.6\" \"opencv-python\" \"pillow\" \"peft>=0.7.0\" --extra-index-url https://download.pytorch.org/whl/cpu\n", "%pip install -qU \"openvino>=2024.3.0\"" ] }, { "cell_type": "code", "execution_count": 2, "id": "910857c2", "metadata": {}, "outputs": [], "source": [ "import requests\n", "from pathlib import Path\n", "\n", "if not Path(\"sd3_helper.py\").exists():\n", " r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/stable-diffusion-v3/sd3_helper.py\")\n", " open(\"sd3_helper.py\", \"w\").write(r.text)\n", "\n", "if not Path(\"sd3_quantization_helper.py\").exists():\n", " r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/stable-diffusion-v3/sd3_quantization_helper.py\")\n", " open(\"sd3_quantization_helper.py\", \"w\").write(r.text)\n", "\n", "if not Path(\"gradio_helper.py\").exists():\n", " r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/notebooks/stable-diffusion-v3/gradio_helper.py\")\n", " open(\"gradio_helper.py\", \"w\").write(r.text)\n", "\n", "if not Path(\"notebook_utils.py\").exists():\n", " r = requests.get(url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/notebook_utils.py\")\n", " open(\"notebook_utils.py\", \"w\").write(r.text)" ] }, { "cell_type": "markdown", "id": "f323606e-831e-4508-bcff-9636fcd7e51a", "metadata": {}, "source": [ "## Build PyTorch pipeline\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", ">**Note**: run model with notebook, you will need to accept license agreement. \n", ">You must be a registered user in 🤗 Hugging Face Hub. Please visit [HuggingFace model card](https://huggingface.co/stabilityai/stable-diffusion-3-medium-diffusers), carefully read terms of usage and click accept button. You will need to use an access token for the code below to run. For more information on access tokens, refer to [this section of the documentation](https://huggingface.co/docs/hub/security-tokens).\n", ">You can login on Hugging Face Hub in notebook environment, using following code:" ] }, { "cell_type": "code", "execution_count": 3, "id": "fa836f89", "metadata": {}, "outputs": [], "source": [ "# uncomment these lines to login to huggingfacehub to get access to pretrained model\n", "\n", "# from huggingface_hub import notebook_login, whoami\n", "\n", "# try:\n", "# whoami()\n", "# print('Authorization token already provided')\n", "# except OSError:\n", "# notebook_login()" ] }, { "cell_type": "markdown", "id": "dcc92855", "metadata": {}, "source": [ "We will use [Diffusers](https://huggingface.co/docs/diffusers/main/en/index) library integration for running Stable Diffusion v3 model. You can find more details in Diffusers [documentation](https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/stable_diffusion_3).\n", "Additionally, we can apply optimization for pipeline performance and memory consumption:\n", "\n", "* **Use flash SD3**. Flash Diffusion is a diffusion distillation method proposed in [Flash Diffusion: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation](http://arxiv.org/abs/2406.02347). The model represented as a 90.4M LoRA distilled version of SD3 model that is able to generate 1024x1024 images in 4 steps. If you want disable it, you can unset checkbox **Use flash SD3**\n", "* **Remove T5 text encoder**. Removing the memory-intensive 4.7B parameter T5-XXL text encoder during inference can significantly decrease the memory requirements for SD3 with only a slight loss in performance. If you want to use this model in pipeline, please set **use t5 text encoder** checkbox." ] }, { "cell_type": "code", "execution_count": 4, "id": "27b79b9c-ca91-44be-b5ea-4fbe58760fd0", "metadata": { "test_replace": { "get_pipeline_options()": "get_pipeline_options(default_value=(False, False))" } }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "/home/ea/work/my_optimum_intel/optimum_env/lib/python3.8/site-packages/diffusers/models/transformers/transformer_2d.py:34: FutureWarning: `Transformer2DModelOutput` is deprecated and will be removed in version 1.0.0. Importing `Transformer2DModelOutput` from `diffusers.models.transformer_2d` is deprecated and this will be removed in a future version. Please use `from diffusers.models.modeling_outputs import Transformer2DModelOutput`, instead.\n", " deprecate(\"Transformer2DModelOutput\", \"1.0.0\", deprecation_message)\n", "2024-08-08 08:15:46.648328: I tensorflow/core/util/port.cc:110] oneDNN custom operations are on. You may see slightly different numerical results due to floating-point round-off errors from different computation orders. To turn them off, set the environment variable `TF_ENABLE_ONEDNN_OPTS=0`.\n", "2024-08-08 08:15:46.650527: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.\n", "2024-08-08 08:15:46.687530: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.\n", "To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.\n", "2024-08-08 08:15:47.368728: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "2bc37116218b41f0803af8e4d80d9a47", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Checkbox(value=True, description='Use flash SD3'), Checkbox(value=False, description='Use t5 te…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "from sd3_helper import get_pipeline_options\n", "\n", "pt_pipeline_options, use_flash_lora, load_t5 = get_pipeline_options()\n", "\n", "display(pt_pipeline_options)" ] }, { "cell_type": "markdown", "id": "6585e1c8-2d71-48cd-bebe-f7eb428f01e9", "metadata": {}, "source": [ "## Convert models with OpenVINO\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Starting from 2023.0 release, OpenVINO supports PyTorch models directly via Model Conversion API. `ov.convert_model` function accepts instance of PyTorch model and example inputs for tracing and returns object of `ov.Model` class, ready to use or save on disk using `ov.save_model` function. \n", "\n", "\n", "The pipeline consists of four important parts:\n", "\n", "* Clip and T5 Text Encoders to create condition to generate an image from a text prompt.\n", "* Transformer for step-by-step denoising latent image representation.\n", "* Autoencoder (VAE) for decoding latent space to image.\n", "\n", "We will use `convert_sd3` helper function defined in [sd3_helper.py](./sd3_helper.py) that create original PyTorch model and convert each part of pipeline using `ov.convert_model`." ] }, { "cell_type": "code", "execution_count": 5, "id": "cda9aa51", "metadata": {}, "outputs": [], "source": [ "from sd3_helper import convert_sd3\n", "\n", "# Uncomment the line beolow to see model conversion code\n", "# ??convert_sd3" ] }, { "cell_type": "code", "execution_count": 6, "id": "eacd235f", "metadata": { "test_replace": { "convert_sd3(load_t5.value, use_flash_lora.value)": "convert_sd3(load_t5.value, use_flash_lora.value, \"katuni4ka/tiny-random-sd3\")" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "SD3 model already converted\n" ] } ], "source": [ "convert_sd3(load_t5.value, use_flash_lora.value)" ] }, { "cell_type": "markdown", "id": "2245bed2-db07-43c4-8ce4-0621c8a0250a", "metadata": {}, "source": [ "## Prepare OpenVINO inference pipeline\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 7, "id": "5fac6f80-7f18-4391-82d4-c51e75f10e0a", "metadata": {}, "outputs": [], "source": [ "from sd3_helper import OVStableDiffusion3Pipeline, init_pipeline # noqa: F401\n", "\n", "# Uncomment line below to see pipeline code\n", "# ??OVStableDiffusion3Pipeline" ] }, { "cell_type": "markdown", "id": "a4027950-85ba-4ac1-96cf-44b8a79e8ee0", "metadata": {}, "source": [ "## Run OpenVINO model\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 8, "id": "8f168e7b-4352-4e1c-90f0-d6f6d9919215", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "0ac49f28a11a4f148c721550e0bd7557", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Dropdown(description='Device:', index=1, options=('CPU', 'AUTO'), value='AUTO')" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from notebook_utils import device_widget\n", "\n", "device = device_widget()\n", "\n", "device" ] }, { "cell_type": "code", "execution_count": 9, "id": "47306cd3-385e-4acb-aee6-97b8f5e19718", "metadata": { "test_replace": { "init_pipeline(models_dict, device.value, use_flash_lora.value)": "init_pipeline(models_dict, device.value, use_flash_lora.value, 32)" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Models compilation\n", "transformer - Done!\n", "vae - Done!\n", "text_encoder - Done!\n", "text_encoder_2 - Done!\n" ] } ], "source": [ "from sd3_helper import TEXT_ENCODER_PATH, TEXT_ENCODER_2_PATH, TEXT_ENCODER_3_PATH, TRANSFORMER_PATH, VAE_DECODER_PATH\n", "\n", "models_dict = {\"transformer\": TRANSFORMER_PATH, \"vae\": VAE_DECODER_PATH, \"text_encoder\": TEXT_ENCODER_PATH, \"text_encoder_2\": TEXT_ENCODER_2_PATH}\n", "\n", "if load_t5.value:\n", " models_dict[\"text_encoder_3\"] = TEXT_ENCODER_3_PATH\n", "\n", "ov_pipe = init_pipeline(models_dict, device.value, use_flash_lora.value)" ] }, { "cell_type": "code", "execution_count": 10, "id": "dcdb8b0f-b2fb-4f70-b0e1-3ecb06e30c7d", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "22f9182c618341e296a883fa41f36230", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "\n", "image = ov_pipe(\n", " \"A raccoon trapped inside a glass jar full of colorful candies, the background is steamy with vivid colors\",\n", " negative_prompt=\"\",\n", " num_inference_steps=28 if not use_flash_lora.value else 4,\n", " guidance_scale=5 if not use_flash_lora.value else 0,\n", " height=512,\n", " width=512,\n", " generator=torch.Generator().manual_seed(141),\n", ").images[0]\n", "image" ] }, { "cell_type": "markdown", "id": "e3cd1ba0", "metadata": {}, "source": [ "## Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "[NNCF](https://github.com/openvinotoolkit/nncf/) enables post-training quantization by adding quantization layers into model graph and then using a subset of the training dataset to initialize the parameters of these additional quantization layers. Quantized operations are executed in `INT8` instead of `FP32`/`FP16` making model inference faster.\n", "\n", "According to `OVStableDiffusion3Pipeline` structure, the `transformer` model takes up significant portion of the overall pipeline execution time. Now we will show you how to optimize the UNet part using [NNCF](https://github.com/openvinotoolkit/nncf/) to reduce computation cost and speed up the pipeline. Quantizing the rest of the pipeline does not significantly improve inference performance but can lead to a substantial degradation of accuracy. That's why we use 4-bit weight compression for the rest of the pipeline to reduce memory footprint.\n", "\n", "Please select below whether you would like to run quantization to improve model inference speed.\n", "\n", "> **NOTE**: Quantization is time and memory consuming operation. Running quantization code below may take some time." ] }, { "cell_type": "code", "execution_count": 11, "id": "73995ea5", "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...\n", "To disable this warning, you can either:\n", "\t- Avoid using `tokenizers` before the fork if possible\n", "\t- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "INFO:nncf:NNCF initialized successfully. Supported frameworks detected: torch, tensorflow, onnx, openvino\n" ] }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "80e8be075e9148c785f7c32a5b879b0f", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Quantization')" ] }, "execution_count": 11, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from notebook_utils import quantization_widget\n", "from sd3_quantization_helper import TRANSFORMER_INT8_PATH, TEXT_ENCODER_INT4_PATH, TEXT_ENCODER_2_INT4_PATH, TEXT_ENCODER_3_INT4_PATH, VAE_DECODER_INT4_PATH\n", "\n", "to_quantize = quantization_widget()\n", "\n", "to_quantize" ] }, { "cell_type": "markdown", "id": "40c7e6c3", "metadata": {}, "source": [ "Let's load `skip magic` extension to skip quantization if `to_quantize` is not selected" ] }, { "cell_type": "code", "execution_count": 12, "id": "7925a97d", "metadata": {}, "outputs": [], "source": [ "# Fetch `skip_kernel_extension` module\n", "import requests\n", "\n", "r = requests.get(\n", " url=\"https://raw.githubusercontent.com/openvinotoolkit/openvino_notebooks/latest/utils/skip_kernel_extension.py\",\n", ")\n", "open(\"skip_kernel_extension.py\", \"w\").write(r.text)\n", "\n", "optimized_pipe = None\n", "\n", "opt_models_dict = {\n", " \"transformer\": TRANSFORMER_INT8_PATH,\n", " \"text_encoder\": TEXT_ENCODER_INT4_PATH,\n", " \"text_encoder_2\": TEXT_ENCODER_2_INT4_PATH,\n", " \"vae\": VAE_DECODER_INT4_PATH,\n", "}\n", "\n", "if TEXT_ENCODER_3_PATH.exists():\n", " opt_models_dict[\"text_encoder_3\"] = TEXT_ENCODER_3_INT4_PATH\n", "\n", "%load_ext skip_kernel_extension" ] }, { "cell_type": "markdown", "id": "713267b2", "metadata": {}, "source": [ "### Prepare calibration dataset\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "We use a portion of [`google-research-datasets/conceptual_captions`](https://huggingface.co/datasets/google-research-datasets/conceptual_captions) dataset from Hugging Face as calibration data. We use prompts below to guide image generation and to determine what not to include in the resulting image." ] }, { "cell_type": "markdown", "id": "dfabab36", "metadata": {}, "source": [ "To collect intermediate model inputs for calibration we should customize `CompiledModel`. We should set the height and width of the image to 512 to reduce memory consumption during quantization." ] }, { "cell_type": "code", "execution_count": 13, "id": "fb18ba2a", "metadata": {}, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "from sd3_quantization_helper import collect_calibration_data, TRANSFORMER_INT8_PATH\n", "\n", "# Uncomment the line to see calibration data collection code\n", "# ??collect_calibration_data\n" ] }, { "cell_type": "markdown", "id": "247bf1ad", "metadata": {}, "source": [ "### Run Quantization\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Quantization of the first `Convolution` layer impacts the generation results. We recommend using `IgnoredScope` to keep accuracy sensitive layers in FP16 precision." ] }, { "cell_type": "code", "execution_count": 14, "id": "8f3005bc", "metadata": { "test_replace": { "__module.model.base_model.model.pos_embed.proj.base_layer/aten::_convolution/Convolution": "__module.pos_embed.proj/aten::_convolution/Convolution" } }, "outputs": [], "source": [ "%%skip not $to_quantize.value\n", "\n", "import nncf\n", "import gc\n", "import openvino as ov\n", "\n", "core = ov.Core()\n", "\n", "\n", "if not TRANSFORMER_INT8_PATH.exists():\n", " calibration_dataset_size = 200\n", " print(\"Calibration data collection started\")\n", " unet_calibration_data = collect_calibration_data(ov_pipe,\n", " calibration_dataset_size=calibration_dataset_size,\n", " num_inference_steps=28 if not use_flash_lora.value else 4,\n", " guidance_scale=5 if not use_flash_lora.value else 0\n", " )\n", " print(\"Calibration data collection finished\")\n", " \n", " del ov_pipe\n", " gc.collect()\n", " ov_pipe = None\n", "\n", " transformer = core.read_model(TRANSFORMER_PATH)\n", " quantized_model = nncf.quantize(\n", " model=transformer,\n", " calibration_dataset=nncf.Dataset(unet_calibration_data),\n", " subset_size=calibration_dataset_size,\n", " model_type=nncf.ModelType.TRANSFORMER,\n", " ignored_scope=nncf.IgnoredScope(names=[\"__module.model.base_model.model.pos_embed.proj.base_layer/aten::_convolution/Convolution\"]),\n", " )\n", "\n", " ov.save_model(quantized_model, TRANSFORMER_INT8_PATH)" ] }, { "cell_type": "markdown", "id": "7fe60a92", "metadata": {}, "source": [ "### Run Weights Compression\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Quantizing of the `Text Encoders` and `Autoencoder` does not significantly improve inference performance but can lead to a substantial degradation of accuracy.\n", "\n", "For reducing model memory consumption we will use weights compression. The [Weights Compression](https://docs.openvino.ai/2024/openvino-workflow/model-optimization-guide/weight-compression.html) algorithm is aimed at compressing the weights of the models and can be used to optimize the model footprint and performance of large models where the size of weights is relatively larger than the size of activations, for example, Large Language Models (LLM). Compared to INT8 compression, INT4 compression improves performance even more, but introduces a minor drop in prediction quality." ] }, { "cell_type": "code", "execution_count": 15, "id": "4e732aeb", "metadata": { "test_replace": { "compress_models()": "compress_models(-1)" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Compressed text_encoder can be found in stable-diffusion-3/text_encoder_int4.xml\n", "Compressed text_encoder_2 can be found in stable-diffusion-3/text_encoder_2_int4.xml\n", "Compressed vae_decoder can be found in stable-diffusion-3/vae_decoder_int4.xml\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "from sd3_quantization_helper import compress_models\n", "\n", "compress_models()" ] }, { "cell_type": "markdown", "id": "22d6caaf", "metadata": {}, "source": [ "Let's compare the images generated by the original and optimized pipelines." ] }, { "cell_type": "code", "execution_count": 16, "id": "b58e683f", "metadata": { "test_replace": { "init_pipeline(opt_models_dict, device.value, use_flash_lora.value)": "init_pipeline(opt_models_dict, device.value, use_flash_lora.value, 32)" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Models compilation\n", "transformer - Done!\n", "text_encoder - Done!\n", "text_encoder_2 - Done!\n", "vae - Done!\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "optimized_pipe = init_pipeline(opt_models_dict, device.value, use_flash_lora.value)" ] }, { "cell_type": "code", "execution_count": 17, "id": "e998c6d9", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "7c6d6f5156f048a9867552d37d9098fa", "version_major": 2, "version_minor": 0 }, "text/plain": [ " 0%| | 0/4 [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "from sd3_quantization_helper import visualize_results\n", "\n", "opt_image = optimized_pipe(\n", " \"A raccoon trapped inside a glass jar full of colorful candies, the background is steamy with vivid colors\",\n", " negative_prompt=\"\",\n", " num_inference_steps=28 if not use_flash_lora.value else 4,\n", " guidance_scale=5 if not use_flash_lora.value else 0,\n", " height=512,\n", " width=512,\n", " generator=torch.Generator().manual_seed(141),\n", ").images[0]\n", "\n", "visualize_results(image, opt_image)" ] }, { "cell_type": "markdown", "id": "ca37a552", "metadata": {}, "source": [ "### Compare model file sizes\n", "[back to top ⬆️](#Table-of-contents:)" ] }, { "cell_type": "code", "execution_count": 18, "id": "7af2c81e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "transformer compression rate: 1.939\n", "text_encoder compression rate: 2.714\n", "text_encoder_2 compression rate: 3.057\n", "vae_decoder compression rate: 2.007\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "from sd3_quantization_helper import compare_models_size\n", "\n", "del optimized_pipe\n", "gc.collect()\n", "\n", "compare_models_size()" ] }, { "cell_type": "markdown", "id": "734e3cd6", "metadata": {}, "source": [ "### Compare inference time of the FP16 and optimized pipelines\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "To measure the inference performance of the `FP16` and optimized pipelines, we use mean inference time on 5 samples.\n", "\n", "> **NOTE**: For the most accurate performance estimation, it is recommended to run `benchmark_app` in a terminal/command prompt after closing other applications." ] }, { "cell_type": "code", "execution_count": 19, "id": "dbbe98f9", "metadata": { "test_replace": { "compare_perf(models_dict, opt_models_dict, device.value, use_flash_lora.value, validation_size=5)": "compare_perf(models_dict, opt_models_dict, device.value, use_flash_lora.value, 5, 32)" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Load FP16 pipeline\n", "Models compilation\n", "transformer - Done!\n", "vae - Done!\n", "text_encoder - Done!\n", "text_encoder_2 - Done!\n", "Load Optimized pipeline\n", "Models compilation\n", "transformer - Done!\n", "text_encoder - Done!\n", "text_encoder_2 - Done!\n", "vae - Done!\n", "Performance speed-up: 1.540\n" ] } ], "source": [ "%%skip not $to_quantize.value\n", "\n", "from sd3_quantization_helper import compare_perf\n", "\n", "compare_perf(models_dict, opt_models_dict, device.value, use_flash_lora.value, validation_size=5)" ] }, { "cell_type": "markdown", "id": "742d515d-4565-4e2a-bfa1-1854ff6fd726", "metadata": {}, "source": [ "## Interactive demo\n", "[back to top ⬆️](#Table-of-contents:)\n", "\n", "Please select below whether you would like to use the quantized models to launch the interactive demo." ] }, { "cell_type": "code", "execution_count": 20, "id": "2307e06c", "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "5ef24a84244544c3872350059660503b", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Checkbox(value=True, description='Use quantized models')" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sd3_helper import get_pipeline_selection_option\n", "\n", "use_quantized_models = get_pipeline_selection_option(opt_models_dict)\n", "\n", "use_quantized_models" ] }, { "cell_type": "code", "execution_count": null, "id": "969f005a-2e6f-4709-b596-4fd0ea99a445", "metadata": { "tags": [] }, "outputs": [], "source": [ "from gradio_helper import make_demo\n", "\n", "ov_pipe = init_pipeline(models_dict if not use_quantized_models.value else opt_models_dict, device.value, use_flash_lora.value)\n", "demo = make_demo(ov_pipe, use_flash_lora.value)\n", "\n", "# if you are launching remotely, specify server_name and server_port\n", "# demo.launch(server_name='your server name', server_port='server port in int')\n", "# if you have any issue to launch on your platform, you can pass share=True to launch method:\n", "# demo.launch(share=True)\n", "# it creates a publicly shareable link for the interface. Read more in the docs: https://gradio.app/docs/\n", "try:\n", " demo.launch(debug=True)\n", "except Exception:\n", " demo.launch(debug=True, share=True)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" }, "openvino_notebooks": { "imageUrl": "https://github.com/openvinotoolkit/openvino_notebooks/assets/29454499/ac99098c-66ec-4b7b-9e01-e80625f1dc3f", "tags": { "categories": [ "Model Demos", "AI Trends" ], "libraries": [], "other": [ "Stable Diffusion" ], "tasks": [ "Text-to-Image" ] } }, "widgets": { "application/vnd.jupyter.widget-state+json": { "state": {}, "version_major": 2, "version_minor": 0 } } }, "nbformat": 4, "nbformat_minor": 5 }