{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Copyright 2019 NVIDIA Corporation. All Rights Reserved.\n", "#\n", "# Licensed under the Apache License, Version 2.0 (the \"License\");\n", "# you may not use this file except in compliance with the License.\n", "# You may obtain a copy of the License at\n", "#\n", "# http://www.apache.org/licenses/LICENSE-2.0\n", "#\n", "# Unless required by applicable law or agreed to in writing, software\n", "# distributed under the License is distributed on an \"AS IS\" BASIS,\n", "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n", "# See the License for the specific language governing permissions and\n", "# limitations under the License.\n", "# ==============================================================================" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "\n", "# Torch-TensorRT Getting Started - ResNet 50" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Overview\n", "\n", "In the practice of developing machine learning models, there are few tools as approachable as PyTorch for developing and experimenting in designing machine learning models. The power of PyTorch comes from its deep integration into Python, its flexibility and its approach to automatic differentiation and execution (eager execution). However, when moving from research into production, the requirements change and we may no longer want that deep Python integration and we want optimization to get the best performance we can on our deployment platform. In PyTorch 1.0, TorchScript was introduced as a method to separate your PyTorch model from Python, make it portable and optimizable. TorchScript uses PyTorch's JIT compiler to transform your normal PyTorch code which gets interpreted by the Python interpreter to an intermediate representation (IR) which can have optimizations run on it and at runtime can get interpreted by the PyTorch JIT interpreter. For PyTorch this has opened up a whole new world of possibilities, including deployment in other languages like C++. It also introduces a structured graph based format that we can use to do down to the kernel level optimization of models for inference.\n", "\n", "When deploying on NVIDIA GPUs TensorRT, NVIDIA's Deep Learning Optimization SDK and Runtime is able to take models from any major framework and specifically tune them to perform better on specific target hardware in the NVIDIA family be it an A100, TITAN V, Jetson Xavier or NVIDIA's Deep Learning Accelerator. TensorRT performs a couple sets of optimizations to achieve this. TensorRT fuses layers and tensors in the model graph, it then uses a large kernel library to select implementations that perform best on the target GPU. TensorRT also has strong support for reduced operating precision execution which allows users to leverage the Tensor Cores on Volta and newer GPUs as well as reducing memory and computation footprints on device.\n", "\n", "Torch-TensorRT is a compiler that uses TensorRT to optimize TorchScript code, compiling standard TorchScript modules into ones that internally run with TensorRT optimizations. This enables you to continue to remain in the PyTorch ecosystem, using all the great features PyTorch has such as module composability, its flexible tensor implementation, data loaders and more. Torch-TensorRT is available to use with both PyTorch and LibTorch." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Learning objectives\n", "\n", "This notebook demonstrates the steps for compiling a TorchScript module with Torch-TensorRT on a pretrained ResNet-50 network, and running it to test the speedup obtained.\n", "\n", "## Content\n", "1. [Requirements](#1)\n", "1. [ResNet-50 Overview](#2)\n", "1. [Creating TorchScript modules](#3)\n", "1. [Compiling with Torch-TensorRT](#4)\n", "1. [Conclusion](#5)" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Requirement already satisfied: numpy==1.21.2 in /opt/conda/lib/python3.8/site-packages (1.21.2)\n", "Requirement already satisfied: scipy==1.5.2 in /opt/conda/lib/python3.8/site-packages (1.5.2)\n", "Requirement already satisfied: Pillow==6.2.0 in /opt/conda/lib/python3.8/site-packages (6.2.0)\n", "Requirement already satisfied: scikit-image==0.17.2 in /opt/conda/lib/python3.8/site-packages (0.17.2)\n", "Requirement already satisfied: matplotlib==3.3.0 in /opt/conda/lib/python3.8/site-packages (3.3.0)\n", "Requirement already satisfied: PyWavelets>=1.1.1 in /opt/conda/lib/python3.8/site-packages (from scikit-image==0.17.2) (1.1.1)\n", "Requirement already satisfied: tifffile>=2019.7.26 in /opt/conda/lib/python3.8/site-packages (from scikit-image==0.17.2) (2021.10.12)\n", "Requirement already satisfied: networkx>=2.0 in /opt/conda/lib/python3.8/site-packages (from scikit-image==0.17.2) (2.0)\n", "Requirement already satisfied: imageio>=2.3.0 in /opt/conda/lib/python3.8/site-packages (from scikit-image==0.17.2) (2.9.0)\n", "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.3 in /opt/conda/lib/python3.8/site-packages (from matplotlib==3.3.0) (2.4.7)\n", "Requirement already satisfied: python-dateutil>=2.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib==3.3.0) (2.8.2)\n", "Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.8/site-packages (from matplotlib==3.3.0) (1.3.2)\n", "Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.8/site-packages (from matplotlib==3.3.0) (0.10.0)\n", "Requirement already satisfied: six in /opt/conda/lib/python3.8/site-packages (from cycler>=0.10->matplotlib==3.3.0) (1.16.0)\n", "Requirement already satisfied: decorator>=4.1.0 in /opt/conda/lib/python3.8/site-packages (from networkx>=2.0->scikit-image==0.17.2) (5.0.9)\n", "\u001b[33mWARNING: Running pip as the 'root' user can result in broken permissions and conflicting behaviour with the system package manager. It is recommended to use a virtual environment instead: https://pip.pypa.io/warnings/venv\u001b[0m\n", "Tue Oct 26 00:13:31 2021 \n", "+-----------------------------------------------------------------------------+\n", "| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.4 |\n", "|-------------------------------+----------------------+----------------------+\n", "| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |\n", "| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |\n", "| | | MIG M. |\n", "|===============================+======================+======================|\n", "| 0 Tesla V100-PCIE... On | 00000000:1A:00.0 Off | 0 |\n", "| N/A 37C P0 24W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 1 Tesla V100-PCIE... On | 00000000:1B:00.0 Off | 0 |\n", "| N/A 33C P0 22W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 2 Tesla V100-PCIE... On | 00000000:3D:00.0 Off | 0 |\n", "| N/A 32C P0 24W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 3 Tesla V100-PCIE... On | 00000000:3E:00.0 Off | 0 |\n", "| N/A 33C P0 24W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 4 Tesla V100-PCIE... On | 00000000:88:00.0 Off | 0 |\n", "| N/A 32C P0 25W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 5 Tesla V100-PCIE... On | 00000000:89:00.0 Off | 0 |\n", "| N/A 31C P0 22W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 6 Tesla V100-PCIE... On | 00000000:B1:00.0 Off | 0 |\n", "| N/A 32C P0 24W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", "| 7 Tesla V100-PCIE... On | 00000000:B2:00.0 Off | 0 |\n", "| N/A 32C P0 25W / 250W | 0MiB / 32510MiB | 0% Default |\n", "| | | N/A |\n", "+-------------------------------+----------------------+----------------------+\n", " \n", "+-----------------------------------------------------------------------------+\n", "| Processes: |\n", "| GPU GI CI PID Type Process name GPU Memory |\n", "| ID ID Usage |\n", "|=============================================================================|\n", "| No running processes found |\n", "+-----------------------------------------------------------------------------+\n" ] } ], "source": [ "!pip install numpy==1.21.2 scipy==1.5.2 Pillow==6.2.0 scikit-image==0.17.2 matplotlib==3.3.0\n", "!nvidia-smi" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 1. Requirements\n", "\n", "Follow the steps in `notebooks/README` to prepare a Docker container, within which you can run this notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 2. ResNet-50 Overview\n", "\n", "\n", "PyTorch has a model repository called the PyTorch Hub, which is a source for high quality implementations of common models. We can get our ResNet-50 model from there pretrained on ImageNet.\n", "\n", "### Model Description\n", "\n", "This ResNet-50 model is based on the [Deep Residual Learning for Image Recognition](https://arxiv.org/pdf/1512.03385.pdf) paper, which describes ResNet as “a method for detecting objects in images using a single deep neural network\". The input size is fixed to 32x32.\n", "\n", "\"alt\"\n", "\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "scrolled": true }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "Using cache found in /root/.cache/torch/hub/pytorch_vision_v0.10.0\n" ] }, { "data": { "text/plain": [ "ResNet(\n", " (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)\n", " (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)\n", " (layer1): Sequential(\n", " (0): Bottleneck(\n", " (conv1): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " (downsample): Sequential(\n", " (0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " )\n", " )\n", " (1): Bottleneck(\n", " (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (2): Bottleneck(\n", " (conv1): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " )\n", " (layer2): Sequential(\n", " (0): Bottleneck(\n", " (conv1): Conv2d(256, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " (downsample): Sequential(\n", " (0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)\n", " (1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " )\n", " )\n", " (1): Bottleneck(\n", " (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (2): Bottleneck(\n", " (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (3): Bottleneck(\n", " (conv1): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " )\n", " (layer3): Sequential(\n", " (0): Bottleneck(\n", " (conv1): Conv2d(512, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " (downsample): Sequential(\n", " (0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2), bias=False)\n", " (1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " )\n", " )\n", " (1): Bottleneck(\n", " (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (2): Bottleneck(\n", " (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (3): Bottleneck(\n", " (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (4): Bottleneck(\n", " (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (5): Bottleneck(\n", " (conv1): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " )\n", " (layer4): Sequential(\n", " (0): Bottleneck(\n", " (conv1): Conv2d(1024, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " (downsample): Sequential(\n", " (0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2), bias=False)\n", " (1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " )\n", " )\n", " (1): Bottleneck(\n", " (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " (2): Bottleneck(\n", " (conv1): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)\n", " (bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (conv3): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1), bias=False)\n", " (bn3): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)\n", " (relu): ReLU(inplace=True)\n", " )\n", " )\n", " (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))\n", " (fc): Linear(in_features=2048, out_features=1000, bias=True)\n", ")" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import torch\n", "import torchvision\n", "\n", "torch.hub._validate_not_a_forked_repo=lambda a,b,c: True\n", "\n", "resnet50_model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet50', pretrained=True)\n", "resnet50_model.eval()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "All pre-trained models expect input images normalized in the same way,\n", "i.e. mini-batches of 3-channel RGB images of shape `(3 x H x W)`, where `H` and `W` are expected to be at least `224`.\n", "The images have to be loaded in to a range of `[0, 1]` and then normalized using `mean = [0.485, 0.456, 0.406]`\n", "and `std = [0.229, 0.224, 0.225]`.\n", "\n", "Here's a sample execution." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mkdir: cannot create directory ‘./data’: File exists\n", "--2021-10-26 00:13:33-- https://d17fnq9dkz9hgj.cloudfront.net/breed-uploads/2018/08/siberian-husky-detail.jpg?bust=1535566590&width=630\n", "Resolving d17fnq9dkz9hgj.cloudfront.net (d17fnq9dkz9hgj.cloudfront.net)... 13.226.251.36, 13.226.251.27, 13.226.251.107, ...\n", "Connecting to d17fnq9dkz9hgj.cloudfront.net (d17fnq9dkz9hgj.cloudfront.net)|13.226.251.36|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 24112 (24K) [image/jpeg]\n", "Saving to: ‘./data/img0.JPG’\n", "\n", "./data/img0.JPG 100%[===================>] 23.55K --.-KB/s in 0.002s \n", "\n", "2021-10-26 00:13:34 (13.1 MB/s) - ‘./data/img0.JPG’ saved [24112/24112]\n", "\n", "--2021-10-26 00:13:34-- https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg\n", "Resolving www.hakaimagazine.com (www.hakaimagazine.com)... 23.185.0.4, 2620:12a:8001::4, 2620:12a:8000::4\n", "Connecting to www.hakaimagazine.com (www.hakaimagazine.com)|23.185.0.4|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 452718 (442K) [image/jpeg]\n", "Saving to: ‘./data/img1.JPG’\n", "\n", "./data/img1.JPG 100%[===================>] 442.11K --.-KB/s in 0.02s \n", "\n", "2021-10-26 00:13:35 (28.3 MB/s) - ‘./data/img1.JPG’ saved [452718/452718]\n", "\n", "--2021-10-26 00:13:36-- https://www.artis.nl/media/filer_public_thumbnails/filer_public/00/f1/00f1b6db-fbed-4fef-9ab0-84e944ff11f8/chimpansee_amber_r_1920x1080.jpg__1920x1080_q85_subject_location-923%2C365_subsampling-2.jpg\n", "Resolving www.artis.nl (www.artis.nl)... 94.75.225.20\n", "Connecting to www.artis.nl (www.artis.nl)|94.75.225.20|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 361413 (353K) [image/jpeg]\n", "Saving to: ‘./data/img2.JPG’\n", "\n", "./data/img2.JPG 100%[===================>] 352.94K 790KB/s in 0.4s \n", "\n", "2021-10-26 00:13:38 (790 KB/s) - ‘./data/img2.JPG’ saved [361413/361413]\n", "\n", "--2021-10-26 00:13:38-- https://www.familyhandyman.com/wp-content/uploads/2018/09/How-to-Avoid-Snakes-Slithering-Up-Your-Toilet-shutterstock_780480850.jpg\n", "Resolving www.familyhandyman.com (www.familyhandyman.com)... 104.18.201.107, 104.18.202.107, 2606:4700::6812:ca6b, ...\n", "Connecting to www.familyhandyman.com (www.familyhandyman.com)|104.18.201.107|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 90994 (89K) [image/jpeg]\n", "Saving to: ‘./data/img3.JPG’\n", "\n", "./data/img3.JPG 100%[===================>] 88.86K --.-KB/s in 0.009s \n", "\n", "2021-10-26 00:13:38 (9.64 MB/s) - ‘./data/img3.JPG’ saved [90994/90994]\n", "\n", "--2021-10-26 00:13:39-- https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json\n", "Resolving s3.amazonaws.com (s3.amazonaws.com)... 52.217.225.168\n", "Connecting to s3.amazonaws.com (s3.amazonaws.com)|52.217.225.168|:443... connected.\n", "HTTP request sent, awaiting response... 200 OK\n", "Length: 35363 (35K) [application/octet-stream]\n", "Saving to: ‘./data/imagenet_class_index.json’\n", "\n", "./data/imagenet_cla 100%[===================>] 34.53K --.-KB/s in 0.07s \n", "\n", "2021-10-26 00:13:39 (486 KB/s) - ‘./data/imagenet_class_index.json’ saved [35363/35363]\n", "\n" ] } ], "source": [ "!mkdir ./data\n", "!wget -O ./data/img0.JPG \"https://d17fnq9dkz9hgj.cloudfront.net/breed-uploads/2018/08/siberian-husky-detail.jpg?bust=1535566590&width=630\"\n", "!wget -O ./data/img1.JPG \"https://www.hakaimagazine.com/wp-content/uploads/header-gulf-birds.jpg\"\n", "!wget -O ./data/img2.JPG \"https://www.artis.nl/media/filer_public_thumbnails/filer_public/00/f1/00f1b6db-fbed-4fef-9ab0-84e944ff11f8/chimpansee_amber_r_1920x1080.jpg__1920x1080_q85_subject_location-923%2C365_subsampling-2.jpg\"\n", "!wget -O ./data/img3.JPG \"https://www.familyhandyman.com/wp-content/uploads/2018/09/How-to-Avoid-Snakes-Slithering-Up-Your-Toilet-shutterstock_780480850.jpg\"\n", "\n", "!wget -O ./data/imagenet_class_index.json \"https://s3.amazonaws.com/deep-learning-models/image-models/imagenet_class_index.json\"" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "from PIL import Image\n", "from torchvision import transforms\n", "import matplotlib.pyplot as plt\n", "\n", "fig, axes = plt.subplots(nrows=2, ncols=2)\n", "\n", "for i in range(4):\n", " img_path = './data/img%d.JPG'%i\n", " img = Image.open(img_path)\n", " preprocess = transforms.Compose([\n", " transforms.Resize(256),\n", " transforms.CenterCrop(224),\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", "])\n", " input_tensor = preprocess(img) \n", " plt.subplot(2,2,i+1)\n", " plt.imshow(img)\n", " plt.axis('off')" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Number of classes in ImageNet: 1000\n" ] } ], "source": [ "import json \n", " \n", "with open(\"./data/imagenet_class_index.json\") as json_file: \n", " d = json.load(json_file)\n", " \n", "print(\"Number of classes in ImageNet: {}\".format(len(d)))" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "\n", "def rn50_preprocess():\n", " preprocess = transforms.Compose([\n", " transforms.Resize(256),\n", " transforms.CenterCrop(224),\n", " transforms.ToTensor(),\n", " transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),\n", " ])\n", " return preprocess\n", "\n", "# decode the results into ([predicted class, description], probability)\n", "def predict(img_path, model):\n", " img = Image.open(img_path)\n", " preprocess = rn50_preprocess()\n", " input_tensor = preprocess(img)\n", " input_batch = input_tensor.unsqueeze(0) # create a mini-batch as expected by the model\n", " \n", " # move the input and model to GPU for speed if available\n", " if torch.cuda.is_available():\n", " input_batch = input_batch.to('cuda')\n", " model.to('cuda')\n", "\n", " with torch.no_grad():\n", " output = model(input_batch)\n", " # Tensor of shape 1000, with confidence scores over Imagenet's 1000 classes\n", " sm_output = torch.nn.functional.softmax(output[0], dim=0)\n", " \n", " ind = torch.argmax(sm_output)\n", " return d[str(ind.item())], sm_output[ind] #([predicted class, description], probability)" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "./data/img0.JPG - Predicted: ['n02110185', 'Siberian_husky'], Probablility: 0.49787256121635437\n", "./data/img1.JPG - Predicted: ['n01820546', 'lorikeet'], Probablility: 0.6447006464004517\n", "./data/img2.JPG - Predicted: ['n02481823', 'chimpanzee'], Probablility: 0.9899842739105225\n", "./data/img3.JPG - Predicted: ['n01749939', 'green_mamba'], Probablility: 0.4564124643802643\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "for i in range(4):\n", " img_path = './data/img%d.JPG'%i\n", " img = Image.open(img_path)\n", " \n", " pred, prob = predict(img_path, resnet50_model)\n", " print('{} - Predicted: {}, Probablility: {}'.format(img_path, pred, prob))\n", "\n", " plt.subplot(2,2,i+1)\n", " plt.imshow(img);\n", " plt.axis('off');\n", " plt.title(pred[1])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Benchmark utility" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define a helper function to benchmark a model." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [], "source": [ "import time\n", "import numpy as np\n", "\n", "import torch.backends.cudnn as cudnn\n", "cudnn.benchmark = True\n", "\n", "def benchmark(model, input_shape=(1024, 1, 224, 224), dtype='fp32', nwarmup=50, nruns=10000):\n", " input_data = torch.randn(input_shape)\n", " input_data = input_data.to(\"cuda\")\n", " if dtype=='fp16':\n", " input_data = input_data.half()\n", " \n", " print(\"Warm up ...\")\n", " with torch.no_grad():\n", " for _ in range(nwarmup):\n", " features = model(input_data)\n", " torch.cuda.synchronize()\n", " print(\"Start timing ...\")\n", " timings = []\n", " with torch.no_grad():\n", " for i in range(1, nruns+1):\n", " start_time = time.time()\n", " features = model(input_data)\n", " torch.cuda.synchronize()\n", " end_time = time.time()\n", " timings.append(end_time - start_time)\n", " if i%10==0:\n", " print('Iteration %d/%d, ave batch time %.2f ms'%(i, nruns, np.mean(timings)*1000))\n", "\n", " print(\"Input shape:\", input_data.size())\n", " print(\"Output features size:\", features.size())\n", " print('Average batch time: %.2f ms'%(np.mean(timings)*1000))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warm up ...\n", "Start timing ...\n", "Iteration 10/100, ave batch time 109.12 ms\n", "Iteration 20/100, ave batch time 109.08 ms\n", "Iteration 30/100, ave batch time 109.10 ms\n", "Iteration 40/100, ave batch time 109.12 ms\n", "Iteration 50/100, ave batch time 109.11 ms\n", "Iteration 60/100, ave batch time 109.10 ms\n", "Iteration 70/100, ave batch time 109.10 ms\n", "Iteration 80/100, ave batch time 109.11 ms\n", "Iteration 90/100, ave batch time 109.13 ms\n", "Iteration 100/100, ave batch time 109.13 ms\n", "Input shape: torch.Size([128, 3, 224, 224])\n", "Output features size: torch.Size([128, 1000])\n", "Average batch time: 109.13 ms\n" ] } ], "source": [ "# Model benchmark without Torch-TensorRT\n", "model = resnet50_model.eval().to(\"cuda\")\n", "benchmark(model, input_shape=(128, 3, 224, 224), nruns=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 3. Creating TorchScript modules\n", "\n", "To compile with Torch-TensorRT, the model must first be in **TorchScript**. TorchScript is a programming language included in PyTorch which removes the Python dependency normal PyTorch models have. This conversion is done via a JIT compiler which given a PyTorch Module will generate an equivalent TorchScript Module. There are two paths that can be used to generate TorchScript: **Tracing** and **Scripting**. \n", "\n", "- Tracing follows execution of PyTorch generating ops in TorchScript corresponding to what it sees. \n", "- Scripting does an analysis of the Python code and generates TorchScript, this allows the resulting graph to include control flow which tracing cannot do. \n", "\n", "Tracing is more likely to compile successfully with Torch-TensorRT due to simplicity (though both systems are supported). We start with an example of the traced model in TorchScript." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Tracing\n", "\n", "Tracing follows the path of execution when the module is called and records what happens. This recording is what the TorchScript IR will describe. \n", "\n", "To trace an instance of the model, we can call torch.jit.trace with an example input. " ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [], "source": [ "model = resnet50_model.eval().to(\"cuda\")\n", "traced_model = torch.jit.trace(model, [torch.randn((128, 3, 224, 224)).to(\"cuda\")])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can save this model and use it independently of Python." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [], "source": [ "# This is just an example, and not required for the purposes of this demo\n", "torch.jit.save(traced_model, \"resnet_50_traced.jit.pt\")" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warm up ...\n", "Start timing ...\n", "Iteration 10/100, ave batch time 109.04 ms\n", "Iteration 20/100, ave batch time 109.04 ms\n", "Iteration 30/100, ave batch time 109.03 ms\n", "Iteration 40/100, ave batch time 109.05 ms\n", "Iteration 50/100, ave batch time 109.04 ms\n", "Iteration 60/100, ave batch time 109.04 ms\n", "Iteration 70/100, ave batch time 109.04 ms\n", "Iteration 80/100, ave batch time 109.04 ms\n", "Iteration 90/100, ave batch time 109.05 ms\n", "Iteration 100/100, ave batch time 109.06 ms\n", "Input shape: torch.Size([128, 3, 224, 224])\n", "Output features size: torch.Size([128, 1000])\n", "Average batch time: 109.06 ms\n" ] } ], "source": [ "# Obtain the average time taken by a batch of input\n", "benchmark(traced_model, input_shape=(128, 3, 224, 224), nruns=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 4. Compiling with Torch-TensorRT" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "TorchScript modules behave just like normal PyTorch modules and are intercompatible. From TorchScript we can now compile a TensorRT based module. This module will still be implemented in TorchScript but all the computation will be done in TensorRT.\n", "\n", "As mentioned earlier, we start with an example of Torch-TensorRT compilation with the traced model.\n", "\n", "Note that we show benchmarking results of two precisions: FP32 (single precision) and FP16 (half precision)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### FP32 (single precision)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: [Torch-TensorRT TorchScript Conversion Context] - The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieved with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.\n", "\n", "WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter\n", "WARNING: [Torch-TensorRT] - Detected invalid timing cache, setup a local cache instead\n" ] } ], "source": [ "import torch_tensorrt\n", "\n", "# The compiled module will have precision as specified by \"op_precision\".\n", "# Here, it will have FP16 precision.\n", "trt_model_fp32 = torch_tensorrt.compile(traced_model, **{\n", " \"inputs\": [torch_tensorrt.Input((128, 3, 224, 224), dtype=torch.float32)],\n", " \"enabled_precisions\": {torch.float32}, # Run with FP32\n", " \"workspace_size\": 1 << 22\n", "})\n", "\n" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warm up ...\n", "Start timing ...\n", "Iteration 10/100, ave batch time 93.58 ms\n", "Iteration 20/100, ave batch time 85.57 ms\n", "Iteration 30/100, ave batch time 92.02 ms\n", "Iteration 40/100, ave batch time 89.07 ms\n", "Iteration 50/100, ave batch time 86.80 ms\n", "Iteration 60/100, ave batch time 89.88 ms\n", "Iteration 70/100, ave batch time 88.58 ms\n", "Iteration 80/100, ave batch time 87.30 ms\n", "Iteration 90/100, ave batch time 86.28 ms\n", "Iteration 100/100, ave batch time 88.27 ms\n", "Input shape: torch.Size([128, 3, 224, 224])\n", "Output features size: torch.Size([128, 1000])\n", "Average batch time: 88.27 ms\n" ] } ], "source": [ "# Obtain the average time taken by a batch of input\n", "benchmark(trt_model_fp32, input_shape=(128, 3, 224, 224), nruns=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### FP16 (half precision)" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING: [Torch-TensorRT] - For input x.1, found user specified input dtype as Float16, however when inspecting the graph, the input type expected was inferred to be Float\n", "The compiler is going to use the user setting Float16\n", "This conflict may cause an error at runtime due to partial compilation being enabled and therefore\n", "compatibility with PyTorch's data type convention is required.\n", "If you do indeed see errors at runtime either:\n", "- Remove the dtype spec for x.1\n", "- Disable partial compilation by setting require_full_compilation to True\n", "WARNING: [Torch-TensorRT TorchScript Conversion Context] - The logger passed into createInferBuilder differs from one already provided for an existing builder, runtime, or refitter. TensorRT maintains only a single logger pointer at any given time, so the existing value, which can be retrieved with getLogger(), will be used instead. In order to use a new logger, first destroy all existing builder, runner or refitter objects.\n", "\n", "WARNING: [Torch-TensorRT] - Dilation not used in Max pooling converter\n", "WARNING: [Torch-TensorRT] - Detected invalid timing cache, setup a local cache instead\n" ] } ], "source": [ "import torch_tensorrt\n", "\n", "# The compiled module will have precision as specified by \"op_precision\".\n", "# Here, it will have FP16 precision.\n", "trt_model = torch_tensorrt.compile(traced_model, **{\n", " \"inputs\": [torch_tensorrt.Input((128, 3, 224, 224), dtype=torch.half)],\n", " \"enabled_precisions\": {torch.float, torch.half}, # Run with FP16\n", " \"workspace_size\": 1 << 22\n", "})\n" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Warm up ...\n", "Start timing ...\n", "Iteration 10/100, ave batch time 18.76 ms\n", "Iteration 20/100, ave batch time 18.85 ms\n", "Iteration 30/100, ave batch time 18.93 ms\n", "Iteration 40/100, ave batch time 18.96 ms\n", "Iteration 50/100, ave batch time 18.92 ms\n", "Iteration 60/100, ave batch time 18.94 ms\n", "Iteration 70/100, ave batch time 18.98 ms\n", "Iteration 80/100, ave batch time 18.97 ms\n", "Iteration 90/100, ave batch time 19.08 ms\n", "Iteration 100/100, ave batch time 22.90 ms\n", "Input shape: torch.Size([128, 3, 224, 224])\n", "Output features size: torch.Size([128, 1000])\n", "Average batch time: 22.90 ms\n" ] } ], "source": [ "# Obtain the average time taken by a batch of input\n", "benchmark(trt_model, input_shape=(128, 3, 224, 224), dtype='fp16', nruns=100)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "## 5. Conclusion\n", "\n", "In this notebook, we have walked through the complete process of compiling TorchScript models with Torch-TensorRT for ResNet-50 model and test the performance impact of the optimization. With Torch-TensorRT, we observe a speedup of **1.4X** with FP32, and **3.0X** with FP16.\n", "\n", "### What's next\n", "Now it's time to try Torch-TensorRT on your own model. Fill out issues at https://github.com/NVIDIA/Torch-TensorRT. Your involvement will help future development of Torch-TensorRT.\n" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.10" } }, "nbformat": 4, "nbformat_minor": 4 }