System Overview ¶
Torch-TensorRT is primarily a C++ Library with a Python API planned. We use Bazel as our build system and target Linux x86_64 and Linux aarch64 (only natively) right now. The compiler we use is GCC 7.5.0 and the library is untested with compilers before that version so there may be compilation errors if you try to use an older compiler.
The repository is structured into:
-
core: Main compiler source code
-
cpp: C++ API
-
tests: tests of the C++ API, the core and converters
-
py: Python API
-
notebooks: Example applications built with Torch-TensorRT
-
docs: Documentation
-
docsrc: Documentation Source
-
third_party: BUILD files for dependency libraries
-
toolchains: Toolchains for different platforms
The C++ API is unstable and subject to change until the library matures, though most work is done under the hood in the core.
The core has a couple major parts: The top level compiler interface which coordinates ingesting a module, lowering, converting and generating a new module and returning it back to the user. The there are the three main phases of the compiler, the lowering phase, the conversion phase, and the execution phase.
Compiler Phases ¶
Lowering ¶
The lowering is made up of a set of passes (some from PyTorch and some specific to Torch-TensorRT) run over the graph IR to map the large PyTorch opset to a reduced opset that is easier to convert to TensorRT.
Partitioning ¶
:ref: ` partitioning
The phase is optional and enabled by the user. It instructs the compiler to seperate nodes into ones that should run in PyTorch and ones that should run in TensorRT. Criteria for seperation include: Lack of a converter, operator is explicitly set to run in PyTorch by the user or the node has a flag which tells partitioning to run in PyTorch by the module fallback passes.
Conversion ¶
In the conversion phase we traverse the lowered graph and construct an equivalent TensorRT graph. The conversion phase is made up of three main components, a context to manage compile time data, a evaluator library which will execute operations that can be resolved at compile time and a converter library which maps an op from JIT to TensorRT.
Compilation and Runtime ¶
Deploying Torch-TensorRT Programs
The final compilation phase constructs a TorchScript program to run the converted TensorRT engine. It takes a serialized engine and instantiates it within a engine manager, then the compiler will build out a JIT graph that references this engine and wraps it in a module to return to the user. When the user executes the module, the JIT program run in the JIT runtime extended by Torch-TensorRT with the data providied from the user.