torch_tensorrt.ts ¶
Functions ¶
-
torch_tensorrt.ts.
compile
( module: torch.jit._script.ScriptModule , inputs=[] , device=<torch_tensorrt._Device.Device object> , disable_tf32=False , sparse_weights=False , enabled_precisions={} , refit=False , debug=False , strict_types=False , capability=<EngineCapability.default: 0> , num_min_timing_iters=2 , num_avg_timing_iters=1 , workspace_size=0 , max_batch_size=0 , calibrator=None , truncate_long_and_double=False , require_full_compilation=False , min_block_size=3 , torch_executed_ops=[] , torch_executed_modules=[] ) → torch.jit._script.ScriptModule ¶ -
Compile a TorchScript module for NVIDIA GPUs using TensorRT
Takes a existing TorchScript module and a set of settings to configure the compiler and will convert methods to JIT Graphs which call equivalent TensorRT engines
Converts specifically the forward method of a TorchScript Module
- Parameters
-
module ( torch.jit.ScriptModule ) – Source module, a result of tracing or scripting a PyTorch
torch.nn.Module
- Keyword Arguments
-
-
inputs ( List [ Union ( torch_tensorrt.Input , torch.Tensor ) ] ) –
Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.
input=[ torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1 torch_tensorrt.Input( min_shape=(1, 224, 224, 3), opt_shape=(1, 512, 512, 3), max_shape=(1, 1024, 1024, 3), dtype=torch.int32 format=torch.channel_last ), # Dynamic input shape for input #2 torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings ]
-
device ( Union ( torch_tensorrt.Device , torch.device , dict ) ) –
Target device for TensorRT engines to run on
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
-
disable_tf32 ( bool ) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
-
sparse_weights ( bool ) – Enable sparsity for convolution and fully connected layers.
-
enabled_precision ( Set ( Union ( torch.dtype , torch_tensorrt.dtype ) ) ) – The set of datatypes that TensorRT can use when selecting kernels
-
refit ( bool ) – Enable refitting
-
debug ( bool ) – Enable debuggable engine
-
strict_types ( bool ) – Kernels should strictly run in a particular operating precision. Enabled precision should only have one type in the set
-
capability ( torch_tensorrt.EngineCapability ) – Restrict kernel selection to safe gpu kernels or safe dla kernels
-
num_min_timing_iters ( int ) – Number of minimization timing iterations used to select kernels
-
num_avg_timing_iters ( int ) – Number of averaging timing iterations used to select kernels
-
workspace_size ( int ) – Maximum size of workspace given to TensorRT
-
max_batch_size ( int ) – Maximum batch size (must be >= 1 to be set, 0 means not set)
-
truncate_long_and_double ( bool ) – Truncate weights provided in int64 or double (float64) to int32 and float32
-
calibrator ( Union ( torch_tensorrt._C.IInt8Calibrator , tensorrt.IInt8Calibrator ) ) – Calibrator object which will provide data to the PTQ system for INT8 Calibration
-
require_full_compilation ( bool ) – Require modules to be compiled end to end or return an error as opposed to returning a hybrid graph where operations that cannot be run in TensorRT are run in PyTorch
-
min_block_size ( int ) – The minimum number of contiguous TensorRT convertable operations in order to run a set of operations in TensorRT
-
torch_executed_ops ( List [ str ] ) – List of aten operators that must be run in PyTorch. An error will be thrown if this list is not empty but
require_full_compilation
is True -
torch_executed_modules ( List [ str ] ) – List of modules that must be run in PyTorch. An error will be thrown if this list is not empty but
require_full_compilation
is True
-
- Returns
-
Compiled TorchScript Module, when run it will execute via TensorRT
- Return type
-
torch.jit.ScriptModule
-
torch_tensorrt.ts.
convert_method_to_trt_engine
( module: torch.jit._script.ScriptModule , method_name: str , inputs=[] , device=<torch_tensorrt._Device.Device object> , disable_tf32=False , sparse_weights=False , enabled_precisions={} , refit=False , debug=False , strict_types=False , capability=<EngineCapability.default: 0> , num_min_timing_iters=2 , num_avg_timing_iters=1 , workspace_size=0 , max_batch_size=0 , truncate_long_and_double=False , calibrator=None ) → str ¶ -
Convert a TorchScript module method to a serialized TensorRT engine
Converts a specified method of a module to a serialized TensorRT engine given a dictionary of conversion settings
- Parameters
-
-
module ( torch.jit.ScriptModule ) – Source module, a result of tracing or scripting a PyTorch
torch.nn.Module
-
method_name ( str ) – Name of method to convert
-
- Keyword Arguments
-
-
inputs ( List [ Union ( torch_tensorrt.Input , torch.Tensor ) ] ) –
Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.
input=[ torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1 torch_tensorrt.Input( min_shape=(1, 224, 224, 3), opt_shape=(1, 512, 512, 3), max_shape=(1, 1024, 1024, 3), dtype=torch.int32 format=torch.channel_last ), # Dynamic input shape for input #2 torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings ]
-
device ( Union ( torch_tensorrt.Device , torch.device , dict ) ) –
Target device for TensorRT engines to run on
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
-
disable_tf32 ( bool ) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
-
sparse_weights ( bool ) – Enable sparsity for convolution and fully connected layers.
-
enabled_precision ( Set ( Union ( torch.dtype , torch_tensorrt.dtype ) ) ) – The set of datatypes that TensorRT can use when selecting kernels
-
refit ( bool ) – Enable refitting
-
debug ( bool ) – Enable debuggable engine
-
strict_types ( bool ) – Kernels should strictly run in a particular operating precision. Enabled precision should only have one type in the set
-
capability ( torch_tensorrt.EngineCapability ) – Restrict kernel selection to safe gpu kernels or safe dla kernels
-
num_min_timing_iters ( int ) – Number of minimization timing iterations used to select kernels
-
num_avg_timing_iters ( int ) – Number of averaging timing iterations used to select kernels
-
workspace_size ( int ) – Maximum size of workspace given to TensorRT
-
max_batch_size ( int ) – Maximum batch size (must be >= 1 to be set, 0 means not set)
-
truncate_long_and_double ( bool ) – Truncate weights provided in int64 or double (float64) to int32 and float32
-
calibrator ( Union ( torch_tensorrt._C.IInt8Calibrator , tensorrt.IInt8Calibrator ) ) – Calibrator object which will provide data to the PTQ system for INT8 Calibration
-
- Returns
-
Serialized TensorRT engine, can either be saved to a file or deserialized via TensorRT APIs
- Return type
-
bytes
-
torch_tensorrt.ts.
check_method_op_support
( module : torch.jit._script.ScriptModule , method_name : str ) → bool ¶ -
Checks to see if a method is fully supported by torch_tensorrt
Checks if a method of a TorchScript module can be compiled by torch_tensorrt, if not, a list of operators that are not supported are printed out and the function returns false, else true.
- Parameters
-
-
module ( torch.jit.ScriptModule ) – Source module, a result of tracing or scripting a PyTorch
torch.nn.Module
-
method_name ( str ) – Name of method to check
-
- Returns
-
True if supported Method
- Return type
-
bool
-
torch_tensorrt.ts.
embed_engine_in_new_module
( serialized_engine: bytes , device=<torch_tensorrt._Device.Device object> ) → torch.jit._script.ScriptModule ¶ -
Takes a pre-built serialized TensorRT engine and embeds it within a TorchScript module
Takes a pre-built serialied TensorRT engine (as bytes) and embeds it within a TorchScript module. Registers the forward method to execute the TensorRT engine with the function signature:
forward(Tensor[]) -> Tensor[]
Module can be save with engine embedded with torch.jit.save and moved / loaded according to torch_tensorrt portability rules
- Parameters
-
serialized_engine ( bytes ) – Serialized TensorRT engine from either torch_tensorrt or TensorRT APIs
- Keyword Arguments
-
device ( Union ( torch_tensorrt.Device , torch.device , dict ) ) – Target device to run engine on. Must be compatible with engine provided. Default: Current active device
- Returns
-
New TorchScript module with engine embedded
- Return type
-
torch.jit.ScriptModule
-
torch_tensorrt.ts.
TensorRTCompileSpec
( inputs=[] , device=<torch_tensorrt._Device.Device object> , disable_tf32=False , sparse_weights=False , enabled_precisions={} , refit=False , debug=False , strict_types=False , capability=<EngineCapability.default: 0> , num_min_timing_iters=2 , num_avg_timing_iters=1 , workspace_size=0 , max_batch_size=0 , truncate_long_and_double=False , calibrator=None ) → <torch._C.ScriptClass object at 0x7f79a6ccc928> ¶ -
Utility to create a formated spec dictionary for using the PyTorch TensorRT backend
- Keyword Arguments
-
-
inputs ( List [ Union ( torch_tensorrt.Input , torch.Tensor ) ] ) –
Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.
input=[ torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1 torch_tensorrt.Input( min_shape=(1, 224, 224, 3), opt_shape=(1, 512, 512, 3), max_shape=(1, 1024, 1024, 3), dtype=torch.int32 format=torch.channel_last ), # Dynamic input shape for input #2 torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings ]
-
device ( Union ( torch_tensorrt.Device , torch.device , dict ) ) –
Target device for TensorRT engines to run on
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
-
disable_tf32 ( bool ) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
-
sparse_weights ( bool ) – Enable sparsity for convolution and fully connected layers.
-
enabled_precision ( Set ( Union ( torch.dtype , torch_tensorrt.dtype ) ) ) – The set of datatypes that TensorRT can use when selecting kernels
-
refit ( bool ) – Enable refitting
-
debug ( bool ) – Enable debuggable engine
-
strict_types ( bool ) – Kernels should strictly run in a particular operating precision. Enabled precision should only have one type in the set
-
capability ( torch_tensorrt.EngineCapability ) – Restrict kernel selection to safe gpu kernels or safe dla kernels
-
num_min_timing_iters ( int ) – Number of minimization timing iterations used to select kernels
-
num_avg_timing_iters ( int ) – Number of averaging timing iterations used to select kernels
-
workspace_size ( int ) – Maximum size of workspace given to TensorRT
-
max_batch_size ( int ) – Maximum batch size (must be >= 1 to be set, 0 means not set)
-
truncate_long_and_double ( bool ) – Truncate weights provided in int64 or double (float64) to int32 and float32
-
calibrator – Calibrator object which will provide data to the PTQ system for INT8 Calibration
-