onnxruntime/docs/execution_providers/TensorRT-ExecutionProvider.md
stevenlix 293b15480b Add dynamic shape support in TensorRT execution provider (#2450)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* update header file

* Removed submodule

* add submodule onnx-tensorrt kevin's branch shape-test'

* add debugging code

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* merge master

* Removed submodule

* update onnx-tensorrt submodule

* add more changes for dynamic shapes

* Update tensorrt_execution_provider.cc

* update for dynamic shape

* update dynamic shape processing

* fix logger issue

* remove submodule onnx-tensorrt

* add submodule onnx-tensorrt

* add env variable min_subgraph_size

* remove redundency

* update document

* use onnxruntime::make_unique

* fix multi-run issue

* remove some tests to save CI build time

* Add dynamic shape test

* Update TensorRT-ExecutionProvider.md

* Add example of running Faster R-CNN model on TensorRT EP

* Add more details on env variables

* update environment variables

* Update tensorrt_basic_test.cc

* Update model tests

* Update tensor_op_test.cc

* remove --use_full_protobuf

* Update build.py
2019-12-03 23:18:33 -08:00

3.8 KiB

TensortRT Execution Provider

The TensorRT execution provider in the ONNX Runtime makes use of NVIDIA's TensortRT Deep Learning inferencing engine to accelerate ONNX model in their family of GPUs. Microsoft and NVIDIA worked closely to integrate the TensorRT execution provider with ONNX Runtime.

With the TensorRT execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration.

Build

For build instructions, please see the BUILD page.

The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 6.0.1.5.

Using the TensorRT execution provider

C/C++

The TensorRT execution provider needs to be registered with ONNX Runtime to enable in the inference session.

InferenceSession session_object{so};
session_object.RegisterExecutionProvider(std::make_unique<::onnxruntime::TensorrtExecutionProvider>());
status = session_object.Load(model_file_name);

The C API details are here.

Sample

To run Faster R-CNN model on TensorRT execution provider,

First, download Faster R-CNN onnx model from onnx model zoo here.

Second, infer shapes in the model by running shape inference script here,

python symbolic_shape_infer.py --input /path/to/onnx/model/model.onnx --output /path/to/onnx/model/new_model.onnx --auto_merge

Third, replace original model with the new model and run onnx_test_runner tool under ONNX Runtime build directory,

./onnx_test_runner -e tensorrt /path/to/onnx/model/

Python

When using the Python wheel from the ONNX Runtime build with TensorRT execution provider, it will be automatically prioritized over the default GPU or CPU execution providers. There is no need to separately register the execution provider. Python APIs details are .

Sample

Please see this Notebook for an example of running a model on GPU using ONNX Runtime through Azure Machine Learning Services.

Performance Tuning

For performance tuning, please see guidance on this page: ONNX Runtime Perf Tuning

When/if using onnxruntime_perf_test, use the flag -e tensorrt

Configuring environment variables

There are three environment variables for TensorRT execution provider.

ORT_TENSORRT_MAX_WORKSPACE_SIZE: maximum workspace size for TensorRT engine.

ORT_TENSORRT_MAX_PARTITION_ITERATIONS: maximum number of iterations allowed in model partitioning for TensorRT. If target model can't be successfully partitioned when the maximum number of iterations is reached, the whole model will fall back to other execution providers such as CUDA or CPU.

ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioning. Subgraphs with smaller size will fall back to other execution providers.

By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000 and min subgraph size = 1.

One can override these defaults by setting environment variables ORT_TENSORRT_MAX_WORKSPACE_SIZE, ORT_TENSORRT_MAX_PARTITION_ITERATIONS and ORT_TENSORRT_MIN_SUBGRAPH_SIZE. e.g. on Linux

override default max workspace size to 2GB

export ORT_TENSORRT_MAX_WORKSPACE_SIZE=2147483648

override default maximum number of iterations to 10

export ORT_TENSORRT_MAX_PARTITION_ITERATIONS=10

override default minimum subgraph node size to 5

export ORT_TENSORRT_MIN_SUBGRAPH_SIZE=5