update TensorRT docs (#5238)

* doc updates TensorRT

* update

* update

* fix warning

* newline

* format
This commit is contained in:
George Wu 2020-09-21 15:24:20 -07:00 committed by GitHub
parent 55e4b5d302
commit 3147bc00c3
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23
2 changed files with 17 additions and 10 deletions

View file

@ -208,12 +208,12 @@ See more information on the TensorRT Execution Provider [here](./docs/execution_
#### Prerequisites
* Install [CUDA](https://developer.nvidia.com/cuda-toolkit) and [cuDNN](https://developer.nvidia.com/cudnn)
* The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 10.2 and cuDNN 7.6.5.
* The TensorRT execution provider for ONNX Runtime is built and tested with CUDA 11.0 and cuDNN 8.0.
* The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home` parameter. The CUDA path should contain `bin`, `include` and `lib` directories.
* The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found.
* The path to the cuDNN installation (path to folder that contains libcudnn.so) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home` parameter.
* Install [TensorRT](https://developer.nvidia.com/nvidia-tensorrt-download)
* The TensorRT execution provider for ONNX Runtime is built on TensorRT 7.x and is tested with TensorRT 7.0.0.11.
* The TensorRT execution provider for ONNX Runtime is built on TensorRT 7.1 and is tested with TensorRT 7.1.3.4.
* The path to TensorRT installation must be provided via the `--tensorrt_home` parameter.
#### Build Instructions

View file

@ -57,19 +57,24 @@ For performance tuning, please see guidance on this page: [ONNX Runtime Perf Tun
When/if using [onnxruntime_perf_test](../../onnxruntime/test/perftest#onnxruntime-performance-test), use the flag `-e tensorrt`
## Configuring environment variables
There are four environment variables for TensorRT execution provider.
There are several environment variables for TensorRT execution provider.
ORT_TENSORRT_MAX_WORKSPACE_SIZE: maximum workspace size for TensorRT engine.
* ORT_TENSORRT_MAX_WORKSPACE_SIZE: maximum workspace size for TensorRT engine.
ORT_TENSORRT_MAX_PARTITION_ITERATIONS: maximum number of iterations allowed in model partitioning for TensorRT. If target model can't be successfully partitioned when the maximum number of iterations is reached, the whole model will fall back to other execution providers such as CUDA or CPU.
* ORT_TENSORRT_MAX_PARTITION_ITERATIONS: maximum number of iterations allowed in model partitioning for TensorRT. If target model can't be successfully partitioned when the maximum number of iterations is reached, the whole model will fall back to other execution providers such as CUDA or CPU.
ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioning. Subgraphs with smaller size will fall back to other execution providers.
* ORT_TENSORRT_MIN_SUBGRAPH_SIZE: minimum node size in a subgraph after partitioning. Subgraphs with smaller size will fall back to other execution providers.
ORT_TENSORRT_FP16_ENABLE: Enable FP16 mode in TensorRT
* ORT_TENSORRT_FP16_ENABLE: Enable FP16 mode in TensorRT
ORT_TENSORRT_ENGINE_CACHE_ENABLE: Enable TensorRT engine caching. The purpose of using engine caching is to save engine build time in the cases that TensorRT may take long time to optimize and build engine. Engine will be cached after it's built at the first time so that next time when inference session is created the engine can be loaded directly from cache. Note each engine is created for specific settings such as precision (FP32/FP16/INT8 etc), workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engines need to be rebuilt and cached again. Also please clean up any old engine cache files (.engine) before enabling the feature for new models. Right now engine caching is only available for static shape models (subgraphs). For dynamic shape cases, since the engine is dynamically created at run-time it's hard to reuse it from previous run without knowing the profile the engine was created from. Dyanmic shape engine caching will be addressed in the future.
* ORT_TENSORRT_ENGINE_CACHE_ENABLE: Enable TensorRT engine caching. The purpose of using engine caching is to save engine build time in the cases that TensorRT may take long time to optimize and build engine. Engine will be cached after it's built at the first time so that next time when inference session is created the engine can be loaded directly from cache. Note each engine is created for specific settings such as precision (FP32/FP16/INT8 etc), workspace, profiles etc, and specific GPUs and it's not portable, so it's essential to make sure those settings are not changing, otherwise the engines need to be rebuilt and cached again.
**Warning: Please clean up any old engine cache files (.engine) if any of the following changes:**
- Model changes (if there are any changes to the model topology, opset version etc.)
- ORT version changes (i.e. moving from ORT version 1.4 to 1.5)
- TensorRT version changes (i.e. moving from TensorRT 7.0 to 7.1)
- Hardware changes. (Engine files are not portable and optimized for specific Nvidia hardware)
ORT_TENSORRT_ENGINE_CACHE_PATH: Specify path for TensorRT engine files if ORT_TENSORRT_ENGINE_CACHE_ENABLE is 1
* ORT_TENSORRT_ENGINE_CACHE_PATH: Specify path for TensorRT engine files if ORT_TENSORRT_ENGINE_CACHE_ENABLE is 1
By default TensorRT execution provider builds an ICudaEngine with max workspace size = 1 GB, max partition iterations = 1000, min subgraph size = 1, FP16 mode is disabled and TensorRT engine caching is disabled.
@ -90,6 +95,8 @@ export ORT_TENSORRT_FP16_ENABLE=1
### Enable TensorRT engine caching
export ORT_TENSORRT_ENGINE_CACHE_ENABLE=1
* Please Note warning above. This feature is experimental. Engine cache files must be invalidated if there are any changes to the model, ORT version, TensorRT version or if the
underlying hardware changes. Engine files are not portable across devices.
### Specify TensorRT engine cache path
export ORT_TENSORRT_ENGINE_CACHE_PATH="cache"
export ORT_TENSORRT_ENGINE_CACHE_PATH="/path/to/cache"