onnxruntime/docs/execution_providers/TensorRT-ExecutionProvider.md

## TensortRT Execution Provider

The TensorRT execution provider in the ONNX Runtime makes use of NVIDIA's [TensortRT](https://developer.nvidia.com/tensorrt) Deep Learning inferencing engine to accelerate ONNX model in their family of GPUs. Microsoft and NVIDIA worked closely to integrate the TensorRT execution provider with ONNX Runtime.

With the TensorRT execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration. 

### Build TensorRT execution provider
Developers can now tap into the power of TensorRT through ONNX Runtime to accelerate inferencing of ONNX models. Instructions to build the TensorRT execution provider from source are available [here](https://github.com/Microsoft/onnxruntime/blob/master/BUILD.md#build). [Dockerfiles](https://github.com/microsoft/onnxruntime/tree/master/dockerfiles#tensorrt-version-preview) are available for convenience.

### Using the TensorRT execution provider
#### C/C++
The TensortRT execution provider needs to be registered with ONNX Runtime to enable in the inference session. 
```
InferenceSession session_object{so};
session_object.RegisterExecutionProvider(std::make_unique<::onnxruntime::TensorrtExecutionProvider>());
status = session_object.Load(model_file_name);
```
The C API details are [here](https://github.com/Microsoft/onnxruntime/blob/master/docs/C_API.md#c-api).

### Python
When using the Python wheel from the ONNX Runtime build with TensorRT execution provider, it will be automatically prioritized over the default GPU or CPU execution providers. There is no need to separately register the execution provider. Python APIs details are [here](https://microsoft.github.io/onnxruntime/api_summary.html).

### Performance Tuning
To test the performance of your ONNX Model with the TensorRT execution provider, use the flag `-e tensorrt` in [onnxruntime_perf_test](https://github.com/Microsoft/onnxruntime/tree/master/onnxruntime/test/perftest#onnxruntime-performance-test).

### Sample
Please see [this Notebook](https://github.com/microsoft/onnxruntime/blob/master/docs/python/notebooks/onnx-inference-byoc-gpu-cpu-aks.ipynb) for an example of running a model on GPU using ONNX Runtime through Azure Machine Learning Services.

### Using onnxruntime_perf_test
You can test the performance for your ONNX Model with the TensorRT execution provider. Use the flag `-e tensorrt` in [onnxruntime_perf_test](https://github.com/Microsoft/onnxruntime/tree/master/onnxruntime/test/perftest#onnxruntime-performance-test).

### Configuring Engine Max Batch Size and Workspace Size
By default TensorRT execution provider builds an ICudaEngine with max batch size = 1 and max workspace size = 1 GB
One can override these defaults by setting environment variables ORT_TENSORRT_MAX_BATCH_SIZE and ORT_TENSORRT_MAX_WORKSPACE_SIZE.
e.g. on Linux
#### override default batch size to 10
export ORT_TENSORRT_MAX_BATCH_SIZE=10
#### override default max workspace size to 2GB
export ORT_TENSORRT_MAX_WORKSPACE_SIZE=2147483648
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00			`## TensortRT Execution Provider`
Build details for TensorRT execution provider. (#632) * Fixed typos in docs for 'onnx_test_runner' * TensorRT Execution Provider (preview) release Updated build instructions and component governence and third party notices for TensorRT execution provider release. * test runner option for tensorrt updated to add option for tensorrt. * Introduction to TensorRT Execution Provider Intro README for TensorRT Execution Provider. * Update BUILD.md * Update TensorRT-ExecutionProvicer.md * corrected typo in the filename * corrected typos * updated with corrections. * removed conflicting edits. * Update BUILD.md 2019-03-15 18:29:00 +00:00
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00			`The TensorRT execution provider in the ONNX Runtime makes use of NVIDIA's [TensortRT](https://developer.nvidia.com/tensorrt) Deep Learning inferencing engine to accelerate ONNX model in their family of GPUs. Microsoft and NVIDIA worked closely to integrate the TensorRT execution provider with ONNX Runtime.`
Build details for TensorRT execution provider. (#632) * Fixed typos in docs for 'onnx_test_runner' * TensorRT Execution Provider (preview) release Updated build instructions and component governence and third party notices for TensorRT execution provider release. * test runner option for tensorrt updated to add option for tensorrt. * Introduction to TensorRT Execution Provider Intro README for TensorRT Execution Provider. * Update BUILD.md * Update TensorRT-ExecutionProvicer.md * corrected typo in the filename * corrected typos * updated with corrections. * removed conflicting edits. * Update BUILD.md 2019-03-15 18:29:00 +00:00
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00			`With the TensorRT execution provider, the ONNX Runtime delivers better inferencing performance on the same hardware compared to generic GPU acceleration.`
Build details for TensorRT execution provider. (#632) * Fixed typos in docs for 'onnx_test_runner' * TensorRT Execution Provider (preview) release Updated build instructions and component governence and third party notices for TensorRT execution provider release. * test runner option for tensorrt updated to add option for tensorrt. * Introduction to TensorRT Execution Provider Intro README for TensorRT Execution Provider. * Update BUILD.md * Update TensorRT-ExecutionProvicer.md * corrected typo in the filename * corrected typos * updated with corrections. * removed conflicting edits. * Update BUILD.md 2019-03-15 18:29:00 +00:00
			`### Build TensorRT execution provider`
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00			`Developers can now tap into the power of TensorRT through ONNX Runtime to accelerate inferencing of ONNX models. Instructions to build the TensorRT execution provider from source are available [here](https://github.com/Microsoft/onnxruntime/blob/master/BUILD.md#build). [Dockerfiles](https://github.com/microsoft/onnxruntime/tree/master/dockerfiles#tensorrt-version-preview) are available for convenience.`
Build details for TensorRT execution provider. (#632) * Fixed typos in docs for 'onnx_test_runner' * TensorRT Execution Provider (preview) release Updated build instructions and component governence and third party notices for TensorRT execution provider release. * test runner option for tensorrt updated to add option for tensorrt. * Introduction to TensorRT Execution Provider Intro README for TensorRT Execution Provider. * Update BUILD.md * Update TensorRT-ExecutionProvicer.md * corrected typo in the filename * corrected typos * updated with corrections. * removed conflicting edits. * Update BUILD.md 2019-03-15 18:29:00 +00:00
			`### Using the TensorRT execution provider`
			`#### C/C++`
			`The TensortRT execution provider needs to be registered with ONNX Runtime to enable in the inference session.`
			```
			`InferenceSession session_object{so};`
			`session_object.RegisterExecutionProvider(std::make_unique<::onnxruntime::TensorrtExecutionProvider>());`
			`status = session_object.Load(model_file_name);`
			```
			`The C API details are [here](https://github.com/Microsoft/onnxruntime/blob/master/docs/C_API.md#c-api).`

			`### Python`
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00			`When using the Python wheel from the ONNX Runtime build with TensorRT execution provider, it will be automatically prioritized over the default GPU or CPU execution providers. There is no need to separately register the execution provider. Python APIs details are [here](https://microsoft.github.io/onnxruntime/api_summary.html).`

			`### Performance Tuning`
			To test the performance of your ONNX Model with the TensorRT execution provider, use the flag `-e tensorrt` in [onnxruntime_perf_test](https://github.com/Microsoft/onnxruntime/tree/master/onnxruntime/test/perftest#onnxruntime-performance-test).

			`### Sample`
			`Please see [this Notebook](https://github.com/microsoft/onnxruntime/blob/master/docs/python/notebooks/onnx-inference-byoc-gpu-cpu-aks.ipynb) for an example of running a model on GPU using ONNX Runtime through Azure Machine Learning Services.`
Build details for TensorRT execution provider. (#632) * Fixed typos in docs for 'onnx_test_runner' * TensorRT Execution Provider (preview) release Updated build instructions and component governence and third party notices for TensorRT execution provider release. * test runner option for tensorrt updated to add option for tensorrt. * Introduction to TensorRT Execution Provider Intro README for TensorRT Execution Provider. * Update BUILD.md * Update TensorRT-ExecutionProvicer.md * corrected typo in the filename * corrected typos * updated with corrections. * removed conflicting edits. * Update BUILD.md 2019-03-15 18:29:00 +00:00
			`### Using onnxruntime_perf_test`
			You can test the performance for your ONNX Model with the TensorRT execution provider. Use the flag `-e tensorrt` in [onnxruntime_perf_test](https://github.com/Microsoft/onnxruntime/tree/master/onnxruntime/test/perftest#onnxruntime-performance-test).
Fix trtlogger segfault. re-enable SoftPlus unit test for TRT. add doc… (#1623) * Fix trtlogger segfault. re-enable SoftPlus unit test for TRT. add documentation for ORT_TENSORRT* env vars. * Update TensorRT-ExecutionProvider.md 2019-08-14 23:34:39 +00:00
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00			`### Configuring Engine Max Batch Size and Workspace Size`
Fix trtlogger segfault. re-enable SoftPlus unit test for TRT. add doc… (#1623) * Fix trtlogger segfault. re-enable SoftPlus unit test for TRT. add documentation for ORT_TENSORRT* env vars. * Update TensorRT-ExecutionProvider.md 2019-08-14 23:34:39 +00:00			`By default TensorRT execution provider builds an ICudaEngine with max batch size = 1 and max workspace size = 1 GB`
			`One can override these defaults by setting environment variables ORT_TENSORRT_MAX_BATCH_SIZE and ORT_TENSORRT_MAX_WORKSPACE_SIZE.`
			`e.g. on Linux`
			`#### override default batch size to 10`
			`export ORT_TENSORRT_MAX_BATCH_SIZE=10`
			`#### override default max workspace size to 2GB`
			`export ORT_TENSORRT_MAX_WORKSPACE_SIZE=2147483648`
Doc updates (#1522) * Updates * Remove preview texts * Update README.md * Updates * Update README.md * Update README.md * Minor wording update * Update README.md * Update doc on CUDA version * revert update * Update readme for issue #1558 * Clean up example section * Cosmetic updates - Add a index of build instructions for browsability - Update build CUDA version from 9.1 to 10 * Fix broken link * Update README to reflect upgrade to pip requirement * Update CuDNN version for Linux Python packages * Clean up content Updated ordering and add table of contents * Minor format fixes * Move Android NNAPI under EP section * Add link to operator support documentation * Fix typo * typo fix * remove todo section 2019-08-28 04:31:19 +00:00