ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Find a file
stevenlix 544e53e24e Update TensorRT to version 6.0.1.5 (#1966)
* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Add Ubuntu18.04 build option

* Remove redundency

* Fix issue that it does not add memcopy node correctly if some nodes fall back to CUDA EP.
e.g. after partition, there's TRT_Node -> Cuda_node (with CPU memory expected), we still need to add memcpy node between them.

* update for Trt Windows build

* Update onnxruntime_providers.cmake

* Disable opset11 tests on TensorRT

* Update pad_test.cc

* Update build.py

* update scripts for ubuntu18.04

* Disable warning for Windows build
2019-10-06 10:40:53 -07:00
.github Issue template update (#1339) 2019-07-07 23:38:52 -07:00
cmake Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
csharp added the overridableinitializers api (#1977) 2019-10-04 16:38:00 -07:00
dockerfiles Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
docs Add nuphar python scripts to wheel, and notebook tutorial (#1952) 2019-09-30 10:39:02 -07:00
include/onnxruntime/core Add Unique operator. (#1900) 2019-10-04 22:11:55 +10:00
onnxruntime Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
package/rpm Upgrade version number for ORT in preparation for release (#1468) 2019-07-23 16:33:06 -07:00
samples/c_cxx Fix broken link to mnist model (#1896) 2019-09-24 10:27:18 -07:00
tools Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
.clang-format Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
.clang-tidy Add remaining build options and make minor changes in documentation (#39) 2018-11-27 19:59:40 -08:00
.dockerignore Allow building Docker container based on a different git repo. (#1222) 2019-06-20 09:55:42 -07:00
.gitattributes Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
.gitignore Add an HTTP server for hosting of ONNX models (#806) 2019-04-30 18:21:23 -07:00
.gitmodules Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
build.amd64.1411.bat Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
build.bat Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
BUILD.md Update TensorRT to version 6.0.1.5 (#1966) 2019-10-06 10:40:53 -07:00
build.sh update 2019-01-09 15:49:27 -08:00
cgmanifest.json Update ONNX to a newer commit (#2015) 2019-10-04 19:41:00 -07:00
CODEOWNERS Fix codeowners file 2018-11-27 23:42:17 -08:00
CONTRIBUTING.md Miscellaneous fixes (#123) 2018-12-06 22:21:04 -08:00
LICENSE Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
README.md Add OLive in perf tuning section (#1772) 2019-09-27 13:10:40 -07:00
requirements-dev.txt Implementation of Nuphar execution provider (#881) 2019-09-01 23:01:47 -07:00
requirements-doc.txt Update the documentation, run all examples during the generation of the documentation (replace #89) (#103) 2018-12-05 10:12:25 -08:00
requirements.txt Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
setup.py fix python setup (#2022) 2019-10-05 09:46:41 -07:00
ThirdPartyNotices.txt add dependency 'cub' as submodule (#1924) 2019-09-26 16:10:39 +08:00
VERSION_NUMBER Upgrade version number for ORT in preparation for release (#1468) 2019-07-23 16:33:06 -07:00

Build Status Build Status Build Status Build Status Build Status

ONNX Runtime is a performance-focused complete scoring engine for Open Neural Network Exchange (ONNX) models, with an open extensible architecture to continually address the latest developments in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard with complete implementation of all ONNX operators, and supports all ONNX releases (1.2+) with both future and backwards compatibility. Please refer to this page for ONNX opset compatibility details.

ONNX is an interoperable format for machine learning models supported by various ML and DNN frameworks and tools. The universal format makes it easier to interoperate between frameworks and maximize the reach of hardware optimization investments.


Key Features

Setup

Usage

Examples and Tutorials

More Info

Contributions and Feedback

License


Key Features

Run any ONNX model

ONNX Runtime provides comprehensive support of the ONNX spec and can be used to run all models based on ONNX v1.2.1 and higher. See version compatibility details here.

Traditional ML support

In addition to DNN models, ONNX Runtime fully supports the ONNX-ML profile of the ONNX spec for traditional ML scenarios.

For the full set of operators and types supported, please see operator documentation

Note: Some operators not supported in the current ONNX version may be available as a Contrib Operator

High Performance

ONNX Runtime supports both CPU and GPU. Using various graph optimizations and accelerators, ONNX Runtime can provide lower latency compared to other runtimes for faster end-to-end customer experiences and minimized machine utilization costs.

Currently ONNX Runtime supports the following accelerators:

Not all variations are supported in the official release builds, but can be built from source following these instructions. Find Dockerfiles here.

We are continuously working to integrate new execution providers for further improvements in latency and efficiency. If you are interested in contributing a new execution provider, please see this page.

Cross Platform

API documentation and package installation

ONNX Runtime is currently available for Linux, Windows, and Mac with Python, C#, C++, and C APIs. If you have specific scenarios that are not supported, please share your suggestions and scenario details via Github Issues.


Installation

Quick Start: The ONNX-Ecosystem Docker container image is available on Dockerhub and includes ONNX Runtime (CPU, Python), dependencies, tools to convert from various frameworks, and Jupyter notebooks to help get started.

Additional dockerfiles for some features can be found here.

APIs and Official Builds

API Documentation

Official Builds

CPU (MLAS+Eigen) CPU (MKL-ML) GPU (CUDA)
Python pypi: onnxruntime

Windows (x64)
Linux (x64)
Mac OS X (x64)
-- pypi: onnxruntime-gpu

Windows (x64)
Linux (x64)
C# Nuget: Microsoft.ML.OnnxRuntime

Windows (x64, x86)
Linux (x64, x86)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.MKLML

Windows (x64)
Linux (x64)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.Gpu

Windows (x64)
Linux (x64)
C/C++ wrapper Nuget: Microsoft.ML.OnnxRuntime

.zip, .tgz

Windows (x64, x86)
Linux (x64, x86)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.MKLML

Windows (x64)
Linux (x64)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.Gpu

.zip, .tgz

Windows (x64)
Linux (x64)

System Requirements (pre-requisite dependencies)

  • ONNX Runtime binaries in the CPU packages use OpenMP and depend on the library being available at runtime in the system.
    • For Windows, OpenMP support comes as part of VC runtime. It is also available as redist packages: vc_redist.x64.exe and vc_redist.x86.exe
    • For Linux, the system must have libgomp.so.1 which can be installed using apt-get install libgomp1.
  • GPU builds require CUDA runtime libraries being installed on the system:
    • Version: CUDA 10.0 and cuDNN 7.3
    • Linux Python packages require CUDA 10.1 and cuDNN 7.6
    • Older ONNX Runtime releases: used CUDA 9.1 and cuDNN 7.1 - please refer to prior release notes for more details.
  • Python binaries are compatible with Python 3.5-3.7. See Python Dev Notes. If using pip to be download the Python binaries, run pip install --upgrade pip prior to downloading.
  • Certain operators makes use of system locales. Installation of the English language package and configuring en_US.UTF-8 locale is required.
    • For Ubuntu install language-pack-en package
    • Run the following commands: locale-gen en_US.UTF-8 update-locale LANG=en_US.UTF-8
    • Follow similar procedure to configure other locales on other platforms.

Building from Source

If additional build flavors are needed, please find instructions on building from source at Build ONNX Runtime. For production scenarios, it's strongly recommended to build from an official release branch.

Dockerfiles are available here to help you get started.


Usage

Getting ONNX Models

Deploying ONNX Runtime

ONNX Runtime can be deployed to the cloud for model inferencing using Azure Machine Learning Services. See detailed instructions and sample notebooks.

ONNX Runtime Server (beta) is a hosted application for serving ONNX models using ONNX Runtime, providing a REST API for prediction. Usage details can be found here, and image installation instructions are here.

Performance Tuning

ONNX Runtime is open and extensible, supporting a broad set of configurations and execution providers for model acceleration. For performance tuning guidance, please see this page.

To tune performance for ONNX models, the ONNX Go Live tool "OLive" provides an easy-to-use pipeline for converting models to ONNX and optimizing performance for inferencing with ONNX Runtime.


Examples and Tutorials

Python

Inference only

Inference with model conversion

Inference and deploy through AzureML

Inference and Deploy wtih Azure IoT Edge

Other

C#

C/C++


Technical Design Details

Extensibility Options


Contribute

We welcome contributions! Please see the contribution guidelines.

Feedback

For any feedback or to report a bug, please file a GitHub Issue.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.


License

MIT License