ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Find a file
Ryan Lai 285d4c85ff
Windowsai without fi (#2701)
* Disable Attention fusion tests when DISABLE_CONTRIB_OPS is defined (#2529)

* Setup java ci (#2528)

* Add provision in ORT for session options to be parsed when available via model file  (#2449)

* Initial commit

* Fix gitmodules

* Nits

* Nits

* Updates

* Update

* More changes

* Updates

* Update

* Some updates

* More changes

* Update

* Update

* Merge

* Update

* Updates

* More changes

* Update

* Fix nits

* Updates

* Fix warning

* Fix build

* Add comment

* PR feedback

* PR feedback

* Updates

* Updates

* Update

* More changes

* Fix build break

* Comment test for now

* Updates

* Updates

* PR feedback

* Updates

* Nits

* Add tests

* Fix build

* Fix build

* Fix build

* Fix build break

* Fix build

* Nits

* PR feedback

* More change

* Expose GetSessionOptions in pybind logic and add unit test for python

* Fix build

* PR feedback

* PR feedback

* Revert "Disable thread pool creation when enabled OpenMP (#2485)" (#2535)

This reverts commit 7c7d5a149c.

* Add dynamic shape support in TensorRT execution provider (#2450)

* remove onnx-tensorrt submodule

* add new onnx-tensorrt submodule (experiment) for trt6

* update engine build for trt6

* update compile and compute for tensorrt6.0

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* switch to onnx-tensorrt master for TensorRT6'

* Update tensorrt_execution_provider.cc

* Handle dynamic batch size and add memcpy in TensorRT EP

* update test cases

* Update tensorrt_execution_provider.cc

* update onnx-tensorrt submodule

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.ubuntu_tensorrt

* Update run_dockerbuild.sh

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update concat_op_test.cc

* Update tensorrt_execution_provider.cc

* Upgrade TensorRT to version 6.0.1.5

* Update onnxruntime_providers.cmake

* Update CMakeLists.txt

* Update reduction_ops_test.cc

* Update install_ubuntu.sh

* Update Dockerfile.ubuntu_tensorrt

* Update Dockerfile.tensorrt

* Update BUILD.md

* Update run_dockerbuild.sh

* Update install_ubuntu.sh

* Update onnxruntime_providers.cmake

* Update install_ubuntu.sh

* Update install_ubuntu.sh

* Update gemm_test.cc

* Update gather_op_test.cc

* Update CMakeLists.txt

* Removed submodule

* update onnx-tensorrt submodule

* update header file

* Removed submodule

* add submodule onnx-tensorrt kevin's branch shape-test'

* add debugging code

* Update tensorrt_execution_provider.cc

* Update tensorrt_execution_provider.cc

* merge master

* Removed submodule

* update onnx-tensorrt submodule

* add more changes for dynamic shapes

* Update tensorrt_execution_provider.cc

* update for dynamic shape

* update dynamic shape processing

* fix logger issue

* remove submodule onnx-tensorrt

* add submodule onnx-tensorrt

* add env variable min_subgraph_size

* remove redundency

* update document

* use onnxruntime::make_unique

* fix multi-run issue

* remove some tests to save CI build time

* Add dynamic shape test

* Update TensorRT-ExecutionProvider.md

* Add example of running Faster R-CNN model on TensorRT EP

* Add more details on env variables

* update environment variables

* Update tensorrt_basic_test.cc

* Update model tests

* Update tensor_op_test.cc

* remove --use_full_protobuf

* Update build.py

* User/xianz/telemetry (#2458)

* enabme telemetry

* enable telemetry

* set enable telemetry as default

* for debugging

* remove log and set disable telemetry as default back

* delete private file while testing

* resolve comment: mainly add license header, rename macro and update docs

* rewording in privacy.md

* Fix integer overflow in cuda NonMaxSuppression implementation (#2540)

* add test case that should pass but fail

* fix nms

* extract int_max_output_boxes_per_class

* Introduce container type runtime checks and other improvements (#2522)

Rework TensorSeq in a manner consistent with Tensor and SparseTensor
  in terms of type system setup.
  Reduce templating. Introduce helpers to ensure the same
  data type.
  Make OrtValue __dtor not virtual.
  Introduce ContainerChecker

* Fix C API tests for centos and mac (#2544)

* change c++14 to c++11

* add ld lib path for centos

* enable csharp tests on macos

* fix C API test on MacOS + fix manylinux dotnet install

* fix manylinux dotnet install

* fix lib link

* Add back executable bit to build.py

* Fix a bug handling negative begin pad values in Pad op (#2550)

* Fix bug in Pad op

* Update

* DNNL CMAKE update (#2548)

* Fix android build (#2558)

* Update win-x86-ci.yml (#2557)

Fix build pipeline break

* Re-enable Windows C# tests (#2564)

* disable onnx_test_runner -x invocations for dnnl (#2568)

* Allow sequence length to be symbolic (#2559)

* setup java ci mac (#2570)

* make layernorm fusion to support opset 11 (#2545)

* Fix a warning found in the latest VS release

* Add more check on SkipLayerNorm and BiasGelu fusion (#2574)

* Fix file not found error during docker build. (#2569)

* Add ConvTranspose1D (#2578)

* Ryanunderhill/packagename test (#2582)

* [Nuphar EP] fixes for some object detection models (#2581)

Update notebook tutorial with multi-threaded int8 GEMM from #2517

* EmbedLayerNormalization Fusion Improvement (#2553)

Embedding layer norm fusion improvements - add more checks

* Update version (#2584)

* Temporarily exclude vgg19 test from Python backend test

1. temporarily exclude vgg19 test which comsumes too much memory, run out of memory on Upsquared device. Single test pass for vgg19, need furture investigation (#2588)
2. Update docker file to decrease the docker image size

* Update docs for Android NNAPI EP (#2586)

* Fix lto bug for protobuf and ubuntu

* add path to build dir before test run (#2590)

* Add missig env variables for mac pipeline test (#2595)

* Fixed an issue in updating realized dims (#2597)

when we update realized dims for scan's output, the sliced axis also
needs to be inclusive, i.e. we should check with "dim >= insert_inclusive_axis",
because the offset in the symbols are based on Scan sugraph.
Otherwise, we would end up with shape mismatch later.

* Java API for onnxruntime (#2215)

* Add support for opset 11 in reshape fusion (#2592)

 Support opset verion 11 in reshape fusion

* Rename automl python tools folder to featurizer_ops. (#2593)

* Support opset 11 subgraph of Squad model in Embed Layer Normalization (#2605)

Support opset 11 Squad model that is exported from PyTorch nightly. The embed layer uses Range op which is missed in the transformer.

* symbolic shape inference: fix warnings in GPT-2 model (#2608)

And revise nuphar perf test on BERT squad

* Dump subgraph ID and fused graph ID (#2607)

* Dump subgraph ID and fused graph ID

Dump subgraph ID and fused graph ID for better debugging

* Remove local static fused_count

added a field global_fused_count_ to NupharExecutionProvider class

* EmbedLayerNormalization Fusion For Dynamic Squad Model Opset 10 (#2613)

Support subgraph of SQuAD model exported from pytorch with dynamic input axes

* Allow providers to be set for InferenceSession at construction (#2606)

* Remove unnecessary parameter in some places in GatherElements implementation (#2612)

* Remove unnecessary parameter in some places

* Update

* Update

* Make sure fenced tensor could not reuse other tensor. (#2561)

Fix random error caused by this.

* Improve Embed Layer Norm Fusion for SQuAD with static input shape  (#2621)

* fix float16 comparison in initializer (#2629)

* epsilon attribute for layernormalization fusion (#2639)

* removed unnecessary batch file and fix path (#2640)

* Add shape inference to ConvTransposeWithDynamicPads schema (#2632)

* Improve cuda expand() opeator's performance. (#2624)

* Cuda pad optimize when no padding is needed. (#2625)

* Shortcut cuda Pad() when no padding is needed.

* Optimize cuda scatter() on 2D compatible. (#2628)

* Optimize cuda scatter() on 2D compatible.

* Add some comments.

* fix build error for ARM (#2648)

* Improve performance of resize() in Nearest mode (#2626)

Special treatment for 2D, check same size as input image.
And in 2d kernel, template use_expolation.

* Fix memory exception in Layer Norm Fusion (#2644)

* Windows CI changes(#2650)

* Revert "User/orilevari/windowsai master merge (#2674)"

This reverts commit fe26146311.
2019-12-19 12:56:49 -08:00
.github Issue template update (#1339) 2019-07-07 23:38:52 -07:00
cmake Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
csharp User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
dockerfiles User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
docs Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
include/onnxruntime/core User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
java User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
onnxruntime Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
package/rpm User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
samples User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
tools Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
winml various changes to properly organize and skip GPU tests. For now for No DML builds we will not run GPU tests at all. In the future we should adapt the tests to expect the appropiate errors. (#2695) 2019-12-18 14:57:27 -08:00
.clang-format Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
.clang-tidy Add remaining build options and make minor changes in documentation (#39) 2018-11-27 19:59:40 -08:00
.dockerignore Allow building Docker container based on a different git repo. (#1222) 2019-06-20 09:55:42 -07:00
.gitattributes Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
.gitignore Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
.gitmodules User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
build.amd64.1411.bat Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
build.bat Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
BUILD.md User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
build.sh update 2019-01-09 15:49:27 -08:00
cgmanifest.json Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
CODEOWNERS Fix codeowners file 2018-11-27 23:42:17 -08:00
CONTRIBUTING.md Miscellaneous fixes (#123) 2018-12-06 22:21:04 -08:00
LICENSE Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
NuGet.config Add DirectML Execution Provider (#2057) 2019-10-15 06:13:07 -07:00
ort.wprp Add Tracelogging for profiling (#1639) 2019-11-11 21:34:10 -08:00
packages.config Add DirectML Execution Provider (#2057) 2019-10-15 06:13:07 -07:00
README.md User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
requirements-dev.txt Implementation of Nuphar execution provider (#881) 2019-09-01 23:01:47 -07:00
requirements-doc.txt Update readme.rst for pypi, change documentation style (#1663) 2019-10-19 18:26:34 -07:00
requirements.txt Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
setup.py Windowsai without fi (#2701) 2019-12-19 12:56:49 -08:00
ThirdPartyNotices.txt User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00
VERSION_NUMBER User/orilevari/windowsai master merge (#2674) 2019-12-17 12:23:40 -08:00

Build Status Build Status Build Status Build Status Build Status

ONNX Runtime is a performance-focused complete scoring engine for Open Neural Network Exchange (ONNX) models, with an open extensible architecture to continually address the latest developments in AI and Deep Learning. ONNX Runtime stays up to date with the ONNX standard and supports all operators from the ONNX v1.2+ spec with both forwards and backwards compatibility. Please refer to this page for ONNX opset compatibility details.

ONNX is an interoperable format for machine learning models supported by various ML and DNN frameworks and tools. The universal format makes it easier to interoperate between frameworks and maximize the reach of hardware optimization investments.


Key Features

Samples and Tutorials

Setup

Usage

More Info

Data/Telemetry

Contributions and Feedback

License


Key Features

Run any ONNX model

ONNX Runtime provides comprehensive support of the ONNX spec and can be used to run all models based on ONNX v1.2.1 and higher. See version compatibility details here.

Traditional ML support

In addition to DNN models, ONNX Runtime fully supports the ONNX-ML profile of the ONNX spec for traditional ML scenarios.

For the full set of operators and types supported, please see operator documentation

Note: Some operators not supported in the current ONNX version may be available as a Contrib Operator

High Performance

ONNX Runtime supports both CPU and GPU. Using various graph optimizations and accelerators, ONNX Runtime can provide lower latency compared to other runtimes for faster end-to-end customer experiences and minimized machine utilization costs.

Currently ONNX Runtime supports the following accelerators:

Not all variations are supported in the official release builds, but can be built from source following these instructions.

We are continuously working to integrate new execution providers for further improvements in latency and efficiency. If you are interested in contributing a new execution provider, please see this page.

Cross Platform

ONNX Runtime is currently available for Linux, Windows, and Mac with Python, C#, C++, and C APIs. Please see API documentation and package installation.

If you have specific scenarios that are not supported, please share your suggestions and scenario details via Github Issues.


Installation

Quick Start: The ONNX-Ecosystem Docker container image is available on Dockerhub and includes ONNX Runtime (CPU, Python), dependencies, tools to convert from various frameworks, and Jupyter notebooks to help get started.

Additional dockerfiles can be found here.

APIs and Official Builds

API Documentation

Official Builds

CPU (MLAS+Eigen) CPU (MKL-ML) GPU (CUDA)
Python pypi: onnxruntime

Windows (x64)
Linux (x64)
Mac OS X (x64)
-- pypi: onnxruntime-gpu

Windows (x64)
Linux (x64)
C# Nuget: Microsoft.ML.OnnxRuntime

Windows (x64, x86)
Linux (x64, x86)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.MKLML

Windows (x64)
Linux (x64)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.Gpu

Windows (x64)
Linux (x64)
C/C++ wrapper Nuget: Microsoft.ML.OnnxRuntime

.zip, .tgz

Windows (x64, x86)
Linux (x64, x86)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.MKLML

Windows (x64)
Linux (x64)
Mac OS X (x64)
Nuget: Microsoft.ML.OnnxRuntime.Gpu

.zip, .tgz

Windows (x64)
Linux (x64)

System Requirements (pre-requisite dependencies)

  • ONNX Runtime binaries in the CPU packages use OpenMP and depend on the library being available at runtime in the system.
    • For Windows, OpenMP support comes as part of VC runtime. It is also available as redist packages: vc_redist.x64.exe and vc_redist.x86.exe
    • For Linux, the system must have libgomp.so.1 which can be installed using apt-get install libgomp1.
  • GPU builds require CUDA runtime libraries being installed on the system:
    • Version: CUDA 10.0 and cuDNN 7.6
    • Older ONNX Runtime releases: used CUDA 9.1 and cuDNN 7.1 - please refer to prior release notes for more details.
  • Python binaries are compatible with Python 3.5-3.7. See Python Dev Notes. If using pip to be download the Python binaries, run pip install --upgrade pip prior to downloading.
  • The Java API is compatible with Java 8-13.
  • Certain operators makes use of system locales. Installation of the English language package and configuring en_US.UTF-8 locale is required.
    • For Ubuntu install language-pack-en package
    • Run the following commands: locale-gen en_US.UTF-8 update-locale LANG=en_US.UTF-8
    • Follow similar procedure to configure other locales on other platforms.

Building from Source

If additional build flavors and/or dockerfiles are needed, please find instructions at Build ONNX Runtime. For production scenarios, it's strongly recommended to build only from an official release branch.


Usage

Getting ONNX Models

Deploying ONNX Runtime

Cloud

ONNX Runtime can be deployed to the cloud for model inferencing using Azure Machine Learning Services. See detailed instructions and sample notebooks.

ONNX Runtime Server (beta) is a hosted application for serving ONNX models using ONNX Runtime, providing a REST API for prediction. Usage details can be found here, and image installation instructions are here.

IoT and edge devices

The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge.

This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inferencing on the target device is crucial for optimal assistance.

To deploy AI workloads to these edge devices and take advantage of hardware acceleration capabilities on the target device, see these reference implementations.

Local applications

ONNX Runtime packages are published to PyPi and Nuget (see Official Builds and/or can be built from source for local application development. Find samples here using the C++ API.

On newer Windows 10 devices (1809+), ONNX Runtime is available by default as part of the OS and is accessible via the Windows Machine Learning APIs. Find tutorials here for building a Windows Desktop or UWP application using WinML.

Performance Tuning

ONNX Runtime is open and extensible, supporting a broad set of configurations and execution providers for model acceleration. For performance tuning guidance, please see this page.

To tune performance for ONNX models, the ONNX Go Live tool "OLive" provides an easy-to-use pipeline for converting models to ONNX and optimizing performance for inferencing with ONNX Runtime.


Technical Design Details

Extensibility Options


Data/Telemetry

This project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.


Contribute

We welcome contributions! Please see the contribution guidelines.

Feedback

For any feedback or to report a bug, please file a GitHub Issue.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.


License

MIT License