ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Find a file
Thiago Crepaldi 42408aa3ed
Add new PytTrch front-end (#4815)
* Add ORTTrainerOptions class for the new pytorch frontend (#4382)

Add ORTTrainerOptions class and some placeholders

* Add _ORTTrainerModelDesc to perform validation for model description (#4416)

* Add Loss Scaler classes to the new frontend (#4306)

* Add TrainStepInfo used on the new frontend API (#4256)

* Add Optimizer classes to the new frontend (#4280)

* Add LRScheduler implementation (#4357)

* Add basic ORTTrainer API (#4435)

This PR presents the public API for ORTTrainer for the short term
development.

It also validates and saves input parameters, which will be used in the
next stages, such as building ONNX model, post processing the model and
configuring the training session

* Add opset_version into ORTTrainerOptions and change type of ORTTrainer.loss_fn (#4592)

* Update ModelDescription and minor fix on ORTTrainer ctor (#4605)

* Update ModelDescription and minor fix on ORTTrainer/ORTTrainerOptions

This PR keeps the public API intact, but changes how model description is stored on the backend

Currently, users creates a dict with two lists of tuples.
One list called 'inputs' and each tuple has the following format tuple(name, shape).
The second list is called 'outputs' and each tuple can be either tuple(name, shape) or tuple(name, shape, is_loss).

With this PR, when this dict is passed in to ORTTrainer, it is fully validated as usual.
However, tuples are internally replaced by namedtuples and all output tuples will have
tuple(name, shape, is_loss) format instead of is_loss being optionally present.

Additionally to that normalization in the internal representation (which eases coding),
two internal methods were created to replace a namedtuple(name, shape) to namedtuple(name, shape, dtype)
or namedtuple(name, shape, is_loss, dtype) dependeing whether the tuple is an input or output.

This is necessary as ORTTRainer finds out data types of each input/output during model export to onnx.

Finally, a minor fix was done on ORTTrainer. It could initialize ORTTrainerOptions incorrectly when options=None

* Rename input name for test

* Add ONNX Model Export to New Frontend (#4612)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Create training session + minor improvements (#4668)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Save ONNX model in file (#4671)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add eval step (#4674)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add train_step (#4677)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add LR Scheduler (#4694)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add deterministic compute tests (#4716)


Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add legacy vs experimental ORTTrainer accuracy comparison (#4727)

Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

* Add Mixed precision/LossScaler + several fixes (#4739)

Additionally to the mixed precision/loss scaler code, this PR includes:

* Fix CUDA training
* Add optimization_step into TrainStepInfo class
* Refactor LRSCheduler to use optimization_step instead of step
* Updated several default values at ORTTrainerOptions
* Add initial Gradient Accumulation supported. Untested
* Fix ONNX model post processing
* Refactor unit tests

* Add ONNX BERT example + minor fixes (#4757)

* Fix training issue when passing ONNX file into ORTTrainer

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>

* Add Dynamic Shape support (#4758)

* Update DeepSpeed Zero Stage option to a separate option group (#4772)

* Add support to fetches (#4777)

* Add Gradient Accumulation Steps support (#4793)

* Fix Dynamic Axes feature and add unit test (#4795)

* Add frozen weights test (#4807)

* Move new pytorch front-end to 'experimental' namespace (#4814)

* Fix build

Co-authored-by: Rayan-Krishnan <rayankrishnan@live.com>
Co-authored-by: Rayan Krishnan <t-rakr@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
2020-08-17 09:45:25 -07:00
.github Add stale bot (#4323) 2020-06-30 01:51:09 -07:00
cgmanifests Refactor manylinux docker image and the related pipelines (#4751) 2020-08-17 09:40:31 -07:00
cmake Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
csharp Refactor manylinux docker image and the related pipelines (#4751) 2020-08-17 09:40:31 -07:00
dockerfiles Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
docs Sahar/csharp support openvino (#4703) 2020-08-16 17:07:26 -07:00
include/onnxruntime/core Fix bug in DispatchOnTensorType macro (#4808) 2020-08-17 01:16:01 -07:00
java Java API: Documentation cleanup (#4395) 2020-08-13 12:06:42 -07:00
nodejs Bump lodash from 4.17.15 to 4.17.19 in /nodejs 2020-07-20 14:24:21 -07:00
onnxruntime Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
orttraining Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
package/rpm Bump version to 1.4.0 (#4496) 2020-07-13 17:09:18 -07:00
samples Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
server [Android NNAPI EP] Remove dependency on external JD/DNNLibrary (#4576) 2020-07-22 14:08:12 -07:00
tools Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
winml add telemetry for named dimension overrides (#4794) 2020-08-16 17:09:55 -07:00
.clang-format Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
.clang-tidy Add remaining build options and make minor changes in documentation (#39) 2018-11-27 19:59:40 -08:00
.dockerignore Allow building Docker container based on a different git repo. (#1222) 2019-06-20 09:55:42 -07:00
.flake8 Re-enable PEP8 check in Win CI build (#4075) 2020-05-30 09:10:05 +10:00
.gitattributes Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
.gitignore dashboard integration - output training perf metrics as json (#3809) 2020-05-10 10:29:38 -07:00
.gitmodules Fix bugs in TensorRT (#4780) 2020-08-13 16:09:27 -07:00
build.amd64.1411.bat Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
build.bat Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
BUILD.md Refactor manylinux docker image and the related pipelines (#4751) 2020-08-17 09:40:31 -07:00
build.sh remove --use_openmp in build.sh 2020-05-25 14:17:48 -07:00
CODEOWNERS Fix codeowners file 2018-11-27 23:42:17 -08:00
CONTRIBUTING.md fix relative links in CONTRIBUTING.md (#4212) 2020-06-15 06:48:09 -07:00
LICENSE Initial bootstrap commit. 2018-11-19 16:48:22 -08:00
NuGet.config Add DirectML Execution Provider (#2057) 2019-10-15 06:13:07 -07:00
ort.wprp Add Tracelogging for profiling (#1639) 2019-11-11 21:34:10 -08:00
packages.config Update DML Nuget version and DML EP Doc (#3945) 2020-05-14 17:33:46 -07:00
README.md enable build flag '--use_openmp' on MacOS (#4774) 2020-08-13 15:56:42 -07:00
requirements-dev.txt Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
requirements-doc.txt Update readme.rst for pypi, change documentation style (#1663) 2019-10-19 18:26:34 -07:00
requirements.txt Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
setup.py Add new PytTrch front-end (#4815) 2020-08-17 09:45:25 -07:00
ThirdPartyNotices.txt Install AzureML support and commonly used packages in the training image. (#4790) 2020-08-13 16:48:48 -07:00
VERSION_NUMBER Bump version to 1.4.0 (#4496) 2020-07-13 17:09:18 -07:00

Build Status Build Status Build Status Build Status Build Status

ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more. aka.ms/onnxruntime

Many users can benefit from ONNX Runtime, including those looking to:

  • Improve inference performance for a wide variety of ML models
  • Reduce time and cost of training large models
  • Train in Python but deploy into a C#/C++/Java app
  • Run on different hardware and operating systems
  • Support models created in several different frameworks

ONNX Runtime inferencing APIs are stable and production-ready since the 1.0 release in October 2019 and can enable faster customer experiences and lower costs.

ONNX Runtime training feature was introduced in May 2020 in preview. This feature supports acceleration of PyTorch training on multi-node NVIDIA GPUs for transformer models. Additional updates for this feature are coming soon.


Table of Contents


Get Started

Frequently Asked Questions

Inferencing: Start

To use ONNX Runtime, refer to the table on aka.ms/onnxruntime for instructions for different build combinations.

Compatibility

Supporting models based on the standard ONNX format, the runtime is compatible with PyTorch, scikit-learn, TensorFlow, Keras, and all other frameworks and tools that support the interoperable format.

ONNX Runtime is up to date and backwards compatible with all operators (both DNN and traditional ML) since ONNX v1.2.1+. (ONNX compatibility details). Newer versions of ONNX Runtime support all models that worked with prior versions, so updates should not break integrations.

Binaries

Official builds are available on PyPi (Python), Nuget (C#/C/C++), Maven Central (Java), and npm (node.js).

  • Default CPU Provider (Eigen + MLAS)
  • GPU Provider - NVIDIA CUDA
  • GPU Provider - DirectML (Windows)

Dev builds created from the master branch are available for testing newer changes between official releases. Please use these at your own risk. We strongly advise against deploying these to production workloads as support is limited for dev builds.

Repository Details
Pypi (Python) If using pip, run pip install --upgrade pip prior to downloading.
CPU: onnxruntime / ort-nightly (dev)
GPU: onnxruntime-gpu / ort-gpu-nightly (dev)
Nuget (C#/C/C++) CPU: Microsoft.ML.OnnxRuntime / ort-nightly (dev)
GPU: Microsoft.ML.OnnxRuntime.Gpu / ort-nightly (dev)
Maven Central (Java) CPU: com.microsoft.onnxruntime/onnxruntime
GPU: com.microsoft.onnxruntime/onnxruntime_gpu
npm (node.js) CPU: onnxruntime
Other Contributed non-official packages (including Homebrew, Linuxbrew, and nixpkgs)
These are not maintained by the core ONNX Runtime team and may have limited support; use at your discretion.

System Requirements

The following are required for usage of the official published packages.

  • Visual C++ Runtime (for Windows packages)

  • System language

    • Installation of the English language package and configuring en_US.UTF-8 locale is required, as certain operators makes use of system locales.
    • For Ubuntu, install language-pack-en package
      • Run the following commands: locale-gen en_US.UTF-8 update-locale LANG=en_US.UTF-8
      • Follow similar procedure to configure other locales on other platforms.
  • Default CPU

    • ONNX Runtime binaries in the CPU packages use OpenMP and depend on the library being available at runtime in the system.
      • For Windows, OpenMP support comes as part of VC runtime. It is also available as redist packages: vc_redist.x64.exe and vc_redist.x86.exe
      • For Linux, the system must have libgomp.so.1 which can be installed using apt-get install libgomp1.
      • For Mac OS X, the system must have libomp.dylib which can be installed using brew install libomp.
  • Default GPU (CUDA)

    • The default GPU build requires CUDA runtime libraries being installed on the system:
      • Version: CUDA 10.1 and cuDNN 7.6.5
    • Version dependencies from older ONNX Runtime releases can be found in prior release notes.

Build from Source

For production scenarios, it's strongly recommended to build only from an official release branch.

Docker Images

API Documentation

API Supported Versions Samples
Python 3.5, 3.6, 3.7, 3.8 (3.8 excludes Win GPU and Linux ARM)
Python Dev Notes
Samples
C# Samples
C++ Samples
C Samples
WinRT Windows.AI.MachineLearning Samples
Java 8+ Samples
Ruby (external project) 2.4-2.7 Samples
Javascript (node.js) 12.x Samples

Supported Accelerators

Execution Providers

CPU GPU IoT/Edge/Mobile Other
  • Default CPU - MLAS (Microsoft Linear Algebra Subprograms) + Eigen
  • Intel DNNL
  • Intel nGraph
  • Intel MKL-ML (build option)

Deploying ONNX Runtime

Cloud

IoT and edge devices

The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge. This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inferencing on the target device is crucial for optimal assistance.

Client applications


Training: Start

The ONNX Runtime training feature enables easy integration with existing Pytorch trainer code to accelerate the exection. With a few lines of code, you can add ONNX Runtime into your existing training scripts and start seeing acceleration. The current preview version supports training acceleration for transformer models on NVIDIA GPUs.

ONNX Runtime pre-training sample: This sample is setup to pre-train the BERT-Large model to show how ONNX Runtime training can be used to accelerate training execution.

Train PyTorch model with ONNX Runtime

ONNX Runtime (ORT) has the capability to train existing PyTorch models through its optimized backend. For this, we have introduced an python API for PyTorch, called ORTTrainer, which can be used to switch the training backend for PyTorch models (instance of torch.nn.Module) to orttrainer. This requires some changes in the trainer code, such as replacing the PyTorch optimizer, and optionally, setting flags to enable additional features such as mixed-precision training. Here is a sample code fragment to integrate ONNX Runtime Training in your PyTorch pre-training script:

NOTE: The current API is experimental and expected to see significant changes in the near future. Our goal is to improve the interface to provide a seamless integration with PyTorch training that requires minimal changes in users training code.

import torch
...
import onnxruntime
from onnxruntime.capi.ort_trainer import IODescription, ModelDescription, ORTTrainer

# Model definition
class Net(torch.nn.Module):
  def __init__(self, D_in, H, D_out):
    ...
  def forward(self, x):
    ...

model = Net(D_in, H, H_out)
criterion = torch.nn.Functional.cross_entropy
description = ModelDescription(...)
optimizer = 'SGDOptimizer'
trainer = ORTTrainer(model, criterion, description, optimizer, ...)

# Training Loop
for t in range(1000):
  # forward + backward + weight update
  loss, y_pred = trainer.train_step(x, y, learning_rate)
  ...

Build ONNX Runtime Training from source

To use ONNX Runtime training in a custom environment, like on-prem NVIDIA DGX-2 clusters, you can use these build instructions to generate the Python package to integrate into existing trainer code.

Data/Telemetry

This project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For any feedback or to report a bug, please file a GitHub Issue.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

License

This project is licensed under the MIT License.