mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-24 19:43:35 +00:00

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Find a file

George Nash a36f627a4c Dnnl training (#6045 ) * Add ReluGrad and ConvGrad ops for the dnnl provider * the mnist sample is updated to add the --use_dnnl option that will cause the sample to use the dnnl execution provider for nodes that exist in dnnl provider. * Added the ability to find forward ops. Dnnl backward gradient ops require the forward primitive description and workspace from the forward operation. * Enable specifying the execution provider for Gradient Checker Tests * Prevent memory leak when running dnnl_provider in training mode Prevent creating a SubgraphPrimitivePool when the code is built with the ENABLE_TRAINING build flag. Instead create a SubgraphPrimitive directly. The SubgraphPrimitivePool was causing a pool of SubgraphPrimitives to be stashed in a map for reuse. Due to the way the Training Loop uses threads the pool of SubgraphPrimitives were not being reuse instead a new pool of SubgraphPrimitives being created each run. The old pool was not instantly freed. This behavior could be a language error when using thread_local memory. Signed-off-by: George Nash <george.nash@intel.com> * Added fixes to maxpoolgrad and memory leak. Maxpoolgrad will now pass all unit tests. With the conv and convgrad disabled for dnnl, mnist is able to train till 95% Signed-off-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Fixed misc issues when testing training code with dnnl provider * fix conv_grad dnnl tests with dilation to run dnnl execution provider * update mnist training sample to accept convolution type models convolution models require the input shape to be {1, 28, 28} instead of the flat {728} image that is used for the gemm models this will enable models that require the different shape by adding `--model_type conv` to the command line when running the mnist sample. (while testing a workaround was used see #4762) * Disable weight caching in dnnl conv operator when using training When training we can not use cached weights because the weight will be updated each run. This re-enables dnnl Conv and ConvGrad Ops. The weight caching was the source of the error from Conv when training. * Fix issues found when building grad ops on Linux * The dnnl_convgrad code was over using the scope operator causing a compilation problem. * The dnnl_maxpoolgrad code had a logic error that is was comparing with the source description when it should have been comparing with the destination despription. * Update BUILD.md so it shows DNNL for training * Updated the table of contents. Since the same providers are listed twice. Once for Infrance and again for Training an HTML anchor was added to distinguish the second header from the first for the TOC. * Fix build failure when not using --enable-training build option * reorganize the gradient operators so they are grouped together * Fix issues found when running onnx_backend_test_series.py * Pooling code only supports 2 outputs when built with --enable-training * Address code review feedback * class member variables end in underscore_ * use dst instead of dist to match pattern use elsewhere in DNNL code. * Remove workaround that was introduced to handle problems running convolution based training models. See issue #4762 Signed-off-by: George Nash <george.nash@intel.com> * Isolate training code and code cleanup * Do not build if dnnl_gpu_runtime if enable_training is set training code does not support dnnl_gpu_runtime yet. * Isolated Training code inside ifdefs so that they wont affect project if built without training enabled * Inadvertant changes in whitespace were removed to make code review simpler * Undid some code reordering that was not needed * comments added to closing #endif statments to simplify reading complex ifdefs * Modified the GetPrimitiveDesc functions to return shared_ptr instead of raw pointer. This matches what was done in Pool code and is safer memory code. Signed-off-by: George Nash <george.nash@intel.com> * Address code review issues - whitespace changes caused by running clang-format on the code - Several spelling errors fixed - Removed/changed some ifdefs to improve readability - other misc. changes in responce to code review. Signed-off-by: George Nash <george.nash@intel.com> * Code changes to address code review - Simplify iteration code using `auto` keyword - remove C style cast that was not needed - remove instance variable that was not needed [relugrad.h] - added the execution providers to `ComputeGradientErrorInternal()` and `ComputeTheoreticalJacobianTranspose()` instead of using a pointer to an instance varaible [gradient_checker.h/.cc] Signed-off-by: George Nash <george.nash@intel.com> * Combined the default gradient ops test and dnnl gradient ops test for ConvGrad and MaxPoolGrad into one function with the help of a helper function. This will reduce repeated code. Signed-off-by: Palangotu Keshava, Chethan's avatarChethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> * Replaced the stack used by convgrad to vector so that the vector(used as stack) can be easily cleared everytime the graph is created. This will prevent memory leak from convolution kernels being pushed constantly onto the stack. Signed-off-by: chethan.palangotu.keshava@intel.com * Code clean up and formating updates - Removed empty else statment - updated indentation of code that was causing double curly brackets to look unususal - Changed check for NumDimensions to Size in Relu and ReluGrad error checking code. - isolated training code Signed-off-by: George Nash <george.nash@intel.com> * Restore inadvertantly removed ConvGrad tests When combining the DNNL and CPU version of the ConvGrad tests two test were inadvertantly excluded. This adds back the Conv3d and Conv3d with strides test cases. Signed-off-by: George Nash <george.nash@intel.com> * Add validation to ConvGrad This validates the dimensions of the ConvGrad match the passed in Convolution forward primitive description. The current code for DNNL ConvGrad makes the assumption that the ConvGrad nodes will be visited in the reverse order from the corresponding Conv nodes The added validation will return an error if this assumption is not true. Signed-off-by: George Nash <george.nash@intel.com> * Do not create new execution providers in provider_test_utils This removes the code that generated new execution providers in the OpTester::Run function. This was added because the std::move was leaving the `entry` value empty so subsequent calls would cause a segfault. Problem is this potentially changed the execution_provider because it would create the default provider dropping any custom arguments. When the now removed code was originally added the std::move was causing crashes when the GradientChecker unit tests were run. However, it is no longer causing problems even with the code removed. Signed-off-by: George Nash <george.nash@intel.com> * Change the forward conv stack to a forward conv map This changes how the forward conv kernel is mapped to the bwd ConvGrad kernel the problematic stack is no longer used. The convolution stack made the assumption that the corresponding ConvGrad operator would be visited in reverse order of the forward Conv operators. This was always problematic and was unlikely to work for inception models. Important changes: - The weight_name is added to the ConvGrad dnnl_node making it possible to use the weight_name as a lookup key to find the Conv forward Kernel - the `std::vector fwd_conv_stack_` has been replaced with a `std::map fwd_conv_kernel_map_` - Although it is not needed lock_guards were added when writing to and reading from the fwd_conv_kernel_map_ as well as the fwd_kernel_map_. These should always be accessed by a single thread when preparing the dnnl subgraphs so the guard should not be needed but its added just in case. - Updated the comments ConvGrad.h code to no longer mention the stack. The error check is not removed. It will be good to verify there are no errors as we continue to test against more models. Signed-off-by: George Nash <george.nash@intel.com> Co-authored-by: Chethan Palangotu Keshava <chethan.palangotu.keshava@intel.com> Co-authored-by: unknown <63478620+jeyblu@users.noreply.github.com>		2021-01-29 16:05:58 -08:00
.github	Don't mark issues that are marked as enhancement as stale (#6134 )	2020-12-14 18:57:40 -08:00
cgmanifests	Op kernel type reduction infrastructure. (#6466 )	2021-01-28 07:27:19 -08:00
cmake	Dnnl training (#6045 )	2021-01-29 16:05:58 -08:00
csharp	Delete nuget extra configs (#6477 )	2021-01-27 20:25:45 -08:00
dockerfiles	OpenVino docker file changes to bypass privileged mode	2021-01-22 09:43:47 -08:00
docs	Add ability to track per operator types in reduced build config. (#6428 )	2021-01-29 07:59:51 +10:00
include/onnxruntime/core	[CoreML EP] Add CI for CoreML EP (macOS) and add coreml_flags for EP options (#6481 )	2021-01-28 12:25:46 -08:00
java	Fixing a leak in OnnxSequences with String keys or values. (#6473 )	2021-01-28 11:28:56 -08:00
nodejs	fix SDL rule (#6464 )	2021-01-27 15:32:45 -08:00
onnxruntime	Dnnl training (#6045 )	2021-01-29 16:05:58 -08:00
orttraining	Dnnl training (#6045 )	2021-01-29 16:05:58 -08:00
package/rpm	Update version to 1.6.0 (#6041 )	2020-12-08 11:09:51 -08:00
samples	Remove nGraph Execution Provider (#5858 )	2020-11-19 16:47:55 -08:00
server	Remove nGraph Execution Provider (#5858 )	2020-11-19 16:47:55 -08:00
tools	Enable dense sequence optimized version of Pytorch exported BERT-L on AMD GPU (#6504 )	2021-01-29 13:12:34 -08:00
winml	handle hr error conditions (#6449 )	2021-01-29 15:01:08 -08:00
.clang-format
.clang-tidy	Add remaining build options and make minor changes in documentation (#39 )	2018-11-27 19:59:40 -08:00
.dockerignore	Update dockerfiles (#5929 )	2020-11-25 15:38:22 -08:00
.flake8	Add ability to track per operator types in reduced build config. (#6428 )	2021-01-29 07:59:51 +10:00
.gitattributes
.gitignore	Enable the xcode build for Apple Silicon (arm64 MacOS) (#5924 )	2020-11-30 11:22:08 -08:00
.gitmodules	Op kernel type reduction infrastructure. (#6466 )	2021-01-28 07:27:19 -08:00
build.amd64.1411.bat
build.bat
BUILD.md	Dnnl training (#6045 )	2021-01-29 16:05:58 -08:00
build.sh	Add iOS test pipeline and a sample app. (#5298 )	2020-09-29 13:53:11 -07:00
CODEOWNERS	Re-enable CI tests for the new PyTorch frontend (#5017 )	2020-09-04 09:36:24 -07:00
CONTRIBUTING.md	Update documentation for contributing a PR and add deprecation notices for PyOp and ORT server. (#6172 )	2020-12-18 02:00:42 -08:00
LICENSE
NuGet.config	Delete nuget extra configs (#6477 )	2021-01-27 20:25:45 -08:00
ort.wprp	Add Tracelogging for profiling (#1639 )	2019-11-11 21:34:10 -08:00
packages.config	Add suspend handler with new telemetry event for UWP scenarios (#5907 )	2020-12-01 20:26:18 -08:00
README.md	Update the readme file	2020-12-30 20:16:45 -08:00
requirements-dev.txt	Add ability to track per operator types in reduced build config. (#6428 )	2021-01-29 07:59:51 +10:00
requirements-doc.txt	Update readme.rst for pypi, change documentation style (#1663 )	2019-10-19 18:26:34 -07:00
requirements.txt	Remove cerberus from wheel package (#4919 )	2020-08-26 09:00:03 -07:00
setup.py	[OpenVINO-EP] Remove support for OpenVINO 2020.2 (#6493 )	2021-01-28 23:00:41 -08:00
ThirdPartyNotices.txt	remove gemmlowp submodule (#6341 )	2021-01-13 15:54:37 -08:00
VERSION_NUMBER	Update version to 1.6.0 (#6041 )	2020-12-08 11:09:51 -08:00

README.md

ONNX Runtime is a cross-platform inferencing and training accelerator compatible with many popular ML/DNN frameworks, including PyTorch, TensorFlow/Keras, scikit-learn, and more. aka.ms/onnxruntime

Many users can benefit from ONNX Runtime, including those looking to:

Improve inference performance for a wide variety of ML models
Reduce time and cost of training large models
Train in Python but deploy into a C#/C++/Java app
Run on different hardware and operating systems
Support models created in several different frameworks

ONNX Runtime inferencing APIs are stable and production-ready since the 1.0 release in October 2019 and can enable faster customer experiences and lower costs.

ONNX Runtime training feature was introduced in May 2020 in preview. This feature supports acceleration of PyTorch training on multi-node NVIDIA GPUs for transformer models. Additional updates for this feature are coming soon.

Get Started
- ONNX Runtime Inferencing
- ONNX Runtime Training
Data/Telemetry
Contributions and Feedback
License

Get Started

Frequently Asked Questions

Inferencing: Start

To use ONNX Runtime, refer to the table on aka.ms/onnxruntime for instructions for different build combinations.

Compatibility

Supporting models based on the standard ONNX format, the runtime is compatible with PyTorch, scikit-learn, TensorFlow, Keras, and all other frameworks and tools that support the interoperable format.

Getting ONNX models - tutorials

ONNX Runtime is up to date and backwards compatible with all operators (both DNN and traditional ML) since ONNX v1.2.1+. (ONNX compatibility details). Newer versions of ONNX Runtime support all models that worked with prior versions, so updates should not break integrations.

Supported operators/types
- Operators not supported in the current ONNX spec may be available as a Contrib Operator
Extensibility: Add a custom operator/kernel

Binaries

Official builds are available on PyPi (Python), Nuget (C#/C/C++), Maven Central (Java), and npm (node.js).

Default CPU Provider (Eigen + MLAS)
GPU Provider - NVIDIA CUDA
GPU Provider - DirectML (Windows)
- On Windows, the DirectML execution provider is recommended for optimal performance and compatibility with a broad set of GPUs.

Dev builds created from the master branch are available for testing newer changes between official releases. Please use these at your own risk. We strongly advise against deploying these to production workloads as support is limited for dev builds.

Repository	Details
Pypi (Python)	If using pip, run `pip install --upgrade pip` prior to downloading. CPU: onnxruntime / ort-nightly (dev) GPU: onnxruntime-gpu / ort-gpu-nightly (dev)
Nuget (C#/C/C++)	CPU: Microsoft.ML.OnnxRuntime / ort-nightly (dev) GPU: Microsoft.ML.OnnxRuntime.Gpu / ort-nightly (dev)
Maven Central (Java)	CPU: com.microsoft.onnxruntime/onnxruntime GPU: com.microsoft.onnxruntime/onnxruntime_gpu
npm (node.js)	CPU: onnxruntime
Other	Contributed non-official packages (including Homebrew, Linuxbrew, and nixpkgs) These are not maintained by the core ONNX Runtime team and may have limited support; use at your discretion.

System Requirements

The following are required for usage of the official published packages.

Visual C++ Runtime (for Windows packages)
- Requires Visual C++ 2019 runtime
System language
- Installation of the English language package and configuring en_US.UTF-8 locale is required, as certain operators makes use of system locales.
- For Ubuntu, install language-pack-en package
  - Run the following commands: locale-gen en_US.UTF-8 update-locale LANG=en_US.UTF-8
  - Follow similar procedure to configure other locales on other platforms.
Default CPU
- ONNX Runtime binaries in the CPU packages use OpenMP and depend on the library being available at runtime in the system.
  - For Windows, OpenMP support comes as part of VC runtime. It is also available as redist packages: vc_redist.x64.exe and vc_redist.x86.exe
  - For Linux, the system must have libgomp.so.1 which can be installed using apt-get install libgomp1.
  - For Mac OS X, the system must have libomp.dylib which can be installed using brew install libomp.
Default GPU (CUDA)
- The default GPU build requires CUDA runtime libraries being installed on the system:
  - Version: CUDA 10.2 and cuDNN 8.0.3
- Version dependencies from older ONNX Runtime releases can be found in prior release notes.

Build from Source

For production scenarios, it's strongly recommended to build only from an official release branch.

Instructions for additional build flavors

Docker Images

ONNX-Ecosystem: includes ONNX Runtime (CPU, Python), dependencies, tools to convert from various frameworks, and Jupyter notebooks to help get started
Additional dockerfiles

API Documentation

API	Supported Versions	Samples
Python	3.6, 3.7, 3.8, 3.9 (3.8/3.9 excludes Win GPU and Linux ARM) Python Dev Notes	Samples
C#		Samples
C++		Samples
C		Samples
WinRT	Windows.AI.MachineLearning	Samples
Java	8+	Samples
Ruby (external project)	2.4-2.7	Samples
Javascript (node.js)	12.x	Samples

Supported Accelerators

Execution Providers

CPU	GPU	IoT/Edge/Mobile	Other
Default CPU - MLAS (Microsoft Linear Algebra Subprograms) + Eigen Intel DNNL Intel MKL-ML (build option)	NVIDIA CUDA NVIDIA TensorRT DirectML AMD MIGraphX (preview)	Intel OpenVINO ARM Compute Library (preview) Android Neural Networks API (preview) ARM-NN (preview) Rockchip NPU (preview)	Nuphar Model Compiler - (preview) Xilinx Vitis-AI (preview)

Deploying ONNX Runtime

Cloud

ONNX Runtime can be deployed to any cloud for model inferencing, including Azure Machine Learning Services.
- Detailed instructions
- AzureML sample notebooks
ONNX Runtime Server (beta) is a hosting application for serving ONNX models using ONNX Runtime, providing a REST API for prediction.
- Usage details
- Image installation instructions

IoT and edge devices

Reference implementations

The expanding focus and selection of IoT devices with sensors and consistent signal streams introduces new opportunities to move AI workloads to the edge. This is particularly important when there are massive volumes of incoming data/signals that may not be efficient or useful to push to the cloud due to storage or latency considerations. Consider: surveillance tapes where 99% of footage is uneventful, or real-time person detection scenarios where immediate action is required. In these scenarios, directly executing model inferencing on the target device is crucial for optimal assistance.

Client applications

Install or build the package you need to use in your application. (sample implementations using the C++ API)
On newer Windows 10 devices (1809+), ONNX Runtime is available by default as part of the OS and is accessible via the Windows Machine Learning APIs. (Tutorials for Windows Desktop or UWP app)

Training: Start

The ONNX Runtime training feature enables easy integration with existing Pytorch trainer code to accelerate the exection. With a few lines of code, you can add ONNX Runtime into your existing training scripts and start seeing acceleration. The current preview version supports training acceleration for transformer models on NVIDIA GPUs.

ONNX Runtime pre-training sample: This sample is setup to pre-train the BERT-Large model to show how ONNX Runtime training can be used to accelerate training execution.

Train PyTorch model with ONNX Runtime

ONNX Runtime (ORT) has the capability to train existing PyTorch models through its optimized backend. For this, we have introduced an python API for PyTorch, called ORTTrainer, which can be used to switch the training backend for PyTorch models (instance of torch.nn.Module) to orttrainer. This requires some changes in the trainer code, such as replacing the PyTorch optimizer, and optionally, setting flags to enable additional features such as mixed-precision training. Here is a sample code fragment to integrate ONNX Runtime Training in your PyTorch pre-training script:

NOTE: The current API is experimental and expected to see significant changes in the near future. Our goal is to improve the interface to provide a seamless integration with PyTorch training that requires minimal changes in users’ training code.

import torch
...
import onnxruntime
from onnxruntime.training import ORTTrainer, optim

# Model definition
class NeuralNet(torch.nn.Module):
  def __init__(self, input_size, hidden_size, num_classes):
    ...
  def forward(self, data):
    ...

model = NeuralNet(input_size=784, hidden_size=500, num_classes=10)
criterion = torch.nn.Functional.cross_entropy 
model_description = {'inputs':  [('data', ['in', 'batch_size']),
                                 ('target', ['label_x_batch_size'])],
                     'outputs': [('loss', [], True),
                                 ('output', ['out', 'batch_size'])]}

optimizer_config = optim.AdamConfig(lr=learning_rate)

trainer = ORTTrainer(model,              # model
                     model_description,  # model description
                     optimizer_config,   # optimizer configuration
                     criterion)          # loss function

# Training Loop
for t in range(1000):
  # forward + backward + weight update
  loss, y_pred = trainer.train_step(input_data, target_labels, learning_rate)
  total_loss += loss.item()
  ...

Build ONNX Runtime Training from source

To use ONNX Runtime training in a custom environment, like on-prem NVIDIA DGX-2 clusters, you can use these build instructions to generate the Python package to integrate into existing trainer code.

Data/Telemetry

This project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For any feedback or to report a bug, please file a GitHub Issue.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

License

This project is licensed under the MIT License.

README.md Unescape Escape

Table of Contents