ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Find a file
Tang, Cheng a81faee41e
Multi-stream execution support (#13495)
**Description**: This PR including following works:
1. provide stream and related synchronization abstractions in
onnxruntime.
2. enhance onnxruntime's execution planner / executor / memory arena to
support execute multiple streams in parallel.
3. deprecate the parallel executor for cpu.
4. deprecate the Fence mechanism. 
5. update the cuda / tensorrt EP to support the stream mechanism,
support running different request in different cuda stream.

**Motivation and Context**
- Why is this change required? 
currently, the execution plan is just a linear list of those primitives,
ort will execute them step by step. For any given graph, ORT will
serialize it to a fixed execution order. This sequential execution
design simplifies most scenarios, but it has the following limitations:
1. it is difficult to enable inter-node parallelization, we have a
half-baked parallel executor but it is very difficult to make it work
with GPU.
2. The fence mechanism can work with single gpu stream + cpu thread
case, but when extend to multiple stream, it is difficult to manage the
cross GPU stream synchronizations.
3. our cuda EP rely on the BFCArena to make the memory management work
with the GPU async kernels, but current BFCArena is not aware of the
streams, so it doesn't behavior correctly when run with multiple
streams.

This PR enhance our existing execution plan and executor to support
multiple stream execution. we use an unified algorithm to mange both
single stream and multiple stream scenarios.
This PR mainly focus on the infrastructure support for multiple stream
execution, that is said, given a valid stream assignment, onnxruntime
can execute it correctly. How to generate a good stream assignment for a
given model will be in the future PR.

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: cao lei <jslhcl@gmail.com>
Co-authored-by: Lei Cao <leca@microsoft.com>
2022-12-15 07:39:29 -08:00
.config Update tsaoptions.json: update the email alias (#13448) 2022-10-26 15:56:16 -07:00
.devcontainer Remove two lines in the Dockerfile for Github Codespace (#12278) 2022-07-21 20:52:17 -07:00
.gdn
.github Auto add docs issues to project (#13897) 2022-12-12 16:45:31 -08:00
.pipelines [DML EP] Upgrade DML to 1.10.0 (#13796) 2022-11-30 21:32:14 -08:00
.vscode cpplint & Eager mode: refactor and add comments to empty_* functions, general lint cleanup in ort_aten (#12238) 2022-07-20 11:47:57 -04:00
cgmanifests Use onnxruntime_fetchcontent_makeavailable cmake function for TRT (#13918) 2022-12-12 11:27:46 -08:00
cmake Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
csharp Miscellaneous updates to training apis (#13929) 2022-12-14 13:33:07 -08:00
dockerfiles fix(cuda): install missing python3-packaging in Dockerfile 2022-12-12 16:26:29 -08:00
docs Add float64 kernels for Floor, Ceil, IsNaN (#13906) 2022-12-14 14:57:14 -08:00
include/onnxruntime/core Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
java [java] Sparse tensor support (#10653) 2022-11-22 10:29:24 -08:00
js Bug Fix - ORT Web build script (#13925) 2022-12-12 21:48:20 -08:00
objectivec [xnnpack-ep] NEW EP API in objc (#13941) 2022-12-15 20:12:02 +08:00
onnxruntime Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
orttraining Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
package/rpm Bumping up version number to 1.14.0 on main branch (#13401) 2022-10-21 19:16:44 -04:00
samples Format all python files under onnxruntime with black and isort (#11324) 2022-04-26 09:35:16 -07:00
test Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
tools Multi-stream execution support (#13495) 2022-12-15 07:39:29 -08:00
winml Enabling thread pool to be numa-aware (#13778) 2022-12-12 10:33:55 -08:00
.clang-format
.clang-tidy Create clang-tidy CI (#12653) 2022-09-30 08:05:38 -07:00
.dockerignore
.flake8 Remove miscellaneous nuphar configs (#13070) 2022-09-26 13:41:28 -07:00
.gitattributes
.gitignore Ignore settings.json in git (#12988) 2022-09-19 12:05:43 -07:00
.gitmodules Remove unused git submodules (#13830) 2022-12-07 21:59:16 -08:00
build.amd64.1411.bat
build.bat
build.sh
CITATION.cff Fix CITATION.cff and add automatic validation of your citation metadata (#10478) 2022-04-13 10:03:52 -07:00
CODEOWNERS Add cgmanifest file in codeowner list (#13042) 2022-09-22 18:58:01 -07:00
CONTRIBUTING.md minor improvements to CONTRIBUTING doc (#11080) 2022-04-12 15:22:34 -07:00
lgtm.yml Fix lgtm C++ error (#13613) 2022-11-10 10:06:22 -08:00
LICENSE
NuGet.config
ort.wprp
ORT_icon_for_light_bg.png
packages.config [DML EP] Upgrade DML to 1.10.0 (#13796) 2022-11-30 21:32:14 -08:00
pyproject.toml Update pylint config to include valid short names (#13631) 2022-11-14 10:00:25 -08:00
README.md Update resource section in readme (#13724) 2022-11-28 09:42:31 -08:00
requirements-dev.txt Introduce parameterized as a dev dependency (#11364) 2022-04-26 17:24:39 -07:00
requirements-doc.txt
requirements-training.txt Remove protobuf pin from training requirements (#13695) 2022-11-22 12:27:18 -08:00
requirements.txt.in Add additional python requirements (#11522) 2022-05-20 16:16:18 -07:00
SECURITY.md Microsoft mandatory file (#11619) 2022-05-25 13:56:10 -07:00
setup.py Enable ORT in TorchDynamo (#13259) 2022-11-01 11:19:29 -07:00
ThirdPartyNotices.txt Use updated ONNX license in ThirdPartyNotices.txt. (#13919) 2022-12-09 17:46:37 -08:00
VERSION_NUMBER Bumping up version number to 1.14.0 on main branch (#13401) 2022-10-21 19:16:44 -04:00

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →

ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →

Get Started & Resources

Build Pipeline Status

System CPU GPU EPs
Windows Build Status Build Status Build Status
Linux Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Mac Build Status
Build Status
Android Build Status
iOS Build Status
WebAssembly Build Status

Data/Telemetry

Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

For general discussion or questions, please use GitHub Discussions.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

License

This project is licensed under the MIT License.