saymrwulf/onnxruntime: ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-10 17:37:14 +00:00

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Find a file

Chen Fu f4f2cc1a00 Add batch interface to floating point GEMM (#7323 ) Currently in high dimension matmul, we call multiple GEMM sequentially. In this change we execute these GEMMs in parallel, removing barriers between two adjacent GEMM operations. Performance tested with Bert and T5 model. Bert model shows no noticeable perf differences, as the heavy lifting is done by the attention operator, which is not changed in this PR. In T5 model, we see no regression on low parallel threads (x4), and performance improvement is more pronounced in high number of threads (8-16). T5 shows 10% speedup with 16 threads. With profiling, we can see the most expensive MatMul operators in T5 achieves around 20% speedup with 16 threads. Co-authored-by: Chen Fu <fuchen@microsoft.com>		2021-04-23 17:34:22 -07:00
.github	Don't mark issues that are marked as enhancement as stale (#6134 )	2020-12-14 18:57:40 -08:00
cgmanifests	pick onnx release candidate (#7177 )	2021-04-22 23:57:09 -07:00
cmake	Build would fail when nccl is not under standard path (--nccl_home) (#7402 )	2021-04-23 14:04:22 -07:00
csharp	pick onnx release candidate (#7177 )	2021-04-22 23:57:09 -07:00
dockerfiles	fix for using tensorrt:20.12 base image (#7264 )	2021-04-07 08:48:43 -07:00
docs	Update docs/ContribOperators.md and the script that generates it. (#7399 )	2021-04-21 16:20:56 -07:00
include/onnxruntime/core	Add ability to allocate initialized tensor memory from non-arena memory (#7267 )	2021-04-20 20:27:48 -07:00
java	Create Android Package pipeline (#7295 )	2021-04-12 17:56:25 -07:00
js	[JS] refactor Javascript/Typescript libraries in ONNX Runtime (#7308 )	2021-04-16 01:33:10 -07:00
onnxruntime	Add batch interface to floating point GEMM (#7323 )	2021-04-23 17:34:22 -07:00
orttraining	Partial graph execution made simple. (#7324 )	2021-04-23 15:09:18 -07:00
package/rpm	Bumping up version to 1.7 (#6736 )	2021-02-17 19:07:38 -08:00
samples	Introduce ORTModule training API to ONNX Runtime	2021-03-10 10:48:10 -08:00
server	Update ORT server build pipeline (#7030 )	2021-03-16 18:02:09 -07:00
tools	Add CI pipeline to publish Python training package targeting Rocm (#7417 )	2021-04-23 17:22:31 -07:00
winml	Enabled fp16-inception-v1 test (#7406 )	2021-04-22 23:05:03 -07:00
.clang-format
.clang-tidy
.dockerignore	Update dockerfiles (#5929 )	2020-11-25 15:38:22 -08:00
.flake8	Sync ORTModule branch with master and fix tests (#6526 )	2021-02-02 08:59:56 -08:00
.gitattributes
.gitignore	Add auto doc gen for ORTModule API during CI build (#7046 )	2021-03-22 10:20:33 -07:00
.gitmodules	build ONNXRuntime into WebAssembly (#6478 )	2021-04-06 16:18:10 -07:00
build.amd64.1411.bat
build.bat
build.sh	Add iOS test pipeline and a sample app. (#5298 )	2020-09-29 13:53:11 -07:00
CODEOWNERS	Update code owners for pytorch frontend team (#6329 )	2021-02-02 11:09:10 -08:00
CONTRIBUTING.md	Add README for docs (#6626 )	2021-03-12 15:14:40 -08:00
LICENSE	Remove year from license (#6658 )	2021-02-12 00:25:56 -08:00
NuGet.config	Sync ORTModule branch with master and fix tests (#6526 )	2021-02-02 08:59:56 -08:00
ort.wprp
packages.config	Update DirectML 1.4.1 to 1.4.2 for ORT 1.7 (#6780 )	2021-02-23 10:52:10 -08:00
README.md	build ONNXRuntime into WebAssembly (#6478 )	2021-04-06 16:18:10 -07:00
requirements-dev.txt	Sync ORTModule branch with master and fix tests (#6526 )	2021-02-02 08:59:56 -08:00
requirements-doc.txt	Add auto doc gen for ORTModule API during CI build (#7046 )	2021-03-22 10:20:33 -07:00
requirements-training.txt	Add missing Python dependencies for ORT training (#7104 )	2021-03-23 18:43:19 -07:00
requirements.txt	Quantization calibration refactor (#6893 )	2021-03-19 01:09:11 -07:00
setup.py	Add CI pipeline to publish Python training package targeting Rocm (#7417 )	2021-04-23 17:22:31 -07:00
ThirdPartyNotices.txt	Enable CoreML EP for minimal extended mode (#7266 )	2021-04-08 17:45:22 -07:00
VERSION_NUMBER	Bumping up version to 1.7 (#6736 )	2021-02-17 19:07:38 -08:00

README.md

ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.

ONNX Runtime uses the portable ONNX computation graph format, backed by execution providers optimized for operating systems, drivers and hardware.

Common use cases for ONNX Runtime:

Improve inference performance for a wide variety of ML models
Reduce time and cost of training large models
Train in Python but deploy into a C#/C++/Java app
Run with optimized performance on different hardware and operating systems
Support models created in several different frameworks

ONNX Runtime inference APIs are stable and production-ready since the 1.0 release in October 2019 and can enable faster customer experiences and lower costs.

ONNX Runtime training feature was introduced in May 2020 in preview. This feature supports acceleration of PyTorch training on multi-node NVIDIA GPUs for transformer models. Additional updates for this feature are coming soon.