saymrwulf/onnxruntime: ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-18 18:52:16 +00:00

ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

Find a file

Chen Fu ef1aaa367a Adding interface for batched integer gemm (#7249 ) Parallelize MinMax, Quantize and batched quantize GEMM Performance problem identified in T5 decoder model (quantized). DynamicMatMul operator is identified as the culprit. This operator spend time on getting MinMax of a Tensor, quantize a tensor, and perform a batched qgemm. All of these can be parallelized. Currently GEMM is parallelized. However, in batched GEMM, we sequentially call GEMM multiple times. This causes multiple starting and ending of parallel sections, which can be slow sometimes. So we made the following changes: Parallel task partition no longer depends on degree of parallelism, only on shape of the matrices. In a single GEMM, perform 2D partition of the multiplication, along panel lines, to reduce repeated packing. For batched GEMM, all parallel tasks are executed in a single parallel section, reducing the cost of starting threads and waiting for them to finish.		2021-04-15 10:25:31 -07:00
.github	Don't mark issues that are marked as enhancement as stale (#6134 )	2020-12-14 18:57:40 -08:00
cgmanifests	pull onnx latest commit (#7102 )	2021-03-29 11:00:38 -07:00
cmake	Adding interface for batched integer gemm (#7249 )	2021-04-15 10:25:31 -07:00
csharp	Fix Zip-Nuget-Java Packaging Pipeline (#7208 )	2021-04-05 10:58:13 -07:00
dockerfiles	fix for using tensorrt:20.12 base image (#7264 )	2021-04-07 08:48:43 -07:00
docs	Update ContribOperators.md (#7246 )	2021-04-05 17:11:33 -07:00
include/onnxruntime/core	[OpenVINO-EP] Enabling save/Load blob feature (#7054 )	2021-04-07 20:59:16 -07:00
java	Create Android Package pipeline (#7295 )	2021-04-12 17:56:25 -07:00
nodejs	[Node.js binding] upgrade y18n to v4.0.1 (#7185 )	2021-03-30 16:09:04 -07:00
onnxruntime	Adding interface for batched integer gemm (#7249 )	2021-04-15 10:25:31 -07:00
orttraining	Propagate Cast operations to maximize lower precision (float16) computation (#7191 )	2021-04-14 20:54:24 -07:00
package/rpm	Bumping up version to 1.7 (#6736 )	2021-02-17 19:07:38 -08:00
samples	Introduce ORTModule training API to ONNX Runtime	2021-03-10 10:48:10 -08:00
server	Update ORT server build pipeline (#7030 )	2021-03-16 18:02:09 -07:00
tools	Delete an unused var in nuget pipelines(#7345 )	2021-04-15 07:29:52 -07:00
winml	fix typo in scenariotestscppwinrt.cpp (#7334 )	2021-04-14 08:26:55 -07:00
.clang-format	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00
.clang-tidy	Add remaining build options and make minor changes in documentation (#39 )	2018-11-27 19:59:40 -08:00
.dockerignore	Update dockerfiles (#5929 )	2020-11-25 15:38:22 -08:00
.flake8	Sync ORTModule branch with master and fix tests (#6526 )	2021-02-02 08:59:56 -08:00
.gitattributes	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00
.gitignore	Add auto doc gen for ORTModule API during CI build (#7046 )	2021-03-22 10:20:33 -07:00
.gitmodules	build ONNXRuntime into WebAssembly (#6478 )	2021-04-06 16:18:10 -07:00
build.amd64.1411.bat	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00
build.bat	Initial bootstrap commit.	2018-11-19 16:48:22 -08:00
build.sh	Add iOS test pipeline and a sample app. (#5298 )	2020-09-29 13:53:11 -07:00
CODEOWNERS	Update code owners for pytorch frontend team (#6329 )	2021-02-02 11:09:10 -08:00
CONTRIBUTING.md	Add README for docs (#6626 )	2021-03-12 15:14:40 -08:00
LICENSE	Remove year from license (#6658 )	2021-02-12 00:25:56 -08:00
NuGet.config	Sync ORTModule branch with master and fix tests (#6526 )	2021-02-02 08:59:56 -08:00
ort.wprp	Add Tracelogging for profiling (#1639 )	2019-11-11 21:34:10 -08:00
packages.config	Update DirectML 1.4.1 to 1.4.2 for ORT 1.7 (#6780 )	2021-02-23 10:52:10 -08:00
README.md	build ONNXRuntime into WebAssembly (#6478 )	2021-04-06 16:18:10 -07:00
requirements-dev.txt	Sync ORTModule branch with master and fix tests (#6526 )	2021-02-02 08:59:56 -08:00
requirements-doc.txt	Add auto doc gen for ORTModule API during CI build (#7046 )	2021-03-22 10:20:33 -07:00
requirements-training.txt	Add missing Python dependencies for ORT training (#7104 )	2021-03-23 18:43:19 -07:00
requirements.txt	Quantization calibration refactor (#6893 )	2021-03-19 01:09:11 -07:00
setup.py	Liqun/ort package name2 (#7337 )	2021-04-13 20:36:24 -07:00
ThirdPartyNotices.txt	Enable CoreML EP for minimal extended mode (#7266 )	2021-04-08 17:45:22 -07:00
VERSION_NUMBER	Bumping up version to 1.7 (#6736 )	2021-02-17 19:07:38 -08:00

README.md

ONNX Runtime is a cross-platform inference and training machine-learning accelerator compatible with deep learning frameworks, PyTorch and TensorFlow/Keras, as well as classical machine learning libraries such as scikit-learn, and more.

ONNX Runtime uses the portable ONNX computation graph format, backed by execution providers optimized for operating systems, drivers and hardware.

Common use cases for ONNX Runtime:

Improve inference performance for a wide variety of ML models
Reduce time and cost of training large models
Train in Python but deploy into a C#/C++/Java app
Run with optimized performance on different hardware and operating systems
Support models created in several different frameworks

ONNX Runtime inference APIs are stable and production-ready since the 1.0 release in October 2019 and can enable faster customer experiences and lower costs.

ONNX Runtime training feature was introduced in May 2020 in preview. This feature supports acceleration of PyTorch training on multi-node NVIDIA GPUs for transformer models. Additional updates for this feature are coming soon.