ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
Find a file
Chi Lo c964da7ea2
FasterTransformer model wrapper using custom op (#15013)
### Description
<!-- Describe your changes. -->
We are introducing the FasterTransfomer model-level integration using
ORT [custom op runtime
wrapper](https://github.com/microsoft/onnxruntime/pull/13427).
In order to make the FT wrapper/integration work, two things need to be
done:

- New API `KernelInfoGetConstantInput_tensor`. (Done in this PR)
During custom op kernel initialization, it needs to get the model
weights (saved as node's constant inputs) ready for FT's weights
instantiation. What's why we need to add this new API to make kernel
info capable of getting constant inputs.

- Custom op and custom op kernel to wrap FT model. (Will provide in
onnxruntime extensions or inference examples)
During custom op kernel initialization, it can fetch attributes from
kernel info to determine which kind of FT model instance create. During
custom op kernel compute/inference, it can get input/output from kernel
context and then assign input/output buffers for model instance to run.
2023-03-20 09:05:30 -07:00
.config Update tsaoptions.json: update the email alias (#13448) 2022-10-26 15:56:16 -07:00
.devcontainer
.gdn
.github Fix API docs deploy so that a PR is not required (#15011) 2023-03-13 09:36:08 -07:00
.pipelines use python 3.9.7 in windowai packaging pipeline (#14766) 2023-02-23 09:48:42 +08:00
.vscode
cgmanifests Consume ONNX 1.13.1 in ONNX Runtime (#14812) 2023-03-02 14:57:35 -08:00
cmake FasterTransformer model wrapper using custom op (#15013) 2023-03-20 09:05:30 -07:00
csharp Add GetVersionSting API for C++, C# and Python (#14873) 2023-03-02 17:11:07 -08:00
dockerfiles fix TRT dockerfile documentation https://github.com/microsoft/onnxruntime/issues/14556 (#14600) 2023-03-01 07:02:42 -08:00
docs [CUDA] Add option to use DecoderMaskedMultiheadAttention in BeamSearch (#14990) 2023-03-15 17:16:32 -07:00
include/onnxruntime/core FasterTransformer model wrapper using custom op (#15013) 2023-03-20 09:05:30 -07:00
java Update Gradle version (#14862) 2023-03-08 12:22:06 -08:00
js Bump @sideway/formula from 3.0.0 to 3.0.1 in /js/react_native (#15028) 2023-03-16 10:17:38 -07:00
objectivec Objective-C lib: Added support for int64 and uint64. (#14405) 2023-02-24 23:25:16 -08:00
onnxruntime FasterTransformer model wrapper using custom op (#15013) 2023-03-20 09:05:30 -07:00
orttraining Fix training gpu ci related to pl upgrade (#15092) 2023-03-17 13:26:58 +08:00
package/rpm Bump ORT version number (#14226) 2023-01-26 12:33:47 -08:00
rust Add rust bindings (#12606) 2023-02-08 14:57:15 -08:00
samples
tools [ROCm] add rocm5.4.2 to python package pipeline (#15081) 2023-03-20 10:30:14 +08:00
winml remove device_id parameter out of ExecutionProvider::GetAllocator() (#14580) 2023-02-13 10:01:07 -08:00
.clang-format
.clang-tidy
.dockerignore
.flake8
.gitattributes
.gitignore Update Gradle version (#14862) 2023-03-08 12:22:06 -08:00
.gitmodules [wasm] upgrade emsdk from 3.1.19 to 3.1.32 (#14818) 2023-02-28 11:06:09 -08:00
build.amd64.1411.bat
build.bat
build.sh
CITATION.cff
CODEOWNERS Update CODEOWNERS file. 2023-03-07 17:56:37 -08:00
CONTRIBUTING.md Fix link to High Level Design (#11786) 2023-02-28 11:05:54 -08:00
lgtm.yml Fix lgtm C++ error (#13613) 2022-11-10 10:06:22 -08:00
LICENSE
NuGet.config
ort.wprp
ORT_icon_for_light_bg.png
packages.config [DML EP] Upgrade DML to 1.10.1 (#14433) 2023-01-25 21:07:10 -08:00
pyproject.toml Update pylint config to include valid short names (#13631) 2022-11-14 10:00:25 -08:00
README.md [Readme] Update table for build pipelines (#14618) 2023-02-08 09:44:20 -08:00
requirements-dev.txt
requirements-doc.txt
requirements-training.txt Remove protobuf pin from training requirements (#13695) 2022-11-22 12:27:18 -08:00
requirements.txt.in
SECURITY.md
setup.py enable pybind for qnn ep (#14897) 2023-03-03 07:26:53 -08:00
ThirdPartyNotices.txt Revert mimalloc from v2.0.9 to v2.0.3 (#14603) 2023-02-07 09:58:25 -08:00
VERSION_NUMBER Bump ORT version number (#14226) 2023-01-26 12:33:47 -08:00

ONNX Runtime is a cross-platform inference and training machine-learning accelerator.

ONNX Runtime inference can enable faster customer experiences and lower costs, supporting models from deep learning frameworks such as PyTorch and TensorFlow/Keras as well as classical machine learning libraries such as scikit-learn, LightGBM, XGBoost, etc. ONNX Runtime is compatible with different hardware, drivers, and operating systems, and provides optimal performance by leveraging hardware accelerators where applicable alongside graph optimizations and transforms. Learn more →

ONNX Runtime training can accelerate the model training time on multi-node NVIDIA GPUs for transformer models with a one-line addition for existing PyTorch training scripts. Learn more →

Get Started & Resources

Build Pipeline Status

System Inference Training
Windows Build Status
Build Status
Build Status
Linux Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Build Status
Mac Build Status
Android Build Status
iOS Build Status
Web Build Status
Other Build Status
Build Status

Data/Telemetry

Windows distributions of this project may collect usage data and send it to Microsoft to help improve our products and services. See the privacy statement for more details.

Contributions and Feedback

We welcome contributions! Please see the contribution guidelines.

For feature requests or bug reports, please file a GitHub Issue.

For general discussion or questions, please use GitHub Discussions.

Code of Conduct

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

License

This project is licensed under the MIT License.