Fuse transpose into MatMul
Implement Pow and constant scalar simplification
Vectorize ReduceMean
Improve symbolic shape inference
Minor updates for better debugging in fused function name
* [Nuphar EP] performance improvements
1. Add new ops: Shape, Expand
2. Add support for steps in Slice
3. Simplify Gather
4. Always inline alias nodes
5. Transpose nodes with inner loop being symbolic falls back to CPU provider when vectorization is not possible
6. Add opt_inproj option to model_editor to extract MatMuls inside Scan for input projection to outside
* save status: add tiling layout; add avx512 skylake cpuid info
* unit tests and matmul integer model passed on skylake, need to verify model
* save commit before update master
* fix check
* address comments
* Added GatherElements to Nuphar
This change added GatherElements (op_ver 11) to the Nuphar provider.
* address CR feedback
* create a utilify function for accessing index safely
* address more CR
* SafeIndex -> ClampIndex
* Added Scatter and ScatterElements to Nuphar
Implemented Scatter (op_ver 9 - 10) and ScatterElements (op_ver 11)
nuphar.
Because TVM's compute is output-oriented, our current implementation
uses extern calls for simplicity.
* fixed build issue after rebase
* remove dead code
* Address CR
* removed dead code
* use GetAttrOrDefault
* Address more CR feedback
* add GetStrides to codegen/common/utils.h
* added a unit test for Bool input data
Remove gsl subodule and replace with a local copy of gsl-lite
Refactor for onnxruntime::make_unique
gsl::span size and index are now size_t
Remove lambda auto argument type detection.
Remove constexpr from fail_fast in gsl due to Linux not being happy.
Comment out std::stream support due to MacOS std lib broken.
Move make_unique into include/core/common so it is accessible for server builds.
Relax requirements for onnxruntime/test/providers/cpu/ml/write_scores_test.cc
due to x86 build.
Add ONNXRUNTIME_ROOT to Server Lib includes so gsl is recognized
* call MLAS's pooling function as an external call for Nuphar
Note that at the moment Nuphar provider doesn't handle the cases below:
- symbolic height/weight dimensions
- Indices output of MaxPool
- non-default dilations
* unify the pool interface for mti and mti_x86
* call MLAS's pooling function as an external call for Nuphar
Note that at the moment Nuphar provider doesn't handle the cases below:
- symbolic height/weight dimensions
- Indices output of MaxPool
- non-default dilations
* unify the pool interface for mti and mti_x86
Enhance proto3 compatibility.
Replace has_*() method to corresponding enum handling so we can deal with
proto3 generated stream from proto2 code.
Add utility wrappers for remaining has_*() methods so we can
easily deal with them if/when we switch to proto3.
* Implement Nuphar execution provider
Nuphar execution provider is a TVM-based compilation provider. It has shown great speedups for RNN models using Scan.
This PR is mainly for a preview of the shared codegen library for other TVM-based providers.
* Fix submodules
* Fix TVM submodule
* Update Nuphar to latest and resolve confliction
* Remove stale files caused by merge -X theirs
* Revert heap buffer change to not introduce onnxruntime_framework into onnxruntime_perf_test
* Fix bad merge
* Merge from Nuphar
* Fix warning treated as error, revert some unnecessary changes
* Revert some more test changes
* Some more test revert or comments to make review easier
New tests could be added later
* One more revert of unnecessary changes
* More change revert. Test could be added back later.
Description:
This change adds the common part of TVM based codegen library. It includes following parts:
* Microsoft TVM Inventory (MTI): a set of TVM ops for neural networks, similar to TOPI
* Compiler pass for traversing ONNX graph and generate TVM ops
* Compiler pass for traversing generated graph and specify TVM schedule
* Compiler pass for handling weight layout
* Utils for debugging
Motivation and Context:
TVM is an open deep learning compiler stack for cpu, gpu and specialized accelerators. To leverage it in ONNX, we built an execution provider named Nuphar. Currently, Nuphar gets good performance on CPUs with AVX2 on quantized LSTM models.
This codegen library was part of Nuphar execution provider. It is split out for sharing with other execution providers, as we'd like to reuse TVM in more devices.
Applies to all public headers and macros, plus many internal ones. There are still some internal things with OnnxRuntime in the name, but this fixes all public functions & macros.