* Fix torch cpp ext build when CPU wheel is installed but GPU card is present
Also there is a minor improvement for ATen operator that allows both
"::op" and "aten::op" name for operators
* Fix flake8 false positive
This includes a series of unit test that exercise
the MatMul fusion. This is not an exhaustive list
of tests. The tests focuse on paterns seen in
in models, with additional tests to cover at least
one instance of each operator type that can be part
of the fusion.
Signed-off-by: George Nash <george.nash@intel.com>
* [UPDATE] update amd ci pipeline 2 rocm5.1.1
* [FIX] json format error
* [ERROR] disable unit tests
* [FIX] ucx error
* [FIX] cmake version
* [FIX] units test
* add so_folder option to TVM EP options. add TvmSoEP class and update TVM EP factory
* compilation from so_folder was implemented
* update TVMCompiler for default pipeline and compilation from shared lib
* filter excess so-file in so_folder
* clean Compile method and vm conditions
* implementation of TVMSoCompile on native side instead of python API
* cpplint fixes
* some fixes after review
* more cpplint fixes
* more fixes after review
* align TVMso EP with new API for compilation from #10632
* small fixes for cpplint
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
- Enable pyright and pylint (https://github.com/microsoft/pyright) in CI
- Enable pyright, pylint and bandit by default in VS code
Pylint has some good style checks. pyright is Microsoft's static type checker.
* Share thread pools between devices
* make tests reuse device
* Change cpu thread pool options for dml sessions to use 1 thread with no spinning
* fix test failure
* Update missing type constraints for dft
* Add comment and rename inference session parameter
* default missing causing inconsistent test behavior
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
This patch uses vector instrinsics to optimize MlasQLinearAddKernelHelper
function for POWER processor.
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
**Description**: Extract arg value from torch Value
**Motivation and Context**
Input to gelu is `torch._C.Value` type values. This caused the `if approximate == "none"` check to always fail, preventing the optimized `com.microsoft::Gelu` op from being used.
* initial implementation for support nnapi depthtospace
* modify depthtospace output tensor shape and enable test pass
* minor update
* minor update
* modify input output layout order and hack nnapi instance to use nchw flag for optest
* address pr comments
* add depthtospace to layout logic
* format length and revert UT log level
* add nchw and android feature level check in opsupportchecker
* minor fix
* update
* update
* fix
* minor update
* Add disentangled attention TRT plugin as contrib op
* update plugin name & remove null character
* update onnx-tensorrt submodule with my beta version
* use suggested plugin name & simpler shape propagation
* update onnx-tensorrt gitsubmodule to temporary fork
* update onnx-tensorrt to temporary commit
* redirect submodule back to latest 8.2-GA release of onnx-tensorrt repo
Co-authored-by: HHH-ComputeLab <haohangh@nvidia.com>