onnxruntime/onnxruntime
RajalakshmiSR 5d8c5409ab
POWER10: QGEMM optimization (#10642)
* POWER10: QGEMM optimization

This patch makes use of POWER10 MMA feature for QGEMM function.
This optimization includes signed and unsigned cases.Tested and
there are no new failures with gcc11 and clang-14.

* Changes as per review comments

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2022-03-02 08:36:26 -08:00
..
contrib_ops replace std::numeric_limits<T> by cub::FpLimits<T> (#10703) 2022-02-28 23:11:51 -08:00
core POWER10: QGEMM optimization (#10642) 2022-03-02 08:36:26 -08:00
gsl Change TensorShape to typically not allocate heap memory (#9542) 2021-11-08 10:29:54 -08:00
python Add ability to save calibration augmented models through external data format when model size exceeds 2Gb. (#10695) 2022-03-02 08:35:30 -08:00
test Convert ConvActivationFusion transformer to a selector action transformer. (#10687) 2022-03-02 13:47:55 +10:00
tool/etw
wasm
.style.yapf
__init__.py Fix VLOG?_DEFAULT macros usability. (#10568) 2022-03-01 13:16:26 +10:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings Merged PR 6622174: merge latest onnxruntime into dmldev 2021-10-30 19:59:33 +00:00