onnxruntime/onnxruntime
Edward Chen 0a4d76d98b
MLAS AArch64 quantized int4 Gemm kernel (#18031)
- Implement MLAS function for quantized 4-bit int Gemm (Gemm with float A and quantized 4-bit int B) for ARM NEON. This is an initial implementation. Only the M=1 path (with M being number of rows of A and C) has any optimization attempted so far. More optimization to come in future PRs.

- Connect MatMulNBits contrib op to MLAS function.
2023-11-15 09:31:54 -08:00
..
contrib_ops MLAS AArch64 quantized int4 Gemm kernel (#18031) 2023-11-15 09:31:54 -08:00
core MLAS AArch64 quantized int4 Gemm kernel (#18031) 2023-11-15 09:31:54 -08:00
python SDXL demo: consistent opt shape and seed (#18445) 2023-11-14 20:24:32 -08:00
test MLAS AArch64 quantized int4 Gemm kernel (#18031) 2023-11-15 09:31:54 -08:00
tool/etw
wasm [js/web/training] Add CreateTrainingSession (#17891) 2023-10-26 09:22:10 -07:00
__init__.py Python API to check whether collective ops are available or not (#17730) 2023-09-29 14:11:05 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings