mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-29 23:06:41 +00:00
- Implement MLAS function for quantized 4-bit int Gemm (Gemm with float A and quantized 4-bit int B) for ARM NEON. This is an initial implementation. Only the M=1 path (with M being number of rows of A and C) has any optimization attempted so far. More optimization to come in future PRs. - Connect MatMulNBits contrib op to MLAS function. |
||
|---|---|---|
| .. | ||
| contrib_ops | ||
| core | ||
| python | ||
| test | ||
| tool/etw | ||
| wasm | ||
| __init__.py | ||
| ReformatSource.ps1 | ||
| ReformatSourcePython.bat | ||
| VSCodeCoverage.runsettings | ||