onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

History

Edward Chen 0a4d76d98b MLAS AArch64 quantized int4 Gemm kernel (#18031 ) - Implement MLAS function for quantized 4-bit int Gemm (Gemm with float A and quantized 4-bit int B) for ARM NEON. This is an initial implementation. Only the M=1 path (with M being number of rows of A and C) has any optimization attempted so far. More optimization to come in future PRs. - Connect MatMulNBits contrib op to MLAS function.	2023-11-15 09:31:54 -08:00
..
onnxruntime/core	MLAS AArch64 quantized int4 Gemm kernel (#18031 )	2023-11-15 09:31:54 -08:00

MLAS AArch64 quantized int4 Gemm kernel (#18031 )

- Implement MLAS function for quantized 4-bit int Gemm (Gemm with float A and quantized 4-bit int B) for ARM NEON. This is an initial implementation. Only the M=1 path (with M being number of rows of A and C) has any optimization attempted so far. More optimization to come in future PRs.

- Connect MatMulNBits contrib op to MLAS function.

2023-11-15 09:31:54 -08:00

onnxruntime/core

MLAS AArch64 quantized int4 Gemm kernel (#18031 )

2023-11-15 09:31:54 -08:00