Commit graph

7 commits

Author SHA1 Message Date
Changming Sun
b72fe664c1
Refactor prepack buffer code (#16280)
### Description
1. Use IAllocatorUniquePtr to replace BufferUniquePtr. It will ensure
the deleter is always right.
2. Change some std::unique_ptr to std::optional
3. Bypass Arena allocator when allocating the prepack buffers for mlas.
In this special case, Arena doesn't help any. And this change is just an
internal implementation change, it doesn't affect our public interface.
2023-06-08 14:42:02 -07:00
Changming Sun
e63bb5acef
Fix a memory leak in QGemm (#15703)
### Description
The BufferUniquePtrs in the old code doesn't have knowledge of the
allocator where the allocated memory was from, so it cannot free the
memory.
2023-04-26 18:48:00 -07:00
Cheng
3f66297499
code clean (#12392)
* code clean

* mispelling fix
2022-08-01 14:12:35 +08:00
Nick Kreeger
93e1e1dfa1
Drop quant_util.h and move helper function into quantization.h (#8747) 2021-08-16 15:08:25 -05:00
Yufeng Li
ceeb1a65d6
Add quantization support of GEMM directly with QGemm (#8447)
QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision.

Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute.

The formula for quantized GEMM is:
Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which,
C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)
2021-07-27 21:21:49 -07:00
Oliver Rausch
972aee8308
Fix GCC build error in quantization tests (#8449) 2021-07-21 18:15:13 +02:00
Nick Kreeger
963d883de8
Create a common directory for quantization code and functionality. (#8320) 2021-07-14 22:56:58 -05:00