onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-28 03:20:58 +00:00

Author	SHA1	Message	Date
Changming Sun	b72fe664c1	Refactor prepack buffer code (#16280 ) ### Description 1. Use IAllocatorUniquePtr to replace BufferUniquePtr. It will ensure the deleter is always right. 2. Change some std::unique_ptr to std::optional 3. Bypass Arena allocator when allocating the prepack buffers for mlas. In this special case, Arena doesn't help any. And this change is just an internal implementation change, it doesn't affect our public interface.	2023-06-08 14:42:02 -07:00
Changming Sun	e63bb5acef	Fix a memory leak in QGemm (#15703 ) ### Description The BufferUniquePtrs in the old code doesn't have knowledge of the allocator where the allocated memory was from, so it cannot free the memory.	2023-04-26 18:48:00 -07:00
Cheng	3f66297499	code clean (#12392 ) * code clean * mispelling fix	2022-08-01 14:12:35 +08:00
Nick Kreeger	93e1e1dfa1	Drop quant_util.h and move helper function into quantization.h (#8747 )	2021-08-16 15:08:25 -05:00
Yufeng Li	ceeb1a65d6	Add quantization support of GEMM directly with QGemm (#8447 ) QGemm takes in quantized A, B, C, and quantization parameters of output Y, in which C and quantization parameters of Y are optional. Its output can be quantized or full precision, which depends on whether quantization parameters of Y exists or not. If quant params of Y are provided, the output will be requantized or is full precision. Comparing with QLinearMatMul and MatMulInteger, QGemm supports transpose, apha and beta attribute. The formula for quantized GEMM is: Y = alpha * scale_a * scale_b * ((A_int8 - zp_a) * (B_int8 - zp_b) + C_int32), in which, C_int32 is quantized with formula: C_int32 = (beta * C) / (alpha * scale_a * scale_b)	2021-07-27 21:21:49 -07:00
Oliver Rausch	972aee8308	Fix GCC build error in quantization tests (#8449 )	2021-07-21 18:15:13 +02:00
Nick Kreeger	963d883de8	Create a common directory for quantization code and functionality. (#8320 )	2021-07-14 22:56:58 -05:00

7 commits