onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

History

Wang, Mengni fe463d4957 Support SmoothQuant for ORT static quantization (#16288 ) ### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com>		2023-07-26 18:56:45 -07:00
..
android_custom_build	[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789 )	2023-07-21 12:53:41 -07:00
ci_build	Support SmoothQuant for ORT static quantization (#16288 )	2023-07-26 18:56:45 -07:00
doc	Disable PERF* rules in ruff to allow better readability (#16834 )	2023-07-25 15:38:22 -07:00
nuget	Disable PERF* rules in ruff to allow better readability (#16834 )	2023-07-25 15:38:22 -07:00
perf_view	fix json format (#11046 )	2022-03-30 16:15:33 -07:00
python	Disable PERF* rules in ruff to allow better readability (#16834 )	2023-07-25 15:38:22 -07:00