onnxruntime/tools
Wang, Mengni fe463d4957
Support SmoothQuant for ORT static quantization (#16288)
### Description

Support SmoothQuant for ORT static quantization via intel neural
compressor

> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>
2023-07-26 18:56:45 -07:00
..
android_custom_build [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
ci_build Support SmoothQuant for ORT static quantization (#16288) 2023-07-26 18:56:45 -07:00
doc Disable PERF* rules in ruff to allow better readability (#16834) 2023-07-25 15:38:22 -07:00
nuget Disable PERF* rules in ruff to allow better readability (#16834) 2023-07-25 15:38:22 -07:00
perf_view fix json format (#11046) 2022-03-30 16:15:33 -07:00
python Disable PERF* rules in ruff to allow better readability (#16834) 2023-07-25 15:38:22 -07:00