mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-16 21:00:14 +00:00
### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> |
||
|---|---|---|
| .. | ||
| install_centos.sh | ||
| install_deps.sh | ||
| install_deps_aten.sh | ||
| install_deps_eager.sh | ||
| install_deps_lort.sh | ||
| install_shared_deps.sh | ||
| install_ubuntuos.sh | ||
| requirements.txt | ||