mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-15 20:50:42 +00:00
### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> |
||
|---|---|---|
| .. | ||
| manylinux | ||
| training | ||
| install-protobuf.sh | ||
| install_ninja.sh | ||
| install_openmpi.sh | ||
| install_os_deps.sh | ||
| install_protobuf.sh | ||
| install_python_deps.sh | ||
| install_rust.sh | ||
| install_ubuntu.sh | ||
| requirements.txt | ||