mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-14 20:48:00 +00:00
### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> |
||
|---|---|---|
| .. | ||
| inference | ||
| scripts | ||
| Dockerfile.arm_yocto | ||
| Dockerfile.manylinux2014_aten_cpu | ||
| Dockerfile.manylinux2014_cpu | ||
| Dockerfile.manylinux2014_cuda11 | ||
| Dockerfile.manylinux2014_cuda11_6_tensorrt8_4 | ||
| Dockerfile.manylinux2014_cuda11_6_tensorrt8_5 | ||
| Dockerfile.manylinux2014_cuda11_8_tensorrt8_6 | ||
| Dockerfile.manylinux2014_eager_cpu | ||
| Dockerfile.manylinux2014_lort_cpu | ||
| Dockerfile.manylinux2014_rocm | ||
| Dockerfile.manylinux2014_training_cuda11_8 | ||
| Dockerfile.ubuntu_cuda11_6_tensorrt8_4 | ||
| Dockerfile.ubuntu_cuda11_8_tensorrt8_5 | ||
| Dockerfile.ubuntu_cuda11_8_tensorrt8_6 | ||
| Dockerfile.ubuntu_for_arm | ||
| Dockerfile.ubuntu_gpu_training | ||
| Dockerfile.ubuntu_openvino | ||
| Dockerfile.ubuntu_tensorrt | ||
| Dockerfile.ubuntu_tensorrt_bin | ||
| Dockerfile_manylinux2014_openvino_multipython | ||
| manylinux-entrypoint | ||
| manylinux.patch | ||
| migraphx-ci-pipeline-env.Dockerfile | ||