mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-16 01:33:39 +00:00
### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> |
||
|---|---|---|
| .. | ||
| github | ||
| __init__.py | ||
| amd_hipify.py | ||
| build.py | ||
| clean_docker_image_cache.py | ||
| compile_triton.py | ||
| coverage.py | ||
| gen_def.py | ||
| get_docker_image.py | ||
| logger.py | ||
| op_registration_utils.py | ||
| op_registration_validator.py | ||
| patch_manylinux.py | ||
| policheck_exclusions.xml | ||
| reduce_op_kernels.py | ||
| replace_urls_in_deps.py | ||
| requirements.txt | ||
| update_tsaoptions.py | ||
| upload_python_package_to_azure_storage.py | ||