mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-22 22:01:08 +00:00
### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com>
11 lines
199 B
Text
11 lines
199 B
Text
numpy==1.21.6 ; python_version < '3.11'
|
|
numpy==1.24.2 ; python_version >= '3.11'
|
|
mypy
|
|
pytest
|
|
setuptools>=41.4.0
|
|
wheel
|
|
onnx==1.14.0
|
|
protobuf==3.20.2
|
|
sympy==1.10.1
|
|
flatbuffers
|
|
neural-compressor>=2.2.1
|