onnxruntime/tools/ci_build/github/linux/docker/scripts/manylinux/requirements.txt at d6ce43db5e4f5e2d6a09beeccc90c5fd22b4f009 - saymrwulf/onnxruntime - Forgejo: Beyond coding. We forge.

saymrwulf/onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-22 22:01:08 +00:00

Wang, Mengni fe463d4957

Support SmoothQuant for ORT static quantization (#16288 )

### Description

Support SmoothQuant for ORT static quantization via intel neural
compressor

> Note:
Please use neural-compressor==2.2 to try SmoothQuant function.

### Motivation and Context
For large language models (LLMs) with gigantic parameters, the
systematic outliers make quantification of activations difficult. As a
training free post-training quantization (PTQ) solution, SmoothQuant
offline migrates this difficulty from activations to weights with a
mathematically equivalent transformation. Integrating SmoothQuant into
ORT quantization can benefit the accuracy of INT8 LLMs.

---------

Signed-off-by: Mengni Wang <mengni.wang@intel.com>

2023-07-26 18:56:45 -07:00

11 lines

199 B

Text

Raw Blame History

 numpy==1.21.6 ; python_version < '3.11'
 numpy==1.24.2 ; python_version >= '3.11'
 mypy
 pytest
 setuptools>=41.4.0
 wheel
 onnx==1.14.0
 protobuf==3.20.2
 sympy==1.10.1
 flatbuffers
 neural-compressor>=2.2.1