onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

History

Wang, Mengni fe463d4957 Support SmoothQuant for ORT static quantization (#16288 ) ### Description Support SmoothQuant for ORT static quantization via intel neural compressor > Note: Please use neural-compressor==2.2 to try SmoothQuant function. ### Motivation and Context For large language models (LLMs) with gigantic parameters, the systematic outliers make quantification of activations difficult. As a training free post-training quantization (PTQ) solution, SmoothQuant offline migrates this difficulty from activations to weights with a mathematically equivalent transformation. Integrating SmoothQuant into ORT quantization can benefit the accuracy of INT8 LLMs. --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com>		2023-07-26 18:56:45 -07:00
..
install_centos.sh	Remove the dependency on CentOS EPEL (#13567 )	2022-11-06 21:28:16 -08:00
install_deps.sh	Update python package pipeline to support 3.11 (#15311 )	2023-04-04 10:55:32 -07:00
install_deps_aten.sh	Add support for cuda 11.8 and python 3.11 for training (#15548 )	2023-04-20 12:56:45 -07:00
install_deps_eager.sh	Update python package pipeline to support 3.11 (#15311 )	2023-04-04 10:55:32 -07:00
install_deps_lort.sh	[DORT] Use new FX-to-ONNX exporter (#16450 )	2023-07-04 13:13:04 -07:00
install_shared_deps.sh	install shared deps script (#14234 )	2023-01-16 18:27:29 +08:00
install_ubuntuos.sh	Update CUDA version to 11.6 and refactor python packaging pipeline (#13002 )	2022-09-23 00:29:27 -07:00
requirements.txt	Support SmoothQuant for ORT static quantization (#16288 )	2023-07-26 18:56:45 -07:00