onnxruntime/onnxruntime/python/tools/quantization
anujj 23d48ea647
Add TensorRT-Model-Optimizer INT4 AWQ support in onnxruntime tools (#22390)
[TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer)
have a implementation for INT4 AWQ. Adding the support in onnxruntime
tools to quantized the models with TensorRT-Model-Optimizer
2024-10-11 13:31:54 -07:00
..
CalTableFlatBuffers
execution_providers Add changes for strided calibration (#20949) 2024-06-21 08:23:23 -07:00
fusions [QNN Quantization] Ensure fused nodes have names (#19650) 2024-02-27 02:27:35 -08:00
operators [QNN EP] Add support for GatherElements (#15966) 2024-08-19 14:33:40 -07:00
__init__.py
base_quantizer.py Add overflow protection for quantization bias to reduce quantization precision loss (#21645) 2024-08-28 14:29:17 -07:00
calibrate.py Fix conversion of TensorData, TensorsData to json (#22166) 2024-10-06 19:13:03 -07:00
matmul_4bits_quantizer.py Add TensorRT-Model-Optimizer INT4 AWQ support in onnxruntime tools (#22390) 2024-10-11 13:31:54 -07:00
matmul_bnb4_quantizer.py Fix argparser in matmul_bnb4_quantizer (#19812) 2024-03-07 11:31:34 -08:00
onnx_model.py [QDQ Quant] Support mixed-precision integer quantization via overrides (#19925) 2024-03-23 11:05:08 -07:00
onnx_quantizer.py Fix missing argument when calling _get_quantize_input_nodes (#20245) 2024-04-25 00:46:48 +02:00
preprocess.py
qdq_loss_debug.py
qdq_quantizer.py [QNN Quant tool] Fix validation of per-channel overrides for models with external data (#21656) 2024-08-09 14:46:52 -07:00
quant_utils.py Fix conversion of TensorData, TensorsData to json (#22166) 2024-10-06 19:13:03 -07:00
quantize.py Added a tool to quantize Gather to GatherBlockQuantized (#21697) 2024-08-19 10:25:36 -07:00
README.md
registry.py [QNN EP] Add support for GatherElements (#15966) 2024-08-19 14:33:40 -07:00
shape_inference.py Update api backward compatibility (#20136) 2024-04-01 21:37:56 -07:00
tensor_quant_overrides.py [QNN Quant tool] Fix validation of per-channel overrides for models with external data (#21656) 2024-08-09 14:46:52 -07:00

Quantization Tool

This tool can be used to quantize select ONNX models. Support is based on operators in the model. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for usage details and https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization for examples.