mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-09 00:30:53 +00:00
[TensorRT-Model-Optimizer](https://github.com/NVIDIA/TensorRT-Model-Optimizer) have a implementation for INT4 AWQ. Adding the support in onnxruntime tools to quantized the models with TensorRT-Model-Optimizer |
||
|---|---|---|
| .. | ||
| CalTableFlatBuffers | ||
| execution_providers | ||
| fusions | ||
| operators | ||
| __init__.py | ||
| base_quantizer.py | ||
| calibrate.py | ||
| matmul_4bits_quantizer.py | ||
| matmul_bnb4_quantizer.py | ||
| onnx_model.py | ||
| onnx_quantizer.py | ||
| preprocess.py | ||
| qdq_loss_debug.py | ||
| qdq_quantizer.py | ||
| quant_utils.py | ||
| quantize.py | ||
| README.md | ||
| registry.py | ||
| shape_inference.py | ||
| tensor_quant_overrides.py | ||
Quantization Tool
This tool can be used to quantize select ONNX models. Support is based on operators in the model. Please refer to https://onnxruntime.ai/docs/performance/quantization.html for usage details and https://github.com/microsoft/onnxruntime-inference-examples/tree/main/quantization for examples.