Add overflow protection for quantization bias to reduce quantization precision loss (#21645)

### Description  When the scale of the bias is too small, the quantized bias may exceed the range of `int32`, leading to significant loss of precision. Therefore, before converting quantized bias to `int32`, it needs to be clipped within the range of `int32` to reduce the loss of quantization precision. ### Motivation and Context  Fix the issue https://github.com/microsoft/onnxruntime/issues/21000
2026-05-14 20:48:00 +00:00 · 2024-08-29 05:29:17 +08:00 · 2024-08-29 05:29:17 +08:00 · 7df8776322
commit 7df8776322
parent 3bfb5e4f62
1 changed files with 3 additions and 1 deletions
--- a/onnxruntime/python/tools/quantization/base_quantizer.py
+++ b/onnxruntime/python/tools/quantization/base_quantizer.py
@ -230,7 +230,9 @@ class BaseQuantizer:
            # TODO: This formula should be explained including why the scale is not estimated for the bias as well.
            bias_scale = input_scale * weight_scale * beta

-            quantized_data = (np.asarray(bias_data) / bias_scale).round().astype(np.int32)
+            quantized_data = (np.asarray(bias_data) / bias_scale).round()
+            quantized_data = np.clip(quantized_data, np.iinfo(np.int32).min, np.iinfo(np.int32).max)
+            quantized_data = quantized_data.astype(np.int32)

            # update bias initializer
            bias_np_data = np.asarray(quantized_data, dtype=np.int32).reshape(bias_initializer.dims)