Add overflow protection for quantization bias to reduce quantization precision loss (#21645)

### Description
<!-- Describe your changes. -->

When the scale of the bias is too small, the quantized bias may exceed
the range of `int32`, leading to significant loss of precision.
Therefore, before converting quantized bias to `int32`, it needs to be
clipped within the range of `int32` to reduce the loss of quantization
precision.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the issue https://github.com/microsoft/onnxruntime/issues/21000
This commit is contained in:
duanshengliu 2024-08-29 05:29:17 +08:00 committed by GitHub
parent 3bfb5e4f62
commit 7df8776322
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -230,7 +230,9 @@ class BaseQuantizer:
# TODO: This formula should be explained including why the scale is not estimated for the bias as well.
bias_scale = input_scale * weight_scale * beta
quantized_data = (np.asarray(bias_data) / bias_scale).round().astype(np.int32)
quantized_data = (np.asarray(bias_data) / bias_scale).round()
quantized_data = np.clip(quantized_data, np.iinfo(np.int32).min, np.iinfo(np.int32).max)
quantized_data = quantized_data.astype(np.int32)
# update bias initializer
bias_np_data = np.asarray(quantized_data, dtype=np.int32).reshape(bias_initializer.dims)