mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-14 20:48:00 +00:00
Add overflow protection for quantization bias to reduce quantization precision loss (#21645)
### Description <!-- Describe your changes. --> When the scale of the bias is too small, the quantized bias may exceed the range of `int32`, leading to significant loss of precision. Therefore, before converting quantized bias to `int32`, it needs to be clipped within the range of `int32` to reduce the loss of quantization precision. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix the issue https://github.com/microsoft/onnxruntime/issues/21000
This commit is contained in:
parent
3bfb5e4f62
commit
7df8776322
1 changed files with 3 additions and 1 deletions
|
|
@ -230,7 +230,9 @@ class BaseQuantizer:
|
|||
# TODO: This formula should be explained including why the scale is not estimated for the bias as well.
|
||||
bias_scale = input_scale * weight_scale * beta
|
||||
|
||||
quantized_data = (np.asarray(bias_data) / bias_scale).round().astype(np.int32)
|
||||
quantized_data = (np.asarray(bias_data) / bias_scale).round()
|
||||
quantized_data = np.clip(quantized_data, np.iinfo(np.int32).min, np.iinfo(np.int32).max)
|
||||
quantized_data = quantized_data.astype(np.int32)
|
||||
|
||||
# update bias initializer
|
||||
bias_np_data = np.asarray(quantized_data, dtype=np.int32).reshape(bias_initializer.dims)
|
||||
|
|
|
|||
Loading…
Reference in a new issue