onnxruntime/include
Arne H Juul 493159b481
near-zero negative values must convert to 0 not NAN (#18473)
for the Float8 types with unsigned zero, we must clear the sign bit when
rounding to zero;
otherwise we end up with 0x80 which is the encoding for NAN.

### Description
Handle all zero and near-zero values the same way, rounding to positive
zero.
Note that I removed one "if" level but did not re-indent the code in
this PR, to make it
easier to see what the actual changes are.

### Motivation and Context
For the two new 8-bit floating point types Float8E4M3FNUZ and
Float8E5M2FNUZ,
converting from a near-zero negative value would end up with the sign
bit set only;
this bit pattern is not negative zero but instead means NAN.
2024-09-06 11:41:48 -07:00
..
onnxruntime/core near-zero negative values must convert to 0 not NAN (#18473) 2024-09-06 11:41:48 -07:00