onnxruntime/onnxruntime
aciddelgado 4e27841bdb
fix gqa cpu nan bug (#20521)
### Description
There was a bug with gqa on cpu where on token case, with batch_size >
1, and with past_present_share_buffer off, the output would occasionally
contain nans. this pr fixes that. it also updates documentation and
fixes posid gen for rotary in cuda in prompt case.



### Motivation and Context
this pr solves the GQA CPU bug as well as updates the documentation and
makes seqlens_k irrelevant for prompt case, which is useful to prevent
user error.
2024-05-07 15:19:26 -07:00
..
contrib_ops fix gqa cpu nan bug (#20521) 2024-05-07 15:19:26 -07:00
core fix gqa cpu nan bug (#20521) 2024-05-07 15:19:26 -07:00
python Add simplified layernorm fusion for Gemma (#20572) 2024-05-06 20:07:14 -07:00
test fix gqa cpu nan bug (#20521) 2024-05-07 15:19:26 -07:00
tool/etw
wasm [js/web] rewrite backend resolve to allow multiple EPs (#19735) 2024-03-15 11:47:45 -07:00
__init__.py Bump up version in main from 1.18.0 to 1.19.0 (#20489) 2024-04-29 20:21:41 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings