onnxruntime/tools
aciddelgado 4e27841bdb
fix gqa cpu nan bug (#20521)
### Description
There was a bug with gqa on cpu where on token case, with batch_size >
1, and with past_present_share_buffer off, the output would occasionally
contain nans. this pr fixes that. it also updates documentation and
fixes posid gen for rotary in cuda in prompt case.



### Motivation and Context
this pr solves the GQA CPU bug as well as updates the documentation and
makes seqlens_k irrelevant for prompt case, which is useful to prevent
user error.
2024-05-07 15:19:26 -07:00
..
android_custom_build Update NDK version to 26.1.10909125 (#18493) 2023-11-17 14:14:01 -08:00
ci_build fix gqa cpu nan bug (#20521) 2024-05-07 15:19:26 -07:00
doc Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
nuget Qnn nuget update (#20527) 2024-04-30 22:12:53 -07:00
perf_view fixed #16873 (#16932) 2023-09-26 09:57:01 -07:00
python Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
scripts Fix a build issue: /MP was not enabled correctly (#19190) 2024-01-29 12:45:38 -08:00