mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-16 21:00:14 +00:00
### Description There was a bug with gqa on cpu where on token case, with batch_size > 1, and with past_present_share_buffer off, the output would occasionally contain nans. this pr fixes that. it also updates documentation and fixes posid gen for rotary in cuda in prompt case. ### Motivation and Context this pr solves the GQA CPU bug as well as updates the documentation and makes seqlens_k irrelevant for prompt case, which is useful to prevent user error. |
||
|---|---|---|
| .. | ||
| android | ||
| apple | ||
| azure-pipelines | ||
| js | ||
| linux | ||
| pai | ||
| windows | ||
| Doxyfile_csharp.cfg | ||