mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-22 02:30:26 +00:00
### Description * Pass topk_scores to beam scorer in slow topk path. * Add an env variable `ORT_BEAM_SEARCH_USE_FAST_TOPK` to enable/disable fast topk. * Add a test case for slow topk path. ### Motivation and Context This bug was introduced in https://github.com/microsoft/onnxruntime/pull/16272 Beam search uses fast cuda kernel when number of beams <= 32. When beam size is larger than that threshold, we use another code path (slower cuda kernel) to get topk. In such `slow topk path`, topk_scores shall be passed to beam scorer but it is not. This bug will cause incorrect result when num_beams > 32. It was not found previously since such large beam size is rarely used. |
||
|---|---|---|
| .. | ||
| cpu | ||
| cuda | ||
| js | ||
| rocm | ||
| webgpu | ||