mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-13 01:09:22 +00:00
### Description * Pass topk_scores to beam scorer in slow topk path. * Add an env variable `ORT_BEAM_SEARCH_USE_FAST_TOPK` to enable/disable fast topk. * Add a test case for slow topk path. ### Motivation and Context This bug was introduced in https://github.com/microsoft/onnxruntime/pull/16272 Beam search uses fast cuda kernel when number of beams <= 32. When beam size is larger than that threshold, we use another code path (slower cuda kernel) to get topk. In such `slow topk path`, topk_scores shall be passed to beam scorer but it is not. This bug will cause incorrect result when num_beams > 32. It was not found previously since such large beam size is rarely used. |
||
|---|---|---|
| .. | ||
| contrib_ops | ||
| core | ||
| lora | ||
| python | ||
| test | ||
| tool/etw | ||
| wasm | ||
| __init__.py | ||
| ReformatSource.ps1 | ||
| ReformatSourcePython.bat | ||
| VSCodeCoverage.runsettings | ||