onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-13 01:09:22 +00:00

History

Tianlei Wu 09e5724f3b [CUDA] Fix beam search of num_beams > 32 (#23599 ) ### Description * Pass topk_scores to beam scorer in slow topk path. * Add an env variable `ORT_BEAM_SEARCH_USE_FAST_TOPK` to enable/disable fast topk. * Add a test case for slow topk path. ### Motivation and Context This bug was introduced in https://github.com/microsoft/onnxruntime/pull/16272 Beam search uses fast cuda kernel when number of beams <= 32. When beam size is larger than that threshold, we use another code path (slower cuda kernel) to get topk. In such `slow topk path`, topk_scores shall be passed to beam scorer but it is not. This bug will cause incorrect result when num_beams > 32. It was not found previously since such large beam size is rarely used.		2025-02-06 16:50:31 -08:00
..
contrib_ops	[CUDA] Fix beam search of num_beams > 32 (#23599 )	2025-02-06 16:50:31 -08:00
core	OpenVINO EP Weights Sharing Feature (#23553 )	2025-02-06 14:57:38 -08:00
lora
python	[TensorRT EP] support TensorRT 10.8-GA (#23592 )	2025-02-06 10:05:57 -08:00
test	[CUDA] Fix beam search of num_beams > 32 (#23599 )	2025-02-06 16:50:31 -08:00
tool/etw
wasm	[WebNN] Fixed WebNN Module undefined issue (#22795 )	2024-11-11 21:31:24 -08:00
__init__.py	Use ruff as the formatter to replace black-isort (#23397 )	2025-01-16 11:14:15 -08:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings