onnxruntime/tools
Ye Wang f35dd1407f
custom allreduce cuda kernel (#20703)
### Description
<!-- Describe your changes. -->

Conditionally route to custom AllReduce kernel when buffer size and gpu
numbers meet certain requirements. Otherwise, keep using NCCL's
AllReduce.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ye Wang <wangye@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>
Co-authored-by: Your Name <you@example.com>
2024-06-13 11:09:49 -07:00
..
android_custom_build Remove deprecated "mobile" packages (#20941) 2024-06-07 16:20:32 -05:00
ci_build custom allreduce cuda kernel (#20703) 2024-06-13 11:09:49 -07:00
doc Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
nuget Qnn nuget update (#20527) 2024-04-30 22:12:53 -07:00
perf_view
python [tools] update pipeline list for run_CIs_for_external_pr.py (#20776) 2024-05-23 10:38:42 -07:00
scripts