onnxruntime/tools/ci_build/github/azure-pipelines
Ye Wang f35dd1407f
custom allreduce cuda kernel (#20703)
### Description
<!-- Describe your changes. -->

Conditionally route to custom AllReduce kernel when buffer size and gpu
numbers meet certain requirements. Otherwise, keep using NCCL's
AllReduce.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ye Wang <wangye@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>
Co-authored-by: Your Name <you@example.com>
2024-06-13 11:09:49 -07:00
..
nodejs/templates
nuget/templates Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925) 2024-06-11 09:37:16 -07:00
stages Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 (#21023) 2024-06-12 22:04:40 -07:00
templates Component Governance Fix round 6 (#21021) 2024-06-13 09:10:51 -07:00
triggers
android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml
android-x86_64-crosscompile-ci-pipeline.yml
bigmodels-ci-pipeline.yml
binary-size-checks-pipeline.yml
build-perf-test-binaries-pipeline.yml
c-api-noopenmp-packaging-pipelines.yml Update c-api-noopenmp-packaging-pipelines.yml: remove CUDA version parameter (#20955) 2024-06-07 11:19:59 -07:00
clean-build-docker-image-cache-pipeline.yml
cuda-packaging-pipeline.yml
linux-ci-pipeline.yml
linux-cpu-aten-pipeline.yml
linux-cpu-eager-pipeline.yml
linux-cpu-minimal-build-ci-pipeline.yml
linux-dnnl-ci-pipeline.yml
linux-gpu-ci-pipeline.yml Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925) 2024-06-11 09:37:16 -07:00
linux-gpu-tensorrt-ci-pipeline.yml Updating cudnn from 8 to 9 on exsiting cuda 12 docker image (#20925) 2024-06-11 09:37:16 -07:00
linux-gpu-tensorrt-daily-perf-pipeline.yml
linux-migraphx-ci-pipeline.yml
linux-openvino-ci-pipeline.yml
linux-qnn-ci-pipeline.yml
mac-ci-pipeline.yml
mac-coreml-ci-pipeline.yml
mac-ios-ci-pipeline.yml
mac-ios-packaging-pipeline.yml
mac-react-native-ci-pipeline.yml
npm-packaging-pipeline.yml
nuget-cuda-publishing-pipeline.yml
orttraining-linux-ci-pipeline.yml
orttraining-linux-gpu-ci-pipeline.yml
orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml custom allreduce cuda kernel (#20703) 2024-06-13 11:09:49 -07:00
orttraining-linux-nightly-ortmodule-test-pipeline.yml
orttraining-mac-ci-pipeline.yml
orttraining-pai-ci-pipeline.yml
orttraining-py-packaging-pipeline-cpu.yml
orttraining-py-packaging-pipeline-cuda.yml
orttraining-py-packaging-pipeline-cuda12.yml
orttraining-py-packaging-pipeline-rocm.yml
post-merge-jobs.yml Remove deprecated "mobile" packages (#20941) 2024-06-07 16:20:32 -05:00
publish-nuget.yml
py-cuda-package-test-pipeline.yml
py-cuda-packaging-pipeline.yml
py-cuda-publishing-pipeline.yml
py-package-build-pipeline.yml
py-package-test-pipeline.yml
py-packaging-pipeline.yml Publish debug symbols for Windows python packages (#20973) 2024-06-10 12:33:49 -07:00
qnn-ep-nuget-packaging-pipeline.yml
web-ci-pipeline.yml
win-ci-fuzz-testing.yml
win-ci-pipeline.yml
win-gpu-ci-pipeline.yml
win-gpu-reduce-op-ci-pipeline.yml Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 (#21023) 2024-06-12 22:04:40 -07:00
win-gpu-tensorrt-ci-pipeline.yml Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 (#21023) 2024-06-12 22:04:40 -07:00
win-qnn-arm64-ci-pipeline.yml
win-qnn-ci-pipeline.yml