onnxruntime/tools/ci_build
Ye Wang f35dd1407f
custom allreduce cuda kernel (#20703)
### Description
<!-- Describe your changes. -->

Conditionally route to custom AllReduce kernel when buffer size and gpu
numbers meet certain requirements. Otherwise, keep using NCCL's
AllReduce.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ye Wang <wangye@microsoft.com@h100vm-ort.kxelwkzfzxguje5bxvwxxs135a.gvxx.internal.cloudapp.net>
Co-authored-by: Your Name <you@example.com>
2024-06-13 11:09:49 -07:00
..
github custom allreduce cuda kernel (#20703) 2024-06-13 11:09:49 -07:00
__init__.py
amd_hipify.py [ROCm] Add SkipGroupNorm for ROCm EP (#19303) 2024-02-21 11:08:48 +08:00
build.py Add "-allow-unsupported-compiler" flags to Windows CUDA flags (#21004) 2024-06-12 14:23:00 -07:00
clean_docker_image_cache.py Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
compile_triton.py
coverage.py
gen_def.py
get_docker_image.py Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
logger.py
op_registration_utils.py Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
op_registration_validator.py Bump ruff to 0.3.2 and black to 24 (#19878) 2024-03-13 10:00:32 -07:00
patch_manylinux.py
policheck_exclusions.xml
reduce_op_kernels.py
replace_urls_in_deps.py
requirements-transformers-test.txt test: refactor flash_attn tests to use parameterized (#20913) 2024-06-11 15:57:20 -07:00
set-trigger-rules.py Add VP test in Stable diffusion pipeline (#19300) 2024-01-29 09:33:58 -08:00
update_tsaoptions.py
upload_python_package_to_azure_storage.py