onnxruntime/tools/ci_build/github/azure-pipelines
Tang, Cheng 8f34c8c8ed
Introduce collective ops to ort inference build (#14399)
### Description
Introduce collective ops into onnxruntime inference build, including
1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI
flag
2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL
flag


### Motivation and Context
Enable the collective ops in onnxruntime inference build so we have the
ability to run distributed inference with multiple GPUs.
The original ncclAllReduce ops in training build require quite complex
configurations, which is not suitable for inference case, and it already
broken. so we introduce a new implementation.

---------

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-02-07 13:47:48 -08:00
..
nodejs/templates Delete cpu-esrp-pipeline.yml (#13623) 2022-11-14 19:00:40 -08:00
nuget/templates Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
templates Revert mimalloc from v2.0.9 to v2.0.3 (#14603) 2023-02-07 09:58:25 -08:00
android-x86_64-crosscompile-ci-pipeline.yml Free OrtStatus in ASSERT_ORT_STATUS_OK, make run_android_emulator.py work with newer JDK version (#14369) 2023-01-20 09:27:47 -08:00
anybuild.yml Add "workspace: clean: all" to anybuild build yaml file 2021-10-06 22:49:37 -07:00
binary-size-checks-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
build-perf-test-binaries-pipeline.yml Remove old enable_linux_gpu_tests parameter from template invocation. (#13102) 2022-09-26 16:27:40 -07:00
c-api-noopenmp-packaging-pipelines.yml Update package pipelines to support TRT 8.5 (#13998) 2022-12-16 15:01:50 -08:00
clean-build-docker-image-cache-pipeline.yml Increase timeout for clean-build-docker-image-cache-pipeline. (#12776) 2022-08-29 15:30:35 -07:00
linux-ci-pipeline.yml Use today's cache only (#14120) 2023-01-04 17:48:52 +08:00
linux-cpu-aten-pipeline.yml Add Cache in Linux CPU Aten Pipeline (#14313) 2023-01-17 10:49:29 +08:00
linux-cpu-eager-pipeline.yml Enable ORT in TorchDynamo (#13259) 2022-11-01 11:19:29 -07:00
linux-cpu-minimal-build-ci-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
linux-dnnl-ci-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
linux-gpu-ci-pipeline.yml Add compilation cache for Linux GPU (#13995) 2022-12-16 16:38:12 +08:00
linux-gpu-tensorrt-ci-pipeline.yml [TensorRT EP] support TensorRT 8.5 (#13867) 2022-12-14 13:06:03 -08:00
linux-gpu-tensorrt-daily-perf-pipeline.yml Update package pipelines to support TRT 8.5 (#13998) 2022-12-16 15:01:50 -08:00
linux-gpu-tensorrt-packaging-pipeline.yml [TensorRT EP] support TensorRT 8.5 (#13867) 2022-12-14 13:06:03 -08:00
linux-migraphx-ci-pipeline.yml [ROCm] Update ROCm and MigraphX CI to ROCm5.4 (#14011) 2022-12-22 10:01:05 +08:00
linux-multi-gpu-ci-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
linux-multi-gpu-tensorrt-ci-pipeline.yml
linux-openvino-ci-pipeline.yml Openvino ep 2022.3 v4.3 (#14210) 2023-01-11 16:31:26 -08:00
linux-openvino-nightly-pipeline.yml
mac-ci-pipeline.yml make WITHCACHE as an option in MacOS workflow (#14188) 2023-01-10 10:54:19 +08:00
mac-coreml-ci-pipeline.yml Update macOS build agents to macOS 11 (#9562) 2021-10-27 10:00:04 -07:00
mac-ios-ci-pipeline.yml Add ability to register custom ops by specifying a function name (#14177) 2023-01-12 15:11:34 +10:00
mac-ios-packaging-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
mac-objc-static-analysis-ci-pipeline.yml Objective-C static analysis - use different llvm path to try to find clang-tidy. (#13280) 2022-10-12 10:16:26 -07:00
mac-react-native-ci-pipeline.yml Update CUDA version to 11.6 and refactor python packaging pipeline (#13002) 2022-09-23 00:29:27 -07:00
npm-packaging-pipeline.yml Update CUDA version to 11.6 and refactor python packaging pipeline (#13002) 2022-09-23 00:29:27 -07:00
orttraining-linux-ci-pipeline.yml increase the time limit as more unit tests added (#14327) 2023-01-18 15:51:21 -08:00
orttraining-linux-external-custom-ops.yml Update CUDA version to 11.6 and refactor python packaging pipeline (#13002) 2022-09-23 00:29:27 -07:00
orttraining-linux-gpu-amd-e2e-test-ci-pipeline.yml [ROCm] Fix azcopy issue on ROCm ci pipeline (#13365) 2022-10-20 12:08:57 +08:00
orttraining-linux-gpu-ci-pipeline.yml Update torch to 1.13.1 in CI and packaging pipelines for ort training (#14055) 2023-01-03 20:03:33 -08:00
orttraining-linux-gpu-distributed-e2e-test-pipeline.yml move training CI agent pools to 1ES hosted (#8775) 2021-08-18 18:36:19 -07:00
orttraining-linux-gpu-docker-release-pipeline.yml
orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml Introduce collective ops to ort inference build (#14399) 2023-02-07 13:47:48 -08:00
orttraining-linux-gpu-ortmodule-test-clear-cache-pipeline.yml move training CI agent pools to 1ES hosted (#8775) 2021-08-18 18:36:19 -07:00
orttraining-linux-gpu-training-apis.yml Refactor training build options (#13964) 2023-01-03 13:28:16 -08:00
orttraining-linux-nightly-ortmodule-test-pipeline.yml Fix onnxruntime-CI-nightly-ort-pipeline Failure (#14464) 2023-01-28 16:05:56 +08:00
orttraining-mac-ci-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
orttraining-pai-ci-pipeline.yml Enable ccache for HIP objects (#14465) 2023-01-28 22:34:24 +08:00
orttraining-py-packaging-pipeline-cpu.yml Add support for python 3.10 for onnxruntime-training cuda and cpu (#14100) 2023-02-02 11:32:41 -08:00
orttraining-py-packaging-pipeline-cuda116.yml Create dedicated build for training api (#14136) 2023-01-10 20:58:04 -08:00
orttraining-py-packaging-pipeline-rocm.yml [ROCm] Add ROCm5.4 to python package pipeline (#14012) 2022-12-22 10:01:40 +08:00
post-merge-jobs.yml Android package custom build script update (#14403) 2023-01-25 09:19:05 -08:00
py-package-build-pipeline.yml Fix OLive build pipeline (#13114) 2022-09-27 10:19:58 -07:00
py-package-test-pipeline.yml Update CUDA version to 11.6 and refactor python packaging pipeline (#13002) 2022-09-23 00:29:27 -07:00
py-packaging-pipeline.yml CloudEP (#13855) 2023-01-03 10:03:15 -08:00
python-checks-ci-pipeline.yml
sign_ov_ep_binaries.yml Move build machines with Nvidia M60 GPUs to Nvidia T4 (#13170) 2022-10-25 11:21:13 -07:00
snpe-ep-nuget-packaging-pipeline.yml Add yml file for Snpe EP build (#13494) 2022-10-28 19:47:50 -07:00
web-ci-pipeline.yml [js/wasm] Add WebAssembly static library build into web CI pipeline (#10959) 2022-03-21 15:49:49 -07:00
web-packaging-pipeline.yml [js] release pipeline for web and react native (#10656) 2022-03-01 21:38:33 -08:00
win-ci-fuzz-testing.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
win-ci-pipeline.yml Remove intermedia obj files once build finished (#14361) 2023-01-20 13:37:15 +08:00
win-eager-ci-pipeline.yml add ortmodule and eager mode test (#9888) 2021-12-02 19:49:18 -08:00
win-gpu-ci-pipeline.yml Enable cache for msbuild (#14085) 2023-01-06 11:19:57 +08:00
win-gpu-reduce-op-ci-pipeline.yml Improve dependency management (#13523) 2022-12-01 09:51:59 -08:00
win-gpu-tensorrt-ci-pipeline.yml [TensorRT EP] support TensorRT 8.5 (#13867) 2022-12-14 13:06:03 -08:00