onnxruntime/tools
Tang, Cheng 8f34c8c8ed
Introduce collective ops to ort inference build (#14399)
### Description
Introduce collective ops into onnxruntime inference build, including
1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI
flag
2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL
flag


### Motivation and Context
Enable the collective ops in onnxruntime inference build so we have the
ability to run distributed inference with multiple GPUs.
The original ncclAllReduce ops in training build require quite complex
configurations, which is not suitable for inference case, and it already
broken. so we introduce a new implementation.

---------

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-02-07 13:47:48 -08:00
..
android_custom_build Android package custom build script update (#14403) 2023-01-25 09:19:05 -08:00
ci_build Introduce collective ops to ort inference build (#14399) 2023-02-07 13:47:48 -08:00
doc
nuget [DML EP] Upgrade DML to 1.10.1 (#14433) 2023-01-25 21:07:10 -08:00
perf_view
python Tool to Convert ONNX Model to TFEvents (#14160) 2023-01-28 15:09:15 +08:00