onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

History

Tang, Cheng 8f34c8c8ed Introduce collective ops to ort inference build (#14399 ) ### Description Introduce collective ops into onnxruntime inference build, including 1) AllReduce and AllGather schema in contrib op, controlled by USE_MPI flag 2) AllReduce and AllGather kernel in cuda EP, controlled by ORT_USE_NCCL flag ### Motivation and Context Enable the collective ops in onnxruntime inference build so we have the ability to run distributed inference with multiple GPUs. The original ncclAllReduce ops in training build require quite complex configurations, which is not suitable for inference case, and it already broken. so we introduce a new implementation. --------- Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>		2023-02-07 13:47:48 -08:00
..
android_custom_build	Android package custom build script update (#14403 )	2023-01-25 09:19:05 -08:00
ci_build	Introduce collective ops to ort inference build (#14399 )	2023-02-07 13:47:48 -08:00
doc
nuget	[DML EP] Upgrade DML to 1.10.1 (#14433 )	2023-01-25 21:07:10 -08:00
perf_view
python	Tool to Convert ONNX Model to TFEvents (#14160 )	2023-01-28 15:09:15 +08:00