onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-03 03:58:54 +00:00

History

Tianlei Wu 8818a99c93 Set proper nvcc threads to avoid OOM (#17419 ) ### Description There are 8 cu files under [flash attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention) and 4 cu files under [cutlass fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha) need a lot of memory to compile. Previously, the default value is same as parallel - number of CPU cores. Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16 nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job). Each thread might take 4 GB on average (peak is around 6GB, but threads are not started at same time). OOM happens since 16 threads might need close to 64 GB in worst case. When build machine has 64GB or larger memory, OOM is rare. Here we set a proper nvcc --threads based on available memory to avoid OOM. ### Motivation and Context Fix `Python Packaging Pipeline (Training Cuda 11.8)`		2023-09-05 10:59:27 -07:00
..
android_custom_build	Add lsb-release package to android custom build (#16944 )	2023-08-01 11:27:29 -07:00
ci_build	Set proper nvcc threads to avoid OOM (#17419 )	2023-09-05 10:59:27 -07:00
doc	Disable PERF* rules in ruff to allow better readability (#16834 )	2023-07-25 15:38:22 -07:00
nuget	Build nuget pkg for ROCm (#16791 )	2023-08-28 13:35:08 +08:00
perf_view
python	Various test infra updates from testing Azure ops with MAUI test app (#17262 )	2023-08-27 09:35:00 +10:00