onnxruntime/tools
Tianlei Wu 8818a99c93
Set proper nvcc threads to avoid OOM (#17419)
### Description

There are 8 cu files under [flash
attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention)
and 4 cu files under [cutlass
fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha)
need a lot of memory to compile.

Previously, the default value is same as parallel - number of CPU cores.
Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16
nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job).
Each thread might take 4 GB on average (peak is around 6GB, but threads
are not started at same time). OOM happens since 16 threads might need
close to 64 GB in worst case. When build machine has 64GB or larger
memory, OOM is rare.

Here we set a proper nvcc --threads based on available memory to avoid
OOM.

### Motivation and Context
Fix `Python Packaging Pipeline (Training Cuda 11.8)`
2023-09-05 10:59:27 -07:00
..
android_custom_build Add lsb-release package to android custom build (#16944) 2023-08-01 11:27:29 -07:00
ci_build Set proper nvcc threads to avoid OOM (#17419) 2023-09-05 10:59:27 -07:00
doc Disable PERF* rules in ruff to allow better readability (#16834) 2023-07-25 15:38:22 -07:00
nuget Build nuget pkg for ROCm (#16791) 2023-08-28 13:35:08 +08:00
perf_view
python Various test infra updates from testing Azure ops with MAUI test app (#17262) 2023-08-27 09:35:00 +10:00