onnxruntime/tools/ci_build
Tianlei Wu 8818a99c93
Set proper nvcc threads to avoid OOM (#17419)
### Description

There are 8 cu files under [flash
attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention)
and 4 cu files under [cutlass
fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha)
need a lot of memory to compile.

Previously, the default value is same as parallel - number of CPU cores.
Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16
nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job).
Each thread might take 4 GB on average (peak is around 6GB, but threads
are not started at same time). OOM happens since 16 threads might need
close to 64 GB in worst case. When build machine has 64GB or larger
memory, OOM is rare.

Here we set a proper nvcc --threads based on available memory to avoid
OOM.

### Motivation and Context
Fix `Python Packaging Pipeline (Training Cuda 11.8)`
2023-09-05 10:59:27 -07:00
..
github Flash Attention v2 MHA (#17227) 2023-08-31 13:52:21 -07:00
__init__.py
amd_hipify.py
build.py Set proper nvcc threads to avoid OOM (#17419) 2023-09-05 10:59:27 -07:00
clean_docker_image_cache.py
compile_triton.py [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
coverage.py
gen_def.py Basic CSharp packaging support for ROCm EP (#15535) 2023-05-16 07:27:38 +08:00
get_docker_image.py [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
logger.py
op_registration_utils.py [CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530) 2023-07-07 18:21:06 +02:00
op_registration_validator.py [CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530) 2023-07-07 18:21:06 +02:00
patch_manylinux.py [Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789) 2023-07-21 12:53:41 -07:00
policheck_exclusions.xml
reduce_op_kernels.py Re-organize the transpose optimization and layout transformation files. (#16246) 2023-07-07 08:24:47 +10:00
replace_urls_in_deps.py
requirements.txt Flash Attention v2 MHA (#17227) 2023-08-31 13:52:21 -07:00
set-trigger-rules.py Pr trggiers generated by code (#17247) 2023-08-30 05:57:03 +08:00
update_tsaoptions.py
upload_python_package_to_azure_storage.py