onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

History

Tianlei Wu 8818a99c93 Set proper nvcc threads to avoid OOM (#17419 ) ### Description There are 8 cu files under [flash attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention) and 4 cu files under [cutlass fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha) need a lot of memory to compile. Previously, the default value is same as parallel - number of CPU cores. Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16 nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job). Each thread might take 4 GB on average (peak is around 6GB, but threads are not started at same time). OOM happens since 16 threads might need close to 64 GB in worst case. When build machine has 64GB or larger memory, OOM is rare. Here we set a proper nvcc --threads based on available memory to avoid OOM. ### Motivation and Context Fix `Python Packaging Pipeline (Training Cuda 11.8)`		2023-09-05 10:59:27 -07:00
..
github	Flash Attention v2 MHA (#17227 )	2023-08-31 13:52:21 -07:00
__init__.py
amd_hipify.py	Adopt linrtunner as the linting tool - take 2 (#15085 )	2023-03-24 15:29:03 -07:00
build.py	Set proper nvcc threads to avoid OOM (#17419 )	2023-09-05 10:59:27 -07:00
clean_docker_image_cache.py	Adopt linrtunner as the linting tool - take 2 (#15085 )	2023-03-24 15:29:03 -07:00
compile_triton.py	[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789 )	2023-07-21 12:53:41 -07:00
coverage.py
gen_def.py	Basic CSharp packaging support for ROCm EP (#15535 )	2023-05-16 07:27:38 +08:00
get_docker_image.py	[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789 )	2023-07-21 12:53:41 -07:00
logger.py
op_registration_utils.py	[CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530 )	2023-07-07 18:21:06 +02:00
op_registration_validator.py	[CI] Removes type2 in process_registration and fix Windows GPU Reduced Ops CI Pipeline (#16530 )	2023-07-07 18:21:06 +02:00
patch_manylinux.py	[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789 )	2023-07-21 12:53:41 -07:00
policheck_exclusions.xml	Exculde hipify option from policheck (#13431 )	2022-10-25 16:35:16 +08:00
reduce_op_kernels.py	Re-organize the transpose optimization and layout transformation files. (#16246 )	2023-07-07 08:24:47 +10:00
replace_urls_in_deps.py	Adopt linrtunner as the linting tool - take 2 (#15085 )	2023-03-24 15:29:03 -07:00
requirements.txt	Flash Attention v2 MHA (#17227 )	2023-08-31 13:52:21 -07:00
set-trigger-rules.py	Pr trggiers generated by code (#17247 )	2023-08-30 05:57:03 +08:00
update_tsaoptions.py	Adopt linrtunner as the linting tool - take 2 (#15085 )	2023-03-24 15:29:03 -07:00
upload_python_package_to_azure_storage.py	Adopt linrtunner as the linting tool - take 2 (#15085 )	2023-03-24 15:29:03 -07:00