onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

History

Yi Zhang 14d7872ce9 Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 ) ### Description It always has been out of memory in training CUDA 12.2 packaging pipeline https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary since the PR #19910 I tried other CPU agents for example, D64as_v5(256G memory) and D32as_v4(128G memory and 256 G SSD temp storage), which are still out of memory like the below image ![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083) But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp storage, and it takes much more time. ### Motivation and Context Restore CUDA 12.2 training packaging pipeline first. More time is needed to investigate the root cause ### Other Clues. These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4 And it runs out of memory on CPU machine. @ajindal1 cuda12.2 on T4 ``` 2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o 2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o 2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o ``` But they could be finished in about one minute with Cuda 11.8 on CPU ``` cuda11.8 on CPU 2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o cuda11.8 on GPU 024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o ```		2024-04-10 09:21:40 +08:00
..
nodejs/templates	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
nuget/templates	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
stages	enable lto in Python-CUDA-Packaging Pipline (#20164 )	2024-04-01 15:42:28 +08:00
templates	Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 )	2024-04-10 09:21:40 +08:00
triggers
android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 )	2024-02-16 16:59:43 -08:00
android-x86_64-crosscompile-ci-pipeline.yml	Change "onnxruntime-Linux-CPU-For-Android-CI" machine pool to "onnxruntime-Ubuntu2204-AMD-CPU" (#19698 )	2024-02-28 19:36:26 -08:00
bigmodels-ci-pipeline.yml	Remove --extra-index-url (#19885 )	2024-03-13 09:45:22 -07:00
binary-size-checks-pipeline.yml
build-perf-test-binaries-pipeline.yml
c-api-noopenmp-packaging-pipelines.yml	Split more windows GPU workflow into 2 stages, building and testing, to make them more stable (#20080 )	2024-03-28 12:55:44 +08:00
clean-build-docker-image-cache-pipeline.yml
cuda-packaging-pipeline.yml	Split more windows GPU workflow into 2 stages, building and testing, to make them more stable (#20080 )	2024-03-28 12:55:44 +08:00
linux-ci-pipeline.yml	Check whether required tests are executed. (#19884 )	2024-03-13 09:59:57 -07:00
linux-cpu-aten-pipeline.yml	Fix a build issue: /MP was not enabled correctly (#19190 )	2024-01-29 12:45:38 -08:00
linux-cpu-eager-pipeline.yml	Fix a build issue: /MP was not enabled correctly (#19190 )	2024-01-29 12:45:38 -08:00
linux-cpu-minimal-build-ci-pipeline.yml	Change "onnxruntime-Linux-CPU-For-Android-CI" machine pool to "onnxruntime-Ubuntu2204-AMD-CPU" (#19698 )	2024-02-28 19:36:26 -08:00
linux-dnnl-ci-pipeline.yml
linux-gpu-ci-pipeline.yml	Enable CUDA EP unit testing on Windows (#20039 )	2024-03-27 13:32:36 -07:00
linux-gpu-tensorrt-ci-pipeline.yml	Fix a build issue: /MP was not enabled correctly (#19190 )	2024-01-29 12:45:38 -08:00
linux-gpu-tensorrt-daily-perf-pipeline.yml	[EP Perf] Add concurrency test (#19804 )	2024-03-15 07:41:21 -07:00
linux-migraphx-ci-pipeline.yml	[ROCm] Remove MPI dependency and collectives to use NCCL (#19830 )	2024-03-19 17:35:18 -07:00
linux-multi-gpu-tensorrt-ci-pipeline.yml
linux-openvino-ci-pipeline.yml	Ort openvino npu 1.17 master (#19966 )	2024-03-21 18:44:00 -07:00
linux-qnn-ci-pipeline.yml	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 )	2024-02-16 16:59:43 -08:00
mac-ci-pipeline.yml
mac-coreml-ci-pipeline.yml	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
mac-ios-ci-pipeline.yml	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
mac-ios-packaging-pipeline.yml	Switch a portion of CI/packaging jobs to MacOS12 (#19908 )	2024-03-19 14:54:58 -07:00
mac-objc-static-analysis-ci-pipeline.yml	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
mac-react-native-ci-pipeline.yml	Change "onnxruntime-Linux-CPU-For-Android-CI" machine pool to "onnxruntime-Ubuntu2204-AMD-CPU" (#19698 )	2024-02-28 19:36:26 -08:00
npm-packaging-pipeline.yml
nuget-cuda-publishing-pipeline.yml
orttraining-linux-ci-pipeline.yml	Fix a build issue: /MP was not enabled correctly (#19190 )	2024-01-29 12:45:38 -08:00
orttraining-linux-gpu-ci-pipeline.yml
orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml
orttraining-linux-nightly-ortmodule-test-pipeline.yml
orttraining-mac-ci-pipeline.yml
orttraining-pai-ci-pipeline.yml
orttraining-py-packaging-pipeline-cpu.yml	[Fix] Error Python Packaging Pipeline (Training CPU) (#19992 )	2024-03-20 09:02:50 -07:00
orttraining-py-packaging-pipeline-cuda.yml	Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 )	2024-04-10 09:21:40 +08:00
orttraining-py-packaging-pipeline-cuda12.yml	Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 )	2024-04-10 09:21:40 +08:00
orttraining-py-packaging-pipeline-rocm.yml
post-merge-jobs.yml	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
publish-nuget.yml
py-cuda-package-test-pipeline.yml
py-cuda-packaging-pipeline.yml	Refactor Python CUDA packaging pipeline to fix random hangs in building (#19989 )	2024-03-22 09:16:00 +08:00
py-cuda-publishing-pipeline.yml
py-package-build-pipeline.yml
py-package-test-pipeline.yml	Fix training and macos ci pipelines (#20034 )	2024-03-26 12:20:11 -07:00
py-packaging-pipeline.yml	[QNN EP] Build x64 python wheel for QNN EP (#19499 )	2024-02-12 20:54:04 -08:00
qnn-ep-nuget-packaging-pipeline.yml	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 )	2024-02-16 16:59:43 -08:00
web-ci-pipeline.yml
win-ci-fuzz-testing.yml
win-ci-pipeline.yml	Install ONNX by buildling source code in Windows DML stage (#20079 )	2024-03-27 12:29:34 -07:00
win-gpu-ci-pipeline.yml	Enable CUDA EP unit testing on Windows (#20039 )	2024-03-27 13:32:36 -07:00
win-gpu-reduce-op-ci-pipeline.yml
win-gpu-tensorrt-ci-pipeline.yml	Fix a build issue: /MP was not enabled correctly (#19190 )	2024-01-29 12:45:38 -08:00
win-qnn-arm64-ci-pipeline.yml	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 )	2024-02-16 16:59:43 -08:00
win-qnn-ci-pipeline.yml	[QNN EP] Update default QNN SDK to 2.19.2.240210 (#19546 )	2024-02-16 16:59:43 -08:00