onnxruntime/tools/ci_build/github/azure-pipelines
Tianlei Wu b4afc6266f
[ROCm] Python 3.10 in ROCm CI, and ROCm 6.2.3 in MigraphX CI (#22527)
### Description
Upgrade python from 3.9 to 3.10 in ROCm and MigraphX docker files and CI
pipelines. Upgrade ROCm version to 6.2.3 in most places except ROCm CI,
see comment below.

Some improvements/upgrades on ROCm/Migraphx docker or pipeline:
* rocm 6.0/6.1.3 => 6.2.3
* python 3.9 => 3.10
* Ubuntu 20.04 => 22.04
* Also upgrade ml_dtypes, numpy and scipy packages.
* Fix message "ROCm version from ..." with correct file path in
CMakeList.txt
* Exclude some NHWC tests since ROCm EP lacks support for NHWC
convolution.

#### ROCm CI Pipeline:
ROCm 6.1.3 is kept in the pipeline for now.
- Failed after upgrading to ROCm 6.2.3: `HIPBLAS_STATUS_INVALID_VALUE ;
GPU=0 ; hostname=76123b390aed ;
file=/onnxruntime_src/onnxruntime/core/providers/rocm/rocm_execution_provider.cc
; line=170 ; expr=hipblasSetStream(hipblas_handle_, stream);` . It need
further investigation.
- cupy issues:
(1) It currently supports numpy < 1.27, might not work with numpy 2.x.
So we locked numpy==1.26.4 for now.
(2) cupy support of ROCm 6.2 is still in progress:
https://github.com/cupy/cupy/issues/8606.

Note that miniconda issues: its libstdc++.so.6 and libgcc_s.so.1 might
have conflict with the system ones. So we created links to use the
system ones.

#### MigraphX CI pipeline

MigraphX CI does not use cupy, and we are able to use ROCm 6.2.3 and
numpy 2.x in the pipeline.

#### Other attempts

Other things that I've tried which might help in the future: 

Attempt to use a single docker file for both ROCm and Migraphx:
https://github.com/microsoft/onnxruntime/pull/22478

Upgrade to ubuntu 24.04 and python 3.12, and use venv like
[this](27903e7ff1/tools/ci_build/github/linux/docker/rocm-ci-pipeline-env.Dockerfile).

### Motivation and Context
In 1.20 release, ROCm nuget packaging pipeline will use 6.2:
https://github.com/microsoft/onnxruntime/pull/22461.
This upgrades rocm to 6.2.3 in CI pipelines to be consistent.
2024-10-25 11:47:16 -07:00
..
nodejs/templates Update pool to MacOS-13 (#17361) 2024-09-17 10:07:30 -07:00
nuget Migrate Nuget Windows AI Pipeline to Use 1ES Template (#22572) 2024-10-24 09:15:39 -07:00
stages Enable 1ES on Python CUDA Package Pipelines (#22560) 2024-10-24 09:51:00 -07:00
templates Fix Maven Sha256 Checksum Issue (#22600) 2024-10-25 08:13:02 -07:00
triggers
android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml Update QNN default version to 2.27 in CI pipeline (#22471) 2024-10-16 22:05:47 -07:00
android-x86_64-crosscompile-ci-pipeline.yml Update pool to MacOS-13 (#17361) 2024-09-17 10:07:30 -07:00
bigmodels-ci-pipeline.yml [CUDA] upgrade opencv in stable diffusion demo (#22470) 2024-10-21 23:20:49 -07:00
binary-size-checks-pipeline.yml Clean up some mobile package related files and their usages. (#21606) 2024-08-05 16:38:20 -07:00
build-perf-test-binaries-pipeline.yml Refactor cuda packaging pipeline (#22542) 2024-10-23 08:14:10 -07:00
c-api-noopenmp-packaging-pipelines.yml Update QNN default version to 2.27 in CI pipeline (#22471) 2024-10-16 22:05:47 -07:00
c-api-training-packaging-pipelines.yml Move on-device training packages publish step (#21539) 2024-07-29 09:59:46 -07:00
cuda-packaging-pipeline.yml [Running CI] Update TensorRT to 10.4 (#22049) 2024-09-26 11:10:52 -07:00
linux-ci-pipeline.yml Add python 3.13 support (#22380) 2024-10-14 18:07:54 -07:00
linux-cpu-minimal-build-ci-pipeline.yml Update training packaging pipeline's docker files (#20853) 2024-05-30 23:48:42 -07:00
linux-dnnl-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
linux-gpu-ci-pipeline.yml Update CMake (#22516) 2024-10-21 07:51:05 -07:00
linux-gpu-tensorrt-ci-pipeline.yml Update CMake (#22516) 2024-10-21 07:51:05 -07:00
linux-gpu-tensorrt-daily-perf-pipeline.yml [Running CI] Update TensorRT to 10.4 (#22049) 2024-09-26 11:10:52 -07:00
linux-migraphx-ci-pipeline.yml [ROCm] Python 3.10 in ROCm CI, and ROCm 6.2.3 in MigraphX CI (#22527) 2024-10-25 11:47:16 -07:00
linux-openvino-ci-pipeline.yml Memory Optimization for Compilation in OVEP (#21872) 2024-09-03 13:52:31 -07:00
linux-qnn-ci-pipeline.yml Update QNN default version to 2.27 in CI pipeline (#22471) 2024-10-16 22:05:47 -07:00
linux-rocm-ci-pipeline.yml [ROCm] Python 3.10 in ROCm CI, and ROCm 6.2.3 in MigraphX CI (#22527) 2024-10-25 11:47:16 -07:00
mac-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
mac-coreml-ci-pipeline.yml Update pool to MacOS-13 (#17361) 2024-09-17 10:07:30 -07:00
mac-ios-ci-pipeline.yml Specify iOS simulator runtime version (#22474) 2024-10-18 09:26:06 -07:00
mac-ios-packaging-pipeline.yml Update pool to MacOS-13 (#17361) 2024-09-17 10:07:30 -07:00
mac-react-native-ci-pipeline.yml Re-enable codesign for maven packages (#22308) 2024-10-04 14:30:17 -07:00
npm-packaging-pipeline.yml Re-enable codesign for maven packages (#22308) 2024-10-04 14:30:17 -07:00
nuget-cuda-publishing-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
nuget-windows-ai.yml Migrate Nuget Windows AI Pipeline to Use 1ES Template (#22572) 2024-10-24 09:15:39 -07:00
post-merge-jobs.yml update pipline python version from 3.8 to 3.12 (#22517) 2024-10-21 07:50:31 -07:00
publish-nuget.yml Move on-device training packages publish step (#21539) 2024-07-29 09:59:46 -07:00
py-cuda-alt-package-test-pipeline.yml Adding new Python package testing pipeline for Cuda Alt (#22584) 2024-10-24 19:24:53 -07:00
py-cuda-alt-packaging-pipeline.yml Enable 1ES on Python CUDA Package Pipelines (#22560) 2024-10-24 09:51:00 -07:00
py-cuda-package-test-pipeline.yml Adding new Python package testing pipeline for Cuda Alt (#22584) 2024-10-24 19:24:53 -07:00
py-cuda-packaging-pipeline.yml Refactor cuda packaging pipeline (#22542) 2024-10-23 08:14:10 -07:00
py-cuda-publishing-pipeline.yml Set CUDA12 as default in GPU packages (#21438) 2024-07-25 10:17:16 -07:00
py-dml-packaging-pipeline.yml Enable 1ES on Python CUDA Package Pipelines (#22560) 2024-10-24 09:51:00 -07:00
py-package-build-pipeline.yml
py-package-test-pipeline.yml Adding new Python package testing pipeline for Cuda Alt (#22584) 2024-10-24 19:24:53 -07:00
py-packaging-pipeline.yml Enable 1ES on Python CUDA Package Pipelines (#22560) 2024-10-24 09:51:00 -07:00
qnn-ep-nuget-packaging-pipeline.yml Update QNN default version to 2.27 in CI pipeline (#22471) 2024-10-16 22:05:47 -07:00
rocm-nuget-packaging-pipeline.yml update pipline python version from 3.8 to 3.12 (#22517) 2024-10-21 07:50:31 -07:00
rocm-publish-nuget-pipeline.yml New rocm nuget publish pipeline (#22418) 2024-10-13 08:30:06 +08:00
web-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
win-ci-fuzz-testing.yml Update Node.js version from 18.x to 20.x in CI pipelines (#22576) 2024-10-24 07:34:42 -07:00
win-ci-pipeline.yml Remove training pipelines from Win CPI CI as redundant (#22190) 2024-09-23 18:15:41 -07:00
win-gpu-cuda-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
win-gpu-dml-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
win-gpu-doc-gen-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
win-gpu-reduce-op-ci-pipeline.yml Move jobs in onnxruntime-Win2022-GPU-T4 machine pool to onnxruntime-Win2022-GPU-A10 (#21023) 2024-06-12 22:04:40 -07:00
win-gpu-tensorrt-ci-pipeline.yml [Running CI] Update TensorRT to 10.4 (#22049) 2024-09-26 11:10:52 -07:00
win-gpu-training-ci-pipeline.yml Add a reminder in set-trigger-rules script (#21929) 2024-08-30 12:18:10 -07:00
win-gpu-webgpu-ci-pipeline.yml Initial WebGPU EP checkin (#22318) 2024-10-08 16:10:46 -07:00
win-qnn-arm64-ci-pipeline.yml Update QNN default version to 2.27 in CI pipeline (#22471) 2024-10-16 22:05:47 -07:00
win-qnn-ci-pipeline.yml update pipline python version from 3.8 to 3.12 (#22517) 2024-10-21 07:50:31 -07:00