onnxruntime/tools/ci_build/github/azure-pipelines
Yi Zhang 54871a2773
Replace T4 to A10 in Linux GPU workflow (#19205)
### Description
1. Update Linux GPU  machine from T4 to A10, sm=8.6
2. update the tolerance 

### Motivation and Context
1. Free more T4 and test with higher compute capability.
2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss
and fail this test
```
2024-01-19T13:27:18.8302842Z [ RUN      ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:25.8438641Z Expected equality of these values:
2024-01-19T13:27:25.8438841Z   COMPARE_RESULT::SUCCESS
2024-01-19T13:27:25.8439276Z     Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:25.8439464Z   ret.first
2024-01-19T13:27:25.8445514Z     Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ
2024-01-19T13:27:25.8446198Z 
2024-01-19T13:27:25.8555736Z [  FAILED  ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms)
2024-01-19T13:27:25.8556077Z [ RUN      ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312
2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:29.3175144Z Expected equality of these values:
2024-01-19T13:27:29.3175389Z   COMPARE_RESULT::SUCCESS
2024-01-19T13:27:29.3175812Z     Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:29.3176080Z   ret.first
2024-01-19T13:27:29.3176322Z     Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ

```
3. some other test like SSD throw other exception, so skip them
'''
2024-01-22T09:07:40.8446910Z [ RUN ]
ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-22T09:07:51.5587571Z
/onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358:
Failure
2024-01-22T09:07:51.5588512Z Expected equality of these values:
2024-01-22T09:07:51.5588870Z   COMPARE_RESULT::SUCCESS
2024-01-22T09:07:51.5589467Z     Which is: 4-byte object <00-00 00-00>
2024-01-22T09:07:51.5589953Z   ret.first
2024-01-22T09:07:51.5590462Z     Which is: 4-byte object <01-00 00-00>
2024-01-22T09:07:51.5590841Z expected 1, got 63
'''
2024-01-23 10:49:24 -08:00
..
nodejs/templates
nuget/templates Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
stages Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187) 2024-01-17 17:20:42 -08:00
templates [QNN EP] Create Windows ARM64 nightly python package (#19128) 2024-01-22 18:14:41 -08:00
triggers
android-arm64-v8a-QNN-crosscompile-ci-pipeline.yml [QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129) 2024-01-18 14:59:23 -08:00
android-x86_64-crosscompile-ci-pipeline.yml
bigmodels-ci-pipeline.yml Add Big models pipeline (#19222) 2024-01-22 14:02:56 -08:00
binary-size-checks-pipeline.yml
build-perf-test-binaries-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
c-api-noopenmp-packaging-pipelines.yml Fix buildJava from Zip-Nuget-Java-Nodejs Packaging Pipeline (#19187) 2024-01-17 17:20:42 -08:00
clean-build-docker-image-cache-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
cuda-packaging-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
linux-ci-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
linux-cpu-aten-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
linux-cpu-eager-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
linux-cpu-minimal-build-ci-pipeline.yml Set NDK version in Linux CPU Minimal Build E2E CI Pipeline (#18810) 2023-12-14 08:08:41 -08:00
linux-dnnl-ci-pipeline.yml
linux-gpu-ci-pipeline.yml Replace T4 to A10 in Linux GPU workflow (#19205) 2024-01-23 10:49:24 -08:00
linux-gpu-tensorrt-ci-pipeline.yml
linux-gpu-tensorrt-daily-perf-pipeline.yml [EP Perf] Fix missing Azure cli & use onnx zoo model inside image (#18917) 2024-01-01 17:14:39 -08:00
linux-migraphx-ci-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
linux-multi-gpu-tensorrt-ci-pipeline.yml
linux-openvino-ci-pipeline.yml
linux-qnn-ci-pipeline.yml [QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129) 2024-01-18 14:59:23 -08:00
mac-ci-pipeline.yml
mac-coreml-ci-pipeline.yml
mac-ios-ci-pipeline.yml Enable Address Sanitizer in CI (#19073) 2024-01-12 07:24:40 -08:00
mac-ios-packaging-pipeline.yml Revert "iOS packaging pipeline stability" (#19135) 2024-01-16 09:18:35 -08:00
mac-objc-static-analysis-ci-pipeline.yml Update absl and gtest to fix an ARM64EC build error (#18735) 2023-12-07 15:55:17 -08:00
mac-react-native-ci-pipeline.yml
npm-packaging-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
nuget-cuda-publishing-pipeline.yml Update Nuget publishing jobs (#18851) 2023-12-19 16:54:46 -08:00
orttraining-linux-ci-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
orttraining-linux-gpu-ci-pipeline.yml
orttraining-linux-gpu-ortmodule-distributed-test-ci-pipeline.yml
orttraining-linux-nightly-ortmodule-test-pipeline.yml ORTModule memory improvement (#18924) 2024-01-16 08:57:37 +08:00
orttraining-mac-ci-pipeline.yml
orttraining-pai-ci-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
orttraining-py-packaging-pipeline-cpu.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
orttraining-py-packaging-pipeline-cuda.yml
orttraining-py-packaging-pipeline-cuda12.yml
orttraining-py-packaging-pipeline-rocm.yml [ROCm] Update CI/Packaging pipeline to ROCm6.0 (#18985) 2024-01-03 17:25:15 +08:00
post-merge-jobs.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
publish-nuget.yml Update Nuget publishing jobs (#18851) 2023-12-19 16:54:46 -08:00
py-cuda-package-test-pipeline.yml Adding new pipeline for python cuda testing (#18718) 2023-12-18 18:13:03 -08:00
py-cuda-packaging-pipeline.yml
py-cuda-publishing-pipeline.yml Adding a new pipeline for publishing to Python Cuda 12 packages. (#18712) 2023-12-11 14:17:46 -08:00
py-package-build-pipeline.yml
py-package-test-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
py-packaging-pipeline.yml [QNN EP] Create Windows ARM64 nightly python package (#19128) 2024-01-22 18:14:41 -08:00
qnn-ep-nuget-packaging-pipeline.yml [QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129) 2024-01-18 14:59:23 -08:00
web-ci-pipeline.yml Upgrade Ubuntu machine pool from 20.04 to 22.04 (#19117) 2024-01-16 17:25:18 -08:00
win-ci-fuzz-testing.yml Fix Fuzz Testing CI (#19228) 2024-01-22 15:44:57 -08:00
win-ci-pipeline.yml Disable ccache in Windows CPU CI pipeline (#19131) 2024-01-13 18:40:43 -08:00
win-gpu-ci-pipeline.yml Move Windows GPU training job to A10 (#19041) 2024-01-08 09:19:58 -08:00
win-gpu-reduce-op-ci-pipeline.yml Enable Address Sanitizer in CI (#19073) 2024-01-12 07:24:40 -08:00
win-gpu-tensorrt-ci-pipeline.yml Enable Address Sanitizer in CI (#19073) 2024-01-12 07:24:40 -08:00
win-qnn-arm64-ci-pipeline.yml [QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129) 2024-01-18 14:59:23 -08:00
win-qnn-ci-pipeline.yml [QNN EP] Update QNN pipelines to use QNN SDK 2.18 by default (#19129) 2024-01-18 14:59:23 -08:00