onnxruntime/tools/ci_build
Yi Zhang 54871a2773
Replace T4 to A10 in Linux GPU workflow (#19205)
### Description
1. Update Linux GPU  machine from T4 to A10, sm=8.6
2. update the tolerance 

### Motivation and Context
1. Free more T4 and test with higher compute capability.
2. ORT enables TF32 in GEMM for A10/100. TF32 will cause precsion loss
and fail this test
```
2024-01-19T13:27:18.8302842Z [ RUN      ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-19T13:27:25.8438153Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:25.8438641Z Expected equality of these values:
2024-01-19T13:27:25.8438841Z   COMPARE_RESULT::SUCCESS
2024-01-19T13:27:25.8439276Z     Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:25.8439464Z   ret.first
2024-01-19T13:27:25.8445514Z     Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:25.8445962Z expected 0.145984 (3e157cc1), got 0.975133 (3f79a24b), diff: 0.829149, tol=0.0114598 idx=375. 20 of 388 differ
2024-01-19T13:27:25.8446198Z 
2024-01-19T13:27:25.8555736Z [  FAILED  ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12, where GetParam() = "cuda_../models/zoo/opset12/SSD/ssd-12.onnx" (7025 ms)
2024-01-19T13:27:25.8556077Z [ RUN      ] ModelTests/ModelTest.Run/cuda__models_zoo_opset12_YOLOv312_yolov312
2024-01-19T13:27:29.3174318Z /onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:347: Failure
2024-01-19T13:27:29.3175144Z Expected equality of these values:
2024-01-19T13:27:29.3175389Z   COMPARE_RESULT::SUCCESS
2024-01-19T13:27:29.3175812Z     Which is: 4-byte object <00-00 00-00>
2024-01-19T13:27:29.3176080Z   ret.first
2024-01-19T13:27:29.3176322Z     Which is: 4-byte object <01-00 00-00>
2024-01-19T13:27:29.3178431Z expected 4.34958 (408b2fb8), got 4.51324 (40906c80), diff: 0.16367, tol=0.0534958 idx=9929. 22 of 42588 differ

```
3. some other test like SSD throw other exception, so skip them
'''
2024-01-22T09:07:40.8446910Z [ RUN ]
ModelTests/ModelTest.Run/cuda__models_zoo_opset12_SSD_ssd12
2024-01-22T09:07:51.5587571Z
/onnxruntime_src/onnxruntime/test/providers/cpu/model_tests.cc:358:
Failure
2024-01-22T09:07:51.5588512Z Expected equality of these values:
2024-01-22T09:07:51.5588870Z   COMPARE_RESULT::SUCCESS
2024-01-22T09:07:51.5589467Z     Which is: 4-byte object <00-00 00-00>
2024-01-22T09:07:51.5589953Z   ret.first
2024-01-22T09:07:51.5590462Z     Which is: 4-byte object <01-00 00-00>
2024-01-22T09:07:51.5590841Z expected 1, got 63
'''
2024-01-23 10:49:24 -08:00
..
github Replace T4 to A10 in Linux GPU workflow (#19205) 2024-01-23 10:49:24 -08:00
__init__.py
amd_hipify.py undo hipify of __half to rocblas_half (#18573) 2023-11-24 18:04:23 +08:00
build.py [WebNN EP] Fixed build issue with disable_rtti (#19173) 2024-01-16 21:35:13 -08:00
clean_docker_image_cache.py
compile_triton.py
coverage.py
gen_def.py
get_docker_image.py
logger.py
op_registration_utils.py
op_registration_validator.py
patch_manylinux.py
policheck_exclusions.xml
reduce_op_kernels.py
replace_urls_in_deps.py Add a build validation for Linux ARM64 cross-compile (#18200) 2023-11-08 13:03:18 -08:00
requirements.txt Adding python3.12 support to ORT (#18814) 2024-01-11 08:34:28 -08:00
set-trigger-rules.py
update_tsaoptions.py
upload_python_package_to_azure_storage.py