onnxruntime/tools/ci_build
Tianlei Wu 8d99b1a8dc
reduce GQA test combinations (#22918)
### Description
* Reduce GQA test combinations to save about 35 minutes test time in CI
pipelines.
* Show latency of transformers tests
* Use seed in DMMHA test to avoid random failure.
* For test_flash_attn_rocm.py, test skipping condition from "has cuda
ep" to "not has rocm ep", so that it does not run in cpu build.
* For test_flash_attn_cuda.py, move flash attention and memory efficient
attention tests to different classes, so that we can skip a test suite
instead of checking in each test.

### Motivation and Context
It takes too long to run GQA tests in CI pipelines since there are too
many combinations.

###### Linux GPU CI Pipeline
Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After:  150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: **1424** seconds (0:23:44)

###### Windows GPU CUDA CI Pipeline
Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) 
Time Saved: **330** seconds (0:05:30)

###### Linux CPU CI Pipeline
Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)
- 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 26.45s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)
- 0.97s  transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 2.41s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

Time Saved: **374** seconds (0:06:14).
2024-11-21 12:26:46 -08:00
..
github bigmodel pipeline update cp38 to cp310 (#22793) 2024-11-21 07:25:01 -08:00
requirements Update attention fusion to support SDPA pattern (#22629) 2024-11-21 09:42:41 -08:00
__init__.py
amd_hipify.py fix issue when build with hipblasLt on rocm6.1 (#22553) 2024-10-28 13:57:08 +08:00
build.py reduce GQA test combinations (#22918) 2024-11-21 12:26:46 -08:00
compile_triton.py
coverage.py
gen_def.py Initial WebGPU EP checkin (#22318) 2024-10-08 16:10:46 -07:00
get_docker_image.py
logger.py
op_registration_utils.py
op_registration_validator.py
patch_manylinux.py
policheck_exclusions.xml
reduce_op_kernels.py
replace_urls_in_deps.py
set-trigger-rules.py Cleanup code (#22827) 2024-11-19 14:13:33 -08:00
update_tsaoptions.py
upload_python_package_to_azure_storage.py