onnxruntime/tools
Tianlei Wu 8d99b1a8dc
reduce GQA test combinations (#22918)
### Description
* Reduce GQA test combinations to save about 35 minutes test time in CI
pipelines.
* Show latency of transformers tests
* Use seed in DMMHA test to avoid random failure.
* For test_flash_attn_rocm.py, test skipping condition from "has cuda
ep" to "not has rocm ep", so that it does not run in cpu build.
* For test_flash_attn_cuda.py, move flash attention and memory efficient
attention tests to different classes, so that we can skip a test suite
instead of checking in each test.

### Motivation and Context
It takes too long to run GQA tests in CI pipelines since there are too
many combinations.

###### Linux GPU CI Pipeline
Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After:  150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: **1424** seconds (0:23:44)

###### Windows GPU CUDA CI Pipeline
Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) 
Time Saved: **330** seconds (0:05:30)

###### Linux CPU CI Pipeline
Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)
- 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 26.45s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)
- 0.97s  transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 2.41s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

Time Saved: **374** seconds (0:06:14).
2024-11-21 12:26:46 -08:00
..
android_custom_build Update all JDK version to 17 (#22786) 2024-11-12 11:42:18 -08:00
ci_build reduce GQA test combinations (#22918) 2024-11-21 12:26:46 -08:00
doc Update ruff and clang-format versions (#21479) 2024-07-24 11:50:11 -07:00
nuget Cleanup code (#22827) 2024-11-19 14:13:33 -08:00
perf_view fixed #16873 (#16932) 2023-09-26 09:57:01 -07:00
python Add Android QNN Browserstack test (#22434) 2024-11-10 16:10:29 -08:00
scripts Cleanup code (#22827) 2024-11-19 14:13:33 -08:00