mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-06-09 00:30:53 +00:00
Move Linux GPU CI pipeline to A10 machines which are more advanced. Retire onnxruntime-Linux-GPU-T4 machine pool. Disable run_lean_attention test because the new machines do not have enough shared memory. ``` skip loading trt attention kernel fmha_mhca_fp16_128_256_sm86_kernel because no enough shared memory [E:onnxruntime:, sequential_executor.cc:505 ExecuteKernel] Non-zero status code returned while running MultiHeadAttention node. Name:'MultiHeadAttention_0' Status Message: CUDA error cudaErrorInvalidValue:invalid argument ``` |
||
|---|---|---|
| .. | ||
| contrib_ops | ||
| core | ||
| lora | ||
| python | ||
| test | ||
| tool/etw | ||
| wasm | ||
| __init__.py | ||
| ReformatSource.ps1 | ||
| ReformatSourcePython.bat | ||
| VSCodeCoverage.runsettings | ||