onnxruntime/onnxruntime
Satya Kumar Jandhyala e8bf46a70e
[WebGPU EP] Support GroupQueryAttention (#22658)
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
2024-12-02 12:40:03 -08:00
..
contrib_ops [WebGPU EP] Support GroupQueryAttention (#22658) 2024-12-02 12:40:03 -08:00
core [TensorRT EP] Fix wrong input order when generating IndexedSubGraph (#22857) 2024-12-02 01:45:29 -08:00
lora Accomodate BE platforms. Make sure we always write flatbuffers LE (#22375) 2024-10-11 09:14:44 -07:00
python [CoreML] Create EP by AppendExecutionProvider (#22675) 2024-11-27 09:26:31 +08:00
test Quantize Bias for Conv/Gemm on Quantized Model (#22889) 2024-11-28 10:10:24 +08:00
tool/etw
wasm [WebNN] Fixed WebNN Module undefined issue (#22795) 2024-11-11 21:31:24 -08:00
__init__.py bumps up version in main from 1.20 -> 1.21 (#22482) 2024-10-17 12:32:35 -07:00
ReformatSource.ps1
ReformatSourcePython.bat
VSCodeCoverage.runsettings