onnxruntime/onnxruntime/core
Jiajia Qin 80d8931f1d
[webgpu] Use subgroup for matmulnbits (#23224)
### Description
This PR applies subgroup to implement matmulnbits when tile_m > 1 for
intel devices.
With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from
8.5s on intel Meteor Lake.
2025-01-13 08:20:42 -08:00
..
common Remove nsync (#20413) 2024-10-21 15:32:14 -07:00
dll fix webgpu delay load test (#23157) 2024-12-20 13:37:12 -08:00
dlpack
eager
flatbuffers
framework Update Linux docker images (#23244) 2025-01-09 10:20:33 -08:00
graph Address CodeQL security issues on comparison of different types (#23276) 2025-01-07 17:30:44 -08:00
mickey
mlas [ARM CPU] Add rotary embedding fp16 kernel (#23013) 2024-12-06 13:25:48 -08:00
optimizer Add Optional Redundant Clip Node to NodeUnit (#22888) 2025-01-09 10:25:32 +08:00
platform [CoreML] support coreml model cache (#23065) 2024-12-31 09:29:41 +08:00
providers [webgpu] Use subgroup for matmulnbits (#23224) 2025-01-13 08:20:42 -08:00
quantization
session [VitisAI] change all support tensor type from ir 9 to ir 10 (#23204) 2025-01-02 06:45:21 -08:00
util Address CodeQL security issues on comparison of different types (#23276) 2025-01-07 17:30:44 -08:00