onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-22 22:01:08 +00:00

History

Jiajia Qin 80d8931f1d [webgpu] Use subgroup for matmulnbits (#23224 ) ### Description This PR applies subgroup to implement matmulnbits when tile_m > 1 for intel devices. With this PR, prefill for 500 tokens prompt for phi3 becomes 3.5s from 8.5s on intel Meteor Lake.		2025-01-13 08:20:42 -08:00
..
common	Remove nsync (#20413 )	2024-10-21 15:32:14 -07:00
dll	fix webgpu delay load test (#23157 )	2024-12-20 13:37:12 -08:00
dlpack
eager
flatbuffers
framework	Update Linux docker images (#23244 )	2025-01-09 10:20:33 -08:00
graph	Address CodeQL security issues on comparison of different types (#23276 )	2025-01-07 17:30:44 -08:00
mickey
mlas	[ARM CPU] Add rotary embedding fp16 kernel (#23013 )	2024-12-06 13:25:48 -08:00
optimizer	Add Optional Redundant Clip Node to NodeUnit (#22888 )	2025-01-09 10:25:32 +08:00
platform	[CoreML] support coreml model cache (#23065 )	2024-12-31 09:29:41 +08:00
providers	[webgpu] Use subgroup for matmulnbits (#23224 )	2025-01-13 08:20:42 -08:00
quantization
session	[VitisAI] change all support tensor type from ir 9 to ir 10 (#23204 )	2025-01-02 06:45:21 -08:00
util	Address CodeQL security issues on comparison of different types (#23276 )	2025-01-07 17:30:44 -08:00