onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-20 19:12:24 +00:00

History

Jiajia Qin 0409c639f7 [js/webgpu] Optimize MultiHeadAttention\|Transpose (#22420 ) ### Description <!-- Describe your changes. --> With this optimization, 96 MultiHeadAttention\|Transpose ops in phi3 disappear. Phi3 becomes 113 tokens from 107 tokens on my dGPUs. The optimization mainly skips the transpose op if one of the transposed dims is 1. Reshape is enough.		2024-10-14 15:43:14 -07:00
..
onnxjs	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 )	2024-08-14 16:51:22 -07:00
wasm	[js/webgpu] Optimize MultiHeadAttention\|Transpose (#22420 )	2024-10-14 15:43:14 -07:00
backend-onnxjs.ts	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 )	2024-08-14 16:51:22 -07:00
backend-wasm.ts	[js/web] remove training release (#22103 )	2024-09-16 10:56:22 -07:00
build-def.d.ts	[js/web] allow build target for non dynamic import (#20898 )	2024-06-03 12:33:37 -07:00
index.ts	[js/web] remove training release (#22103 )	2024-09-16 10:56:22 -07:00
version.ts	bumps up version in main from 1.19 -> 1.20 (#21588 )	2024-08-05 15:46:04 -07:00