onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-24 19:43:35 +00:00

History

Jiajia Qin 891fba3b9c [js/webgpu] Optimize Gather op (#17625 ) ### Description This PR optimizes the gather op, which is improved ~6ms in segment anything model in ADL. The problem in original algorithm is that it includes a for loop to calculate a block size of data. However, the block size may be very large, like `65536`. In GPU shader, we should try to avoid large loop in shader and try to use more threads to do it parallelly. Before: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 6886207 ns ``` After: ``` [profiling] kernel "41771992\|[Gather] 41771992" input[0]: [4,65536] \| float32, input[1]: [1] \| int64, output[0]: [1,65536] \| float32, execution time: 11719 ns		2023-09-21 21:00:36 -07:00
..
ops	[js/webgpu] Optimize Gather op (#17625 )	2023-09-21 21:00:36 -07:00
attribute-with-cache-key.ts
gpu-data-manager.ts	[js/webgpu] add kernel Not and Equal (#17306 )	2023-08-27 19:50:17 -07:00
op-resolve-rules.ts	[WebGPU/JS] Added Pad operator support (#16928 )	2023-09-14 13:14:11 -07:00
program-manager.ts	[js/web] revise TensorView (#17473 )	2023-09-14 21:14:44 -07:00
types.ts	[js/web] revise TensorView (#17473 )	2023-09-14 21:14:44 -07:00