onnxruntime/js/web/lib/wasm/jsep/webgpu
Jiajia Qin 891fba3b9c
[js/webgpu] Optimize Gather op (#17625)
### Description
This PR optimizes the gather op, which is improved ~6ms in segment
anything model in ADL.
The problem in original algorithm is that it includes a for loop to
calculate a block size of data. However, the block size may be very
large, like `65536`. In GPU shader, we should try to avoid large loop in
shader and try to use more threads to do it parallelly.

Before:
```
[profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 6886207 ns
```
After:
```
[profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 11719 ns
2023-09-21 21:00:36 -07:00
..
ops [js/webgpu] Optimize Gather op (#17625) 2023-09-21 21:00:36 -07:00
attribute-with-cache-key.ts
gpu-data-manager.ts [js/webgpu] add kernel Not and Equal (#17306) 2023-08-27 19:50:17 -07:00
op-resolve-rules.ts [WebGPU/JS] Added Pad operator support (#16928) 2023-09-14 13:14:11 -07:00
program-manager.ts [js/web] revise TensorView (#17473) 2023-09-14 21:14:44 -07:00
types.ts [js/web] revise TensorView (#17473) 2023-09-14 21:14:44 -07:00