onnxruntime/js/web/lib
Jiajia Qin 891fba3b9c
[js/webgpu] Optimize Gather op (#17625)
### Description
This PR optimizes the gather op, which is improved ~6ms in segment
anything model in ADL.
The problem in original algorithm is that it includes a for loop to
calculate a block size of data. However, the block size may be very
large, like `65536`. In GPU shader, we should try to avoid large loop in
shader and try to use more threads to do it parallelly.

Before:
```
[profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 6886207 ns
```
After:
```
[profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 11719 ns
2023-09-21 21:00:36 -07:00
..
onnxjs [js/api] introducing IO binding for tensor (#16452) 2023-08-29 12:58:26 -07:00
wasm [js/webgpu] Optimize Gather op (#17625) 2023-09-21 21:00:36 -07:00
backend-onnxjs.ts
backend-wasm.ts [js/webgpu] support proxy for webgpu (#15851) 2023-05-15 16:23:13 -07:00
build-def.d.ts [js/web] WebGPU backend via JSEP (#14579) 2023-04-24 15:21:18 -07:00
index.ts [js/api] introducing IO binding for tensor (#16452) 2023-08-29 12:58:26 -07:00
version.ts Bump Up Version to 1.17.0 (#17587) 2023-09-20 11:02:58 +08:00