onnxruntime/js/web/lib/wasm
Jiajia Qin 891fba3b9c
[js/webgpu] Optimize Gather op (#17625)
### Description
This PR optimizes the gather op, which is improved ~6ms in segment
anything model in ADL.
The problem in original algorithm is that it includes a for loop to
calculate a block size of data. However, the block size may be very
large, like `65536`. In GPU shader, we should try to avoid large loop in
shader and try to use more threads to do it parallelly.

Before:
```
[profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 6886207 ns
```
After:
```
[profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 11719 ns
2023-09-21 21:00:36 -07:00
..
binding [js/web] add sessionOptions.freeDimensionOverrides (#17488) 2023-09-13 09:17:34 -07:00
jsep [js/webgpu] Optimize Gather op (#17625) 2023-09-21 21:00:36 -07:00
proxy-worker [js/webgpu] support proxy for webgpu (#15851) 2023-05-15 16:23:13 -07:00
proxy-messages.ts [js/webgpu] support proxy for webgpu (#15851) 2023-05-15 16:23:13 -07:00
proxy-wrapper.ts [js/webgpu] support proxy for webgpu (#15851) 2023-05-15 16:23:13 -07:00
run-options.ts [js/web] enable ONNX Runtime Web error messages in JS (#16335) 2023-06-15 09:45:41 -07:00
session-handler.ts [js/web] ensure ORT initialization to run only once (#17529) 2023-09-12 23:52:08 -07:00
session-options.ts [js/web] add sessionOptions.freeDimensionOverrides (#17488) 2023-09-13 09:17:34 -07:00
wasm-common.ts [js/webgpu] make IndicesHelper implementation implicit (#17193) 2023-08-23 14:41:35 -07:00
wasm-core-impl.ts [js/webgpu] fix jsepOnRunEnd (#17300) 2023-08-26 00:30:28 -07:00
wasm-factory.ts [js/web] add target ort.webgpu.min.js (#15780) 2023-05-04 10:05:39 -07:00
wasm-utils.ts [js/web] enable ONNX Runtime Web error messages in JS (#16335) 2023-06-15 09:45:41 -07:00