mirror of
https://github.com/saymrwulf/onnxruntime.git
synced 2026-05-22 22:01:08 +00:00
### Description This PR optimizes the gather op, which is improved ~6ms in segment anything model in ADL. The problem in original algorithm is that it includes a for loop to calculate a block size of data. However, the block size may be very large, like `65536`. In GPU shader, we should try to avoid large loop in shader and try to use more threads to do it parallelly. Before: ``` [profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 6886207 ns ``` After: ``` [profiling] kernel "41771992|[Gather] 41771992" input[0]: [4,65536] | float32, input[1]: [1] | int64, output[0]: [1,65536] | float32, execution time: 11719 ns |
||
|---|---|---|
| .. | ||
| binding | ||
| jsep | ||
| proxy-worker | ||
| proxy-messages.ts | ||
| proxy-wrapper.ts | ||
| run-options.ts | ||
| session-handler.ts | ||
| session-options.ts | ||
| wasm-common.ts | ||
| wasm-core-impl.ts | ||
| wasm-factory.ts | ||
| wasm-utils.ts | ||