onnxruntime/js/web/lib/wasm/jsep/webgpu/ops/3rd-party
Jiajia Qin 25f427466e
[js/webgpu] Optimize ConvTranspose (Continue) (#23429)
BUG #23273

This PR does below optimizations:
1. When output channels is one, 1) calculate the offset before the
inchannel loop to reduce indices to offsets calculation, 2) split the
`inputChannelsPerGroup` into `inputChannelsPerGroupInt` and
`inputChannelsRemainder` parts so that we can always access 4 data for
`inputChannelsPerGroupInt`.
2. Use precise initial value to reduce useless loop iterations. Thanks
@jiangzhaoming 's suggestion's on this.

With this PR, ConvTranspose becomes 3.7s from 8.4s on Intel Meteor Lake.
On NV RTX 2000 Ada, it becomes 1.6s from 2.7s.
2025-01-22 08:59:17 -08:00
..
activation_util.ts [js/webgpu] Fix conv2d with activation (#18388) 2023-11-10 12:54:35 -08:00
conv2d_mm_webgpu.ts [js/webgpu] fix Conv2DMatMul shader's out-of-bound read (#23085) 2024-12-12 11:33:53 -08:00
conv3d_naive_webgpu.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
conv_backprop_mm_webgpu.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
conv_backprop_webgpu.ts [js/webgpu] Optimize ConvTranspose (Continue) (#23429) 2025-01-22 08:59:17 -08:00
conv_util.ts [js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728) 2024-08-14 16:51:22 -07:00
matmul_packed_webgpu.ts WebGPU JSEP: Make shader code not depend on input broadcasting patterns (#22536) 2024-11-08 11:00:51 -08:00