onnxruntime/js/web/lib
Arthur Islamov ccf14e891e
[js/web] JSEP node assignment optimization (#17128)
### Description
Since WebGPU supports only float32 and int32, having Gather, Reshape,
Shape, Squeeze and Unsqueeze ops with other data types create additional
MemCpy ops and slow down the overall execution as all other OPs with
other tensor types will be done on CPU.

Before this patch SD Unet had these numbers:
Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1141
Node(s) placed on [JsExecutionProvider]. Number of nodes: 4025
memcpy tokens: 2001

After patch:
Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1735
Node(s) placed on [JsExecutionProvider]. Number of nodes: 2243
memcpu tokens: 813

It also gives more than 5X performance benefit. From 12sec for one Unet
step to 2.2sec on RTX 3090 Ti, so we are almost getting to native
performance.

UPD: with latest changes from main branch and multi-threading it went
down to 1.6sec. Will try re-exporting my model to onnx with maximum
optimizations, like using MultiHeadAttention to decrease node count.
Maybe after implementing that it can go in less than 1 sec
2023-08-15 18:58:05 -07:00
..
onnxjs Fix Resize op input check (#16594) 2023-08-09 15:42:30 -07:00
wasm [js/web] JSEP node assignment optimization (#17128) 2023-08-15 18:58:05 -07:00
backend-onnxjs.ts
backend-wasm.ts [js/webgpu] support proxy for webgpu (#15851) 2023-05-15 16:23:13 -07:00
build-def.d.ts [js/web] WebGPU backend via JSEP (#14579) 2023-04-24 15:21:18 -07:00
index.ts [js] add API that allows to get package version (#16207) 2023-06-09 16:18:53 -07:00
version.ts [js] add API that allows to get package version (#16207) 2023-06-09 16:18:53 -07:00