onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Xu Xing	c19617a24a	[js/webgpu] Add GatherND (#22847 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-12-04 09:57:32 -08:00
Xu Xing	ff57ac4f3d	[js/webgpu] Add scatterND (#22755 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-13 09:13:00 -08:00
xhcao	b5ee4ac760	[js/webgpu] support GridSample operator (#22652 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-08 11:02:36 -08:00
Prathik Rao	5cc7fb4a74	[JSEP] Upgrade to ONNX Opset 21 (#22595 ) ### JSEP Ops that need updating - [x] Cast - [x] ReduceMax - [x] ReduceMin - [x] Squeeze - [x] Unsqueeze - [x] Transpose - [x] AveragePool - [x] Flatten - [x] Pad - [x] If	2024-10-29 17:44:38 -07:00
Jiajia Qin	252222034f	[js/webgpu] Support Reshape/Shape 21+ on jsep (#21871 ) ### Description <!-- Describe your changes. --> #21618 With this PR, the cross device copying (`MemcpyToHost`) can totally be removed for model `wav2vec2`. And the overall time becomes 48ms from 604ms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-27 09:02:39 -07:00
Satya Kumar Jandhyala	af18824f43	[JS/WebGPU] Add GatherBlockQuantized op support (#21734 ) ### Description Add GatherBlockQuantized operator to JSEP. ### Motivation and Context Gemma model requires this.	2024-08-26 14:46:04 -07:00
Guenther Schmuelling	d82f15d0e3	add Gelu opset-20 to webgpu (#21725 ) https://github.com/microsoft/onnxruntime/issues/21618	2024-08-14 09:45:05 -07:00
Satya Kumar Jandhyala	51b2044120	[JS/WebGPU] Add Dequantizelinear operator (#21642 ) ### Description Added DequantizeLinear operator for JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 14:44:19 -07:00
Guenther Schmuelling	9eb1c2a7a3	support for layernorm in webgpu pre opset-17 (#21121 ) handled the same way cpu does	2024-06-27 10:20:48 -07:00
Guenther Schmuelling	c749bd997a	webgpu quickgelu (#20939 )	2024-06-06 08:21:33 -07:00
Xu Xing	8c59cd4fce	[js/webgpu] Support GroupQueryAttention (#20237 ) TODOs: 1. Handle H * params.kvNumHeads greater than work group size limit. 2. Support BNSH kv cache.	2024-05-13 09:43:37 -07:00
Yulong Wang	50bd4571ac	[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277 ) ### Description Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for WebGPU backend.	2024-04-11 14:08:50 -07:00
MasayoshiTsutsui	6a9d8a9030	[js/webgpu] implement DepthToSpace operator in webgpu (#19948 ) ### Description This PR supports [DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace) operator in webgpu backend. ### Test We followed the steps described on [this page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce) to build, tested with the following commands, and confirmed that it passed the Model and Op tests that already existed. (Probably, these test cases were prepared in the past for WebGL backend) ``` ~/onnxruntime/js/web> % npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug ``` ##### NOTE I want to tell you that the main branch version failed 5 tests for the resize_upsample_sizes_nearest operator. Since I didn't touch this issue, those test cases still fail in my branch as well. Should I post an issue for this? ### Motivation and Context Though the DepthToSpace operator plays a crucial role in super-resolution domains, it was not supported in webgpu backend.	2024-04-10 12:13:46 -07:00
Jiajie Hu	23d3afd4fe	[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209 ) ### Description https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding ### Motivation and Context As per customer request, this helps Phi-2 and Gemma.	2024-04-08 09:11:26 -07:00
satyajandhyala	dfeda9019c	[JS/WebGPU] Add MatMulNBits (#19446 ) ### Description Add MatMulNBits to support MatMul using 4-bit quantized weights ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-17 09:19:17 -08:00
Yulong Wang	5ff27ef02a	[js/webgpu] support customop FastGelu (#19392 ) ### Description Support WebGPU custom operator FastGelu.	2024-02-06 09:07:31 -08:00
Jiajia Qin	2e0a388c36	[js/webgpu] Add HardSigmoid support (#19215 ) ### Description This op is required in mobilenetv3-small-100. With this PR, mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on ADL.	2024-01-22 15:53:26 -08:00
satyajandhyala	10c547516d	[JS/Web] Added CumSum operator to JSEP (#18637 ) ### Description Added CumSum operator ### Motivation and Context Reduce CPU <->GPU data movement.	2023-12-05 07:51:53 -08:00
Jiajia Qin	64dacc2892	[js/webgpu] Add BatchNormalization Op (#18468 ) ### Description This PR adds `BatchNormalization` with `float` support. Some Todos: 1. all inputs don't have same data type. For example, x/y is float16, but bias/scale is float32 or double. 2. training mode support. We see many models are using `BatchNormalization` ops. However, due to the missing in jsep, all of them run on cpu, which result very poor performance. With this PR's support, densenet-9 model becomes 20.29 ms from 250.69 ms.	2023-11-22 15:58:06 -08:00
Arthur Islamov	fac3e33da5	[js/web] JSEP Attention & MultiHeadAttention (#17742 ) ### Description This is a narrow implementation of Attention/MultiHeadAttention as it does not support: a. inputs 5-7 for MHA b. packed QKV/KV c. past/present d. attention mask But it works well for StableDiffusion and can be extended later. It reduces VRAM usage as it combines many ops into few I've updated demo here https://islamov.ai/stable-diffusion-webgpu/ it takes ~13sec for 1 image with 20 steps on RTX3090Ti and about 25s on M1 Pro VRAM usage is about 8gb if you don't use img2img Going to focus on SDXL now --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-11-17 12:23:52 -08:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
satyajandhyala	a2e9ba72d5	[JS/Web]Added FusedConv. (#17766 ) ### Description Added FusedConv and FusedConvTranspose ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-11-01 15:34:51 -07:00
Xu Xing	992f3e4609	[js/webgpu] Support where (#17544 ) Supported type: float. int32_t, uint32_t, bool. Case where_broadcast.jsonc is not enabled due to https://github.com/microsoft/onnxruntime/issues/17405. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-03 14:28:21 -07:00
Arthur Islamov	d0519a7603	[js/web] BiasSplitGelu and BiasAdd kernels (#17161 ) ### Description Two contrib kernels that supposed to speed-up StableDiffusion according to this doc https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md However, there is no noticable effect in speed or memory consumption. So i guess the only way to make it faster is to implement MultiHeadAttention but i'm not capable of doing that right now. So i'll focus on existing PRs and finding the JSEP kernel that produces incorrect results. It should be one of the old ones (i suspect Conv or ConvTranspose), as SD was not generating images correctly on webgpu since i started working on it. I hoped someone else would fix that by the time i finish with kernels/optimizations 😅 --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-03 12:20:20 -07:00
xhcao	0d60604638	[JS/WebGPU] support Range operator (#17233 ) The patch also introduces the method which copies data from GPU to CPU synchronously. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-30 02:05:32 -07:00
Hariharan Seshadri	460f17fbb8	[JS/WebGPU] Support If on WebGPU (#17478 )	2023-09-19 12:20:18 -07:00
xhcao	198d468849	[WebGPU/JS] Added Pad operator support (#16928 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-14 13:14:11 -07:00
satyajandhyala	bf6d6961cc	[JS/Web] Added Einsum operator support. (#17401 ) ### Description Added Einsum operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-11 15:57:15 -07:00
xhcao	9017ea131b	[js/webgpu] support GreaterOrEqual and LessOrEqual operators (#17310 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-07 17:41:16 -07:00
Hariharan Seshadri	cbd97515cd	[JS/WebGPU] Support GatherElements kernel (#17243 ) ### Description As title ### Motivation and Context Improve WebGPU kernel coverage	2023-08-28 09:55:25 -07:00
Yulong Wang	bb1871332f	[js/webgpu] add kernel Not and Equal (#17306 ) ### Description This PR adds kernel implementation for operator "Not" and "Equal". Also removed download cache in gpu data manager. Why removing download cache The following test case failed. ("Or" is on CPU, "Greater" and "Equal" are on JSEP) ![image](https://github.com/microsoft/onnxruntime/assets/7679871/8d9798ad-2703-4fb9-907e-ff716c67d0b2) after debugging, I found that both "Equal" and "Greater" are using the same output GPU Data ID. This is because when ORT executes the graph, it first run "Equal", allowing its shader to write into GPU Data ID 2; then a Gpu2Cpu copy for it is issued (because currently "Or" is on CPU EP); at this point, ORT thinks GPU Data ID=2 is free to use; so it reuse it as output for "Greater". This means there is no allocation for output of "Greater" kernel, and both kernel writes to GPU Data ID=2. For gpu data manager, there will be 2 downloads from the same GPU buffer. Previously I think this is a waste of resource so I cached the data. But now it shoes that we need to perform 2 downloads because the GPU data is already different. The download data cache should be removed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-27 19:50:17 -07:00
xhcao	5e8d94cec8	[js/webgpu] support Greater and Less operators (#17296 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 12:11:25 -07:00
Arthur Islamov	5842144d98	[js/web] JSEP Gemm for opset 13 (#16936 ) ### Description Added JSEP Gemm registration for opset 13. It was falling back to CPU provider as CPU has it for 13 --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-22 18:13:20 -07:00
Yulong Wang	6fc3fd9ece	[js/webgpu] support Cast operator (#16489 ) ### Description support `Cast` operator for webgpu backend. Cast operator for webgpu backend currently only supports f32, u32, i32 and bool.	2023-08-18 23:51:03 -07:00
Hariharan Seshadri	a476dbf430	[JS/WebGPU] Support Tile operator (#17123 ) ### Description As title ### Motivation and Context Improve WebGPU op coverage	2023-08-18 10:07:21 -07:00
satyajandhyala	7d1a5635a0	[JS/Web] Added SkipLayerNormalization operator. (#17102 ) ### Description Add SkipLayerNormalization operator to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-18 09:59:03 -07:00
xhcao	24e0bd37b4	[JS/WebGPU] Support Log operator (#17045 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-14 18:04:12 -07:00
Guenther Schmuelling	9204cd7392	[js/webgpu] Add C++ registration for operator Tanh in JSEP (#17124 ) add webgpu/tanh Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-12 11:43:39 -07:00
satyajandhyala	e8a9d4f04d	[JS/Web] Fix Resize kMSInternalNHWCDomain (#17023 ) ### Description Fix some Resize failing tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-10 09:14:43 -07:00
Arthur Islamov	c3f04251c7	[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830 ) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks	2023-08-08 09:09:37 -07:00
Arthur Islamov	ea55700e1c	[js/web] JSEP Gather OP (#16855 ) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-03 14:09:37 -07:00
Guenther Schmuelling	0df2e14038	js/webgpu: argmax,argmin,softmax support (#16882 ) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin	2023-08-02 18:16:19 -07:00
satyajandhyala	d399648869	[JS/Web] Added Resize kMSInternalNHWCDomain domain registration. (#16946 ) ### Description Added Resize NHWC domain kernel registration. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-02 14:16:21 -07:00
satyajandhyala	77b2b618b2	[JS/WebGPU] Add Resize operator (#16680 ) ### Description Implemented Resize operator support in JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:35:06 -07:00
satyajandhyala	dd24d52737	[JS/Web] Added Gelu contrib operator support to JSEP (#16909 ) ### Description Added Gelu operator to JSEP ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-31 09:18:58 -07:00
satyajandhyala	e67547b978	[JS/WebGPU] Added Flatten operator support. (#16860 ) ### Description Added Flatten operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-27 12:50:45 -07:00
satyajandhyala	03ce0a5693	[Web/JS] Added Slice operator in JSEP. (#16811 ) ### Description Added Slice operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-25 14:19:20 -07:00
satyajandhyala	d41bbac7b9	[Web/JS] Added Expand operator support. (#16577 ) ### Description Added Expand operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-11 09:38:16 -07:00
satyajandhyala	00e8f2a2a9	[Web/JS] Add ConvTranspose support (#16433 ) ### Description Add ConvTranspose support for WebGPU ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-08 11:10:50 -07:00
satyajandhyala	e55a20ece8	[Web/JS] Added Split operator support. (#16567 ) ### Description Added WeGPU/JSEP Split operator support. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-07 12:16:10 -07:00

1 2

55 commits