onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

Author	SHA1	Message	Date
Xu Xing	0d7cf301a1	[js/webgpu] Add activation Tanh (#21540 ) Bug:https://github.com/microsoft/onnxruntime/issues/21467 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 11:05:34 -07:00
Xu Xing	5bc12bf209	[js/webgpu] Add activation for conv3d naive (#21466 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 08:47:41 -07:00
Xu Xing	92a8407b39	[js/webgpu] Remove unnecessary initialization of var (#21312 ) This var has been initialized to 0 in tint, so no need extra loop to do it again: ``` float tint_symbol_52[1][4] = (float[1][4])0; { for(int tint_symbol_53 = 0; (tint_symbol_53 < 1); tint_symbol_53 = (tint_symbol_53 + 1)) { { for(int tint_symbol_54 = 0; (tint_symbol_54 < 4); tint_symbol_54 = (tint_symbol_54 + 1)) { tint_symbol_52[min(uint(tint_symbol_53), 0u)][min(uint(tint_symbol_54), 3u)] = 0.0f; } } } } ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-12 12:34:34 -07:00
Xu Xing	c3076721f3	[js/webgpu] Support conv3d naive (#20706 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-06-19 10:13:50 -07:00
Guenther Schmuelling	c749bd997a	webgpu quickgelu (#20939 )	2024-06-06 08:21:33 -07:00
Xu Xing	25ac65375c	[js/webgpu] Fix mha name (#20860 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-30 00:01:06 -07:00
Guenther Schmuelling	33a68d221f	add missing file for pr20791 (#20811 ) this file should have been in pr20791 to allow fp16 in the tile implementation	2024-05-24 09:59:13 -07:00
Satya Kumar Jandhyala	bab5037eab	Eliminate explicit Concat operations in Attention (#20556 ) ### Description Remove explicitly concatinating pastKey with Key and pastValue with Value. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-05-24 09:07:57 -07:00
Xu Xing	f1fef19b6e	[js/webgpu] Support shared memory for transpose 2d (#19267 ) For 1024x1024, without shared memoey, 18.7ms. With shared memory 13.2ms.	2024-05-22 08:15:44 -07:00
Xu Xing	8c59cd4fce	[js/webgpu] Support GroupQueryAttention (#20237 ) TODOs: 1. Handle H * params.kvNumHeads greater than work group size limit. 2. Support BNSH kv cache.	2024-05-13 09:43:37 -07:00
Guenther Schmuelling	55a6986d38	optimize skiplayernorm (#20551 ) SkipSimplifiedLayerNormalization used in phi3 comes down from 222usec to 14usec	2024-05-08 08:40:03 -07:00
Satya Kumar Jandhyala	99b0e19f11	[JS/WebGPU] MatMulNBits remove unnecessary condition (#20396 ) Distribute writing-to-output work over all threads in MatMulNBits. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-29 14:27:21 -07:00
Satya Kumar Jandhyala	736cbb3925	[JS/WebGU] Support fp16 in Attention by performing the computation in fp32. (#20486 ) ### Description Perform computation in fp32 and convert finally to fp16. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-27 08:30:26 -07:00
Satya Kumar Jandhyala	21b3cbc3af	[WIP][JS/WebGPU] Inputs Key and Value could be 4-dims. (#20470 ) ### Description The Key and Value inputs could be 4-dims ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-25 13:33:46 -07:00
Satya Kumar Jandhyala	ae78cdb5d7	[JS/WebGPU] MultiheadAttention bugfix (#20447 ) ### Description Fixed pastkey, key and pastvalue, value concatenation condition and fixed index error. Added new test cases. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-24 08:43:14 -07:00
Guenther Schmuelling	33d5ea39b3	[js/webgpu] fixes for fp16 attention (#20440 )	2024-04-24 08:01:28 -07:00
Satya Kumar Jandhyala	d42ac7f0c6	[JS/WebGPU] Multihead attention improvements (#20286 ) ### Description Enabled more usecases ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-23 12:39:49 -07:00
Guenther Schmuelling	b8e6684313	more conservitive gpu-buffer cache algo (#20312 ) tuned based on 80 models to keep performance impact minimal	2024-04-23 09:07:04 -07:00
Guenther Schmuelling	497a627a69	fix fp16 for skiplayernorm (#20381 )	2024-04-19 12:12:02 -07:00
Guenther Schmuelling	a8a77ddfdc	fix csum and enable ut (#20355 )	2024-04-17 15:01:06 -07:00
Satya Kumar Jandhyala	b33216be4c	[JS/WebGPU] Improve MatMulNBits perf (#19974 ) ### Description <!-- Describe your changes. --> Improve performance using shared memory ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-12 11:03:05 -07:00
Yulong Wang	50bd4571ac	[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277 ) ### Description Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for WebGPU backend.	2024-04-11 14:08:50 -07:00
MasayoshiTsutsui	6a9d8a9030	[js/webgpu] implement DepthToSpace operator in webgpu (#19948 ) ### Description This PR supports [DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace) operator in webgpu backend. ### Test We followed the steps described on [this page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce) to build, tested with the following commands, and confirmed that it passed the Model and Op tests that already existed. (Probably, these test cases were prepared in the past for WebGL backend) ``` ~/onnxruntime/js/web> % npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug ``` ##### NOTE I want to tell you that the main branch version failed 5 tests for the resize_upsample_sizes_nearest operator. Since I didn't touch this issue, those test cases still fail in my branch as well. Should I post an issue for this? ### Motivation and Context Though the DepthToSpace operator plays a crucial role in super-resolution domains, it was not supported in webgpu backend.	2024-04-10 12:13:46 -07:00
Jiajie Hu	23d3afd4fe	[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209 ) ### Description https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding ### Motivation and Context As per customer request, this helps Phi-2 and Gemma.	2024-04-08 09:11:26 -07:00
Guenther Schmuelling	c529e05e38	fix ConvTranspose 1D (#20194 )	2024-04-05 10:05:32 -07:00
Yulong Wang	fa1917b81b	[js/webgpu] add validation to workgroup size (#20110 ) ### Description add validation to workgroup size in `shaderHelper.mainStart()`.	2024-04-02 19:29:20 -07:00
Xu Xing	a2998e5d42	[js/webgpu] Use global id in attention and instance-norm (#20008 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-02 01:42:39 -07:00
Yulong Wang	473434c73f	[js/webgpu] perform uniform consistency check (#20019 ) ### Description This PR makes a change in WebGPU backend to validate program uniforms. It compares the uniform data that comes from the result of `getRunData()` callback from the program info, with the `ShaderHelper`'s maintained list of uniform variables. Fixes a few bugs that found by this check as well.	2024-03-26 17:14:43 -07:00
Satya Kumar Jandhyala	5b64d7c32b	[JS/WebGPU] Use non-matmul implementation for ConvTranspose in channel-first case. (#20022 ) ### Description Avoid using vec4 Matmul implementation for ConvTranspose with channel-last ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-23 11:19:14 -07:00
Guenther Schmuelling	c45cff60cf	[js/webgpu] fix maxpool / fp16 (#19981 )	2024-03-19 16:15:49 -07:00
Xu Xing	4c6a6a37f7	[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387 ) The added case will be NAN because of the un-initialized buffer.	2024-03-18 22:59:32 -07:00
Guenther Schmuelling	7e0d424934	accumulate in fp32 for Reduce* (#19868 )	2024-03-18 08:28:43 -07:00
Satya Kumar Jandhyala	ed250b88c3	[JS/WebGPU] Optimize MatMulNBits (#19852 ) ### Description Use vec<2> or vec<4>, operands in MatMulNBits ### Motivation and Context Improve performance	2024-03-13 10:33:14 -07:00
Yang Gu	53de2d8cb0	[js/webgpu] Enable GroupedConvVectorize path (#19791 ) Vectorize met 2 failed cases in a CI bot with NVIDIA GPU, but we couldn't repro with all the GPUs at hand, including NVIDIA GPUs. This PR introduces GPUAdapterInfo and enables this opt on non-NVIDIA GPUs to make the bots happy. No obivous perf gain can be seen if we enable vectorize on NVIDIA. However, it shows big perf improvement on Intel. On my Gen12 Intel GPU, mobilenetv2-12 perf was improved from 11.14ms to 7.1ms.	2024-03-12 22:25:07 -07:00
Satya Kumar Jandhyala	24b72d2613	[JS/WebGPU] Preserve zero size input tensor dims. (#19737 ) ### Description For Concat operation, the zero-size input tensor shape need to be preserved and, unlike non-zero tensors, the dims are not constrained to match other input tensors' dims. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-07 19:07:49 -08:00
Guenther Schmuelling	bb43a0f133	[js/webgpu] minor fixes to make tinyllama work (#19564 )	2024-02-23 15:45:30 -08:00
satyajandhyala	ae3d73c981	[JS/WebGPU] Fix Split and Where to handle corner cases. (#19613 ) ### Description <!-- Describe your changes. --> 1. Fix Where operator to handle Boolean input less than 4 bytes. 2. Fix JSEP test harness to use tensor names consistently. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-23 00:21:15 -08:00
Xu Xing	fe82fccf1a	[js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596 ) This is used in sam-h-decoder-f16. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-22 13:09:28 -08:00
Yulong Wang	3fe2c137ee	[js] small fix to workaround formatter (#19400 ) ### Description Rename shader variable names to snake_case naming and also to avoid formatter behaving inconsistently in win/linux.	2024-02-20 17:23:01 -08:00
Jiajie Hu	1b48054e1b	[js/webgpu] Create Split indices helpers by rank, not by shape (#19554 ) ### Description This is required to make shape uniforms really work. ### Motivation and Context The bug was unveiled in a model with multiple Split nodes. The later nodes would try to reuse a previous pipeline cache, while the old shapes were hardcoded as constants in cache.	2024-02-20 09:24:34 -08:00
satyajandhyala	dfeda9019c	[JS/WebGPU] Add MatMulNBits (#19446 ) ### Description Add MatMulNBits to support MatMul using 4-bit quantized weights ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-02-17 09:19:17 -08:00
Yulong Wang	5ff27ef02a	[js/webgpu] support customop FastGelu (#19392 ) ### Description Support WebGPU custom operator FastGelu.	2024-02-06 09:07:31 -08:00
Jiajia Qin	ccbe264a39	[js/webgpu] Add LeakyRelu activation for fusedConv (#19369 ) ### Description This PR 1) adds LeakyRelu activation for fusedConv; 2) makes `vec4<f16>` value work with `float32` uniforms attributes. For example: `clamp(value, vec4<f16>(uniforms.clip_min), vec4<f16>(uniforms.clip_max)` will throw compilation errors since `uniforms.clip_min` and `uniforms.clip_min` are `f32` not `f16`. So we need to change it to `clamp(value, vec4<f16>(f16(uniforms.clip_min)), vec4<f16>(f16(uniforms.clip_max))` And above problem was introduced when we make activation attributes as uniforms instead of constant. BTW, after adding LeakyRelu, `realesrgan-t256` model can pass.	2024-02-02 09:06:38 -08:00
Xu Xing	3a2ab1963a	[js/webgpu] Refactor createTensorShapeVariables (#18883 )	2024-02-01 17:59:00 -08:00
Xu Xing	d73131cf0f	[js/webgpu] Use DataType as uniform cpu type (#19281 ) This saves turning data type to string by tensorDataTypeEnumToString.	2024-01-30 21:05:08 -08:00
Jiajia Qin	85cef0af8c	[js/webgpu] Support capture and replay for jsep (#18989 ) ### Description This PR expands the graph capture capability to JS EP, which is similar to #16081. But for JS EP, we don't use the CUDA Graph, instead, we records all gpu commands and replay them, which removes most of the cpu overhead to avoid the the situation that gpu waiting for cpu. mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from 4.58ms on Intel A770. All limitations are similar with CUDA EP: 1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. 2. Usage of graph capture is limited to models where-in all ops in the model can be partitioned to the JS EP or CPU EP and no memory copy between them. 3. Shapes of inputs/outputs cannot change across inference calls. 4. IObinding is required. The usage is like below: Method 1: specify outputs buffers explicitly. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer/outputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; const fetches = { 'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] }) }; let results = await session.run(feeds, fetches); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds, fetches); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release(); ``` Method 2: Don't specify outputs buffers explicitly. Internally, when graph capture is enabled, it will set all outputs location to 'gpu-buffer'. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; let results = await session.run(feeds); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release();	2024-01-30 18:28:03 -08:00
Jiajia Qin	90883a366a	[js/webgpu] Add hardSigmoid activation for fusedConv (#19233 ) ### Description Add hardSigmoid activation for fusedConv. It will be used by mobilenetv3-small-100 model.	2024-01-30 16:28:53 -08:00
Xu Xing	624b4e2063	[js/webgpu] Remove enableShapesUniforms (#19279 )	2024-01-29 17:49:06 -08:00
Guenther Schmuelling	9e69606360	fix f16 for attention, enable slice and flatten for more types (#19262 )	2024-01-29 10:13:46 -08:00
Xu Xing	a3f0e2422b	[js/webgpu] Support f16 uniform (#19098 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-01-25 16:58:22 -08:00

1 2 3 4

196 commits