onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-22 22:01:08 +00:00

Author	SHA1	Message	Date
satyajandhyala	b291b20fa0	[JS/Web]Added uniforms support to Slice op. (#18422 ) ### Description Support uniforms in Slice op ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve ferformance	2023-11-16 09:44:13 -08:00
Yulong Wang	586f06f5a1	[js/web] set noUnusedParameters to true and fix a few bugs (#18404 ) ### Description - set tsconfig "noUnusedParameters" to `true` and fix a few bugs discovered by typescript. how unused parameter is fixed: - for most code (webgl), add underscore as prefix, which is the standard ignore pattern for typescript check. - remove unused parameter from function and modify corresponding function calls (jsep) - fix a bug in ArgMinMax: this 2 operators do not have more than one input(s) so the `createArgMinMaxAttributesFromInputs()` is removed. - add proxy main.ts into typescript check and fix a bug in parameter passing - fixed `run()` function call and add typecheck fix (hack)	2023-11-15 09:16:29 -08:00
Xu Xing	829d802337	[js/webgpu] Support uniform for softmax (#18345 )	2023-11-09 11:19:23 -08:00
Scott McKay	4f2096be38	Update XNNPACK to latest version (#18038 ) ### Description <!-- Describe your changes. --> Update XNNPACK to latest version - adds fp16 kernels and various other improvements - requires pthreadpool update as well Most code updates in the XNNPACK EP are to adjust to the new XNNPACK API - 'setup' is split into 'reshape' and 'setup' - some ops use a workspace buffer - copied workspace allocation from XNNPACK unit test code - some suffixes changed Added wrapper for XNNPACK caches to base XNNPACK EP kernel - simplifies usage - XNNPACK split out the code and weights caches, but the code cache isn't currently usable via the public API - we could use the internal types if we think it's required for performance reasons. non-trivial though as we'd need to propagate ifdef values from the XNNPACK build up to the ORT build. - using XNNPACK internals would also mean we would not be able to support using a pre-build XNNPACK package - not an issue currently Fixed opset registration for internal NHWC domain - was not being tied to the ONNX version, so nodes inserted by layout transformation had the incorrect opset - a number of other places needed updating once this issue was fixed Remove support for NCHW Resize from XNNPACK EP so it's NHWC only - we only supported NCHW for fp32, - doing so adds complexity in multiple places (XNNPACK EP kernel implementation, layout transformation and transpose optimization) - unclear if that complexity provides any benefit. can add back if required by production scenario ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> We're looking at enabling fp16 support for CoreML and NNAPI. If we do that we need a good fallback story if the CPU EP will be used. The XNNPACK fp16 kernels will hopefully provide that. NOTE: This PR doesn't add fp16 support to the XNNPACK EP kernels. That can be done as required in separate EPs and should be relatively simple to do.	2023-11-03 09:04:28 -07:00
satyajandhyala	a2e9ba72d5	[JS/Web]Added FusedConv. (#17766 ) ### Description Added FusedConv and FusedConvTranspose ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve performance	2023-11-01 15:34:51 -07:00
Jiajia Qin	8a12b2cea6	[js/webgpu] Fix the transpose error when dims > 4D (#18027 ) ### Description <!-- Describe your changes. --> Currently, the uniform support has bugs when dims rank is larger than 4. See https://github.com/microsoft/onnxruntime/issues/17860 item 1. So this PR only enables shapes uniforms when shape rank is <= 4 for transpose. Otherwise, below compilation errors are thrown: ``` 1 error(s) generated while compiling the shader: :3:50 error: uniform storage requires that array elements are aligned to 16 bytes, but array element of type 'u32' has a stride of 4 bytes. Consider using a vector or struct as the element type instead. struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> }; ^^^^^^^^^^^^^ :3:7 note: see layout of struct: /* align(4) size(84) / struct Uniforms { / offset( 0) align(4) size( 4) / output_size : u32; / offset( 4) align(4) size(20) / a_shape : array<u32, 5>; / offset(24) align(4) size(20) / a_strides : array<u32, 5>; / offset(44) align(4) size(20) / output_shape : array<u32, 5>; / offset(64) align(4) size(20) / output_strides : array<u32, 5>; / */ }; struct Uniforms { output_size:u32, a_shape:array<u32, 5>, a_strides:array<u32, 5>, output_shape:array<u32, 5>, output_strides:array<u32, 5> }; ^^^^^^ :4:42 note: 'Uniforms' used in address space 'uniform' here @group(0) @binding(2) var<uniform> uniforms: Uniforms; ^^^^^^^^ ```	2023-10-23 11:02:19 -07:00
Yulong Wang	6ea493571e	[js/web] use esbuild to accelerate bundle build (#17745 ) ### Description Use esbuild to accelerate bundle build. This change uses esbuild to replace webpack for onnxruntime-web. Bundle build time reduced from ~20sec to ~0.6sec on my windows dev box. A few changes applied: - import nodejs modules using "node:" prefix - remove enum declaration inside namespace (EncoderUsage) - use "fs/promise" to replace the old promisify from "util" - separate ort-web and test-runner. Previously they are bundled together, now they are built into 2 files. - optimize karma runner launch time - remove unnecessary sourcemap preprocessor. sourcemaps are handled inside esbuild - remove unnecessary proxies (because ort-web and test-runner are separated now, the path are correctly inferred) - remove file watcher for test data - optimize special handling as esbuild plugins: - polyfill dummy imports for node.js modules when targetting browser. - load as content string for ort-wasm-.worker.js - load as content string for ./proxy-worker/main.ts - a source patch to ort-wasm-threaded*.js (see details in comments in code) - updated debug configurations for sourcemap mapping to ensure out-of-box good dev experience	2023-10-06 13:37:37 -07:00
Jiajia Qin	db3901ab97	[js/webgpu] Enable the NCHW ConvMatMul path (#17717 ) 1) Enable pointwise NCHW conv2d by MatMul. 2) Enable non-pointwise NCHW conv2d by convMatMul. 3) Fix bug when `sameSize` is true --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-05 00:26:01 -07:00
Xu Xing	992f3e4609	[js/webgpu] Support where (#17544 ) Supported type: float. int32_t, uint32_t, bool. Case where_broadcast.jsonc is not enabled due to https://github.com/microsoft/onnxruntime/issues/17405. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-03 14:28:21 -07:00
Arthur Islamov	d0519a7603	[js/web] BiasSplitGelu and BiasAdd kernels (#17161 ) ### Description Two contrib kernels that supposed to speed-up StableDiffusion according to this doc https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md However, there is no noticable effect in speed or memory consumption. So i guess the only way to make it faster is to implement MultiHeadAttention but i'm not capable of doing that right now. So i'll focus on existing PRs and finding the JSEP kernel that produces incorrect results. It should be one of the old ones (i suspect Conv or ConvTranspose), as SD was not generating images correctly on webgpu since i started working on it. I hoped someone else would fix that by the time i finish with kernels/optimizations 😅 --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-10-03 12:20:20 -07:00
xhcao	0d60604638	[JS/WebGPU] support Range operator (#17233 ) The patch also introduces the method which copies data from GPU to CPU synchronously. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-30 02:05:32 -07:00
Yulong Wang	561aca97cf	[js/webgpu] support IO binding (#17480 ) <del> This PR is based on a few prerequisites PRs. They are listed as below: - #17465 - #17469 - #17470 - #17472 - #17473 - #17484 Please review the current change by only looking at commit e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later. </del> ### Description This PR introduces WebGPU IO binding. This new feature allows onnxruntime-web users to use tensors created from GPU as model input/output so that a model inferencing can be done without unnecessary data copy between CPU and GPU for model input/output. ### Examples An E2E demo/example is being worked on. Following is some simple demo with code snippet. Let's first check today how we do: ```js // STEP.1 - create an inference session: const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] }); // STEP.2 - create model input: (supposing myImageCpuData is a Float32Array) const feeds = { 'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3]) }; // STEP.3 - run model const myResults = await mySession.run(feeds); // STEP.4 - get output data const myData = myResults['output_image:0'].data; // Float32Array ``` #### for inputs (GPU tensor): Now, with IO binding, you can create a tensor from a GPU buffer, and feed it to the model: ```js // new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data) const feeds = { 'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] }) }; ``` ### for outputs (pre-allocated GPU tensor) you can also do that for output, if you know the output shape: ```js // new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object) const fetches = { 'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] }) }; // new STEP.3 - run model with pre-allocated output (fetches) const myResults = await mySession.run(feeds, fetches); ``` ### for outputs (specify location) if you do not know the output shape, you can specify the output location when creating the session: ```js // new STEP.1 - create an inference session with an option "preferredOutputLocation": const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: "gpu-buffer" }); ``` if the model has multiple outputs, you can specify them seperately: ```js // new STEP.1 - create an inference session with an option "preferredOutputLocation": const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'], preferredOutputLocation: { "output_image:0": "gpu-buffer" } }); ``` now you don't need to prepare the `fetches` object and onnxruntime-web will prepare output data on the location that specified. #### read data when you get the output tensor, you can: ```js // get the gpu buffer object: const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer // get the CPU data asynchronizely const cpuData = await myOutputTensor.getData(); // get the CPU data asynchronizely and release the underlying GPU resources const cpuData = await myOutputTensor.getData(true); // dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called. myOutputTensor.dispose(); ``` #### resource management JavaScript has GC so you don't need to worry about managing JavaScript objects. But there are 2 types of resources that are not managed by GC: - GPU buffer that used in tensors - Underlying ORT native resources To simplify, most of the unmanaged resources and handled inside ORT web. But there are a few resources that need users to manage: - All external GPU resources, including GPU buffers inside all tensors created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User should manage those GPU buffers themselves. - When a session is created with `preferredOutputLocation` == "gpu-buffer" specified in session options, and the corresponding output is not pre-allocated, user need to call the output tensor's `dispose()` or `getData(true)` to manually release the underlying GPU buffers. - ORT internal errors (including providing a pre-allocated output tensor with wrong type/dims) will invalidate the whole wasm memory and is not recoverable. An exception is thrown in this situation.	2023-09-29 11:24:42 -07:00
satyajandhyala	b4fbc25b1f	[JS/Web] Add ConvTranspose implementation using MatMul (#17573 ) ### Description Add ConvTranspose implementation using MatMul to increase perf. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-29 11:00:44 -07:00
Hariharan Seshadri	460f17fbb8	[JS/WebGPU] Support If on WebGPU (#17478 )	2023-09-19 12:20:18 -07:00
Jiajia Qin	41d2ff622c	[js/webgpu] Optimize InstanceNormalization (#17491 ) ### Description <!-- Describe your changes. --> In previous implementation, there are two loops to iterate H * W elements to calculate the `mean` and `squaredNorm` value in one thread, meanwhile it outputs H * W elements in one thread. That results it's very very slow when H * W is a large value. And usually, H * W does be a large value in a model. For example, in the `candy-8` model, the shapes of [H, W] are [224,224], [112,112], [56,56] for `InstanceNormalization` op. And in my ADL, `[1,224,224,32]` consumes 17 ms. See below: ``` [profiling] kernel "23848328\|[InstanceNormalization] 23848328" input[0]: [1,224,224,32] \| float32, input[1]: [32] \| float32, input[2]: [32] \| float32, output[0]: [1,224,224,32] \| float32, execution time: 17007914 ns ``` In this PR, it uses workgroup memory to optimize the original algorithm. The advantage is that it can parallelly utilize the 64 (workgroupSize) threads in one workgroup to calculate `mean` and `squaredNorm` value. Meanwhile, it only outputs `H * W / workgroupSize` outputs for one thread, which greatly reduces the overhead for one thread. With this optimization, `[1,224,224,32]` becomes 3 ms and the main overhead is the extra two `transpose`. The `createInstanceNormProgramInfo` only needs `0.64` ms. See below: ``` [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,224,224,32] \| float32, output[0]: [1,32,224,224] \| float32, execution time: 1543792 ns program-manager.ts:115 [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] \| float32, input[1]: [32] \| float32, input[2]: [32] \| float32, output[0]: [1,32,224,224] \| float32, execution time: 642652 ns program-manager.ts:115 [profiling] kernel "23003600\|[InstanceNormalization] 23003600" input[0]: [1,32,224,224] \| float32, output[0]: [1,224,224,32] \| float32, execution time: 991608 ns ``` This PR currently only applies the new algorithm to NCHW format. For NHWC format, one way is to transpose the input so that it can use the new algorithm. But the disadvantage is that 2 extra transpose are added. @dakenf also gives another way to optimize NHWC. Details see [here](`d45a96616d/js/web/lib/wasm/jsep/webgpu/ops/instance-norm.ts`). I checked @dakenf's method. The perf is similar with transpose + optimized NCHW. But on different GPUs, one is a little better than another or vice versa. So I prefer this PR only does the NCHW part. @dakenf can submit his optimization on NHWC.	2023-09-14 17:03:18 -07:00
xhcao	198d468849	[WebGPU/JS] Added Pad operator support (#16928 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-14 13:14:11 -07:00
xhcao	ec94b07f0a	[JS/WebGPU] support Concat.int32 operator (#17003 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-13 00:05:00 -07:00
Yulong Wang	f923eec28b	[js/web] release session after use in npm test (#17470 ) ### Description release session after use in npm test. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: #17465 #17469 #17470 (this one)	2023-09-12 16:59:13 -07:00
satyajandhyala	bf6d6961cc	[JS/Web] Added Einsum operator support. (#17401 ) ### Description Added Einsum operator support to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-11 15:57:15 -07:00
Yulong Wang	89da5a0108	[js/webgpu] exclude WebGPU reduce_log_sum_exp_* float64 test cases (#17472 ) ### Description as explained in the comments, tests "test_reduce_log_sum_exp_*" on opset17/opset18 are excluded because they use float64. They are passing now because they fallback to CPU. WebGPU does not support f64. This is one of the prerequisites for supporting IO binding for WebGPU buffer in onnxruntime-web. list of prerequisites PRs: https://github.com/microsoft/onnxruntime/pull/17465 https://github.com/microsoft/onnxruntime/pull/17469 https://github.com/microsoft/onnxruntime/pull/17470 https://github.com/microsoft/onnxruntime/pull/17472 (this one)	2023-09-08 17:03:04 -07:00
Jian Chen	8914fe687b	[js/webgpu] Include Support for neg.int32 (#17374 ) ### Description Include Support for neg.int32 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-06 12:00:16 -07:00
Yulong Wang	75710f0006	[js/webgpu] add matmul broadcast tests (#17335 ) ### Description Commit `fffefb1c22` (#16969) optimized matmul and also fixes broadcasting. So #17191 is no longer needed. However, the newly added operator test file from the PR by @dakenf is helpful so pick and add it to enhance the tests.	2023-09-05 20:41:46 -07:00
xhcao	026672e947	[js/webgpu] Support slice int32 (#16968 ) Co-authored-by: Xing Xu <xing.xu@intel.com>	2023-09-05 18:05:47 -07:00
Jian Chen	e60493525f	[js/webgpu] Adding support for abs with int32 type (#17359 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-31 08:13:54 -07:00
Yulong Wang	e5ca3f3dcb	[js/api] introducing IO binding for tensor (#16452 ) [//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve #15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```	2023-08-29 12:58:26 -07:00
Jiajia Qin	fffefb1c22	[js/webgpu] Optimize matmul (#16969 ) ### Description Changes in this PR: 1) use the optimized version `makeMatMulPacked[Vec4]Source` to support matmul. 2) enable the conv2dByMatMul path. 3) support broadcast 4) use IndicesHelper. MatMul with M = 512, K = 512, N = 512 becomes 2ms from 15ms when enabling profilingMode on my ADL.	2023-08-29 12:40:57 -07:00
Hariharan Seshadri	cbd97515cd	[JS/WebGPU] Support GatherElements kernel (#17243 ) ### Description As title ### Motivation and Context Improve WebGPU kernel coverage	2023-08-28 09:55:25 -07:00
Yulong Wang	bb1871332f	[js/webgpu] add kernel Not and Equal (#17306 ) ### Description This PR adds kernel implementation for operator "Not" and "Equal". Also removed download cache in gpu data manager. Why removing download cache The following test case failed. ("Or" is on CPU, "Greater" and "Equal" are on JSEP) ![image](https://github.com/microsoft/onnxruntime/assets/7679871/8d9798ad-2703-4fb9-907e-ff716c67d0b2) after debugging, I found that both "Equal" and "Greater" are using the same output GPU Data ID. This is because when ORT executes the graph, it first run "Equal", allowing its shader to write into GPU Data ID 2; then a Gpu2Cpu copy for it is issued (because currently "Or" is on CPU EP); at this point, ORT thinks GPU Data ID=2 is free to use; so it reuse it as output for "Greater". This means there is no allocation for output of "Greater" kernel, and both kernel writes to GPU Data ID=2. For gpu data manager, there will be 2 downloads from the same GPU buffer. Previously I think this is a waste of resource so I cached the data. But now it shoes that we need to perform 2 downloads because the GPU data is already different. The download data cache should be removed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-27 19:50:17 -07:00
xhcao	5e8d94cec8	[js/webgpu] support Greater and Less operators (#17296 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 12:11:25 -07:00
satyajandhyala	da180b20fa	[JS/Web] Fix ConvTranspose shader code compilation errors. (#17232 ) ### Description Fix JSEP ConvTranspose shader code errors. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-25 06:25:54 -07:00
Yulong Wang	6fc3fd9ece	[js/webgpu] support Cast operator (#16489 ) ### Description support `Cast` operator for webgpu backend. Cast operator for webgpu backend currently only supports f32, u32, i32 and bool.	2023-08-18 23:51:03 -07:00
xhcao	dd3b2cefd6	[js/webgpu] Support int32 type for binary (#16901 ) ### Description Enable typed binary and support int32 type for binary. Co-authored-by: Xing Xu <xing.xu@intel.com> --------- Co-authored-by: Xing Xu <xing.xu@intel.com>	2023-08-18 12:19:01 -07:00
Hariharan Seshadri	a476dbf430	[JS/WebGPU] Support Tile operator (#17123 ) ### Description As title ### Motivation and Context Improve WebGPU op coverage	2023-08-18 10:07:21 -07:00
satyajandhyala	7d1a5635a0	[JS/Web] Added SkipLayerNormalization operator. (#17102 ) ### Description Add SkipLayerNormalization operator to JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-18 09:59:03 -07:00
Yulong Wang	cbee84ddfb	[js/web] allow optional input/output in operator test (#17184 ) ### Description allow optional input/output in operator test	2023-08-16 11:50:11 -07:00
Hariharan Seshadri	66df11769c	[JS/WebGPU] Expand operator fixes (#17137 )	2023-08-16 11:24:26 -07:00
satyajandhyala	89b682e3f3	[JS/Web] The bias input is optional, not required, for LayerNormalization operator (#17143 ) ### Description Fix a typo. LayerNormalization takes 2 or 3 inputs. The third input, bias, is optional. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-16 10:41:20 -07:00
xhcao	33ecde9af1	[js/webgpu] Fix reshape int32 test case (#17113 ) Co-authored-by: Xing Xu <xing.xu@intel.com> Co-authored-by: Xing Xu <xing.xu@intel.com>	2023-08-15 21:18:13 -07:00
Yulong Wang	35363dd9a5	[js/web] a few optimizations for test runner (#17174 ) ### Description 1. allows passing session options to operator test (eg. graph optimization level) 2. add a short flag '-x' for '--wasm-number-threads' as it is frequently used.	2023-08-15 21:00:23 -07:00
xhcao	24e0bd37b4	[JS/WebGPU] Support Log operator (#17045 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-14 18:04:12 -07:00
Yulong Wang	e7adbb38f6	[js/webgpu] disable test case 'test_batchnorm_epsilon_training_mode' temporarily (#17129 ) ### Description test case 'test_batchnorm_epsilon_training_mode' on webgpu is failing. the issue need time to investigate so comment this off and re-enable it when the root cause is fixed.	2023-08-12 08:53:10 -07:00
satyajandhyala	e8a9d4f04d	[JS/Web] Fix Resize kMSInternalNHWCDomain (#17023 ) ### Description Fix some Resize failing tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-10 09:14:43 -07:00
Yulong Wang	56bced0581	[js/web] enable webgpu in browser unit test (#16310 ) ### Description enable webgpu in browser unit test. The CI pipeline uses Edge v113+ which enables WebGPU. === UPDATE on 08/07/2023: - add flags to Edge browser launch commandline so that Edge on CI agents can initialize WebGPU correctly. - ONLY enable webgpu on web release build. Other pipelines are using flag `-b=wasm,webgl,xnnpack` to specify the other 3 backends explicitly. - disable "Resize" related test failures. Once they are fixed the tests can be re-enabled. --------- Co-authored-by: Satya Jandhyala <satya.k.jandhyala@gmail.com>	2023-08-08 11:45:04 -07:00
Arthur Islamov	c3f04251c7	[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830 ) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks	2023-08-08 09:09:37 -07:00
satyajandhyala	7ad43d9564	[JS/Web] Fixed ArgMin and ArgMax and refactored (#17002 ) Fixed ArgMin and ArgMax and refactored using functionality from Reduce operator code. ### Description Removed code/functionality duplication and fixed some issue. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-04 12:59:36 -07:00
satyajandhyala	cc4b64f646	[JS/Web] Modify Reduce, Expand and Slice to pass op and node tests. (#16979 ) ### Description Make CacheHint mechanism, which is designed to avoid running the same test multiple times saving the result mapped against a key, working by adding input dims. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-03 15:48:47 -07:00
Yulong Wang	641c3a4a37	[js/web] update op test schema (#16921 ) ### Description update op test schema. This changes fixes several problems for operator tests for web: - `opsets` -> `opset`: an operator uses exactly one opset instead of multiple - `condition` -> `platformCondition`: make it less confusing - `inputShapeDefinitions`: allows to test ORT behaviors when it get no/partial/full shape info. Added a JSON schema file and also an example file	2023-08-03 14:20:20 -07:00
Arthur Islamov	ea55700e1c	[js/web] JSEP Gather OP (#16855 ) ### Description Added Gather op that works with both i32 and i64 indices, assuming that values fall into i32 limit. The assumption is safe because it's not possible to allocate more than 2gb buffer for inputs. It treats all data from input tensor as u32, copying 1 or 2 elements for i64, u64 and double. --------- Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>	2023-08-03 14:09:37 -07:00
Guenther Schmuelling	0df2e14038	js/webgpu: argmax,argmin,softmax support (#16882 ) argmax and argmin are similar to reduce. Eventually we need to add optimized flavors of the shader. softmax is optimized but only works on the last axis for now which should be the common use case. todo: enable more ut for argmax/argmin	2023-08-02 18:16:19 -07:00
Hariharan Seshadri	506ddb3d5d	[js/WebGPU] Support int32 Transpose in WebGPU (#16952 )	2023-08-02 16:27:24 -07:00

1 2 3

120 commits