onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-22 02:30:26 +00:00

Author	SHA1	Message	Date
Yang Gu	9e5153b688	[js/webgpu] Manage model download with a specific unittest option (#22214 ) Currently in debug mode, unit test will always download models to local file system, which is a bit annoying. This PR fixes this by adding a specific option to enable model download.	2024-09-30 18:27:43 -07:00
Yang Gu	c75f4a09b7	[js/webgpu] Remove the limitation on axis in softmax (#22231 ) In current implementation, axis in softmax has to be the last, which is an obvious limitation. This PR removes this limitation and will fix issues #20710 and #22176.	2024-09-30 18:27:11 -07:00
Yulong Wang	1bda91fc57	[js/webgpu] fix external buffer registration (#22254 ) ### Description Fixes the problem of running into failure when GPU inputs shuffled between iterations.	2024-09-28 10:36:40 -07:00
Enrico Galli	52a8c1cae8	[WebNN EP] Enable IO Bindings with MLTensor (#21301 ) ### Description Enables using the MLTensor to pass data between models. ### Motivation and Context Using MLTensor instead of ArrayBuffers reduces the number of copies between the CPU and devices as well as the renderer and GPU process in Chromium.	2024-09-27 17:24:21 -07:00
shiyi	1e3cd86d80	[WebNN EP] Support LSTM op (#20293 ) <!-- Describe your changes. --> <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-27 14:23:08 -07:00
Scott McKay	3846f84218	Increase React Native E2E (#22230 ) ### Description <!-- Describe your changes. --> Increase the detox setup timeout to 4 minutes. The iOS RN E2E tests are taking slightly around 2 mins to setup causing flakiness. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve RN CI pass rate	2024-09-27 08:59:36 +10:00
Claude	3494f80e83	Check if HTMLCanvasElement exists (i.e. we are not running in a webworker) (#22153 ) This fixes #22152 ### Description Tensor.fromImage fails in a webworker context, because HTMLCanvasElement does not exist: > HTMLCanvasElement is not defined ### Motivation and Context This fixes #22152 --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-09-25 11:52:52 -07:00
Yulong Wang	df25006d1b	upgrade micromatch to v4.0.8 (#22174 ) ### Description Upgrade `micromatch` to v4.0.8 https://github.com/advisories/GHSA-952p-6rrq-rcjv	2024-09-23 14:39:32 -07:00
Jiajia Qin	80e9df826e	[js/webgpu] Optimize InstanceNormalization (#21995 ) ### Description <!-- Describe your changes. --> For InstanceNormalization, it has `y = scale * (x - mean) / sqrt(variance + epsilon) + B` , where mean and variance are computed per instance per channel. Calculating mean and variance per channel is a reduce processing, which is NCHW layout friendly since it makes the adjacent threads can access contiguous data in gpu memory. This PR optimizes both NHWC and NCHW InstanceNormalization. To efficiently calculate the mean and variance, we need to make sure the input is NCHW instead of NHWC. Then use shared memory to do the reduce operation to get `channel_scale` and `channel_shift`. With this PR, getting `channel_scale` and `channel_shift` are same for NHWC and NCHW InstanceNormalization. And the overall performance becomes very close now. Below data comes from SD Turbo profiling results. Before (InstanceNormalization overall time: 140.84 ms) InstanceNormalization\\|InstanceNormComputeMean \| 129.70 -- \| -- InstanceNormalization\\|InstanceNormalizationNHWC \| 10.55 InstanceNormalization\\|InstanceNormComputeChannelScaleShift \| 0.59 After (InstanceNormalization overall time: 59.44 ms) InstanceNormalization\\|InstanceNormComputeChannelScaleShift \| 28.57 -- \| -- InstanceNormalization\\|TransposeShared \| 20.19 InstanceNormalization\\|InstanceNormalizationNHWC \| 10.68	2024-09-23 11:32:09 -07:00
Jian Chen	fa68ae2def	Update pool to MacOS-13 (#17361 ) ### Description See https://github.com/microsoft/onnxruntime-extensions/pull/476 and https://github.com/actions/runner-images/issues/7671 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Current issue - [ ] For default xcode 15.2, that come with the MacOS-13, We Need to update the boost container header boost/container_hash/hash.hpp version to pass the build - [x] For xcode 14.2 The Build passed but the `Run React Native Detox Android e2e Test` Failed. Possible flaky test, https://github.com/microsoft/onnxruntime/pull/21969 - [x] For xcode 14.3.1 We encountered following issue in `Build React Native Detox iOS e2e Tests` ``` ld: file not found: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/arc/libarclite_iphonesimulator.a clang: error: linker command failed with exit code 1 (use -v to see invocation) ``` Applied following code to the eof in both ios/Podfile and fixed the issue ``` post_install do \|installer\| installer.generated_projects.each do \|project\| project.targets.each do \|target\| target.build_configurations.each do \|config\| config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0' end end end end ``` - [x] https://github.com/facebook/react-native/issues/32483 Applying changes to ios/Pofile ``` pre_install do \|installer\| # Custom pre-install script or commands puts "Running pre-install script..." # Recommended fix for https://github.com/facebook/react-native/issues/32483 # from https://github.com/facebook/react-native/issues/32483#issuecomment-966784501 system("sed -i '' 's/typedef uint8_t clockid_t;//' \"${SRCROOT}/Pods/RCT-Folly/folly/portability/Time.h\"") end ``` - [ ] Detox environment setting up exceeded time out of 120000ms during iso e2e test ### dependent - [x] https://github.com/microsoft/onnxruntime/pull/21159 --------- Co-authored-by: Changming Sun <chasun@microsoft.com>	2024-09-17 10:07:30 -07:00
Wanming Lin	9786909ab5	[WebNN EP] Support QuantizeLinear and DequantizeLinear ops (#22097 )	2024-09-17 08:18:47 -07:00
Xu Xing	afd642a194	[js/webgpu] Replace array with string in transpose perm (#21930 ) Perf test data(100000 times) Array: 12.599999997764826ms String: 1.6000000014901161ms Perf test case: ``` const permFunctionBodyArray = (rank: number, input: string): string => { const reverseFunc = []; reverseFunc.push(`fn perm(i: int) -> int { var a: int};`); for (let i = 0; i < rank; ++i) { reverseFunc.push(input); } reverseFunc.push('return a;}'); return reverseFunc.join('\n'); }; const permFunctionBodyString = (rank: number, input: string): string => { let reverseFunc= `fn perm(i: int}) -> int { var a: int;`; for (let i = 0; i < rank; ++i) { reverseFunc+=input; } reverseFunc+='return a;}'; return reverseFunc;//.join('\n'); }; const count = 100000; let start, end console.time('array'); start = performance.now(); for(let i =0 ; i < count; i ++) { permFunctionBodyArray(3, 'input'); } end = performance.now(); console.timeEnd('array'); console.log("Array: "+ (end-start)); console.time('string'); start = performance.now(); for(let i =0 ; i < count; i ++) { permFunctionBodyString(3, 'input'); } end = performance.now(); console.log("String: " +(end-start)); console.timeEnd('string'); ``` ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-16 23:17:46 -07:00
Yang Gu	2db6b734f5	[js/webgpu] Fix issue to run model demucs (#22074 ) This is to fix issue #22031 to run model demucs. For conv-transpose, outputPadding.length could be 1, while spatialRank is 2. The fix is to append enough 0s to outputPadding. For conv, the issue is similar. kernelShape.length sometimes could be 1, while inputs[1].dims.length is 4. The fix is also to append enough 0s to kernelShape.	2024-09-16 23:17:10 -07:00
Yulong Wang	291a5352b2	[js/web] remove training release (#22103 ) ### Description Remove training from onnxruntime-web Following up of #22082	2024-09-16 10:56:22 -07:00
Prathik Rao	d495e6cf1c	adds support for Uint8ClampedArray (#21985 ) Fixes https://github.com/microsoft/onnxruntime/issues/21753	2024-09-11 22:02:30 -07:00
Bin Miao	4d82404544	[WebNN EP] Support GRU operator (#20405 ) This PR support Gru operator for WebNN EP. @Honry , @fdwr thanks!	2024-09-11 14:16:36 -07:00
dependabot[bot]	19954decaf	Bump body-parser from 1.20.2 to 1.20.3 in /js/web (#22044 )	2024-09-10 23:05:44 +00:00
Jiajia Qin	3580e01348	[js/webgpu] Optimize grouped conv (#21892 ) ### Description <!-- Describe your changes. --> #21618 This PR optimizes grouped conv by 1) more sequential memory access in gpu 2) reusing input's data to reduce global memory access times. See `Conv\|GroupedConv` op in [Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) becomes 92 ms from 1058 ms on iGPUs with 32 EU. For the whole model on my iGPUs with 32 EU, wav2vec2 model becomes 982ms from 1942 ms. squeezebert-uncased model becomes 71.86ms from 431.77ms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-09-04 17:16:35 -07:00
Jiajia Qin	a80bfed5b4	[js/webgpu] Optimize transpose (#21964 ) ### Description <!-- Describe your changes. --> Fix bugs in previous implementation and add more situations to go the optimized path. Below situations will go to the optimized path. 1. 2d inputs or squeezed 2d inputs 2. channels last or channels first transpose. For example, channel last transpose: [1, 256, 512, 512] -> [1, 512, 512, 256] For this case, the transpose becomes [256, 512x512] -> [512x512, 256] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> For SD Turbo demo, the total transpose time becomes 39.98ms from 122.09ms. And the correspnding percents becomes 3.89% from 11.05% in this demo. This PR will also help #21618, the total transpose time in that demo becomes 17.32 ms from 70.25 ms on my iGPUs.	2024-09-04 12:04:04 -07:00
Edward Chen	cbf3c50d75	Improve stability of Android ReactNative E2E test (#21969 ) - Remove redundant `OnnxruntimeModuleExampleE2ETest CheckOutputComponentExists` test - Attempt to close any Application Not Responding (ANR) dialog prior to running Android test - Add `--take-screenshots failing` option to detox test commands to save screenshots on failure	2024-09-04 08:41:07 -07:00
Guenther Schmuelling	4fece0430f	remove duplicate function definition (#21903 )	2024-08-28 16:18:56 -07:00
xhcao	3bfb5e4f62	[js/webgpu] support float16 for Clip (#21584 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-28 13:19:20 -07:00
Jiajia Qin	252222034f	[js/webgpu] Support Reshape/Shape 21+ on jsep (#21871 ) ### Description <!-- Describe your changes. --> #21618 With this PR, the cross device copying (`MemcpyToHost`) can totally be removed for model `wav2vec2`. And the overall time becomes 48ms from 604ms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-27 09:02:39 -07:00
Yulong Wang	99bc45dcbd	[js] add big data file to formatter ignore list (#21767 ) ### Description Add the big data file `web/test/data/ops/pad-big.jsonc` to formatter ignore list. This file slows down the formatter quite a lot at local.	2024-08-26 22:08:26 -07:00
Satya Kumar Jandhyala	af18824f43	[JS/WebGPU] Add GatherBlockQuantized op support (#21734 ) ### Description Add GatherBlockQuantized operator to JSEP. ### Motivation and Context Gemma model requires this.	2024-08-26 14:46:04 -07:00
Xu Xing	d9c57ac7db	[js/webgpu] Enable pad f16 uniform (#21691 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-26 07:58:48 -07:00
Jiajia Qin	87165b92e9	[js/webgpu] optimize MatmulNBits (#21747 ) ### Description <!-- Describe your changes. --> See 2x speedup for phi3 on the integrated intel gpu with this optimization. The optimization is mainly to store input A's data into local variable instead of loading them from global memory each time when calculate them with B data. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-23 16:36:00 -07:00
Jiajia Qin	27a6890529	[js/webgpu] Optimize conv1d by conv2d (#19388 ) ### Description <!-- Describe your changes. --> Optimize conv1d to go to the conv2d path to utilize the conv2d's optimization path. See whisper-tiny-encoder model becomes 158.66 ms from 532.28 ms. Conv goes to Conv2DMatMul(8 ms) instead of GroupedConv(382 ms). Old profiling result: Kernel \| Time (ms) \| Percentage (%) -- \| -- \| -- Conv\\|GroupedConv \| 382.99 \| 71.95 MatMul \| 126.16 \| 23.70 Softmax \| 7.01 \| 1.32 Transpose \| 4.59 \| 0.86 Add \| 4.39 \| 0.82 Mul \| 2.36 \| 0.44 Div \| 1.44 \| 0.27 ReduceMean\\|ReduceMeanShared \| 1.25 \| 0.23 Erf \| 0.85 \| 0.16 Sub \| 0.72 \| 0.14 Pow \| 0.46 \| 0.09 Sqrt \| 0.07 \| 0.01 Sum \| 532.28 \| New profiling result with this PR: Kernel \| Time (ms) \| Percentage (%) -- \| -- \| -- MatMul \| 127.07 \| 80.09 Conv\\|Conv2DMatMul \| 8.00 \| 5.04 Softmax \| 6.95 \| 4.38 Transpose \| 4.65 \| 2.93 Add \| 4.26 \| 2.68 Mul \| 2.56 \| 1.61 Div \| 1.51 \| 0.95 ReduceMean\\|ReduceMeanShared \| 1.31 \| 0.83 Erf \| 0.85 \| 0.54 Sub \| 0.79 \| 0.50 Pow \| 0.46 \| 0.29 Conv\\|Transpose \| 0.26 \| 0.17 Sqrt \| 0.00 \| 0.00 Sum \| 158.66 \| --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-08-22 22:56:07 -07:00
Satya Kumar Jandhyala	1fb2e71ddc	[JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782 ) Avoid producing presentKey/presentValue outputs if pastKey/pastValue don't exists. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-19 18:02:19 -07:00
Wanming Lin	7ae0b4ce64	[WebNN EP] Support Erf and Trilu for CPU backend (#21768 )	2024-08-19 07:56:16 -07:00
xhcao	417aa00406	[js/webgpu] fix conv1d error (#21585 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-18 15:45:13 -07:00
Jiajia Qin	c4ade796d6	[js/webgpu] Fix attention shader recompilation issue (#21770 ) ### Description <!-- Describe your changes. --> This PR fixes the `AttentionProbsSoftmax` recompilation issue when executing the phi3 model. With this fix, it will further improve the phi3 performance. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-17 17:15:15 -07:00
Yang Gu	49fc168eed	[js/webgpu] Handle negative axis in op Split (#21771 ) This is to fix issue #21703, where the axis is a negative value in the model. According to the spec (https://onnx.ai/onnx/operators/onnx__Split.html), negative axis means counting dimensions from the back.	2024-08-17 16:41:23 -07:00
Tianlei Wu	d79e3c5791	Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.	2024-08-16 15:40:04 -07:00
Yulong Wang	ef2ccc477b	[js/web] Add support for int4/uint4 tensor (#21720 ) ### Description Add support for int4/uint4 tensor.	2024-08-15 21:32:10 -07:00
Yulong Wang	d4d0bea1fb	[js] update docs for new code formatter (#21743 ) ### Description Update README.md for code formatter change (#21728)	2024-08-15 20:17:08 -07:00
Yang Gu	f8efc086ce	[js/webgpu] Support Chrome Canary in unit tests (#21750 ) Chrome Canary is helpful to test some new features. With this PR, we can enable Chrome Canary in unit tests with command like "npm test -- op abs.jsonc -b=webgpu -e=chromecanary".	2024-08-15 19:27:54 -07:00
Yulong Wang	abdc31de40	[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728 ) ### Description See `454996d496` for manual changes (excluded auto-generated formatting changes) ### Why Because the toolsets for old clang-format is out-of-date. This reduces the development efficiency. - The NPM package `clang-format` is already in maintenance mode. not updated since 2 years ago. - The VSCode extension for clang-format is not maintained for a while, and a recent Node.js security update made it not working at all in Windows. No one in community seems interested in fixing those. Choose Prettier as it is the most popular TS/JS formatter. ### How to merge It's easy to break the build: - Be careful of any new commits on main not included in this PR. - Be careful that after this PR is merged, other PRs that already passed CI can merge. So, make sure there is no new commits before merging this one, and invalidate js PRs that already passed CI, force them to merge to latest.	2024-08-14 16:51:22 -07:00
Guenther Schmuelling	d82f15d0e3	add Gelu opset-20 to webgpu (#21725 ) https://github.com/microsoft/onnxruntime/issues/21618	2024-08-14 09:45:05 -07:00
Xu Xing	7172aff1cf	[js/webgpu] Fix max pool shape end with 0 (#21698 ) Bug: https://github.com/microsoft/onnxruntime/issues/21386 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-13 20:59:24 -07:00
Scott McKay	6af5394bd7	Replace usage of jcenter in React Native build.gradle files (#21714 ) ### Description <!-- Describe your changes. --> Replace jcenter. It's deprecated and not responding. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CIs	2024-08-13 11:10:51 -07:00
xhcao	9c6ee89fa7	[js/webgpu] fix two errors of attention operator (#21687 ) Fix two issues: (1) scale shall be fp32 instead of f16 (2) Softmax program does not handle the normalized dispatch group values, so if the sequence length is over 65535, the result is not correct for this program.	2024-08-13 09:42:34 -07:00
Satya Kumar Jandhyala	51b2044120	[JS/WebGPU] Add Dequantizelinear operator (#21642 ) ### Description Added DequantizeLinear operator for JSEP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-08-09 14:44:19 -07:00
Yulong Wang	e6e4047a77	[js/web] update the build script for webgpu to enable model dump by default (#19707 ) ### Description update the build script for webgpu to enable model dump by default Now if using build_jsep.bat to build debug, the model dump is enabled. Using [`optimizedModelFilePath`](https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#optimizedModelFilePath) in session option can dump the optimized model in browser ### Motivation and Context Helps to debug/rule out problems may related to model optimizer.	2024-08-09 05:55:34 -07:00
Yulong Wang	5e66fcc703	[js/web] allow op test to use f16 type for inputs/outputs (#21664 ) ### Description allow op test to use f16 type for inputs/outputs. This PR introduces "@petamoriken/float16" as Float16Array polyfill but restricts it to be only used for test runner.	2024-08-08 09:56:37 -07:00
Prathik Rao	134f47743e	bumps up version in main from 1.19 -> 1.20 (#21588 ) Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.	2024-08-05 15:46:04 -07:00
Wanming Lin	8c641d7182	[WebNN EP] Support Dropout op (#21586 ) ### Description WebNN only supports test mode, so we don't care about other inputs or attributes about training mode, use WebNN's identity op to implement the Dropout op directly.	2024-08-02 16:25:04 -07:00
Wanming Lin	1d4b161145	[WebNN EP] Support ConvTranspose for TFLite backend (#21291 ) ### Description Chromium supports ConvTranspose for TFLite in https://chromium-review.googlesource.com/c/chromium/src/+/5635194 With constraint that only default dilations and groups are supported. --------- Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>	2024-07-30 17:46:08 -07:00
Yulong Wang	b03c9496aa	[js/web] allow load WebAssembly binary from buffer (#21534 ) ### Description This PR adds a new option `ort.env.wasm.wasmBinary`, which allows user to set to a buffer containing preload .wasm file content. This PR should resolve the problem from latest discussion in #20876.	2024-07-29 13:39:38 -07:00
Xu Xing	0d7cf301a1	[js/webgpu] Add activation Tanh (#21540 ) Bug:https://github.com/microsoft/onnxruntime/issues/21467 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 11:05:34 -07:00

1 2 3 4 5 ...

713 commits