Commit graph

685 commits

Author SHA1 Message Date
Satya Kumar Jandhyala
1fb2e71ddc
[JS/WebGPU] Avoid producing presentKey/presentValue outputs if pastKey/pastValue … (#21782)
Avoid producing presentKey/presentValue outputs if pastKey/pastValue
don't exists.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-19 18:02:19 -07:00
Wanming Lin
7ae0b4ce64
[WebNN EP] Support Erf and Trilu for CPU backend (#21768) 2024-08-19 07:56:16 -07:00
xhcao
417aa00406
[js/webgpu] fix conv1d error (#21585)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-18 15:45:13 -07:00
Jiajia Qin
c4ade796d6
[js/webgpu] Fix attention shader recompilation issue (#21770)
### Description
<!-- Describe your changes. -->

This PR fixes the `AttentionProbsSoftmax` recompilation issue when
executing the phi3 model. With this fix, it will further improve the
phi3 performance.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-17 17:15:15 -07:00
Yang Gu
49fc168eed
[js/webgpu] Handle negative axis in op Split (#21771)
This is to fix issue #21703, where the axis is a negative value in the
model. According to the spec
(https://onnx.ai/onnx/operators/onnx__Split.html), negative axis means
counting dimensions from the back.
2024-08-17 16:41:23 -07:00
Tianlei Wu
d79e3c5791
Extend Attention Bias Broadcast Support (#21710)
### Description
Previously, MultiHeadAttention supports relative position bias of shape
[1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention
supports [1, N, S, T]. This will extend the support to allow [1, N, S,
T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs.

- [x] Rename the input of "relative position bias" to "attention bias"
because it can also be used for other types of bias, like ALiBi
(Attention with Linear Biases) or attention mask.
- [x] Update unfused kernel to support broadcasting 2nd dimension of
attention bias.
- [x] Update efficient attention to support broadcasting 2nd dimension
of attention bias.
- [x] Update operators (MultiHeadAttention,
DecoderMaskedMultiHeadAttention, Attention, PackedAttention,
PackedMultiHeadAttention) to support broadcast attention bias on CUDA
and CPU EPs.
- [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that
those EPs do not support broadcasting attention_bias for now).
- [x] Add attention bias tests for MultiHeadAttention.
- [x] Update operator documents
- [x] Update benchmark script

Other changes:
* Fix some checks in multihead-attention.ts
* Add helper functions to dump tensors given dimensions.
2024-08-16 15:40:04 -07:00
Yulong Wang
ef2ccc477b
[js/web] Add support for int4/uint4 tensor (#21720)
### Description
Add support for int4/uint4 tensor.
2024-08-15 21:32:10 -07:00
Yulong Wang
d4d0bea1fb
[js] update docs for new code formatter (#21743)
### Description

Update README.md for code formatter change (#21728)
2024-08-15 20:17:08 -07:00
Yang Gu
f8efc086ce
[js/webgpu] Support Chrome Canary in unit tests (#21750)
Chrome Canary is helpful to test some new features. With this PR, we can
enable Chrome Canary in unit tests with command like "npm test -- op
abs.jsonc -b=webgpu -e=chromecanary".
2024-08-15 19:27:54 -07:00
Yulong Wang
abdc31de40
[js] change default formatter for JavaScript/TypeScript from clang-format to Prettier (#21728)
### Description

See
454996d496
for manual changes (excluded auto-generated formatting changes)

### Why

Because the toolsets for old clang-format is out-of-date. This reduces
the development efficiency.

- The NPM package `clang-format` is already in maintenance mode. not
updated since 2 years ago.
- The VSCode extension for clang-format is not maintained for a while,
and a recent Node.js security update made it not working at all in
Windows.

No one in community seems interested in fixing those.

Choose Prettier as it is the most popular TS/JS formatter.

### How to merge

It's easy to break the build:
- Be careful of any new commits on main not included in this PR.
- Be careful that after this PR is merged, other PRs that already passed
CI can merge.

So, make sure there is no new commits before merging this one, and
invalidate js PRs that already passed CI, force them to merge to latest.
2024-08-14 16:51:22 -07:00
Guenther Schmuelling
d82f15d0e3
add Gelu opset-20 to webgpu (#21725)
https://github.com/microsoft/onnxruntime/issues/21618
2024-08-14 09:45:05 -07:00
Xu Xing
7172aff1cf
[js/webgpu] Fix max pool shape end with 0 (#21698)
Bug: https://github.com/microsoft/onnxruntime/issues/21386

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-13 20:59:24 -07:00
Scott McKay
6af5394bd7
Replace usage of jcenter in React Native build.gradle files (#21714)
### Description
<!-- Describe your changes. -->
Replace jcenter. It's deprecated and not responding.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CIs
2024-08-13 11:10:51 -07:00
xhcao
9c6ee89fa7
[js/webgpu] fix two errors of attention operator (#21687)
Fix two issues:
(1) scale shall be fp32 instead of f16
(2) Softmax program does not handle the normalized dispatch group values, so if the sequence length is over 65535, the result is not correct for this program.
2024-08-13 09:42:34 -07:00
Satya Kumar Jandhyala
51b2044120
[JS/WebGPU] Add Dequantizelinear operator (#21642)
### Description
Added DequantizeLinear operator for JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 14:44:19 -07:00
Yulong Wang
e6e4047a77
[js/web] update the build script for webgpu to enable model dump by default (#19707)
### Description
update the build script for webgpu to enable model dump by default

Now if using build_jsep.bat to build debug, the model dump is enabled.
Using
[`optimizedModelFilePath`](https://onnxruntime.ai/docs/api/js/interfaces/InferenceSession.SessionOptions.html#optimizedModelFilePath)
in session option can dump the optimized model in browser

### Motivation and Context
Helps to debug/rule out problems may related to model optimizer.
2024-08-09 05:55:34 -07:00
Yulong Wang
5e66fcc703
[js/web] allow op test to use f16 type for inputs/outputs (#21664)
### Description
allow op test to use f16 type for inputs/outputs.

This PR introduces "@petamoriken/float16" as Float16Array polyfill but
restricts it to be only used for test runner.
2024-08-08 09:56:37 -07:00
Prathik Rao
134f47743e
bumps up version in main from 1.19 -> 1.20 (#21588)
Bump up version in main from 1.19.0 to 1.20.0 since the release branch
has been cut.
2024-08-05 15:46:04 -07:00
Wanming Lin
8c641d7182
[WebNN EP] Support Dropout op (#21586)
### Description
WebNN only supports test mode, so we don't care about other inputs or
attributes about training mode, use WebNN's identity op to implement the
Dropout op directly.
2024-08-02 16:25:04 -07:00
Wanming Lin
1d4b161145
[WebNN EP] Support ConvTranspose for TFLite backend (#21291)
### Description
Chromium supports ConvTranspose for TFLite in
https://chromium-review.googlesource.com/c/chromium/src/+/5635194

With constraint that only default dilations and groups are supported.

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2024-07-30 17:46:08 -07:00
Yulong Wang
b03c9496aa
[js/web] allow load WebAssembly binary from buffer (#21534)
### Description

This PR adds a new option `ort.env.wasm.wasmBinary`, which allows user
to set to a buffer containing preload .wasm file content.

This PR should resolve the problem from latest discussion in #20876.
2024-07-29 13:39:38 -07:00
Xu Xing
0d7cf301a1
[js/webgpu] Add activation Tanh (#21540)
Bug:https://github.com/microsoft/onnxruntime/issues/21467

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-29 11:05:34 -07:00
Xu Xing
5bc12bf209
[js/webgpu] Add activation for conv3d naive (#21466)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-29 08:47:41 -07:00
Yulong Wang
dbff0cd098
[js/node] enable float16 support for Node.js binding (#20581)
### Description
enable float16 support for Node.js binding.

data of float16 tensor uses `Uint16Array`.
2024-07-28 13:03:17 -07:00
Wanming Lin
b6b29309a5
[WebNN EP] Update argMax/argMin to adapt to latest spec (#21452)
WebNN spec recently changes the definition of argMax/argMin:
- Remove selectLastIndex option, let backends decide to select the last
index or not.
- Move axes option to axis input
2024-07-25 17:07:01 -07:00
Changming Sun
7af39c6955
Update nodejs's cmake file to fix a file copy issue (#21390)
This commit e5f18ba2c1 caused some nightly
pipelines to fail. This PR fixes it.
It is because recently I changed our Linux library's SONAME. At runtime
onnxruntime_binding depends on libonnxruntime.so.1 , instead of
libonnxruntime.so.1.19.0(with the full version number). Therefore we
need to keep the libonnxruntime.so.1 symlink.

The packaging tools/ci_build/github/js/pack-npm-packages.ps1 still needs
be updated. I will address it in another PR.
2024-07-23 11:03:55 -07:00
mindest
5b9369e93c
Fix typos according to reviewdog report. (#21335)
### Description
Fix typos based on reviewdog report but with some
exceptions/corrections.
2024-07-22 13:37:32 -07:00
Yulong Wang
01df8c787d
[js/web] fix vulnerable version of dependencies (#21412)
### Description
```
# npm audit report

socket.io  3.0.0 - 4.6.2
Severity: high
socket.io has an unhandled 'error' event - https://github.com/advisories/GHSA-25hc-qcg6-38wj
Depends on vulnerable versions of engine.io
fix available via `npm audit fix`
node_modules/socket.io

ws  8.0.0 - 8.17.0
Severity: high
ws affected by a DoS when handling a request with many HTTP headers - https://github.com/advisories/GHSA-3h5v-q93c-6h6q
fix available via `npm audit fix`
node_modules/ws
  engine.io  0.7.8 - 0.7.9 || 6.0.0 - 6.5.4
  Depends on vulnerable versions of ws
  node_modules/engine.io
  socket.io-adapter  2.5.2 - 2.5.4
  Depends on vulnerable versions of ws
  node_modules/socket.io-adapter

4 high severity vulnerabilities
```
2024-07-19 11:11:30 -07:00
Xu Xing
92a8407b39
[js/webgpu] Remove unnecessary initialization of var (#21312)
This var has been initialized to 0 in tint, so no need extra loop to do
it again:
```
  float tint_symbol_52[1][4] = (float[1][4])0;
  {
    for(int tint_symbol_53 = 0; (tint_symbol_53 < 1); tint_symbol_53 = (tint_symbol_53 + 1)) {
      {
        for(int tint_symbol_54 = 0; (tint_symbol_54 < 4); tint_symbol_54 = (tint_symbol_54 + 1)) {
          tint_symbol_52[min(uint(tint_symbol_53), 0u)][min(uint(tint_symbol_54), 3u)] = 0.0f;
        }
      }
    }
  }
```
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-12 12:34:34 -07:00
pengwa
88336ffa92
Fix typos - 1st Wave (#21278)
### Description

There are so many typos reported by the review dog, [Optional Lint]
actions (example:
https://github.com/microsoft/onnxruntime/actions/runs/9864564489/job/27239732367),
this PR is to fix some of them.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-07-11 13:35:08 +08:00
Pavan Goyal
1b82d835d8
[Fix] InterOpNumThreads Session Option for ONNX ReactNative Package (#21263)
### Description
This PR resolves a bug related to setting the **interOpNumThreads**
session option when creating an **ORTSession**. Currently, when the
**interOpNumThreads** option is passed from React Native, the native
module incorrectly sets **intraOpNumThreads** instead of
**interOpNumThreads**.


### Motivation and Context
Since this is a bug, users of the Onnx React Native package may believe
that they are setting **interOpNumThreads** correctly, So this change is
required. Refer to the code snippet below for details
<img width="634" alt="Screenshot 2024-07-05 at 9 28 58 PM"
src="https://github.com/microsoft/onnxruntime/assets/88655321/70a8f216-553a-4f4c-9481-e6871f0e37e6">
2024-07-10 07:00:18 -07:00
Enrico Galli
4c3c809bdb
[js/webnn] Enable user-supplied MLContext (#20600)
### Description
This PR enables the API added in #20816 as well as moving context
creation to JS.

### Motivation and Context
In order to enable I/O Binding with the upcoming
[MLBuffer](https://github.com/webmachinelearning/webnn/issues/542) API
in the WebNN specification, we need to share the same `MLContext` across
multiple sessions. This is because `MLBuffer`s are restricted to the
`MLContext` where they were created. This PR enables developers to use
the same `MLContext` across multiple sessions.
2024-07-08 10:19:39 -07:00
Wanming Lin
cd516a1677
[WebNN EP] Remove constraint for conv ops on CPU backend (#21237)
Currently WebNN TFLite backend allows the filter of
conv2d/convTranspose2d be an input. Remove the constraint and operate
necessary transpose/reshape operations for the filter input.
2024-07-08 10:14:43 -07:00
Guenther Schmuelling
9eb1c2a7a3
support for layernorm in webgpu pre opset-17 (#21121)
handled the same way cpu does
2024-06-27 10:20:48 -07:00
Wanming Lin
41ad83fb00
[WebNN EP] Support rest Reduction ops for TFLite backend (#21135)
- reduceLogSum, reduceLogSumExp and reduceSumSquare have been landed in
https://chromium-review.googlesource.com/c/chromium/src/+/5575815
- reduceL1 and reduceL2 have been landed in
https://chromium-review.googlesource.com/c/chromium/src/+/5606091
2024-06-25 18:30:55 -07:00
Wanming Lin
4743803944
[WebNN EP] Support more Normalization ops for TFLite backend (#21151)
Following Normalization ops have been supported in Chromium for TFLite
backend:
- batchNormalization:
https://chromium-review.googlesource.com/c/chromium/src/+/5532745
- layerNormalization:
https://chromium-review.googlesource.com/c/chromium/src/+/5573326
- instanceNormalization:
https://chromium-review.googlesource.com/c/chromium/src/+/5532750
2024-06-24 19:04:23 -07:00
Wanming Lin
3a917e49fb
[WebNN EP] Support 4 more ops for TFLite backend (#21134)
Recently WebNN TFLite backend supports gelu, expand, softsign,
reciprocal.
2024-06-24 09:52:12 -07:00
Wanming Lin
0c80cd2157
[WebNN EP] Update Prelu restriction for CPU backend (#20878) 2024-06-20 11:04:01 -07:00
Xu Xing
c3076721f3
[js/webgpu] Support conv3d naive (#20706)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-19 10:13:50 -07:00
Wanming Lin
40879a2623
[WebNN EP] Enable Cast op for WebNN CPU backend (#20864)
WebNN TFLite backend supports `cast` op but doesn't support casting to
`uint64` data type.
2024-06-19 01:51:19 -07:00
Wanming Lin
35c430a95a
[WebNN EP] Enable several ops for WebNN CPU backend (#20847)
WebNN CPU implementation has been migrated from XNNPack to TFLite which
supports more ops. Turn on partial `cpu` supported ops which just need
the change from `false` to `true` firstly.
2024-06-19 01:45:31 -07:00
Yulong Wang
5e81fa8aec
[js] fix vulnerability CVE-2024-4068: upgrade braces to 3.0.3 (#21078)
### Description

Upgrade `braces` to 3.0.3

[CVE-2024-4068](https://github.com/advisories/GHSA-grv7-fg5c-xmjg)

```
# npm audit report

braces  <3.0.3
Severity: high
Uncontrolled resource consumption in braces - https://github.com/advisories/GHSA-grv7-fg5c-xmjg
fix available via `npm audit fix`
node_modules/braces

1 high severity vulnerability
```
2024-06-18 16:02:08 -07:00
Yulong Wang
631a2c16be
[js/web] skip default locateFile() when dynamic import is disabled (#21073)
### Description
skip default `locateFile()` when dynamic import is disabled. This allows
the file to work with bundlers to load WebAssembly file correctly if
`env.wasm.wasmPaths` is not set.
2024-06-18 12:21:45 -07:00
Yang Gu
1473d66a00
[js/webgpu] Prefer adapter.info to adapter.requestAdapterInfo (#21065)
WebGPU is deprecating async adapter.requestAdapterInfo, and replacing it
with sync adapter.info.
Spec change: https://github.com/gpuweb/gpuweb/pull/4662
2024-06-18 12:02:38 -07:00
Jian Chen
4e18b0b7ce
Upgrade braces from 3.0.2 to 3.0.3 to fix the vulnerability (#21022) 2024-06-12 18:02:52 -07:00
Yulong Wang
dd805ff77d
[js/web] ESM: use the bundled target as default export (#20991)
### Description
ESM: use the bundled target as default export

In this change, the default import of the following entries:
```
import from 'onnxruntime-web';
import from 'onnxruntime-web/all';
import from 'onnxruntime-web/webgpu';
```
will use the "bundled" version, which has no dynamic import.

This change should only apply to ESM on web.
2024-06-11 11:14:55 -07:00
Wanming Lin
043ef5c95f
[WebNN EP] Support latest WebNN softmax op (#20827)
Latest WebNN softmax supports N-D input and axis parameter.
2024-06-11 08:27:14 -07:00
Edward Chen
981893c318
Remove deprecated "mobile" packages (#20941)
# Description

This PR removes the building of the ORT "mobile" packages and much of the associated infrastructure which is no longer needed.

Not removed yet - tools/ci_build/github/android/mobile_package.required_operators.config and the helper scripts that depend on it.

# Motivation and Context

The mobile packages were deprecated in 1.18. Users should use the full packages (Android - onnxruntime-android, iOS - onnxruntime-c/onnxruntime-objc) instead or do a custom build.
2024-06-07 16:20:32 -05:00
Wanming Lin
52874f628a
[WebNN EP] Remove some constraints for CPU backend (#20900)
Following constraints have been supported by WebNN TFLite backend:
- Concat: supports up to 4 inputs
- Matmul: supports broadcasting
- Resize: supports nearest mode
- Split: supports up to 4 outputs
2024-06-06 08:22:41 -07:00
Wanming Lin
da1f8f9274
[WebNN EP] TFLite backend only supports limit ranges for Clip (#20863) 2024-06-06 08:22:18 -07:00