Commit graph

440 commits

Author SHA1 Message Date
Yulong Wang
e771a763c3
[js/test] align web test runner flags with ort.env (#19790)
### Description
the `npm test` flags are difficult to memorize, because they are
different to the `ort.env` flags. This change makes those flags align
with ort JS API. eg. `--wasm-enable-proxy` became `--wasm.proxy`.

Old flags are marked as deprecated except `-x` (as a shortcut of
`--wasm.numThreads`)
2024-03-13 12:00:36 -07:00
Satya Kumar Jandhyala
ed250b88c3
[JS/WebGPU] Optimize MatMulNBits (#19852)
### Description
Use vec<2> or vec<4>, operands in MatMulNBits


### Motivation and Context
Improve performance
2024-03-13 10:33:14 -07:00
Yang Gu
53de2d8cb0
[js/webgpu] Enable GroupedConvVectorize path (#19791)
Vectorize met 2 failed cases in a CI bot with NVIDIA GPU, but we
couldn't repro with all the GPUs at hand, including NVIDIA GPUs. This PR
introduces GPUAdapterInfo and enables this opt on non-NVIDIA GPUs to
make the bots happy.
No obivous perf gain can be seen if we enable vectorize on NVIDIA.
However, it shows big perf improvement on Intel. On my Gen12 Intel GPU,
mobilenetv2-12 perf was improved from 11.14ms to 7.1ms.
2024-03-12 22:25:07 -07:00
Yulong Wang
4538d31a8b
[js/webgpu] expose a few properties in WebGPU API (#19857)
### Description
This change exposes a few properties in `ort.env.webgpu` to resolve
feature requirement mentioned in properties in
https://github.com/microsoft/onnxruntime/pull/14579#discussion_r1519612619.

- Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`,
to allow users to set the value of the properties before the first
inference session is created.
- Add readonly property `adapter` in `ort.env.webgpu` to allow users to
get the adapter instance. Now users can access `ort.env.webgpu.device`
and `ort.env.webgpu.adapter`.

@xenova @beaufortfrancois
2024-03-12 19:50:51 -07:00
Satya Kumar Jandhyala
24b72d2613
[JS/WebGPU] Preserve zero size input tensor dims. (#19737)
### Description
For Concat operation, the zero-size input tensor shape need to be
preserved and, unlike non-zero tensors, the dims are not constrained to
match other input tensors' dims.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-07 19:07:49 -08:00
Yulong Wang
a788514027
[js/web] dump debug logs for karma for diagnose purpose (#19785)
### Description
dump debug logs for karma for diagnose purpose.

This is for debugging the CI issue of Chrome launch failure and
considered temporary.
2024-03-05 18:27:26 -08:00
Yulong Wang
f06164ef8b
[js/web] transfer input buffer back to caller thread (#19677)
### Description

When using proxy worker, input buffers should be transferred back to the
caller thread after `run()` call is done.

Fixes #19488
2024-03-01 14:50:06 -08:00
Yulong Wang
e30618d055
[js/webgpu] use Headless for webgpu test by default (#19702)
### Description
use Chromium Headless for webgpu test by default. Still use normal
Chromium with window when debug=true or perfMode=true.

Use the
[`--headless=new`](https://developer.chrome.com/docs/chromium/new-headless)
mode.



### Motivation and Context
try to use a more stable way to launch npm tests to avoid a "chrome not
found" issue in pipeline, which may potentially caused by windowed
application.
2024-02-28 16:05:08 -08:00
Yulong Wang
3cb81cdde2
[js/common] move 'env.wasm.trace' to 'env.trace' (#19617)
### Description

Try to move 'env.wasm.trace' to 'env.trace' to make it less confusing,
because it also works in webgpu. Marked 'env.wasm.trace' as deprecated.
2024-02-27 11:07:15 -08:00
Yulong Wang
0edb035808
[js/web] fix suite test list for zero sized tensor (#19638)
### Description

Fixes build break brought by #19614

Currently WebGL backend does not support zero sized tensor. This change
split test data into 2 parts, and only enable zero sized tensor tests
for WebGPU.
2024-02-24 10:09:07 -08:00
Guenther Schmuelling
bb43a0f133
[js/webgpu] minor fixes to make tinyllama work (#19564) 2024-02-23 15:45:30 -08:00
Yulong Wang
aec2389ad0
[js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614)
### Description
This PR allows zero-sized output.

To make the implementation simple, it does not support partial
zero-sized tensor. Which means, either all outputs are zero-sized, or an
error will be reported.

added 2 tests:
 - op test of `Add` with input T[2,0] T[2,1], and
 - test_split_zero_size_splits
2024-02-23 12:52:47 -08:00
satyajandhyala
ae3d73c981
[JS/WebGPU] Fix Split and Where to handle corner cases. (#19613)
### Description
<!-- Describe your changes. -->
1. Fix Where operator to handle Boolean input less than 4 bytes.
2. Fix JSEP test harness to use tensor names consistently.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-23 00:21:15 -08:00
Xu Xing
fe82fccf1a
[js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596)
This is used in sam-h-decoder-f16.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-22 13:09:28 -08:00
Matttttt
ebd220b073
Misspelling in README.md (#19433)
Fixed a misspelling.
2024-02-21 13:38:18 -08:00
Xu Xing
57d6819212
[js/web] Fix fused-conv is not included in npm test (#19581)
BUG: https://github.com/microsoft/onnxruntime/issues/18855

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-21 08:08:47 -08:00
Yulong Wang
58f4921686
[js] changes to allow Float16Array if any polyfill is available (#19305)
### Description

This change adds only necessary code to enable ort-web works with any
Float16Array polyfill. Unlike #19302, in this PR, ort-web does not
include any specific polyfill; instead, it's user's choice for how to
use a polyfill.

ORT-web uses Float16Array if it's available; otherwise, fallback to use
Uint16Array.

```js
// case 1: user does not use polyfill:
import * as ort from 'onnxruntime-web';

const myF16Data = new Uint16Array(...);  // need to use Uint16Array
const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
```

```js
// case 2: user use polyfill:
import * as ort from 'onnxruntime-web';
import {
  Float16Array, isFloat16Array, isTypedArray,
  getFloat16, setFloat16,
  f16round,
} from "@petamoriken/float16";
globalThis.Float16Array = Float16Array;  // ort-web will pick the global Float16Array

const myF16Data = new Float16Array(...);  // Use the polyfilled Float16Array type
const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
```
2024-02-21 00:31:06 -08:00
Yulong Wang
70567a4b3a
[js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358)
### Description
use ApiTensor insteadof onnxjs Tensor in TensorResultValidator. Make
test runner less depend on onnxjs classes.
2024-02-20 17:33:21 -08:00
Yulong Wang
3fe2c137ee
[js] small fix to workaround formatter (#19400)
### Description
Rename shader variable names to snake_case naming and also to avoid
formatter behaving inconsistently in win/linux.
2024-02-20 17:23:01 -08:00
Jiajie Hu
1b48054e1b
[js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
### Description
This is required to make shape uniforms really work.

### Motivation and Context
The bug was unveiled in a model with multiple Split nodes. The later
nodes would try to reuse a previous pipeline cache, while the old shapes
were hardcoded as constants in cache.
2024-02-20 09:24:34 -08:00
satyajandhyala
dfeda9019c
[JS/WebGPU] Add MatMulNBits (#19446)
### Description
Add MatMulNBits to support MatMul using 4-bit quantized weights



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-17 09:19:17 -08:00
Yulong Wang
06269a3952
[js/webgpu] allow uint8 tensors for webgpu (#19545)
### Description
allow uint8 tensors for webgpu
2024-02-16 18:28:27 -08:00
Yulong Wang
03be65e064
[js/web] fix types exports in package.json (#19458)
### Description

Since TypeScript v4.7, types need to specify inside "exports" field when
it is available. This PR appends types just before each "default" (which
is required by spec to be the last item).

Fixes #19403.
2024-02-08 15:56:48 -08:00
Yulong Wang
5ff27ef02a
[js/webgpu] support customop FastGelu (#19392)
### Description
Support WebGPU custom operator FastGelu.
2024-02-06 09:07:31 -08:00
Jiajia Qin
ccbe264a39
[js/webgpu] Add LeakyRelu activation for fusedConv (#19369)
### Description
This PR 1) adds LeakyRelu activation for fusedConv; 2) makes `vec4<f16>`
value work with `float32` uniforms attributes.

For example:
`clamp(value, vec4<f16>(uniforms.clip_min),
vec4<f16>(uniforms.clip_max)` will throw compilation errors since
`uniforms.clip_min` and `uniforms.clip_min` are `f32` not `f16`. So we
need to change it to `clamp(value, vec4<f16>(f16(uniforms.clip_min)),
vec4<f16>(f16(uniforms.clip_max))`

And above problem was introduced when we make activation attributes as
uniforms instead of constant.

BTW, after adding LeakyRelu, `realesrgan-t256` model can pass.
2024-02-02 09:06:38 -08:00
Yulong Wang
50806a7dd5
[js/web] support external data in npm test (#19377)
### Description
support external data in npm test.

This allows test runner to detect whether an external data is available
in the test folder, and if it is, load it as external data
automatically.

this feature does not parse every model to figure out whether the model
has external data. the following comments in code explained how to
determine whether should parse the model file.

```js
      // for performance consideration, we do not parse every model. when we think it's likely to have external
      // data, we will parse it. We think it's "likely" when one of the following conditions is met:
      // 1. any file in the same folder has the similar file name as the model file
      //    (e.g., model file is "model_abc.onnx", and there is a file "model_abc.pb" or "model_abc.onnx.data")
      // 2. the file size is larger than 1GB
```
2024-02-02 09:05:57 -08:00
Jiajia Qin
efc17e79de
[js/webgpu] Fix the undefined push error (#19366)
### Description
This PR fixes below errors when enable webgpu profiling: 
```
TypeError: Cannot read properties of undefined (reading 'push')
```
2024-02-02 02:04:06 -08:00
Xu Xing
3a2ab1963a
[js/webgpu] Refactor createTensorShapeVariables (#18883) 2024-02-01 17:59:00 -08:00
Yulong Wang
dd1f6ccc45
[js/webgpu] resolve codescan alert (#19343)
### Description
resolve codescan alert:
https://github.com/microsoft/onnxruntime/security/code-scanning/17687
2024-01-30 21:06:21 -08:00
Xu Xing
d73131cf0f
[js/webgpu] Use DataType as uniform cpu type (#19281)
This saves turning data type to string by tensorDataTypeEnumToString.
2024-01-30 21:05:08 -08:00
Jiajia Qin
85cef0af8c
[js/webgpu] Support capture and replay for jsep (#18989)
### Description
This PR expands the graph capture capability to JS EP, which is similar
to #16081. But for JS EP, we don't use the CUDA Graph, instead, we
records all gpu commands and replay them, which removes most of the cpu
overhead to avoid the the situation that gpu waiting for cpu.

mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from
4.58ms on Intel A770.

All limitations are similar with CUDA EP:
1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
2. Usage of graph capture is limited to models where-in all ops in the
model can be partitioned to the JS EP or CPU EP and no memory copy
between them.
3. Shapes of inputs/outputs cannot change across inference calls.
4. IObinding is required.

The usage is like below:
Method 1: specify outputs buffers explicitly.
```
    const sessionOptions = {
        executionProviders: [
          {
            name: "webgpu",
          },
        ],
        enableGraphCapture: true,
      };
    const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions);
   
    // prepare the inputBuffer/outputBuffer
    ... ...

   const feeds = {
       'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims })
   };

   const fetches = {
       'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] })
   };

   let results = await session.run(feeds, fetches);  // The first run will begin to capture the graph.

   // update inputBuffer content
  ... ...
   results = = await session.run(feeds, fetches);  // The 2ed run and after will directly call replay to execute the graph.

  ... ...
   session.release();
```
Method 2: Don't specify outputs buffers explicitly. Internally, when
graph capture is enabled, it will set all outputs location to
'gpu-buffer'.
```
    const sessionOptions = {
        executionProviders: [
          {
            name: "webgpu",
          },
        ],
        enableGraphCapture: true,
      };
    const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions);

    // prepare the inputBuffer
    ... ...

   const feeds = {
       'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims })
   };

   let results = await session.run(feeds);  // The first run will begin to capture the graph.
   
   // update inputBuffer content
  ... ...
   results = = await session.run(feeds);  // The 2ed run and after will directly call replay to execute the graph.

  ... ...
   session.release();
2024-01-30 18:28:03 -08:00
Jiajia Qin
90883a366a
[js/webgpu] Add hardSigmoid activation for fusedConv (#19233)
### Description
Add hardSigmoid activation for fusedConv. It will be used by
mobilenetv3-small-100 model.
2024-01-30 16:28:53 -08:00
Xu Xing
624b4e2063
[js/webgpu] Remove enableShapesUniforms (#19279) 2024-01-29 17:49:06 -08:00
Guenther Schmuelling
9e69606360
fix f16 for attention, enable slice and flatten for more types (#19262) 2024-01-29 10:13:46 -08:00
Xu Xing
a3f0e2422b
[js/webgpu] Support f16 uniform (#19098)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-25 16:58:22 -08:00
Xu Xing
656ca66186
[js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753) 2024-01-25 15:37:05 -08:00
Jiajie Hu
5b06505073
[js/webgpu] Fix Tanh explosion (#19201)
### Description
```math
\tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}=
\left\{
\begin{array}{cc}
-\frac{1-e^{-2\cdot(-x)}}{1+e^{-2\cdot(-x)}}, & x<0 \\
0, & x=0 \\
\frac{1-e^{-2x}}{1+e^{-2x}}, & x>0
\end{array}
\right.
```

### Motivation and Context
On some platforms,
$$\tanh(1000)=\frac{e^{1000}-e^{-1000}}{e^{1000}+e^{-1000}}$$ would
produce NaN instead of 0.999... or 1 (imagine $e^{1000}=\infty$ and
$\frac{\infty}{\infty}$ explodes).
2024-01-25 08:25:35 -08:00
Wanming Lin
7252c6e747
[WebNN EP] Support WebNN async API with Asyncify (#19145) 2024-01-24 15:37:35 -08:00
Yang Gu
591f90c0b9
[js/webgpu] Fix issue of timestamp query (#19258)
When we enable webgpu profiling mode between session.create and
session.run, current implementation has a problem to create querySet
(and also queryResolveBuffer) if we share the commandEncoder with inputs
upload. This PR fixes this by moving the querySet creation to the place
we set queryType.
2024-01-24 14:49:37 -08:00
satyajandhyala
a33b5bd1fa
[JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788)
### Description
Added Uniforms to SkipLayerNorm



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve performance

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-01-25 01:12:21 +05:30
Xu Xing
61610ff986
[js/webgpu] Add FusedConv clip test case (#18900)
Bug: https://github.com/microsoft/onnxruntime/issues/18899
2024-01-23 08:25:05 -08:00
Jiajia Qin
d226e40856
[js/webgpu] set query type in onRunStart (#19202)
### Description
<!-- Describe your changes. -->
`env.webgpu.profiling` is a global flag. It may change before each
session.run. So the best place is to update it in `onRunStart` event.
After this, we can directly check `this.queryType`'s value. Without this
pr, we need to make sure that `getCommandEncoder()` is called before
checking `this.queryType`. Otherwise, it may happen that
`pendingKernels`'s length is not equal to `pendingDispatchNumber`'s
length. See the two ugly workarounds
[1)](e630dbf528 (diff-006fc84d3997f96a29b8033bd2075d6a0a9509211bd5812a6b934fc74fedfd9dR267-R268))
and
[2)](e630dbf528 (diff-618fe297fbe7a1da586380163b8fd2627311ccc217640a3c5cdc9c17a33472c1R73-R80))
if we don't introduce `onRunStart`. Or we need to call `setQueryType` in
each kernel run.
2024-01-22 16:08:55 -08:00
Jiajia Qin
2e0a388c36
[js/webgpu] Add HardSigmoid support (#19215)
### Description
This op is required in mobilenetv3-small-100. With this PR,
mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on
ADL.
2024-01-22 15:53:26 -08:00
Yulong Wang
d69b622ef4
[js/web] upgrade dependency packages version (#19193)
### Description
upgrade packages version.

```
# npm audit report

electron  23.0.0-alpha.1 - 23.3.13
Severity: moderate
ASAR Integrity bypass via filetype confusion in electron - https://github.com/advisories/GHSA-7m48-wc93-9g85
fix available via `npm audit fix --force`
Will install electron@28.1.4, which is a breaking change
node_modules/electron

get-func-name  <2.0.1
Severity: high
Chaijs/get-func-name vulnerable to ReDoS - https://github.com/advisories/GHSA-4q6p-r6v2-jvc5
fix available via `npm audit fix`
node_modules/get-func-name

semver  <=5.7.1 || 6.0.0 - 6.3.0 || 7.0.0 - 7.5.1
Severity: moderate
semver vulnerable to Regular Expression Denial of Service - https://github.com/advisories/GHSA-c2qf-rxjj-qqgw
semver vulnerable to Regular Expression Denial of Service - https://github.com/advisories/GHSA-c2qf-rxjj-qqgw
semver vulnerable to Regular Expression Denial of Service - https://github.com/advisories/GHSA-c2qf-rxjj-qqgw
fix available via `npm audit fix`
node_modules/cross-spawn/node_modules/semver
node_modules/global-agent/node_modules/semver
node_modules/semver
```
2024-01-18 13:45:42 -08:00
Yulong Wang
f87e69801f
[js/web] show warning when numThreads is set but threads is not supported (#19179)
### Description
show warning when numThreads is set but threads is not supported.
Resolves #19148, #18933

for web: when crossOriginIsolated is false.
for node: always disable.
2024-01-17 15:04:22 -08:00
Yulong Wang
146ebaf91e
[js/web] allow proxy to load model with 1GB <= size < 2GB (#19178)
### Description

allow proxy to load model with 1GB <= size < 2GB

resolves #19157.
2024-01-17 15:03:43 -08:00
Rachel Guo
bd9d8fb2a5
[ORT 1.17.0 release] Bump up version to 1.18.0 (#19170)
### Description
<!-- Describe your changes. -->

Bump up version to 1.18.0 since the release branch has been cut.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2024-01-17 11:18:32 -08:00
Guenther Schmuelling
9dee543bed
fix gemm beta for fp16 (#19153)
per onnx spec beta is always fp32 so we need to cast it
2024-01-15 18:40:38 -08:00
Yulong Wang
f917dde717
[web] remove xnnpack from web backends (#19116)
### Description
XNNPACK is already disabled in web assembly build. This change removes
the xnnpack backend registration in JS.
2024-01-13 23:04:02 -08:00
Yang Gu
e803f8eb0f
[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894)
We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.
2024-01-13 00:23:17 -08:00