Commit graph

384 commits

Author SHA1 Message Date
xhcao
9c6ee89fa7
[js/webgpu] fix two errors of attention operator (#21687)
Fix two issues:
(1) scale shall be fp32 instead of f16
(2) Softmax program does not handle the normalized dispatch group values, so if the sequence length is over 65535, the result is not correct for this program.
2024-08-13 09:42:34 -07:00
Satya Kumar Jandhyala
51b2044120
[JS/WebGPU] Add Dequantizelinear operator (#21642)
### Description
Added DequantizeLinear operator for JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-09 14:44:19 -07:00
Prathik Rao
134f47743e
bumps up version in main from 1.19 -> 1.20 (#21588)
Bump up version in main from 1.19.0 to 1.20.0 since the release branch
has been cut.
2024-08-05 15:46:04 -07:00
Yulong Wang
b03c9496aa
[js/web] allow load WebAssembly binary from buffer (#21534)
### Description

This PR adds a new option `ort.env.wasm.wasmBinary`, which allows user
to set to a buffer containing preload .wasm file content.

This PR should resolve the problem from latest discussion in #20876.
2024-07-29 13:39:38 -07:00
Xu Xing
0d7cf301a1
[js/webgpu] Add activation Tanh (#21540)
Bug:https://github.com/microsoft/onnxruntime/issues/21467

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-29 11:05:34 -07:00
Xu Xing
5bc12bf209
[js/webgpu] Add activation for conv3d naive (#21466)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-29 08:47:41 -07:00
mindest
5b9369e93c
Fix typos according to reviewdog report. (#21335)
### Description
Fix typos based on reviewdog report but with some
exceptions/corrections.
2024-07-22 13:37:32 -07:00
Xu Xing
92a8407b39
[js/webgpu] Remove unnecessary initialization of var (#21312)
This var has been initialized to 0 in tint, so no need extra loop to do
it again:
```
  float tint_symbol_52[1][4] = (float[1][4])0;
  {
    for(int tint_symbol_53 = 0; (tint_symbol_53 < 1); tint_symbol_53 = (tint_symbol_53 + 1)) {
      {
        for(int tint_symbol_54 = 0; (tint_symbol_54 < 4); tint_symbol_54 = (tint_symbol_54 + 1)) {
          tint_symbol_52[min(uint(tint_symbol_53), 0u)][min(uint(tint_symbol_54), 3u)] = 0.0f;
        }
      }
    }
  }
```
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-07-12 12:34:34 -07:00
pengwa
88336ffa92
Fix typos - 1st Wave (#21278)
### Description

There are so many typos reported by the review dog, [Optional Lint]
actions (example:
https://github.com/microsoft/onnxruntime/actions/runs/9864564489/job/27239732367),
this PR is to fix some of them.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-07-11 13:35:08 +08:00
Enrico Galli
4c3c809bdb
[js/webnn] Enable user-supplied MLContext (#20600)
### Description
This PR enables the API added in #20816 as well as moving context
creation to JS.

### Motivation and Context
In order to enable I/O Binding with the upcoming
[MLBuffer](https://github.com/webmachinelearning/webnn/issues/542) API
in the WebNN specification, we need to share the same `MLContext` across
multiple sessions. This is because `MLBuffer`s are restricted to the
`MLContext` where they were created. This PR enables developers to use
the same `MLContext` across multiple sessions.
2024-07-08 10:19:39 -07:00
Xu Xing
c3076721f3
[js/webgpu] Support conv3d naive (#20706)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-06-19 10:13:50 -07:00
Yulong Wang
631a2c16be
[js/web] skip default locateFile() when dynamic import is disabled (#21073)
### Description
skip default `locateFile()` when dynamic import is disabled. This allows
the file to work with bundlers to load WebAssembly file correctly if
`env.wasm.wasmPaths` is not set.
2024-06-18 12:21:45 -07:00
Yang Gu
1473d66a00
[js/webgpu] Prefer adapter.info to adapter.requestAdapterInfo (#21065)
WebGPU is deprecating async adapter.requestAdapterInfo, and replacing it
with sync adapter.info.
Spec change: https://github.com/gpuweb/gpuweb/pull/4662
2024-06-18 12:02:38 -07:00
Guenther Schmuelling
c749bd997a
webgpu quickgelu (#20939) 2024-06-06 08:21:33 -07:00
Yulong Wang
ab9f153746
[js/web] allow build target for non dynamic import (#20898)
### Description
<!-- Describe your changes. -->

This PR allows to build ORT web to `ort{.all|.webgpu}.bundle.min.mjs`,
which does not have any dynamic import. This makes it possible to use
ort web via static import in service worker.

Fixes #20876
2024-06-03 12:33:37 -07:00
Yulong Wang
35697d2421
[js/webnn] update API of session options for WebNN (#20816)
### Description

This PR is an API-only change to address the requirements being
discussed in #20729.

There are multiple ways that users may create an ORT session by
specifying the session options differently.

All the code snippet below will use the variable `webnnOptions` as this:
```js
const myWebnnSession = await ort.InferenceSession.create('./model.onnx', {
   executionProviders: [
     webnnOptions
   ]
});
```

### The old way (backward-compatibility)

```js
// all-default, name only
const webnnOptions_0 = 'webnn';

// all-default, properties omitted
const webnnOptions_1 = { name: 'webnn' };

// partial
const webnnOptions_2 = {
  name: 'webnn',
  deviceType: 'cpu'
};

// full
const webnnOptions_3 = {
  name: 'webnn',
  deviceType: 'gpu',
  numThreads: 1,
  powerPreference: 'high-performance'
};
```

### The new way (specify with MLContext)

```js
// options to create MLcontext
const options = {
  deviceType: 'gpu',
  powerPreference: 'high-performance'
};

const myMlContext = await navigator.ml.createContext(options);

// options for session options
const webnnOptions = {
  name: 'webnn',
  context: myMlContext,
  ...options
};
```

This should throw (because no deviceType is specified):
```js
const myMlContext = await navigator.ml.createContext({ ... });
const webnnOptions = {
  name: 'webnn',
  context: myMlContext
};
```

### Interop with WebGPU
```js
// get WebGPU device
const adaptor = await navigator.gpu.requestAdapter({ ... });
const device = await adaptor.requestDevice({ ... });

// set WebGPU adaptor and device
ort.env.webgpu.adaptor = adaptor;
ort.env.webgpu.device = device;

const myMlContext = await navigator.ml.createContext(device);
const webnnOptions = {
  name: 'webnn',
  context: myMlContext,
  gpuDevice: device
};
```

This should throw (because cannot specify both gpu device and MLContext
option at the same time):
```js
const webnnOptions = {
  name: 'webnn',
  context: myMlContext,
  gpuDevice: device,
  deviceType: 'gpu'
};
```
2024-05-31 03:25:14 -07:00
Xu Xing
25ac65375c
[js/webgpu] Fix mha name (#20860)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-30 00:01:06 -07:00
Guenther Schmuelling
33a68d221f
add missing file for pr20791 (#20811)
this file should have been in pr20791 to allow fp16 in the tile
implementation
2024-05-24 09:59:13 -07:00
Satya Kumar Jandhyala
bab5037eab
Eliminate explicit Concat operations in Attention (#20556)
### Description
Remove explicitly concatinating pastKey with Key and pastValue with
Value.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-05-24 09:07:57 -07:00
Xu Xing
f1fef19b6e
[js/webgpu] Support shared memory for transpose 2d (#19267)
For 1024x1024, without shared memoey, 18.7ms. With shared memory 13.2ms.
2024-05-22 08:15:44 -07:00
Yulong Wang
036fcd93d4
[js/web] optimize module export and deployment (#20165)
### Description

This PR make numbers of optimizations to onnxruntime-web's module export
and deployment.

See each section below for more details.

#### Preview

>
[onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21)

> ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~

> ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~

> ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~

<details>
<summary><h4>Breaking changes</h4></summary>

There is no code change required, but there are a few differences
regarding **code import**, **flags**, **bundler config** and
**deployment steps**.

#### Importing:

Import table is changed. See following for details.

<details>
<summary><h5>Current import table:</h5></summary>

| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
  |------|-----|-----|-----|-----|-----|-----|
  | `ort` (default) | `onnxruntime-web` | ✔️ |  | ✔️ | ✔️ |  |
  | `ort.all` | `onnxruntime-web/experimental` | ✔️ | ✔️ | ✔️ | ✔️ |  |
  | `ort.node` | `onnxruntime-web` |  |  | ✔️ |  |  |
| `ort.training` | `onnxruntime-web/training` |  |  | ✔️ |
✔️<sup>\[1]</sup> | ✔️ |
  | `ort.wasm` | `onnxruntime-web/wasm` |  |  | ✔️ | ✔️ |  |
  | `ort.wasm-core` | `onnxruntime-web/wasm-core` |  |  | ✔️ |  |  |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ |  |  | ✔️<sup>\[2]</sup>
|  |
  | `ort.webgpu` | `onnxruntime-web/webgpu` |  | ✔️ | ✔️ | ✔️ |  |

* [1] didn't test. may not actually work.
* [2] not working. this is a mistake in build config.

</details>

<details>
<summary><h5>Proposed update:</h5></summary>

| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
  |------|-----|-----|-----|-----|-----|-----|
  | `ort` (default) | `onnxruntime-web` | ✔️ |  | ✔️ | ✔️ |  |
| `ort.all` |
~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` | ✔️ | ✔️ |
✔️ | ✔️ |  |
  | `ort.node` | `onnxruntime-web` |  |  | ✔️ |  |  |
  | `ort.training` | `onnxruntime-web/training` |  |  | ✔️ | ✔️ | ✔️ |
  | `ort.wasm` | `onnxruntime-web/wasm` |  |  | ✔️ | ✔️ |  |
| ~~`ort.wasm-core`~~ | ~~`onnxruntime-web/wasm-core`~~ | ~~~~ | ~~~~
| ~~✔️~~ | ~~~~ | ~~~~ |
  | `ort.webgl` | `onnxruntime-web/webgl` | ✔️ |  |  | ~~✔️~~  |  |
  | `ort.webgpu` | `onnxruntime-web/webgpu` |  | ✔️ | ✔️ | ✔️ |  |

</details>

#### Flags:

The following flags are deprecated:
- `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in
build.

The following flags changed their type:
- `env.wasm.wasmPaths`: When using this flag as a string ( for the URL
prefix ), nothing is changed. When using this flag as an object ( for
per-file path override ), the type changed:
  ```diff
  -  export interface Old_WasmFilePaths{
  -    'ort-wasm.wasm'?: string;
  -    'ort-wasm-threaded.wasm'?: string;
  -    'ort-wasm-simd.wasm'?: string;
  -    'ort-training-wasm-simd.wasm'?: string;
  -    'ort-wasm-simd-threaded.wasm'?: string;
  -  };
  +  export interface New_WasmFilePaths {
  +    /**
  +     * Specify the override path for the main .wasm file.
  +     *
  +     * This path should be an absolute path.
  +     *
  +     * If not modified, the filename of the .wasm file is:
  +     * - `ort-wasm-simd-threaded.wasm` for default build
+ * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and
WebNN)
  +     * - `ort-training-wasm-simd-threaded.wasm` for training build
  +     */
  +    wasm?: URL|string;
  +    /**
  +     * Specify the override path for the main .mjs file.
  +     *
  +     * This path should be an absolute path.
  +     *
  +     * If not modified, the filename of the .mjs file is:
  +     * - `ort-wasm-simd-threaded.mjs` for default build
+ * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and
WebNN)
  +     * - `ort-training-wasm-simd-threaded.mjs` for training build
  +     */
  +    mjs?: URL|string;
  +  }
  ```

#### Bundler compatibility:

Config changes are need for bundlers. See usage example in
/js/web/test/e2e/ for Webpack, parcel and rollup.

#### Deployment:

- if consuming from a CDN, there is no breaking change.
- if consuming from a local server, need to copy all `ort-*.wasm` and
`ort-*.mjs` files (totally 6 files) in the dist folder. (previously only
need to copy `ort-*.wasm` files.)

</details>
<details>
<summary><h4>Problems</h4></summary>

There are a few problems with the current module export and deployment:

- Script URL cannot be correctly inferred when imported as ESM.
- Workers are forcefully encoded using Blob URL, which makes
onnxruntime-web not working in CSP environment and Node.js, when using
proxy or multi-threading feature.
- Generated JS code (by Emscripten) is encoded using
`function.toString()`, which is unstable and error-prone.
- When running with a different Emscripten build, always need the build
step. Making it difficult to swap artifacts in deveopment/debug.
</details>
<details>
<summary><h4>Goals</h4></summary>

- Full ESM support
- Support variances of ways to import. Including:
- import from HTML's `<script>` tag (IIFE format, exporting to global
variable `ort`)
    ```html
<script
src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script>
    ```
  - import from source code inside `<script type="module">` tag (ESM)
    ```html
    <script type="module">
import * as ort from
"https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs";

      // using 'ort'
    </script>
    ```
- import in a CommonJS project (CJS format, resolve from package.json
"exports" field)
    ```js
    // myProject/main.js
    const ort = require('onnxruntime-web');
    ```
- import in an ESM project (ESM format, resolve from package.json
"exports" field)
    ```js
    // myProject/main.js (or main.mjs)
    import * as ort from 'onnxruntime-web';
    ```
- Support popular bundlers when importing onnxruntime-web into a CJS/ESM
project.
  - webpack (esm requires extra post-process step)
  - rollup
  - parcel (esm requires extra post-process step)
  - More bundlers **TBD**
- Multi-threading support for Node.js

NOTE: keeping single JavaScript file (the all-in-one bundle) is no
longer a goal. This is because technically there is a conflict with the
other requirements.
</details>

<details>
<summary><h4>Important Design Decisions</h4></summary>

- Drop support of single JavaScript output.
- The current onnxruntime-web distribution uses a single JavaScript file
to include all code. While there are a few benefits, it also creates
problems as mentioned above. Since ESM is being used more and more
widely, and browsers are making more restricted security checks and
requirement, the old Blob based solution is going to be replaced.
- To achieve the requirement, specifically, the CSP environment support,
we have to offer a non Blob based solution. Therefore, we have to
distribute multiple files and drop the single file solution.

- Do not run parser/postprocess on Emscripten generated JavaScript.
- Emscripten is evolving quickly so we should only depends on what's in
its documentation instead of a certain implementation details. (for
example, currently we patch on its code to deal with a special variable
`_scriptDir`)
  - Keep the generated files as-is also helps to:
    - reduce the size of ort.min.js
- make it easier to replace build artifacts when in development/debug

- Drop support for non-SIMD and non-MultiThread. This helps to reduce
the number of artifacts in distribution.
  - (fixed-sized) SIMD is supported in any mainstream JS environment.
- Multi-thread as WebAssembly feature is supported in any mainstream JS
environment. In some environment the feature is guarded with cross
origin policy, but it can still work if not trying to create any worker.

- Use ESM output for Emscripten generated JavaScript.
- There are 2 ways to dynamically import classic (umd) modules and
neither of them are recommended:
- dynamically creating a <script> tag. This changes the HTML structure
and have quite a lot of compatibility issue
- use `fetch()` and `eval()`. However `eval` is strongly suggested to be
avoid because there is a great perf hit.
- importing ESM is super easy - just use the `import()` call.
Considering ESM is widely supported in modern browsers and Node.js this
is the better option.

- Add Blob based solution as a fallback for cross-origin workers.
- There are still wide use case of importing onnxruntime-web from CDN.
In this usage, make it able create worker by using `fetch()`+`Blob` to
create a same-origin Blob URL.

</details>

<details>
<summary><h4>Distribution File Manifest</h4></summary>

The distribution folder contains the following files:

- WebAssembly artifacts. These files are the result of compiling the
ONNX Runtime C++ code to WebAssembly by Emscripten.

  | File Name | Build Flags |
  |------|-----|
| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm |
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-training-wasm-simd-threaded.mjs <br/>
ort-training-wasm-simd-threaded.wasm | `--enable_training_apis` <br/>
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm
| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep`
<br/> `--use_webnn` |

- onnxruntime-web JavaScript artifacts. These files are generated by
ESBuild as the entry point for onnxruntime-web.

  There are multiple build targets for different use cases:
  | Target Name | Path for "import" or "require" | Description |
  |------|-----|-----|
  | `ort` | `onnxruntime-web` | The default target. |
  | `ort.all` | `onnxruntime-web/all` | The target including webgl. |
  | `ort.node` | `onnxruntime-web` | The default target for Node.js. |
| `ort.training` | `onnxruntime-web/training` | The target including
training APIs |
| `ort.wasm` | `onnxruntime-web/wasm` | The target including only
WebAssembly (CPU) EP |
| `ort.webgl` | `onnxruntime-web/webgl` | The target including only
WebGL EP |


  For each target, there are multiple files generated:
  | File Name | Description |
  |------|-----|
| [target].js | The entry point for the target. IIFE and CommonJS
format. |
  | [target].mjs | The entry point for the target. ESM format. |
| [target].min.js <br/> [target].min.js.map | The entry point for the
target. Minimized with sourcemap. IIFE and CommonJS format. |
| [target].min.mjs <br/> [target].min.mjs.map | The entry point for the
target. Minimized with sourcemap. ESM format. |
| [target].proxy.mjs | (if appliable) The proxy ESM module for the
target. |
| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map | (if
appliable) The proxy ESM module for the target. Minimized with
sourcemap. |

</details>

<details>
<summary><h4>Dynamic Import Explained</h4></summary>

- Local Served | No Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + import()--> [ort-wasm-simd-threaded.mjs]
                    |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                    |
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
                                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
- Local Served | Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + import()--> [ort.proxy.min.mjs]
                    |
                    + new Worker()--> [ort.proxy.min.mjs (worker)]
                                        |
+ import()--> [ort-wasm-simd-threaded.mjs]
                                                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                                                        |
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
- Cross Origin | No Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + fetch('ort-wasm-simd-threaded.mjs')
        |
        + URL.createObjectURL(res.blob())
        |
        + import()--> [blob:... (ort-wasm-simd-threaded)]
                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                        |
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
                                            |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```

- Cross Origin | Proxy
  ```
  [Bundle or ort.min.js]
    |
    + fetch('ort.proxy.min.mjs')
        |
        + URL.createObjectURL(res.blob())
        |
        + import()--> [blob:... (ort.proxy)]
                        |
+ new Worker()--> [blob:... (ort.proxy) (worker)]
                                            |
+ fetch('ort-wasm-simd-threaded.mjs')
                                                |
+ URL.createObjectURL(res.blob())
                                                |
+ import()--> [blob:... (ort-wasm-simd-threaded)]
                                                                |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                                                                |
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
</details>
2024-05-20 09:51:16 -07:00
Xu Xing
8c59cd4fce
[js/webgpu] Support GroupQueryAttention (#20237)
TODOs:
1. Handle H * params.kvNumHeads greater than work group size limit.
2. Support BNSH kv cache.
2024-05-13 09:43:37 -07:00
Guenther Schmuelling
55a6986d38
optimize skiplayernorm (#20551)
SkipSimplifiedLayerNormalization used in phi3 comes down from 222usec to
14usec
2024-05-08 08:40:03 -07:00
Yi-Hong Lyu
b2481e3602
Bump up version in main from 1.18.0 to 1.19.0 (#20489)
Bump up version in main from 1.18.0 to 1.19.0 since the release branch
has been cut.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-04-29 20:21:41 -07:00
Satya Kumar Jandhyala
99b0e19f11
[JS/WebGPU] MatMulNBits remove unnecessary condition (#20396)
Distribute writing-to-output work over all threads in MatMulNBits.
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-29 14:27:21 -07:00
Satya Kumar Jandhyala
736cbb3925
[JS/WebGU] Support fp16 in Attention by performing the computation in fp32. (#20486)
### Description
Perform computation in fp32 and convert finally to fp16.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-27 08:30:26 -07:00
Satya Kumar Jandhyala
21b3cbc3af
[WIP][JS/WebGPU] Inputs Key and Value could be 4-dims. (#20470)
### Description
The Key and Value inputs could be 4-dims


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-25 13:33:46 -07:00
Satya Kumar Jandhyala
ae78cdb5d7
[JS/WebGPU] MultiheadAttention bugfix (#20447)
### Description
Fixed pastkey, key and pastvalue, value concatenation condition and
fixed index error. Added new test cases.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-24 08:43:14 -07:00
Guenther Schmuelling
33d5ea39b3
[js/webgpu] fixes for fp16 attention (#20440) 2024-04-24 08:01:28 -07:00
Satya Kumar Jandhyala
d42ac7f0c6
[JS/WebGPU] Multihead attention improvements (#20286)
### Description
Enabled more usecases



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-23 12:39:49 -07:00
Guenther Schmuelling
b8e6684313
more conservitive gpu-buffer cache algo (#20312)
tuned based on 80 models to keep performance impact minimal
2024-04-23 09:07:04 -07:00
Guenther Schmuelling
497a627a69
fix fp16 for skiplayernorm (#20381) 2024-04-19 12:12:02 -07:00
Guenther Schmuelling
a8a77ddfdc
fix csum and enable ut (#20355) 2024-04-17 15:01:06 -07:00
Satya Kumar Jandhyala
b33216be4c
[JS/WebGPU] Improve MatMulNBits perf (#19974)
### Description
<!-- Describe your changes. -->
Improve performance using shared memory


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-12 11:03:05 -07:00
Yulong Wang
50bd4571ac
[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277)
### Description
Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for
WebGPU backend.
2024-04-11 14:08:50 -07:00
MasayoshiTsutsui
6a9d8a9030
[js/webgpu] implement DepthToSpace operator in webgpu (#19948)
### Description
This PR supports
[DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace)
operator in webgpu backend.


### Test
We followed the steps described on [this
page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce)
to build, tested with the following commands, and confirmed that it
passed the Model and Op tests that already existed. (Probably, these
test cases were prepared in the past for WebGL backend)
```
~/onnxruntime/js/web>
% npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug   
```
##### NOTE
I want to tell you that the main branch version failed 5 tests for the
resize_upsample_sizes_nearest operator.
Since I didn't touch this issue, those test cases still fail in my
branch as well.
Should I post an issue for this?


### Motivation and Context
Though the DepthToSpace operator plays a crucial role in
super-resolution domains, it was not supported in webgpu backend.
2024-04-10 12:13:46 -07:00
Jiajie Hu
23d3afd4fe
[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209)
### Description

https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding

### Motivation and Context
As per customer request, this helps Phi-2 and Gemma.
2024-04-08 09:11:26 -07:00
Guenther Schmuelling
c529e05e38
fix ConvTranspose 1D (#20194) 2024-04-05 10:05:32 -07:00
Yulong Wang
fa1917b81b
[js/webgpu] add validation to workgroup size (#20110)
### Description
add validation to workgroup size in `shaderHelper.mainStart()`.
2024-04-02 19:29:20 -07:00
Xu Xing
a2998e5d42
[js/webgpu] Use global id in attention and instance-norm (#20008)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-02 01:42:39 -07:00
Nanashi
ca465dc087
[js] Make error friendly when isOrtFormat is undefined (#19958)
### Description
Make error friendly when isOrtFormat is undefined
(`onnxruntime.InferenceSession.create` is called with ArrayBuffer or
Uint8Array).

### Motivation and Context
I was trying to run my onnx model in WebGL EP, but it gave me the error
"Cannot read properties of null (reading 'irVersion')".
I used debugger to find that actual error is `int64 is not supported`,
but the error was invisible for me.
So I made it to show both error when isOrtFormat is undefined.
<s>I haven't written unit test yet, so I'm making it draft. (I have no
idea about how do I test this though...)</s>
[d62d942](d62d9425ba)
2024-03-27 02:07:00 -07:00
Yulong Wang
473434c73f
[js/webgpu] perform uniform consistency check (#20019)
### Description

This PR makes a change in WebGPU backend to validate program uniforms.
It compares the uniform data that comes from the result of
`getRunData()` callback from the program info, with the `ShaderHelper`'s
maintained list of uniform variables.

Fixes a few bugs that found by this check as well.
2024-03-26 17:14:43 -07:00
Satya Kumar Jandhyala
5b64d7c32b
[JS/WebGPU] Use non-matmul implementation for ConvTranspose in channel-first case. (#20022)
### Description
Avoid using vec4 Matmul implementation for ConvTranspose with channel-last



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-23 11:19:14 -07:00
Guenther Schmuelling
c45cff60cf
[js/webgpu] fix maxpool / fp16 (#19981) 2024-03-19 16:15:49 -07:00
Yulong Wang
01c7aaf6aa
[js/webgpu] allow setting env.webgpu.adapter (#19940)
### Description
Allow user to set `env.webgpu.adapter` before creating the first
inference session.

Feature request:
https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753

@xenova
2024-03-19 12:55:00 -07:00
Xu Xing
4c6a6a37f7
[js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387)
The added case will be NAN because of the un-initialized buffer.
2024-03-18 22:59:32 -07:00
Guenther Schmuelling
7e0d424934
accumulate in fp32 for Reduce* (#19868) 2024-03-18 08:28:43 -07:00
Yulong Wang
79e50aeef3
[js/web] rewrite backend resolve to allow multiple EPs (#19735)
### Description

This PR rewrite the backend resolve logic to support specifying multiple
EPs.

#### Backend

The first version of ONNX Runtime Web actually carried some existing
code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes
the "backend" concept. The original "backend" in ONNX.js is designed in
a way assuming there is only one backend from user's backend hint list
will be used. For example, in ONNX.js, if user specify a backend hint as
`['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it
loads successfully (the browser supports webgl), then "webgl" backend
will be used and "wasm" will be ignored; otherwise, "webgl" will be
ignored and try to load "wasm" backend.

In short: only one backend will be used when initializing a session.

#### Execution Provider

Execution Provider, or EP, in ONNX Runtime is a different concept. One
of the differences is that users are allow to specify multiple EPs, and
if one does not support a particular kernel, it can fallback to other
EP. This is a very common case when using a GPU EP in ONNX Runtime.

#### Current Status: Backend v.s. EP

Because of the history reasons mentioned above, the current status is
quite confusing. There are **real backend**s, which means it's different
implementation in code; and there are **backend hint**s, which are used
as string names for backend hint; and there are **EP**s of the ONNX
Runtime concepts.

currently there are only 2 **backend**s in our code base: The "onnxjs
backend", and the "wasm backend". The "onnxjs backend" currently only
powers backend hint "webgl", which go into the old onnx.js code path.
All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu"
and "webnn" are all powered by "wasm backend".

And because ORT Web treat "backend" as an internal concept and want to
align with ONNX Runtime, so those names of backend hints are becoming EP
names.

The following table shows today's status:

| Execution Provider Name (public) / Backend Hint (internal) | Backend |
EP in ORT
| -------- | ------- | ------- |
| "wasm"/"cpu" | WasmBackend | CPU EP
| "webgl" | OnnxjsBackend | \* technically not an EP
| "webgpu" | WasmBackend | JSEP
| "webnn" | WasmBackend | WebNN EP

#### Problem

While the API allows to specify multiple EPs, the backend resolving only
allows one backend. This causes issues when user specify multiple EP
names in session options, the backend resolve behavior and EP
registration behavior is inconsistent. Specifically, in this issue:
https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908:

EP list `['webgpu', 'wasm']` on a browser without WebGPU support
resolves to 'wasm' backend, but the full EP list is passed in session
options, so JSEP is still enabled, causing the runtime error.


#### Solution

Since we still need WebGL backend, we cannot totally remove the backend
register/resolve system. In this PR I made the following changes:
- initialize every backend from the EP list, instead of only do that for
the first successful one.
- for the first resolved backend, filter all EP using the exact same
backend. Remove all EPs not using this backend from session options
- for every explicitly specified EP, if it's removed, show a warning
message in console
2024-03-15 11:47:45 -07:00
Satya Kumar Jandhyala
ed250b88c3
[JS/WebGPU] Optimize MatMulNBits (#19852)
### Description
Use vec<2> or vec<4>, operands in MatMulNBits


### Motivation and Context
Improve performance
2024-03-13 10:33:14 -07:00
Yang Gu
53de2d8cb0
[js/webgpu] Enable GroupedConvVectorize path (#19791)
Vectorize met 2 failed cases in a CI bot with NVIDIA GPU, but we
couldn't repro with all the GPUs at hand, including NVIDIA GPUs. This PR
introduces GPUAdapterInfo and enables this opt on non-NVIDIA GPUs to
make the bots happy.
No obivous perf gain can be seen if we enable vectorize on NVIDIA.
However, it shows big perf improvement on Intel. On my Gen12 Intel GPU,
mobilenetv2-12 perf was improved from 11.14ms to 7.1ms.
2024-03-12 22:25:07 -07:00
Yulong Wang
4538d31a8b
[js/webgpu] expose a few properties in WebGPU API (#19857)
### Description
This change exposes a few properties in `ort.env.webgpu` to resolve
feature requirement mentioned in properties in
https://github.com/microsoft/onnxruntime/pull/14579#discussion_r1519612619.

- Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`,
to allow users to set the value of the properties before the first
inference session is created.
- Add readonly property `adapter` in `ort.env.webgpu` to allow users to
get the adapter instance. Now users can access `ort.env.webgpu.device`
and `ort.env.webgpu.adapter`.

@xenova @beaufortfrancois
2024-03-12 19:50:51 -07:00
Satya Kumar Jandhyala
24b72d2613
[JS/WebGPU] Preserve zero size input tensor dims. (#19737)
### Description
For Concat operation, the zero-size input tensor shape need to be
preserved and, unlike non-zero tensors, the dims are not constrained to
match other input tensors' dims.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-07 19:07:49 -08:00
Yulong Wang
f06164ef8b
[js/web] transfer input buffer back to caller thread (#19677)
### Description

When using proxy worker, input buffers should be transferred back to the
caller thread after `run()` call is done.

Fixes #19488
2024-03-01 14:50:06 -08:00
Yulong Wang
3cb81cdde2
[js/common] move 'env.wasm.trace' to 'env.trace' (#19617)
### Description

Try to move 'env.wasm.trace' to 'env.trace' to make it less confusing,
because it also works in webgpu. Marked 'env.wasm.trace' as deprecated.
2024-02-27 11:07:15 -08:00
Guenther Schmuelling
bb43a0f133
[js/webgpu] minor fixes to make tinyllama work (#19564) 2024-02-23 15:45:30 -08:00
Yulong Wang
aec2389ad0
[js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614)
### Description
This PR allows zero-sized output.

To make the implementation simple, it does not support partial
zero-sized tensor. Which means, either all outputs are zero-sized, or an
error will be reported.

added 2 tests:
 - op test of `Add` with input T[2,0] T[2,1], and
 - test_split_zero_size_splits
2024-02-23 12:52:47 -08:00
satyajandhyala
ae3d73c981
[JS/WebGPU] Fix Split and Where to handle corner cases. (#19613)
### Description
<!-- Describe your changes. -->
1. Fix Where operator to handle Boolean input less than 4 bytes.
2. Fix JSEP test harness to use tensor names consistently.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-23 00:21:15 -08:00
Xu Xing
fe82fccf1a
[js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596)
This is used in sam-h-decoder-f16.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-22 13:09:28 -08:00
Yulong Wang
58f4921686
[js] changes to allow Float16Array if any polyfill is available (#19305)
### Description

This change adds only necessary code to enable ort-web works with any
Float16Array polyfill. Unlike #19302, in this PR, ort-web does not
include any specific polyfill; instead, it's user's choice for how to
use a polyfill.

ORT-web uses Float16Array if it's available; otherwise, fallback to use
Uint16Array.

```js
// case 1: user does not use polyfill:
import * as ort from 'onnxruntime-web';

const myF16Data = new Uint16Array(...);  // need to use Uint16Array
const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
```

```js
// case 2: user use polyfill:
import * as ort from 'onnxruntime-web';
import {
  Float16Array, isFloat16Array, isTypedArray,
  getFloat16, setFloat16,
  f16round,
} from "@petamoriken/float16";
globalThis.Float16Array = Float16Array;  // ort-web will pick the global Float16Array

const myF16Data = new Float16Array(...);  // Use the polyfilled Float16Array type
const myF16tensor = new ort.Tensor('float16', myF16Data, dims);
```
2024-02-21 00:31:06 -08:00
Yulong Wang
3fe2c137ee
[js] small fix to workaround formatter (#19400)
### Description
Rename shader variable names to snake_case naming and also to avoid
formatter behaving inconsistently in win/linux.
2024-02-20 17:23:01 -08:00
Jiajie Hu
1b48054e1b
[js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
### Description
This is required to make shape uniforms really work.

### Motivation and Context
The bug was unveiled in a model with multiple Split nodes. The later
nodes would try to reuse a previous pipeline cache, while the old shapes
were hardcoded as constants in cache.
2024-02-20 09:24:34 -08:00
satyajandhyala
dfeda9019c
[JS/WebGPU] Add MatMulNBits (#19446)
### Description
Add MatMulNBits to support MatMul using 4-bit quantized weights



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-02-17 09:19:17 -08:00
Yulong Wang
06269a3952
[js/webgpu] allow uint8 tensors for webgpu (#19545)
### Description
allow uint8 tensors for webgpu
2024-02-16 18:28:27 -08:00
Yulong Wang
5ff27ef02a
[js/webgpu] support customop FastGelu (#19392)
### Description
Support WebGPU custom operator FastGelu.
2024-02-06 09:07:31 -08:00
Jiajia Qin
ccbe264a39
[js/webgpu] Add LeakyRelu activation for fusedConv (#19369)
### Description
This PR 1) adds LeakyRelu activation for fusedConv; 2) makes `vec4<f16>`
value work with `float32` uniforms attributes.

For example:
`clamp(value, vec4<f16>(uniforms.clip_min),
vec4<f16>(uniforms.clip_max)` will throw compilation errors since
`uniforms.clip_min` and `uniforms.clip_min` are `f32` not `f16`. So we
need to change it to `clamp(value, vec4<f16>(f16(uniforms.clip_min)),
vec4<f16>(f16(uniforms.clip_max))`

And above problem was introduced when we make activation attributes as
uniforms instead of constant.

BTW, after adding LeakyRelu, `realesrgan-t256` model can pass.
2024-02-02 09:06:38 -08:00
Jiajia Qin
efc17e79de
[js/webgpu] Fix the undefined push error (#19366)
### Description
This PR fixes below errors when enable webgpu profiling: 
```
TypeError: Cannot read properties of undefined (reading 'push')
```
2024-02-02 02:04:06 -08:00
Xu Xing
3a2ab1963a
[js/webgpu] Refactor createTensorShapeVariables (#18883) 2024-02-01 17:59:00 -08:00
Yulong Wang
dd1f6ccc45
[js/webgpu] resolve codescan alert (#19343)
### Description
resolve codescan alert:
https://github.com/microsoft/onnxruntime/security/code-scanning/17687
2024-01-30 21:06:21 -08:00
Xu Xing
d73131cf0f
[js/webgpu] Use DataType as uniform cpu type (#19281)
This saves turning data type to string by tensorDataTypeEnumToString.
2024-01-30 21:05:08 -08:00
Jiajia Qin
85cef0af8c
[js/webgpu] Support capture and replay for jsep (#18989)
### Description
This PR expands the graph capture capability to JS EP, which is similar
to #16081. But for JS EP, we don't use the CUDA Graph, instead, we
records all gpu commands and replay them, which removes most of the cpu
overhead to avoid the the situation that gpu waiting for cpu.

mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from
4.58ms on Intel A770.

All limitations are similar with CUDA EP:
1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
2. Usage of graph capture is limited to models where-in all ops in the
model can be partitioned to the JS EP or CPU EP and no memory copy
between them.
3. Shapes of inputs/outputs cannot change across inference calls.
4. IObinding is required.

The usage is like below:
Method 1: specify outputs buffers explicitly.
```
    const sessionOptions = {
        executionProviders: [
          {
            name: "webgpu",
          },
        ],
        enableGraphCapture: true,
      };
    const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions);
   
    // prepare the inputBuffer/outputBuffer
    ... ...

   const feeds = {
       'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims })
   };

   const fetches = {
       'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] })
   };

   let results = await session.run(feeds, fetches);  // The first run will begin to capture the graph.

   // update inputBuffer content
  ... ...
   results = = await session.run(feeds, fetches);  // The 2ed run and after will directly call replay to execute the graph.

  ... ...
   session.release();
```
Method 2: Don't specify outputs buffers explicitly. Internally, when
graph capture is enabled, it will set all outputs location to
'gpu-buffer'.
```
    const sessionOptions = {
        executionProviders: [
          {
            name: "webgpu",
          },
        ],
        enableGraphCapture: true,
      };
    const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions);

    // prepare the inputBuffer
    ... ...

   const feeds = {
       'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims })
   };

   let results = await session.run(feeds);  // The first run will begin to capture the graph.
   
   // update inputBuffer content
  ... ...
   results = = await session.run(feeds);  // The 2ed run and after will directly call replay to execute the graph.

  ... ...
   session.release();
2024-01-30 18:28:03 -08:00
Jiajia Qin
90883a366a
[js/webgpu] Add hardSigmoid activation for fusedConv (#19233)
### Description
Add hardSigmoid activation for fusedConv. It will be used by
mobilenetv3-small-100 model.
2024-01-30 16:28:53 -08:00
Xu Xing
624b4e2063
[js/webgpu] Remove enableShapesUniforms (#19279) 2024-01-29 17:49:06 -08:00
Guenther Schmuelling
9e69606360
fix f16 for attention, enable slice and flatten for more types (#19262) 2024-01-29 10:13:46 -08:00
Xu Xing
a3f0e2422b
[js/webgpu] Support f16 uniform (#19098)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-25 16:58:22 -08:00
Xu Xing
656ca66186
[js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753) 2024-01-25 15:37:05 -08:00
Jiajie Hu
5b06505073
[js/webgpu] Fix Tanh explosion (#19201)
### Description
```math
\tanh(x)=\frac{e^x-e^{-x}}{e^x+e^{-x}}=
\left\{
\begin{array}{cc}
-\frac{1-e^{-2\cdot(-x)}}{1+e^{-2\cdot(-x)}}, & x<0 \\
0, & x=0 \\
\frac{1-e^{-2x}}{1+e^{-2x}}, & x>0
\end{array}
\right.
```

### Motivation and Context
On some platforms,
$$\tanh(1000)=\frac{e^{1000}-e^{-1000}}{e^{1000}+e^{-1000}}$$ would
produce NaN instead of 0.999... or 1 (imagine $e^{1000}=\infty$ and
$\frac{\infty}{\infty}$ explodes).
2024-01-25 08:25:35 -08:00
Wanming Lin
7252c6e747
[WebNN EP] Support WebNN async API with Asyncify (#19145) 2024-01-24 15:37:35 -08:00
Yang Gu
591f90c0b9
[js/webgpu] Fix issue of timestamp query (#19258)
When we enable webgpu profiling mode between session.create and
session.run, current implementation has a problem to create querySet
(and also queryResolveBuffer) if we share the commandEncoder with inputs
upload. This PR fixes this by moving the querySet creation to the place
we set queryType.
2024-01-24 14:49:37 -08:00
satyajandhyala
a33b5bd1fa
[JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788)
### Description
Added Uniforms to SkipLayerNorm



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve performance

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-01-25 01:12:21 +05:30
Jiajia Qin
d226e40856
[js/webgpu] set query type in onRunStart (#19202)
### Description
<!-- Describe your changes. -->
`env.webgpu.profiling` is a global flag. It may change before each
session.run. So the best place is to update it in `onRunStart` event.
After this, we can directly check `this.queryType`'s value. Without this
pr, we need to make sure that `getCommandEncoder()` is called before
checking `this.queryType`. Otherwise, it may happen that
`pendingKernels`'s length is not equal to `pendingDispatchNumber`'s
length. See the two ugly workarounds
[1)](e630dbf528 (diff-006fc84d3997f96a29b8033bd2075d6a0a9509211bd5812a6b934fc74fedfd9dR267-R268))
and
[2)](e630dbf528 (diff-618fe297fbe7a1da586380163b8fd2627311ccc217640a3c5cdc9c17a33472c1R73-R80))
if we don't introduce `onRunStart`. Or we need to call `setQueryType` in
each kernel run.
2024-01-22 16:08:55 -08:00
Jiajia Qin
2e0a388c36
[js/webgpu] Add HardSigmoid support (#19215)
### Description
This op is required in mobilenetv3-small-100. With this PR,
mobilenetv3-small-100 model becomes less than 10 ms from over 100 ms on
ADL.
2024-01-22 15:53:26 -08:00
Yulong Wang
f87e69801f
[js/web] show warning when numThreads is set but threads is not supported (#19179)
### Description
show warning when numThreads is set but threads is not supported.
Resolves #19148, #18933

for web: when crossOriginIsolated is false.
for node: always disable.
2024-01-17 15:04:22 -08:00
Yulong Wang
146ebaf91e
[js/web] allow proxy to load model with 1GB <= size < 2GB (#19178)
### Description

allow proxy to load model with 1GB <= size < 2GB

resolves #19157.
2024-01-17 15:03:43 -08:00
Rachel Guo
bd9d8fb2a5
[ORT 1.17.0 release] Bump up version to 1.18.0 (#19170)
### Description
<!-- Describe your changes. -->

Bump up version to 1.18.0 since the release branch has been cut.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2024-01-17 11:18:32 -08:00
Guenther Schmuelling
9dee543bed
fix gemm beta for fp16 (#19153)
per onnx spec beta is always fp32 so we need to cast it
2024-01-15 18:40:38 -08:00
Yulong Wang
f917dde717
[web] remove xnnpack from web backends (#19116)
### Description
XNNPACK is already disabled in web assembly build. This change removes
the xnnpack backend registration in JS.
2024-01-13 23:04:02 -08:00
Yang Gu
e803f8eb0f
[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894)
We submit kernels in a batch (a fixed number 16 is used except for the
last batch) for better performance. However, timestamp query support is
at pass level so we disable the batch execution in profiling mode in
previous implementation. Actually we can have multiple passes in a batch
so that we don't have to disable batch execution, which is the first
enhancement of this PR.
Furthermore, WebGPU has an extension to support timestamp query inside
passes, which isn't supported by all the platforms (e.g., Windows
supports it, while macOS doesn't). This is expected to have lower cost
compared with multiple passes solution. So this PR also introduce this
support when available.
This PR also refactors some implementation related to kernelInfo, and
try to unify the related kernel names.
2024-01-13 00:23:17 -08:00
Yulong Wang
07cfc56538
[js] enable external data loading for ort-web (#19087)
### Description
enable external data loading for ort-web.

### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.

### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
  // session options
  externalData: [
    {
      // relative or absolute path/URL of the file,
      // or a pre-loaded Uint8Array containing the data of the external data file
      data: './path/data/b.bin', 

      // the relative path of the external data. Should match initializers' "location" value defined in the model file
      path: './b.bin'
    },
    // { } if multiple external data file
  ]
});
```

Currently, this feature only works with JSEP build enabled.
2024-01-12 19:24:24 -08:00
Guenther Schmuelling
a756017e9f
[js/webgpu] more fixes for access above 2GB (#19065)
when jsep calls javascript with an index to HEAP8 or HEAP32 the index is
negative when the heap is above 2GB, even if we pass it as uint32_t it
remains negative. So in javascript use >>> 0 to make it unsigned.
2024-01-12 17:47:37 -08:00
Guenther Schmuelling
4a5f13b681
fix resize for fp16 (#19110)
resize for fp16 has 2 issues: scales are always f32 and roi can be f32
or f16.
scales:
this is fixed.

roi
this is fixed for the case where roi is not passed as optional input
with f16. To fix this it requires a much larger change and I did not
want to risk this short before a release. For all practical purpose
passing roi as input with f16 should be rare and we can fix it in the
near future.
2024-01-12 13:44:28 -08:00
Jiajie Hu
acba63c36a
[js/webgpu] Change A/sqrt(B) to A*inverseSqrt(B) in normalization ops (#19101)
### Description
Change `A / sqrt(B)` to `A * inverseSqrt(B)` in BatchNormalization,
InstanceNormalization, LayerNormalization and SkipLayerNormalization.

### Motivation and Context
For the same reason as the existence of the `inverseSqrt` built-in in
WebGPU spec.
2024-01-12 00:08:16 -08:00
Guenther Schmuelling
d0bac8216d
[js/webgpu] fix bcast in where (#19009) 2024-01-11 12:13:24 -08:00
Jiajia Qin
a89db01fce
[js/webgpu] disable GroupedConvVectorize path (#19090)
Disable createGroupedConvVectorizeProgramInfo path due to bots failures
on below two cases:
[webgpu]Conv - conv - vectorize group - B
[webgpu]Conv - conv - vectorize group - D
2024-01-11 08:13:14 -08:00
Jiajia Qin
fd6bab4250
[js/webgpu] Provide a vectorized algorithm for GroupedConv (#18884)
### Description
This PR provides a vectorized algorithm for NHWC GroupedConv to improve
performance.

The aggregate time of GroupedConv in mobilenetv2-12 becomes ~1ms from
~4ms on Intel Alder Lake machine. About 20% improvement for the whole
model.
2024-01-10 16:12:43 -08:00
Xu Xing
ed0f26d3d4
[js/webgpu] Revert parse norm attributes (#19074)
This resolves the below build errors:
```
lib/wasm/jsep/webgpu/op-resolve-rules.ts:19:23 - error TS2724: '"./ops/instance-norm"' has no exported member named 'parseInstanceNormAttributes'. Did you mean 'InstanceNormAttributes'?

19 import {instanceNorm, parseInstanceNormAttributes} from './ops/instance-norm';
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~

lib/wasm/jsep/webgpu/op-resolve-rules.ts:19:23 - error TS6133: 'parseInstanceNormAttributes' is declared but its value is never read.

19 import {instanceNorm, parseInstanceNormAttributes} from './ops/instance-norm';
                         ~~~~~~~~~~~~~~~~~~~~~~~~~~~

lib/wasm/jsep/webgpu/op-resolve-rules.ts:20:20 - error TS2305: Module '"./ops/layer-norm"' has no exported member 'parseLayerNormAttributes'.

20 import {layerNorm, parseLayerNormAttributes} from './ops/layer-norm';
                      ~~~~~~~~~~~~~~~~~~~~~~~~

lib/wasm/jsep/webgpu/op-resolve-rules.ts:20:20 - error TS6133: 'parseLayerNormAttributes' is declared but its value is never read.

20 import {layerNorm, parseLayerNormAttributes} from './ops/layer-norm';
```
2024-01-09 20:58:50 -08:00
Xu Xing
76dfe5347c
[js/webgpu] Support uniforms for instance-norm (#18929)
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
2024-01-09 14:56:00 -08:00
zesongw
ad6dd0a597
[WebNN] Enable npm unit tests (#18486)
### Description
- Support more test cases for WebNN EP in suite-test-list.jsonc
- Add DISABLE_WEBNN flag in build.ts as preparing for WebNN EP release
- Add test option: '--webnn-device-type' in test-runner-args-cli.ts to
support running WebNN 'gpu' deviceType
- Use Chrome Stable as default browser for WebNN testing to unblock the
CI limitation.
2024-01-09 10:10:57 -08:00
Xu Xing
557ac74c05
[js/webgpu] Support gemm uniforms (#19056)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-09 09:57:06 -08:00
Xu Xing
42ba2aed54
[js/webgpu] Support pad uniforms (#19057)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-01-09 09:34:56 -08:00
Xu Xing
eb92681bfb
[js/webgpu] Support range uniforms (#19055) 2024-01-09 09:33:57 -08:00