2021-04-16 08:33:10 +00:00
|
|
|
// Copyright (c) Microsoft Corporation. All rights reserved.
|
|
|
|
|
// Licensed under the MIT License.
|
|
|
|
|
|
2023-06-12 19:05:11 +00:00
|
|
|
import {env as envImpl} from './env-impl.js';
|
2023-03-22 22:05:04 +00:00
|
|
|
|
2021-05-07 19:12:37 +00:00
|
|
|
export declare namespace Env {
|
[js/web] optimize module export and deployment (#20165)
### Description
This PR make numbers of optimizations to onnxruntime-web's module export
and deployment.
See each section below for more details.
#### Preview
>
[onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21)
> ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~
> ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~
> ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~
<details>
<summary><h4>Breaking changes</h4></summary>
There is no code change required, but there are a few differences
regarding **code import**, **flags**, **bundler config** and
**deployment steps**.
#### Importing:
Import table is changed. See following for details.
<details>
<summary><h5>Current import table:</h5></summary>
| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
|------|-----|-----|-----|-----|-----|-----|
| `ort` (default) | `onnxruntime-web` | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.all` | `onnxruntime-web/experimental` | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
| `ort.node` | `onnxruntime-web` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.training` | `onnxruntime-web/training` | ❌ | ❌ | ✔️ |
✔️<sup>\[1]</sup> | ✔️ |
| `ort.wasm` | `onnxruntime-web/wasm` | ❌ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.wasm-core` | `onnxruntime-web/wasm-core` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ | ❌ | ❌ | ✔️<sup>\[2]</sup>
| ❌ |
| `ort.webgpu` | `onnxruntime-web/webgpu` | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
* [1] didn't test. may not actually work.
* [2] not working. this is a mistake in build config.
</details>
<details>
<summary><h5>Proposed update:</h5></summary>
| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
|------|-----|-----|-----|-----|-----|-----|
| `ort` (default) | `onnxruntime-web` | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.all` |
~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` | ✔️ | ✔️ |
✔️ | ✔️ | ❌ |
| `ort.node` | `onnxruntime-web` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.training` | `onnxruntime-web/training` | ❌ | ❌ | ✔️ | ✔️ | ✔️ |
| `ort.wasm` | `onnxruntime-web/wasm` | ❌ | ❌ | ✔️ | ✔️ | ❌ |
| ~~`ort.wasm-core`~~ | ~~`onnxruntime-web/wasm-core`~~ | ~~❌~~ | ~~❌~~
| ~~✔️~~ | ~~❌~~ | ~~❌~~ |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ | ❌ | ❌ | ~~✔️~~ ❌ | ❌ |
| `ort.webgpu` | `onnxruntime-web/webgpu` | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
</details>
#### Flags:
The following flags are deprecated:
- `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in
build.
The following flags changed their type:
- `env.wasm.wasmPaths`: When using this flag as a string ( for the URL
prefix ), nothing is changed. When using this flag as an object ( for
per-file path override ), the type changed:
```diff
- export interface Old_WasmFilePaths{
- 'ort-wasm.wasm'?: string;
- 'ort-wasm-threaded.wasm'?: string;
- 'ort-wasm-simd.wasm'?: string;
- 'ort-training-wasm-simd.wasm'?: string;
- 'ort-wasm-simd-threaded.wasm'?: string;
- };
+ export interface New_WasmFilePaths {
+ /**
+ * Specify the override path for the main .wasm file.
+ *
+ * This path should be an absolute path.
+ *
+ * If not modified, the filename of the .wasm file is:
+ * - `ort-wasm-simd-threaded.wasm` for default build
+ * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and
WebNN)
+ * - `ort-training-wasm-simd-threaded.wasm` for training build
+ */
+ wasm?: URL|string;
+ /**
+ * Specify the override path for the main .mjs file.
+ *
+ * This path should be an absolute path.
+ *
+ * If not modified, the filename of the .mjs file is:
+ * - `ort-wasm-simd-threaded.mjs` for default build
+ * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and
WebNN)
+ * - `ort-training-wasm-simd-threaded.mjs` for training build
+ */
+ mjs?: URL|string;
+ }
```
#### Bundler compatibility:
Config changes are need for bundlers. See usage example in
/js/web/test/e2e/ for Webpack, parcel and rollup.
#### Deployment:
- if consuming from a CDN, there is no breaking change.
- if consuming from a local server, need to copy all `ort-*.wasm` and
`ort-*.mjs` files (totally 6 files) in the dist folder. (previously only
need to copy `ort-*.wasm` files.)
</details>
<details>
<summary><h4>Problems</h4></summary>
There are a few problems with the current module export and deployment:
- Script URL cannot be correctly inferred when imported as ESM.
- Workers are forcefully encoded using Blob URL, which makes
onnxruntime-web not working in CSP environment and Node.js, when using
proxy or multi-threading feature.
- Generated JS code (by Emscripten) is encoded using
`function.toString()`, which is unstable and error-prone.
- When running with a different Emscripten build, always need the build
step. Making it difficult to swap artifacts in deveopment/debug.
</details>
<details>
<summary><h4>Goals</h4></summary>
- Full ESM support
- Support variances of ways to import. Including:
- import from HTML's `<script>` tag (IIFE format, exporting to global
variable `ort`)
```html
<script
src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script>
```
- import from source code inside `<script type="module">` tag (ESM)
```html
<script type="module">
import * as ort from
"https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs";
// using 'ort'
</script>
```
- import in a CommonJS project (CJS format, resolve from package.json
"exports" field)
```js
// myProject/main.js
const ort = require('onnxruntime-web');
```
- import in an ESM project (ESM format, resolve from package.json
"exports" field)
```js
// myProject/main.js (or main.mjs)
import * as ort from 'onnxruntime-web';
```
- Support popular bundlers when importing onnxruntime-web into a CJS/ESM
project.
- webpack (esm requires extra post-process step)
- rollup
- parcel (esm requires extra post-process step)
- More bundlers **TBD**
- Multi-threading support for Node.js
NOTE: keeping single JavaScript file (the all-in-one bundle) is no
longer a goal. This is because technically there is a conflict with the
other requirements.
</details>
<details>
<summary><h4>Important Design Decisions</h4></summary>
- Drop support of single JavaScript output.
- The current onnxruntime-web distribution uses a single JavaScript file
to include all code. While there are a few benefits, it also creates
problems as mentioned above. Since ESM is being used more and more
widely, and browsers are making more restricted security checks and
requirement, the old Blob based solution is going to be replaced.
- To achieve the requirement, specifically, the CSP environment support,
we have to offer a non Blob based solution. Therefore, we have to
distribute multiple files and drop the single file solution.
- Do not run parser/postprocess on Emscripten generated JavaScript.
- Emscripten is evolving quickly so we should only depends on what's in
its documentation instead of a certain implementation details. (for
example, currently we patch on its code to deal with a special variable
`_scriptDir`)
- Keep the generated files as-is also helps to:
- reduce the size of ort.min.js
- make it easier to replace build artifacts when in development/debug
- Drop support for non-SIMD and non-MultiThread. This helps to reduce
the number of artifacts in distribution.
- (fixed-sized) SIMD is supported in any mainstream JS environment.
- Multi-thread as WebAssembly feature is supported in any mainstream JS
environment. In some environment the feature is guarded with cross
origin policy, but it can still work if not trying to create any worker.
- Use ESM output for Emscripten generated JavaScript.
- There are 2 ways to dynamically import classic (umd) modules and
neither of them are recommended:
- dynamically creating a <script> tag. This changes the HTML structure
and have quite a lot of compatibility issue
- use `fetch()` and `eval()`. However `eval` is strongly suggested to be
avoid because there is a great perf hit.
- importing ESM is super easy - just use the `import()` call.
Considering ESM is widely supported in modern browsers and Node.js this
is the better option.
- Add Blob based solution as a fallback for cross-origin workers.
- There are still wide use case of importing onnxruntime-web from CDN.
In this usage, make it able create worker by using `fetch()`+`Blob` to
create a same-origin Blob URL.
</details>
<details>
<summary><h4>Distribution File Manifest</h4></summary>
The distribution folder contains the following files:
- WebAssembly artifacts. These files are the result of compiling the
ONNX Runtime C++ code to WebAssembly by Emscripten.
| File Name | Build Flags |
|------|-----|
| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm |
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-training-wasm-simd-threaded.mjs <br/>
ort-training-wasm-simd-threaded.wasm | `--enable_training_apis` <br/>
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm
| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep`
<br/> `--use_webnn` |
- onnxruntime-web JavaScript artifacts. These files are generated by
ESBuild as the entry point for onnxruntime-web.
There are multiple build targets for different use cases:
| Target Name | Path for "import" or "require" | Description |
|------|-----|-----|
| `ort` | `onnxruntime-web` | The default target. |
| `ort.all` | `onnxruntime-web/all` | The target including webgl. |
| `ort.node` | `onnxruntime-web` | The default target for Node.js. |
| `ort.training` | `onnxruntime-web/training` | The target including
training APIs |
| `ort.wasm` | `onnxruntime-web/wasm` | The target including only
WebAssembly (CPU) EP |
| `ort.webgl` | `onnxruntime-web/webgl` | The target including only
WebGL EP |
For each target, there are multiple files generated:
| File Name | Description |
|------|-----|
| [target].js | The entry point for the target. IIFE and CommonJS
format. |
| [target].mjs | The entry point for the target. ESM format. |
| [target].min.js <br/> [target].min.js.map | The entry point for the
target. Minimized with sourcemap. IIFE and CommonJS format. |
| [target].min.mjs <br/> [target].min.mjs.map | The entry point for the
target. Minimized with sourcemap. ESM format. |
| [target].proxy.mjs | (if appliable) The proxy ESM module for the
target. |
| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map | (if
appliable) The proxy ESM module for the target. Minimized with
sourcemap. |
</details>
<details>
<summary><h4>Dynamic Import Explained</h4></summary>
- Local Served | No Proxy:
```
[Bundle or ort.min.js]
|
+ import()--> [ort-wasm-simd-threaded.mjs]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Local Served | Proxy:
```
[Bundle or ort.min.js]
|
+ import()--> [ort.proxy.min.mjs]
|
+ new Worker()--> [ort.proxy.min.mjs (worker)]
|
+ import()--> [ort-wasm-simd-threaded.mjs]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Cross Origin | No Proxy:
```
[Bundle or ort.min.js]
|
+ fetch('ort-wasm-simd-threaded.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort-wasm-simd-threaded)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Cross Origin | Proxy
```
[Bundle or ort.min.js]
|
+ fetch('ort.proxy.min.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort.proxy)]
|
+ new Worker()--> [blob:... (ort.proxy) (worker)]
|
+ fetch('ort-wasm-simd-threaded.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort-wasm-simd-threaded)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
</details>
2024-05-20 16:51:16 +00:00
|
|
|
export type WasmPathPrefix = string;
|
|
|
|
|
export interface WasmFilePaths {
|
|
|
|
|
/**
|
|
|
|
|
* Specify the override path for the main .wasm file.
|
|
|
|
|
*
|
|
|
|
|
* This path should be an absolute path.
|
|
|
|
|
*
|
|
|
|
|
* If not modified, the filename of the .wasm file is:
|
|
|
|
|
* - `ort-wasm-simd-threaded.wasm` for default build
|
|
|
|
|
* - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and WebNN)
|
|
|
|
|
* - `ort-training-wasm-simd-threaded.wasm` for training build
|
|
|
|
|
*/
|
|
|
|
|
wasm?: URL|string;
|
|
|
|
|
/**
|
|
|
|
|
* Specify the override path for the main .mjs file.
|
|
|
|
|
*
|
|
|
|
|
* This path should be an absolute path.
|
|
|
|
|
*
|
|
|
|
|
* If not modified, the filename of the .mjs file is:
|
|
|
|
|
* - `ort-wasm-simd-threaded.mjs` for default build
|
|
|
|
|
* - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and WebNN)
|
|
|
|
|
* - `ort-training-wasm-simd-threaded.mjs` for training build
|
|
|
|
|
*/
|
|
|
|
|
mjs?: URL|string;
|
|
|
|
|
}
|
|
|
|
|
export type WasmPrefixOrFilePaths = WasmPathPrefix|WasmFilePaths;
|
2021-05-07 19:12:37 +00:00
|
|
|
export interface WebAssemblyFlags {
|
|
|
|
|
/**
|
|
|
|
|
* set or get number of thread(s). If omitted or set to 0, number of thread(s) will be determined by system. If set
|
|
|
|
|
* to 1, no worker thread will be spawned.
|
|
|
|
|
*
|
|
|
|
|
* This setting is available only when WebAssembly multithread feature is available in current context.
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
|
|
|
|
* @defaultValue `0`
|
2021-05-07 19:12:37 +00:00
|
|
|
*/
|
|
|
|
|
numThreads?: number;
|
|
|
|
|
|
2021-06-08 06:24:27 +00:00
|
|
|
/**
|
|
|
|
|
* set or get a boolean value indicating whether to enable SIMD. If set to false, SIMD will be forcely disabled.
|
|
|
|
|
*
|
|
|
|
|
* This setting is available only when WebAssembly SIMD feature is available in current context.
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
[js/web] optimize module export and deployment (#20165)
### Description
This PR make numbers of optimizations to onnxruntime-web's module export
and deployment.
See each section below for more details.
#### Preview
>
[onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21)
> ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~
> ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~
> ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~
<details>
<summary><h4>Breaking changes</h4></summary>
There is no code change required, but there are a few differences
regarding **code import**, **flags**, **bundler config** and
**deployment steps**.
#### Importing:
Import table is changed. See following for details.
<details>
<summary><h5>Current import table:</h5></summary>
| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
|------|-----|-----|-----|-----|-----|-----|
| `ort` (default) | `onnxruntime-web` | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.all` | `onnxruntime-web/experimental` | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
| `ort.node` | `onnxruntime-web` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.training` | `onnxruntime-web/training` | ❌ | ❌ | ✔️ |
✔️<sup>\[1]</sup> | ✔️ |
| `ort.wasm` | `onnxruntime-web/wasm` | ❌ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.wasm-core` | `onnxruntime-web/wasm-core` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ | ❌ | ❌ | ✔️<sup>\[2]</sup>
| ❌ |
| `ort.webgpu` | `onnxruntime-web/webgpu` | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
* [1] didn't test. may not actually work.
* [2] not working. this is a mistake in build config.
</details>
<details>
<summary><h5>Proposed update:</h5></summary>
| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
|------|-----|-----|-----|-----|-----|-----|
| `ort` (default) | `onnxruntime-web` | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.all` |
~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` | ✔️ | ✔️ |
✔️ | ✔️ | ❌ |
| `ort.node` | `onnxruntime-web` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.training` | `onnxruntime-web/training` | ❌ | ❌ | ✔️ | ✔️ | ✔️ |
| `ort.wasm` | `onnxruntime-web/wasm` | ❌ | ❌ | ✔️ | ✔️ | ❌ |
| ~~`ort.wasm-core`~~ | ~~`onnxruntime-web/wasm-core`~~ | ~~❌~~ | ~~❌~~
| ~~✔️~~ | ~~❌~~ | ~~❌~~ |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ | ❌ | ❌ | ~~✔️~~ ❌ | ❌ |
| `ort.webgpu` | `onnxruntime-web/webgpu` | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
</details>
#### Flags:
The following flags are deprecated:
- `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in
build.
The following flags changed their type:
- `env.wasm.wasmPaths`: When using this flag as a string ( for the URL
prefix ), nothing is changed. When using this flag as an object ( for
per-file path override ), the type changed:
```diff
- export interface Old_WasmFilePaths{
- 'ort-wasm.wasm'?: string;
- 'ort-wasm-threaded.wasm'?: string;
- 'ort-wasm-simd.wasm'?: string;
- 'ort-training-wasm-simd.wasm'?: string;
- 'ort-wasm-simd-threaded.wasm'?: string;
- };
+ export interface New_WasmFilePaths {
+ /**
+ * Specify the override path for the main .wasm file.
+ *
+ * This path should be an absolute path.
+ *
+ * If not modified, the filename of the .wasm file is:
+ * - `ort-wasm-simd-threaded.wasm` for default build
+ * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and
WebNN)
+ * - `ort-training-wasm-simd-threaded.wasm` for training build
+ */
+ wasm?: URL|string;
+ /**
+ * Specify the override path for the main .mjs file.
+ *
+ * This path should be an absolute path.
+ *
+ * If not modified, the filename of the .mjs file is:
+ * - `ort-wasm-simd-threaded.mjs` for default build
+ * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and
WebNN)
+ * - `ort-training-wasm-simd-threaded.mjs` for training build
+ */
+ mjs?: URL|string;
+ }
```
#### Bundler compatibility:
Config changes are need for bundlers. See usage example in
/js/web/test/e2e/ for Webpack, parcel and rollup.
#### Deployment:
- if consuming from a CDN, there is no breaking change.
- if consuming from a local server, need to copy all `ort-*.wasm` and
`ort-*.mjs` files (totally 6 files) in the dist folder. (previously only
need to copy `ort-*.wasm` files.)
</details>
<details>
<summary><h4>Problems</h4></summary>
There are a few problems with the current module export and deployment:
- Script URL cannot be correctly inferred when imported as ESM.
- Workers are forcefully encoded using Blob URL, which makes
onnxruntime-web not working in CSP environment and Node.js, when using
proxy or multi-threading feature.
- Generated JS code (by Emscripten) is encoded using
`function.toString()`, which is unstable and error-prone.
- When running with a different Emscripten build, always need the build
step. Making it difficult to swap artifacts in deveopment/debug.
</details>
<details>
<summary><h4>Goals</h4></summary>
- Full ESM support
- Support variances of ways to import. Including:
- import from HTML's `<script>` tag (IIFE format, exporting to global
variable `ort`)
```html
<script
src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script>
```
- import from source code inside `<script type="module">` tag (ESM)
```html
<script type="module">
import * as ort from
"https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs";
// using 'ort'
</script>
```
- import in a CommonJS project (CJS format, resolve from package.json
"exports" field)
```js
// myProject/main.js
const ort = require('onnxruntime-web');
```
- import in an ESM project (ESM format, resolve from package.json
"exports" field)
```js
// myProject/main.js (or main.mjs)
import * as ort from 'onnxruntime-web';
```
- Support popular bundlers when importing onnxruntime-web into a CJS/ESM
project.
- webpack (esm requires extra post-process step)
- rollup
- parcel (esm requires extra post-process step)
- More bundlers **TBD**
- Multi-threading support for Node.js
NOTE: keeping single JavaScript file (the all-in-one bundle) is no
longer a goal. This is because technically there is a conflict with the
other requirements.
</details>
<details>
<summary><h4>Important Design Decisions</h4></summary>
- Drop support of single JavaScript output.
- The current onnxruntime-web distribution uses a single JavaScript file
to include all code. While there are a few benefits, it also creates
problems as mentioned above. Since ESM is being used more and more
widely, and browsers are making more restricted security checks and
requirement, the old Blob based solution is going to be replaced.
- To achieve the requirement, specifically, the CSP environment support,
we have to offer a non Blob based solution. Therefore, we have to
distribute multiple files and drop the single file solution.
- Do not run parser/postprocess on Emscripten generated JavaScript.
- Emscripten is evolving quickly so we should only depends on what's in
its documentation instead of a certain implementation details. (for
example, currently we patch on its code to deal with a special variable
`_scriptDir`)
- Keep the generated files as-is also helps to:
- reduce the size of ort.min.js
- make it easier to replace build artifacts when in development/debug
- Drop support for non-SIMD and non-MultiThread. This helps to reduce
the number of artifacts in distribution.
- (fixed-sized) SIMD is supported in any mainstream JS environment.
- Multi-thread as WebAssembly feature is supported in any mainstream JS
environment. In some environment the feature is guarded with cross
origin policy, but it can still work if not trying to create any worker.
- Use ESM output for Emscripten generated JavaScript.
- There are 2 ways to dynamically import classic (umd) modules and
neither of them are recommended:
- dynamically creating a <script> tag. This changes the HTML structure
and have quite a lot of compatibility issue
- use `fetch()` and `eval()`. However `eval` is strongly suggested to be
avoid because there is a great perf hit.
- importing ESM is super easy - just use the `import()` call.
Considering ESM is widely supported in modern browsers and Node.js this
is the better option.
- Add Blob based solution as a fallback for cross-origin workers.
- There are still wide use case of importing onnxruntime-web from CDN.
In this usage, make it able create worker by using `fetch()`+`Blob` to
create a same-origin Blob URL.
</details>
<details>
<summary><h4>Distribution File Manifest</h4></summary>
The distribution folder contains the following files:
- WebAssembly artifacts. These files are the result of compiling the
ONNX Runtime C++ code to WebAssembly by Emscripten.
| File Name | Build Flags |
|------|-----|
| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm |
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-training-wasm-simd-threaded.mjs <br/>
ort-training-wasm-simd-threaded.wasm | `--enable_training_apis` <br/>
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm
| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep`
<br/> `--use_webnn` |
- onnxruntime-web JavaScript artifacts. These files are generated by
ESBuild as the entry point for onnxruntime-web.
There are multiple build targets for different use cases:
| Target Name | Path for "import" or "require" | Description |
|------|-----|-----|
| `ort` | `onnxruntime-web` | The default target. |
| `ort.all` | `onnxruntime-web/all` | The target including webgl. |
| `ort.node` | `onnxruntime-web` | The default target for Node.js. |
| `ort.training` | `onnxruntime-web/training` | The target including
training APIs |
| `ort.wasm` | `onnxruntime-web/wasm` | The target including only
WebAssembly (CPU) EP |
| `ort.webgl` | `onnxruntime-web/webgl` | The target including only
WebGL EP |
For each target, there are multiple files generated:
| File Name | Description |
|------|-----|
| [target].js | The entry point for the target. IIFE and CommonJS
format. |
| [target].mjs | The entry point for the target. ESM format. |
| [target].min.js <br/> [target].min.js.map | The entry point for the
target. Minimized with sourcemap. IIFE and CommonJS format. |
| [target].min.mjs <br/> [target].min.mjs.map | The entry point for the
target. Minimized with sourcemap. ESM format. |
| [target].proxy.mjs | (if appliable) The proxy ESM module for the
target. |
| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map | (if
appliable) The proxy ESM module for the target. Minimized with
sourcemap. |
</details>
<details>
<summary><h4>Dynamic Import Explained</h4></summary>
- Local Served | No Proxy:
```
[Bundle or ort.min.js]
|
+ import()--> [ort-wasm-simd-threaded.mjs]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Local Served | Proxy:
```
[Bundle or ort.min.js]
|
+ import()--> [ort.proxy.min.mjs]
|
+ new Worker()--> [ort.proxy.min.mjs (worker)]
|
+ import()--> [ort-wasm-simd-threaded.mjs]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Cross Origin | No Proxy:
```
[Bundle or ort.min.js]
|
+ fetch('ort-wasm-simd-threaded.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort-wasm-simd-threaded)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Cross Origin | Proxy
```
[Bundle or ort.min.js]
|
+ fetch('ort.proxy.min.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort.proxy)]
|
+ new Worker()--> [blob:... (ort.proxy) (worker)]
|
+ fetch('ort-wasm-simd-threaded.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort-wasm-simd-threaded)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
</details>
2024-05-20 16:51:16 +00:00
|
|
|
* @deprecated This property is deprecated. Since SIMD is supported by all major JavaScript engines, non-SIMD
|
|
|
|
|
* build is no longer provided. This property will be removed in future release.
|
2021-09-21 00:54:46 +00:00
|
|
|
* @defaultValue `true`
|
2021-06-08 06:24:27 +00:00
|
|
|
*/
|
|
|
|
|
simd?: boolean;
|
|
|
|
|
|
2024-01-03 18:13:17 +00:00
|
|
|
/**
|
|
|
|
|
* set or get a boolean value indicating whether to enable trace.
|
|
|
|
|
*
|
2024-02-27 19:07:15 +00:00
|
|
|
* @deprecated Use `env.trace` instead. If `env.trace` is set, this property will be ignored.
|
2024-01-03 18:13:17 +00:00
|
|
|
* @defaultValue `false`
|
|
|
|
|
*/
|
|
|
|
|
trace?: boolean;
|
|
|
|
|
|
2021-05-07 19:12:37 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get a number specifying the timeout for initialization of WebAssembly backend, in milliseconds. A zero
|
2021-09-21 00:54:46 +00:00
|
|
|
* value indicates no timeout is set.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `0`
|
2021-05-07 19:12:37 +00:00
|
|
|
*/
|
|
|
|
|
initTimeout?: number;
|
2021-08-06 01:01:03 +00:00
|
|
|
|
|
|
|
|
/**
|
[js/web] optimize module export and deployment (#20165)
### Description
This PR make numbers of optimizations to onnxruntime-web's module export
and deployment.
See each section below for more details.
#### Preview
>
[onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21)
> ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~
> ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~
> ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~
<details>
<summary><h4>Breaking changes</h4></summary>
There is no code change required, but there are a few differences
regarding **code import**, **flags**, **bundler config** and
**deployment steps**.
#### Importing:
Import table is changed. See following for details.
<details>
<summary><h5>Current import table:</h5></summary>
| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
|------|-----|-----|-----|-----|-----|-----|
| `ort` (default) | `onnxruntime-web` | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.all` | `onnxruntime-web/experimental` | ✔️ | ✔️ | ✔️ | ✔️ | ❌ |
| `ort.node` | `onnxruntime-web` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.training` | `onnxruntime-web/training` | ❌ | ❌ | ✔️ |
✔️<sup>\[1]</sup> | ✔️ |
| `ort.wasm` | `onnxruntime-web/wasm` | ❌ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.wasm-core` | `onnxruntime-web/wasm-core` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ | ❌ | ❌ | ✔️<sup>\[2]</sup>
| ❌ |
| `ort.webgpu` | `onnxruntime-web/webgpu` | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
* [1] didn't test. may not actually work.
* [2] not working. this is a mistake in build config.
</details>
<details>
<summary><h5>Proposed update:</h5></summary>
| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
|------|-----|-----|-----|-----|-----|-----|
| `ort` (default) | `onnxruntime-web` | ✔️ | ❌ | ✔️ | ✔️ | ❌ |
| `ort.all` |
~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` | ✔️ | ✔️ |
✔️ | ✔️ | ❌ |
| `ort.node` | `onnxruntime-web` | ❌ | ❌ | ✔️ | ❌ | ❌ |
| `ort.training` | `onnxruntime-web/training` | ❌ | ❌ | ✔️ | ✔️ | ✔️ |
| `ort.wasm` | `onnxruntime-web/wasm` | ❌ | ❌ | ✔️ | ✔️ | ❌ |
| ~~`ort.wasm-core`~~ | ~~`onnxruntime-web/wasm-core`~~ | ~~❌~~ | ~~❌~~
| ~~✔️~~ | ~~❌~~ | ~~❌~~ |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ | ❌ | ❌ | ~~✔️~~ ❌ | ❌ |
| `ort.webgpu` | `onnxruntime-web/webgpu` | ❌ | ✔️ | ✔️ | ✔️ | ❌ |
</details>
#### Flags:
The following flags are deprecated:
- `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in
build.
The following flags changed their type:
- `env.wasm.wasmPaths`: When using this flag as a string ( for the URL
prefix ), nothing is changed. When using this flag as an object ( for
per-file path override ), the type changed:
```diff
- export interface Old_WasmFilePaths{
- 'ort-wasm.wasm'?: string;
- 'ort-wasm-threaded.wasm'?: string;
- 'ort-wasm-simd.wasm'?: string;
- 'ort-training-wasm-simd.wasm'?: string;
- 'ort-wasm-simd-threaded.wasm'?: string;
- };
+ export interface New_WasmFilePaths {
+ /**
+ * Specify the override path for the main .wasm file.
+ *
+ * This path should be an absolute path.
+ *
+ * If not modified, the filename of the .wasm file is:
+ * - `ort-wasm-simd-threaded.wasm` for default build
+ * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and
WebNN)
+ * - `ort-training-wasm-simd-threaded.wasm` for training build
+ */
+ wasm?: URL|string;
+ /**
+ * Specify the override path for the main .mjs file.
+ *
+ * This path should be an absolute path.
+ *
+ * If not modified, the filename of the .mjs file is:
+ * - `ort-wasm-simd-threaded.mjs` for default build
+ * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and
WebNN)
+ * - `ort-training-wasm-simd-threaded.mjs` for training build
+ */
+ mjs?: URL|string;
+ }
```
#### Bundler compatibility:
Config changes are need for bundlers. See usage example in
/js/web/test/e2e/ for Webpack, parcel and rollup.
#### Deployment:
- if consuming from a CDN, there is no breaking change.
- if consuming from a local server, need to copy all `ort-*.wasm` and
`ort-*.mjs` files (totally 6 files) in the dist folder. (previously only
need to copy `ort-*.wasm` files.)
</details>
<details>
<summary><h4>Problems</h4></summary>
There are a few problems with the current module export and deployment:
- Script URL cannot be correctly inferred when imported as ESM.
- Workers are forcefully encoded using Blob URL, which makes
onnxruntime-web not working in CSP environment and Node.js, when using
proxy or multi-threading feature.
- Generated JS code (by Emscripten) is encoded using
`function.toString()`, which is unstable and error-prone.
- When running with a different Emscripten build, always need the build
step. Making it difficult to swap artifacts in deveopment/debug.
</details>
<details>
<summary><h4>Goals</h4></summary>
- Full ESM support
- Support variances of ways to import. Including:
- import from HTML's `<script>` tag (IIFE format, exporting to global
variable `ort`)
```html
<script
src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script>
```
- import from source code inside `<script type="module">` tag (ESM)
```html
<script type="module">
import * as ort from
"https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs";
// using 'ort'
</script>
```
- import in a CommonJS project (CJS format, resolve from package.json
"exports" field)
```js
// myProject/main.js
const ort = require('onnxruntime-web');
```
- import in an ESM project (ESM format, resolve from package.json
"exports" field)
```js
// myProject/main.js (or main.mjs)
import * as ort from 'onnxruntime-web';
```
- Support popular bundlers when importing onnxruntime-web into a CJS/ESM
project.
- webpack (esm requires extra post-process step)
- rollup
- parcel (esm requires extra post-process step)
- More bundlers **TBD**
- Multi-threading support for Node.js
NOTE: keeping single JavaScript file (the all-in-one bundle) is no
longer a goal. This is because technically there is a conflict with the
other requirements.
</details>
<details>
<summary><h4>Important Design Decisions</h4></summary>
- Drop support of single JavaScript output.
- The current onnxruntime-web distribution uses a single JavaScript file
to include all code. While there are a few benefits, it also creates
problems as mentioned above. Since ESM is being used more and more
widely, and browsers are making more restricted security checks and
requirement, the old Blob based solution is going to be replaced.
- To achieve the requirement, specifically, the CSP environment support,
we have to offer a non Blob based solution. Therefore, we have to
distribute multiple files and drop the single file solution.
- Do not run parser/postprocess on Emscripten generated JavaScript.
- Emscripten is evolving quickly so we should only depends on what's in
its documentation instead of a certain implementation details. (for
example, currently we patch on its code to deal with a special variable
`_scriptDir`)
- Keep the generated files as-is also helps to:
- reduce the size of ort.min.js
- make it easier to replace build artifacts when in development/debug
- Drop support for non-SIMD and non-MultiThread. This helps to reduce
the number of artifacts in distribution.
- (fixed-sized) SIMD is supported in any mainstream JS environment.
- Multi-thread as WebAssembly feature is supported in any mainstream JS
environment. In some environment the feature is guarded with cross
origin policy, but it can still work if not trying to create any worker.
- Use ESM output for Emscripten generated JavaScript.
- There are 2 ways to dynamically import classic (umd) modules and
neither of them are recommended:
- dynamically creating a <script> tag. This changes the HTML structure
and have quite a lot of compatibility issue
- use `fetch()` and `eval()`. However `eval` is strongly suggested to be
avoid because there is a great perf hit.
- importing ESM is super easy - just use the `import()` call.
Considering ESM is widely supported in modern browsers and Node.js this
is the better option.
- Add Blob based solution as a fallback for cross-origin workers.
- There are still wide use case of importing onnxruntime-web from CDN.
In this usage, make it able create worker by using `fetch()`+`Blob` to
create a same-origin Blob URL.
</details>
<details>
<summary><h4>Distribution File Manifest</h4></summary>
The distribution folder contains the following files:
- WebAssembly artifacts. These files are the result of compiling the
ONNX Runtime C++ code to WebAssembly by Emscripten.
| File Name | Build Flags |
|------|-----|
| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm |
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-training-wasm-simd-threaded.mjs <br/>
ort-training-wasm-simd-threaded.wasm | `--enable_training_apis` <br/>
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm
| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep`
<br/> `--use_webnn` |
- onnxruntime-web JavaScript artifacts. These files are generated by
ESBuild as the entry point for onnxruntime-web.
There are multiple build targets for different use cases:
| Target Name | Path for "import" or "require" | Description |
|------|-----|-----|
| `ort` | `onnxruntime-web` | The default target. |
| `ort.all` | `onnxruntime-web/all` | The target including webgl. |
| `ort.node` | `onnxruntime-web` | The default target for Node.js. |
| `ort.training` | `onnxruntime-web/training` | The target including
training APIs |
| `ort.wasm` | `onnxruntime-web/wasm` | The target including only
WebAssembly (CPU) EP |
| `ort.webgl` | `onnxruntime-web/webgl` | The target including only
WebGL EP |
For each target, there are multiple files generated:
| File Name | Description |
|------|-----|
| [target].js | The entry point for the target. IIFE and CommonJS
format. |
| [target].mjs | The entry point for the target. ESM format. |
| [target].min.js <br/> [target].min.js.map | The entry point for the
target. Minimized with sourcemap. IIFE and CommonJS format. |
| [target].min.mjs <br/> [target].min.mjs.map | The entry point for the
target. Minimized with sourcemap. ESM format. |
| [target].proxy.mjs | (if appliable) The proxy ESM module for the
target. |
| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map | (if
appliable) The proxy ESM module for the target. Minimized with
sourcemap. |
</details>
<details>
<summary><h4>Dynamic Import Explained</h4></summary>
- Local Served | No Proxy:
```
[Bundle or ort.min.js]
|
+ import()--> [ort-wasm-simd-threaded.mjs]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Local Served | Proxy:
```
[Bundle or ort.min.js]
|
+ import()--> [ort.proxy.min.mjs]
|
+ new Worker()--> [ort.proxy.min.mjs (worker)]
|
+ import()--> [ort-wasm-simd-threaded.mjs]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Cross Origin | No Proxy:
```
[Bundle or ort.min.js]
|
+ fetch('ort-wasm-simd-threaded.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort-wasm-simd-threaded)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
- Cross Origin | Proxy
```
[Bundle or ort.min.js]
|
+ fetch('ort.proxy.min.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort.proxy)]
|
+ new Worker()--> [blob:... (ort.proxy) (worker)]
|
+ fetch('ort-wasm-simd-threaded.mjs')
|
+ URL.createObjectURL(res.blob())
|
+ import()--> [blob:... (ort-wasm-simd-threaded)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
|
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
```
</details>
2024-05-20 16:51:16 +00:00
|
|
|
* Set a custom URL prefix to the .wasm/.mjs files, or an object of overrides for both .wasm/.mjs file. The override
|
|
|
|
|
* path should be an absolute path.
|
2021-08-06 01:01:03 +00:00
|
|
|
*/
|
|
|
|
|
wasmPaths?: WasmPrefixOrFilePaths;
|
2021-08-31 17:23:42 +00:00
|
|
|
|
2024-07-29 20:39:38 +00:00
|
|
|
/**
|
|
|
|
|
* Set a custom buffer which contains the WebAssembly binary. If this property is set, the `wasmPaths` property will
|
|
|
|
|
* be ignored.
|
|
|
|
|
*/
|
|
|
|
|
wasmBinary?: ArrayBufferLike|Uint8Array;
|
|
|
|
|
|
2021-08-31 17:23:42 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get a boolean value indicating whether to proxy the execution of main thread to a worker thread.
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
|
|
|
|
* @defaultValue `false`
|
2021-08-31 17:23:42 +00:00
|
|
|
*/
|
|
|
|
|
proxy?: boolean;
|
2021-05-07 19:12:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface WebGLFlags {
|
|
|
|
|
/**
|
2021-09-21 00:54:46 +00:00
|
|
|
* Set or get the WebGL Context ID (webgl or webgl2).
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `'webgl2'`
|
2021-05-07 19:12:37 +00:00
|
|
|
*/
|
|
|
|
|
contextId?: 'webgl'|'webgl2';
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
/**
|
|
|
|
|
* Get the WebGL rendering context.
|
|
|
|
|
*/
|
|
|
|
|
readonly context: WebGLRenderingContext;
|
2021-05-07 19:12:37 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get the maximum batch size for matmul. 0 means to disable batching.
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
|
|
|
|
* @deprecated
|
2021-05-07 19:12:37 +00:00
|
|
|
*/
|
|
|
|
|
matmulMaxBatchSize?: number;
|
|
|
|
|
/**
|
2021-09-21 00:54:46 +00:00
|
|
|
* Set or get the texture cache mode.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `'full'`
|
2021-05-07 19:12:37 +00:00
|
|
|
*/
|
|
|
|
|
textureCacheMode?: 'initializerOnly'|'full';
|
|
|
|
|
/**
|
|
|
|
|
* Set or get the packed texture mode
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
|
|
|
|
* @defaultValue `false`
|
2021-05-07 19:12:37 +00:00
|
|
|
*/
|
|
|
|
|
pack?: boolean;
|
2021-09-10 05:17:42 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get whether enable async download.
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
|
|
|
|
* @defaultValue `false`
|
2021-09-10 05:17:42 +00:00
|
|
|
*/
|
|
|
|
|
async?: boolean;
|
2021-05-07 19:12:37 +00:00
|
|
|
}
|
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 22:21:18 +00:00
|
|
|
|
2023-12-07 22:10:28 +00:00
|
|
|
export interface WebGpuProfilingDataV1TensorMetadata {
|
|
|
|
|
dims: readonly number[];
|
|
|
|
|
dataType: string;
|
|
|
|
|
}
|
|
|
|
|
export interface WebGpuProfilingDataV1 {
|
|
|
|
|
version: 1;
|
|
|
|
|
inputsMetadata: readonly WebGpuProfilingDataV1TensorMetadata[];
|
|
|
|
|
outputsMetadata: readonly WebGpuProfilingDataV1TensorMetadata[];
|
|
|
|
|
kernelId: number;
|
|
|
|
|
kernelType: string;
|
|
|
|
|
kernelName: string;
|
2024-01-13 08:23:17 +00:00
|
|
|
programName: string;
|
2023-12-07 22:10:28 +00:00
|
|
|
startTime: number;
|
|
|
|
|
endTime: number;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export type WebGpuProfilingData = WebGpuProfilingDataV1;
|
|
|
|
|
|
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 22:21:18 +00:00
|
|
|
export interface WebGpuFlags {
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get the profiling mode.
|
2023-12-07 22:10:28 +00:00
|
|
|
*
|
|
|
|
|
* @deprecated Use `env.webgpu.profiling.mode` instead. If `env.webgpu.profiling.mode` is set, this property will be
|
|
|
|
|
* ignored.
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
*/
|
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 22:21:18 +00:00
|
|
|
profilingMode?: 'off'|'default';
|
2023-12-07 22:10:28 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get the profiling configuration.
|
|
|
|
|
*/
|
|
|
|
|
profiling?: {
|
|
|
|
|
/**
|
|
|
|
|
* Set or get the profiling mode.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `'off'`
|
|
|
|
|
*/
|
|
|
|
|
mode?: 'off'|'default';
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Set or get a callback function when a profiling data is received. If not set, the profiling data will be
|
|
|
|
|
* printed to console.
|
|
|
|
|
*/
|
|
|
|
|
ondata?: (data: WebGpuProfilingData) => void;
|
|
|
|
|
};
|
2024-03-13 02:50:51 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get the power preference.
|
|
|
|
|
*
|
|
|
|
|
* Setting this property only has effect before the first WebGPU inference session is created. The value will be
|
|
|
|
|
* used as options for `navigator.gpu.requestAdapter()`.
|
|
|
|
|
*
|
|
|
|
|
* See {@link https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions} for more details.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `undefined`
|
|
|
|
|
*/
|
|
|
|
|
powerPreference?: 'low-power'|'high-performance';
|
|
|
|
|
/**
|
|
|
|
|
* Set or get the force fallback adapter flag.
|
|
|
|
|
*
|
|
|
|
|
* Setting this property only has effect before the first WebGPU inference session is created. The value will be
|
|
|
|
|
* used as options for `navigator.gpu.requestAdapter()`.
|
|
|
|
|
*
|
|
|
|
|
* See {@link https://gpuweb.github.io/gpuweb/#dictdef-gpurequestadapteroptions} for more details.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `undefined`
|
|
|
|
|
*/
|
|
|
|
|
forceFallbackAdapter?: boolean;
|
|
|
|
|
/**
|
2024-03-19 19:55:00 +00:00
|
|
|
* Set or get the adapter for WebGPU.
|
2024-03-13 02:50:51 +00:00
|
|
|
*
|
2024-03-19 19:55:00 +00:00
|
|
|
* Setting this property only has effect before the first WebGPU inference session is created. The value will be
|
|
|
|
|
* used as the GPU adapter for the underlying WebGPU backend to create GPU device.
|
|
|
|
|
*
|
|
|
|
|
* If this property is not set, it will be available to get after the first WebGPU inference session is created. The
|
|
|
|
|
* value will be the GPU adapter that created by the underlying WebGPU backend.
|
2024-03-13 02:50:51 +00:00
|
|
|
*
|
|
|
|
|
* When use with TypeScript, the type of this property is `GPUAdapter` defined in "@webgpu/types".
|
|
|
|
|
* Use `const adapter = env.webgpu.adapter as GPUAdapter;` in TypeScript to access this property with correct type.
|
|
|
|
|
*
|
[js/common] fix typedoc warnings (#19933)
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```
Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
2024-03-16 02:01:50 +00:00
|
|
|
* see comments on {@link Tensor.GpuBufferType}
|
2024-03-13 02:50:51 +00:00
|
|
|
*/
|
2024-03-19 19:55:00 +00:00
|
|
|
adapter: unknown;
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
/**
|
|
|
|
|
* Get the device for WebGPU.
|
|
|
|
|
*
|
2024-03-13 02:50:51 +00:00
|
|
|
* This property is only available after the first WebGPU inference session is created.
|
|
|
|
|
*
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
* When use with TypeScript, the type of this property is `GPUDevice` defined in "@webgpu/types".
|
|
|
|
|
* Use `const device = env.webgpu.device as GPUDevice;` in TypeScript to access this property with correct type.
|
|
|
|
|
*
|
[js/common] fix typedoc warnings (#19933)
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```
Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
2024-03-16 02:01:50 +00:00
|
|
|
* see comments on {@link Tensor.GpuBufferType} for more details about why not use types defined in "@webgpu/types".
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
*/
|
|
|
|
|
readonly device: unknown;
|
2023-09-30 09:05:32 +00:00
|
|
|
/**
|
|
|
|
|
* Set or get whether validate input content.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `false`
|
|
|
|
|
*/
|
|
|
|
|
validateInputContent?: boolean;
|
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 22:21:18 +00:00
|
|
|
}
|
2021-05-07 19:12:37 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
export interface Env {
|
2021-05-17 21:16:59 +00:00
|
|
|
/**
|
2021-09-21 00:54:46 +00:00
|
|
|
* set the severity level for logging.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `'warning'`
|
2021-05-17 21:16:59 +00:00
|
|
|
*/
|
|
|
|
|
logLevel?: 'verbose'|'info'|'warning'|'error'|'fatal';
|
2024-02-27 19:07:15 +00:00
|
|
|
|
2021-04-16 08:33:10 +00:00
|
|
|
/**
|
|
|
|
|
* Indicate whether run in debug mode.
|
2021-09-21 00:54:46 +00:00
|
|
|
*
|
|
|
|
|
* @defaultValue `false`
|
2021-04-16 08:33:10 +00:00
|
|
|
*/
|
|
|
|
|
debug?: boolean;
|
|
|
|
|
|
2024-02-27 19:07:15 +00:00
|
|
|
/**
|
|
|
|
|
* set or get a boolean value indicating whether to enable trace.
|
|
|
|
|
*
|
|
|
|
|
* @defaultValue `false`
|
|
|
|
|
*/
|
|
|
|
|
trace?: boolean;
|
|
|
|
|
|
2023-06-09 23:18:53 +00:00
|
|
|
/**
|
|
|
|
|
* Get version of the current package.
|
|
|
|
|
*/
|
|
|
|
|
readonly versions: {
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
readonly common: string;
|
|
|
|
|
readonly web?: string;
|
|
|
|
|
readonly node?: string;
|
2023-06-09 23:18:53 +00:00
|
|
|
// eslint-disable-next-line @typescript-eslint/naming-convention
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
readonly 'react-native'?: string;
|
2023-06-09 23:18:53 +00:00
|
|
|
};
|
|
|
|
|
|
2021-05-07 19:12:37 +00:00
|
|
|
/**
|
|
|
|
|
* Represent a set of flags for WebAssembly
|
|
|
|
|
*/
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
readonly wasm: Env.WebAssemblyFlags;
|
2021-05-07 19:12:37 +00:00
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Represent a set of flags for WebGL
|
|
|
|
|
*/
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
readonly webgl: Env.WebGLFlags;
|
2021-05-07 19:12:37 +00:00
|
|
|
|
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 22:21:18 +00:00
|
|
|
/**
|
|
|
|
|
* Represent a set of flags for WebGPU
|
|
|
|
|
*/
|
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve #15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 19:58:26 +00:00
|
|
|
readonly webgpu: Env.WebGpuFlags;
|
[js/web] WebGPU backend via JSEP (#14579)
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-04-24 22:21:18 +00:00
|
|
|
|
2021-04-16 08:33:10 +00:00
|
|
|
[name: string]: unknown;
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
/**
|
|
|
|
|
* Represent a set of flags as a global singleton.
|
|
|
|
|
*/
|
2023-05-15 23:23:13 +00:00
|
|
|
export const env: Env = envImpl;
|