onnxruntime/js/web/lib/onnxjs/session.ts
Yulong Wang 036fcd93d4
[js/web] optimize module export and deployment (#20165)
### Description

This PR make numbers of optimizations to onnxruntime-web's module export
and deployment.

See each section below for more details.

#### Preview

>
[onnxruntime-web@1.19.0-esmtest.20240513-a16cd2bd21](https://www.npmjs.com/package/onnxruntime-web/v/1.19.0-esmtest.20240513-a16cd2bd21)

> ~~onnxruntime-web@1.19.0-esmtest.20240430-c7edbcc63d~~

> ~~onnxruntime-web@1.18.0-esmtest.20240428-624c681c83~~

> ~~onnxruntime-web@1.18.0-esmtest.20240411-1abb64e894~~

<details>
<summary><h4>Breaking changes</h4></summary>

There is no code change required, but there are a few differences
regarding **code import**, **flags**, **bundler config** and
**deployment steps**.

#### Importing:

Import table is changed. See following for details.

<details>
<summary><h5>Current import table:</h5></summary>

| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
  |------|-----|-----|-----|-----|-----|-----|
  | `ort` (default) | `onnxruntime-web` | ✔️ |  | ✔️ | ✔️ |  |
  | `ort.all` | `onnxruntime-web/experimental` | ✔️ | ✔️ | ✔️ | ✔️ |  |
  | `ort.node` | `onnxruntime-web` |  |  | ✔️ |  |  |
| `ort.training` | `onnxruntime-web/training` |  |  | ✔️ |
✔️<sup>\[1]</sup> | ✔️ |
  | `ort.wasm` | `onnxruntime-web/wasm` |  |  | ✔️ | ✔️ |  |
  | `ort.wasm-core` | `onnxruntime-web/wasm-core` |  |  | ✔️ |  |  |
| `ort.webgl` | `onnxruntime-web/webgl` | ✔️ |  |  | ✔️<sup>\[2]</sup>
|  |
  | `ort.webgpu` | `onnxruntime-web/webgpu` |  | ✔️ | ✔️ | ✔️ |  |

* [1] didn't test. may not actually work.
* [2] not working. this is a mistake in build config.

</details>

<details>
<summary><h5>Proposed update:</h5></summary>

| Target Name | Path for "import" or "require" | WebGL | JSEP | wasm |
Proxy | Training |
  |------|-----|-----|-----|-----|-----|-----|
  | `ort` (default) | `onnxruntime-web` | ✔️ |  | ✔️ | ✔️ |  |
| `ort.all` |
~~`onnxruntime-web/experimental`~~<br/>`onnxruntime-web/all` | ✔️ | ✔️ |
✔️ | ✔️ |  |
  | `ort.node` | `onnxruntime-web` |  |  | ✔️ |  |  |
  | `ort.training` | `onnxruntime-web/training` |  |  | ✔️ | ✔️ | ✔️ |
  | `ort.wasm` | `onnxruntime-web/wasm` |  |  | ✔️ | ✔️ |  |
| ~~`ort.wasm-core`~~ | ~~`onnxruntime-web/wasm-core`~~ | ~~~~ | ~~~~
| ~~✔️~~ | ~~~~ | ~~~~ |
  | `ort.webgl` | `onnxruntime-web/webgl` | ✔️ |  |  | ~~✔️~~  |  |
  | `ort.webgpu` | `onnxruntime-web/webgpu` |  | ✔️ | ✔️ | ✔️ |  |

</details>

#### Flags:

The following flags are deprecated:
- `env.wasm.simd` (boolean): will be ignored. SIMD is always enabled in
build.

The following flags changed their type:
- `env.wasm.wasmPaths`: When using this flag as a string ( for the URL
prefix ), nothing is changed. When using this flag as an object ( for
per-file path override ), the type changed:
  ```diff
  -  export interface Old_WasmFilePaths{
  -    'ort-wasm.wasm'?: string;
  -    'ort-wasm-threaded.wasm'?: string;
  -    'ort-wasm-simd.wasm'?: string;
  -    'ort-training-wasm-simd.wasm'?: string;
  -    'ort-wasm-simd-threaded.wasm'?: string;
  -  };
  +  export interface New_WasmFilePaths {
  +    /**
  +     * Specify the override path for the main .wasm file.
  +     *
  +     * This path should be an absolute path.
  +     *
  +     * If not modified, the filename of the .wasm file is:
  +     * - `ort-wasm-simd-threaded.wasm` for default build
+ * - `ort-wasm-simd-threaded.jsep.wasm` for JSEP build (with WebGPU and
WebNN)
  +     * - `ort-training-wasm-simd-threaded.wasm` for training build
  +     */
  +    wasm?: URL|string;
  +    /**
  +     * Specify the override path for the main .mjs file.
  +     *
  +     * This path should be an absolute path.
  +     *
  +     * If not modified, the filename of the .mjs file is:
  +     * - `ort-wasm-simd-threaded.mjs` for default build
+ * - `ort-wasm-simd-threaded.jsep.mjs` for JSEP build (with WebGPU and
WebNN)
  +     * - `ort-training-wasm-simd-threaded.mjs` for training build
  +     */
  +    mjs?: URL|string;
  +  }
  ```

#### Bundler compatibility:

Config changes are need for bundlers. See usage example in
/js/web/test/e2e/ for Webpack, parcel and rollup.

#### Deployment:

- if consuming from a CDN, there is no breaking change.
- if consuming from a local server, need to copy all `ort-*.wasm` and
`ort-*.mjs` files (totally 6 files) in the dist folder. (previously only
need to copy `ort-*.wasm` files.)

</details>
<details>
<summary><h4>Problems</h4></summary>

There are a few problems with the current module export and deployment:

- Script URL cannot be correctly inferred when imported as ESM.
- Workers are forcefully encoded using Blob URL, which makes
onnxruntime-web not working in CSP environment and Node.js, when using
proxy or multi-threading feature.
- Generated JS code (by Emscripten) is encoded using
`function.toString()`, which is unstable and error-prone.
- When running with a different Emscripten build, always need the build
step. Making it difficult to swap artifacts in deveopment/debug.
</details>
<details>
<summary><h4>Goals</h4></summary>

- Full ESM support
- Support variances of ways to import. Including:
- import from HTML's `<script>` tag (IIFE format, exporting to global
variable `ort`)
    ```html
<script
src="https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.js"></script>
    ```
  - import from source code inside `<script type="module">` tag (ESM)
    ```html
    <script type="module">
import * as ort from
"https://example.com/cdn-path-to-onnxruntime-web/dist/ort.min.mjs";

      // using 'ort'
    </script>
    ```
- import in a CommonJS project (CJS format, resolve from package.json
"exports" field)
    ```js
    // myProject/main.js
    const ort = require('onnxruntime-web');
    ```
- import in an ESM project (ESM format, resolve from package.json
"exports" field)
    ```js
    // myProject/main.js (or main.mjs)
    import * as ort from 'onnxruntime-web';
    ```
- Support popular bundlers when importing onnxruntime-web into a CJS/ESM
project.
  - webpack (esm requires extra post-process step)
  - rollup
  - parcel (esm requires extra post-process step)
  - More bundlers **TBD**
- Multi-threading support for Node.js

NOTE: keeping single JavaScript file (the all-in-one bundle) is no
longer a goal. This is because technically there is a conflict with the
other requirements.
</details>

<details>
<summary><h4>Important Design Decisions</h4></summary>

- Drop support of single JavaScript output.
- The current onnxruntime-web distribution uses a single JavaScript file
to include all code. While there are a few benefits, it also creates
problems as mentioned above. Since ESM is being used more and more
widely, and browsers are making more restricted security checks and
requirement, the old Blob based solution is going to be replaced.
- To achieve the requirement, specifically, the CSP environment support,
we have to offer a non Blob based solution. Therefore, we have to
distribute multiple files and drop the single file solution.

- Do not run parser/postprocess on Emscripten generated JavaScript.
- Emscripten is evolving quickly so we should only depends on what's in
its documentation instead of a certain implementation details. (for
example, currently we patch on its code to deal with a special variable
`_scriptDir`)
  - Keep the generated files as-is also helps to:
    - reduce the size of ort.min.js
- make it easier to replace build artifacts when in development/debug

- Drop support for non-SIMD and non-MultiThread. This helps to reduce
the number of artifacts in distribution.
  - (fixed-sized) SIMD is supported in any mainstream JS environment.
- Multi-thread as WebAssembly feature is supported in any mainstream JS
environment. In some environment the feature is guarded with cross
origin policy, but it can still work if not trying to create any worker.

- Use ESM output for Emscripten generated JavaScript.
- There are 2 ways to dynamically import classic (umd) modules and
neither of them are recommended:
- dynamically creating a <script> tag. This changes the HTML structure
and have quite a lot of compatibility issue
- use `fetch()` and `eval()`. However `eval` is strongly suggested to be
avoid because there is a great perf hit.
- importing ESM is super easy - just use the `import()` call.
Considering ESM is widely supported in modern browsers and Node.js this
is the better option.

- Add Blob based solution as a fallback for cross-origin workers.
- There are still wide use case of importing onnxruntime-web from CDN.
In this usage, make it able create worker by using `fetch()`+`Blob` to
create a same-origin Blob URL.

</details>

<details>
<summary><h4>Distribution File Manifest</h4></summary>

The distribution folder contains the following files:

- WebAssembly artifacts. These files are the result of compiling the
ONNX Runtime C++ code to WebAssembly by Emscripten.

  | File Name | Build Flags |
  |------|-----|
| ort-wasm-simd-threaded.mjs <br/> ort-wasm-simd-threaded.wasm |
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-training-wasm-simd-threaded.mjs <br/>
ort-training-wasm-simd-threaded.wasm | `--enable_training_apis` <br/>
`--enable_wasm_simd` <br/> `--enable_wasm_threads` |
| ort-wasm-simd-threaded.jsep.mjs <br/> ort-wasm-simd-threaded.jsep.wasm
| `--enable_wasm_simd` <br/> `--enable_wasm_threads` <br/> `--use_jsep`
<br/> `--use_webnn` |

- onnxruntime-web JavaScript artifacts. These files are generated by
ESBuild as the entry point for onnxruntime-web.

  There are multiple build targets for different use cases:
  | Target Name | Path for "import" or "require" | Description |
  |------|-----|-----|
  | `ort` | `onnxruntime-web` | The default target. |
  | `ort.all` | `onnxruntime-web/all` | The target including webgl. |
  | `ort.node` | `onnxruntime-web` | The default target for Node.js. |
| `ort.training` | `onnxruntime-web/training` | The target including
training APIs |
| `ort.wasm` | `onnxruntime-web/wasm` | The target including only
WebAssembly (CPU) EP |
| `ort.webgl` | `onnxruntime-web/webgl` | The target including only
WebGL EP |


  For each target, there are multiple files generated:
  | File Name | Description |
  |------|-----|
| [target].js | The entry point for the target. IIFE and CommonJS
format. |
  | [target].mjs | The entry point for the target. ESM format. |
| [target].min.js <br/> [target].min.js.map | The entry point for the
target. Minimized with sourcemap. IIFE and CommonJS format. |
| [target].min.mjs <br/> [target].min.mjs.map | The entry point for the
target. Minimized with sourcemap. ESM format. |
| [target].proxy.mjs | (if appliable) The proxy ESM module for the
target. |
| [target].proxy.min.mjs <br/> [target].proxy.min.mjs.map | (if
appliable) The proxy ESM module for the target. Minimized with
sourcemap. |

</details>

<details>
<summary><h4>Dynamic Import Explained</h4></summary>

- Local Served | No Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + import()--> [ort-wasm-simd-threaded.mjs]
                    |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                    |
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
                                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
- Local Served | Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + import()--> [ort.proxy.min.mjs]
                    |
                    + new Worker()--> [ort.proxy.min.mjs (worker)]
                                        |
+ import()--> [ort-wasm-simd-threaded.mjs]
                                                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                                                        |
+ new Worker()--> [ort-wasm-simd-threaded.mjs (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
- Cross Origin | No Proxy:
  ```
  [Bundle or ort.min.js]
    |
    + fetch('ort-wasm-simd-threaded.mjs')
        |
        + URL.createObjectURL(res.blob())
        |
        + import()--> [blob:... (ort-wasm-simd-threaded)]
                        |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                        |
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
                                            |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```

- Cross Origin | Proxy
  ```
  [Bundle or ort.min.js]
    |
    + fetch('ort.proxy.min.mjs')
        |
        + URL.createObjectURL(res.blob())
        |
        + import()--> [blob:... (ort.proxy)]
                        |
+ new Worker()--> [blob:... (ort.proxy) (worker)]
                                            |
+ fetch('ort-wasm-simd-threaded.mjs')
                                                |
+ URL.createObjectURL(res.blob())
                                                |
+ import()--> [blob:... (ort-wasm-simd-threaded)]
                                                                |
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
                                                                |
+ new Worker()--> [blob:... (ort-wasm-simd-threaded) (worker)]
|
+ WebAssembly.instantiateStreaming()--> [ort-wasm-simd-threaded.wasm]
  ```
</details>
2024-05-20 09:51:16 -07:00

256 lines
8.8 KiB
TypeScript

// Copyright (c) Microsoft Corporation. All rights reserved.
// Licensed under the MIT License.
import {resolveBackend, SessionHandlerType} from './backend';
import {ExecutionPlan} from './execution-plan';
import {Graph} from './graph';
import {Profiler} from './instrument';
import {Model} from './model';
import {Operator} from './operators';
import {Tensor} from './tensor';
export declare namespace Session {
export interface Config {
backendHint?: string;
profiler?: Profiler.Config;
}
export interface Context {
profiler: Readonly<Profiler>;
graphInputTypes?: Tensor.DataType[];
graphInputDims?: Array<readonly number[]>;
}
}
export class Session {
constructor(config: Session.Config = {}) {
this._initialized = false;
this.backendHint = config.backendHint;
this.profiler = Profiler.create(config.profiler);
this.context = {profiler: this.profiler, graphInputTypes: [], graphInputDims: []};
}
get inputNames(): readonly string[] {
return this._model.graph.getInputNames();
}
get outputNames(): readonly string[] {
return this._model.graph.getOutputNames();
}
startProfiling() {
this.profiler.start();
}
endProfiling() {
this.profiler.stop();
}
async loadModel(uri: string): Promise<void>;
async loadModel(buffer: ArrayBuffer, byteOffset?: number, length?: number): Promise<void>;
async loadModel(buffer: Uint8Array): Promise<void>;
async loadModel(arg: string|ArrayBuffer|Uint8Array, byteOffset?: number, length?: number): Promise<void> {
await this.profiler.event('session', 'Session.loadModel', async () => {
// resolve backend and session handler
const backend = await resolveBackend(this.backendHint);
this.sessionHandler = backend.createSessionHandler(this.context);
this._model = new Model();
if (typeof arg === 'string') {
const isOrtFormat = arg.endsWith('.ort');
if (typeof process !== 'undefined' && process.versions && process.versions.node) {
// node
const {readFile} = require('node:fs/promises');
const buf = await readFile(arg);
this.initialize(buf, isOrtFormat);
} else {
// browser
const response = await fetch(arg);
const buf = await response.arrayBuffer();
this.initialize(new Uint8Array(buf), isOrtFormat);
}
} else if (!ArrayBuffer.isView(arg)) {
// load model from ArrayBuffer
const arr = new Uint8Array(arg, byteOffset || 0, length || arg.byteLength);
this.initialize(arr);
} else {
// load model from Uint8array
this.initialize(arg);
}
});
}
private initialize(modelProtoBlob: Uint8Array, isOrtFormat?: boolean): void {
if (this._initialized) {
throw new Error('already initialized');
}
this.profiler.event('session', 'Session.initialize', () => {
// load graph
const graphInitializer =
this.sessionHandler.transformGraph ? this.sessionHandler as Graph.Initializer : undefined;
this._model.load(modelProtoBlob, graphInitializer, isOrtFormat);
// graph is completely initialzied at this stage , let the interested handlers know
if (this.sessionHandler.onGraphInitialized) {
this.sessionHandler.onGraphInitialized(this._model.graph);
}
// initialize each operator in the graph
this.initializeOps(this._model.graph);
// instantiate an ExecutionPlan object to be used by the Session object
this._executionPlan = new ExecutionPlan(this._model.graph, this._ops, this.profiler);
});
this._initialized = true;
}
async run(inputs: Map<string, Tensor>|Tensor[]): Promise<Map<string, Tensor>> {
if (!this._initialized) {
throw new Error('session not initialized yet');
}
return this.profiler.event('session', 'Session.run', async () => {
const inputTensors = this.normalizeAndValidateInputs(inputs);
const outputTensors = await this._executionPlan.execute(this.sessionHandler, inputTensors);
return this.createOutput(outputTensors);
});
}
private normalizeAndValidateInputs(inputs: Map<string, Tensor>|Tensor[]): Tensor[] {
const modelInputNames = this._model.graph.getInputNames();
// normalize inputs
// inputs: Tensor[]
if (Array.isArray(inputs)) {
if (inputs.length !== modelInputNames.length) {
throw new Error(`incorrect input array length: expected ${modelInputNames.length} but got ${inputs.length}`);
}
}
// convert map to array
// inputs: Map<string, Tensor>
else {
if (inputs.size !== modelInputNames.length) {
throw new Error(`incorrect input map size: expected ${modelInputNames.length} but got ${inputs.size}`);
}
const sortedInputs = new Array<Tensor>(inputs.size);
let sortedInputsIndex = 0;
for (let i = 0; i < modelInputNames.length; ++i) {
const tensor = inputs.get(modelInputNames[i]);
if (!tensor) {
throw new Error(`missing input tensor for: '${name}'`);
}
sortedInputs[sortedInputsIndex++] = tensor;
}
inputs = sortedInputs;
}
// validate dims requirements
// First session run - graph input data is not cached for the session
if (!this.context.graphInputTypes || this.context.graphInputTypes.length === 0 || !this.context.graphInputDims ||
this.context.graphInputDims.length === 0) {
const modelInputIndices = this._model.graph.getInputIndices();
const modelValues = this._model.graph.getValues();
const graphInputDims = new Array<readonly number[]>(modelInputIndices.length);
for (let i = 0; i < modelInputIndices.length; ++i) {
const graphInput = modelValues[modelInputIndices[i]];
graphInputDims[i] = graphInput.type!.shape.dims;
// cached for second and subsequent runs.
// Some parts of the framework works on the assumption that the graph and types and shapes are static
this.context.graphInputTypes!.push(graphInput.type!.tensorType);
this.context.graphInputDims!.push(inputs[i].dims);
}
this.validateInputTensorDims(graphInputDims, inputs, true);
}
// Second and subsequent session runs - graph input data is cached for the session
else {
this.validateInputTensorDims(this.context.graphInputDims, inputs, false);
}
// validate types requirement
this.validateInputTensorTypes(this.context.graphInputTypes!, inputs);
return inputs;
}
private validateInputTensorTypes(graphInputTypes: Tensor.DataType[], givenInputs: Tensor[]) {
for (let i = 0; i < givenInputs.length; i++) {
const expectedType = graphInputTypes[i];
const actualType = givenInputs[i].type;
if (expectedType !== actualType) {
throw new Error(`input tensor[${i}] check failed: expected type '${expectedType}' but got ${actualType}`);
}
}
}
private validateInputTensorDims(
graphInputDims: Array<readonly number[]>, givenInputs: Tensor[], noneDimSupported: boolean) {
for (let i = 0; i < givenInputs.length; i++) {
const expectedDims = graphInputDims[i];
const actualDims = givenInputs[i].dims;
if (!this.compareTensorDims(expectedDims, actualDims, noneDimSupported)) {
throw new Error(`input tensor[${i}] check failed: expected shape '[${expectedDims.join(',')}]' but got [${
actualDims.join(',')}]`);
}
}
}
private compareTensorDims(expectedDims: readonly number[], actualDims: readonly number[], noneDimSupported: boolean):
boolean {
if (expectedDims.length !== actualDims.length) {
return false;
}
for (let i = 0; i < expectedDims.length; ++i) {
if (expectedDims[i] !== actualDims[i] && (!noneDimSupported || expectedDims[i] !== 0)) {
// data shape mis-match AND not a 'None' dimension.
return false;
}
}
return true;
}
private createOutput(outputTensors: Tensor[]): Map<string, Tensor> {
const modelOutputNames = this._model.graph.getOutputNames();
if (outputTensors.length !== modelOutputNames.length) {
throw new Error('expected number of outputs do not match number of generated outputs');
}
const output = new Map<string, Tensor>();
for (let i = 0; i < modelOutputNames.length; ++i) {
output.set(modelOutputNames[i], outputTensors[i]);
}
return output;
}
private initializeOps(graph: Graph): void {
const nodes = graph.getNodes();
this._ops = new Array(nodes.length);
for (let i = 0; i < nodes.length; i++) {
this._ops[i] = this.sessionHandler.resolve(nodes[i], this._model.opsets, graph);
}
}
private _model: Model;
private _initialized: boolean;
private _ops: Operator[];
private _executionPlan: ExecutionPlan;
private backendHint?: string;
private sessionHandler: SessionHandlerType;
private context: Session.Context;
private profiler: Readonly<Profiler>;
}