Commit graph

26 commits

Author SHA1 Message Date
Yulong Wang
35697d2421
[js/webnn] update API of session options for WebNN (#20816)
### Description

This PR is an API-only change to address the requirements being
discussed in #20729.

There are multiple ways that users may create an ORT session by
specifying the session options differently.

All the code snippet below will use the variable `webnnOptions` as this:
```js
const myWebnnSession = await ort.InferenceSession.create('./model.onnx', {
   executionProviders: [
     webnnOptions
   ]
});
```

### The old way (backward-compatibility)

```js
// all-default, name only
const webnnOptions_0 = 'webnn';

// all-default, properties omitted
const webnnOptions_1 = { name: 'webnn' };

// partial
const webnnOptions_2 = {
  name: 'webnn',
  deviceType: 'cpu'
};

// full
const webnnOptions_3 = {
  name: 'webnn',
  deviceType: 'gpu',
  numThreads: 1,
  powerPreference: 'high-performance'
};
```

### The new way (specify with MLContext)

```js
// options to create MLcontext
const options = {
  deviceType: 'gpu',
  powerPreference: 'high-performance'
};

const myMlContext = await navigator.ml.createContext(options);

// options for session options
const webnnOptions = {
  name: 'webnn',
  context: myMlContext,
  ...options
};
```

This should throw (because no deviceType is specified):
```js
const myMlContext = await navigator.ml.createContext({ ... });
const webnnOptions = {
  name: 'webnn',
  context: myMlContext
};
```

### Interop with WebGPU
```js
// get WebGPU device
const adaptor = await navigator.gpu.requestAdapter({ ... });
const device = await adaptor.requestDevice({ ... });

// set WebGPU adaptor and device
ort.env.webgpu.adaptor = adaptor;
ort.env.webgpu.device = device;

const myMlContext = await navigator.ml.createContext(device);
const webnnOptions = {
  name: 'webnn',
  context: myMlContext,
  gpuDevice: device
};
```

This should throw (because cannot specify both gpu device and MLContext
option at the same time):
```js
const webnnOptions = {
  name: 'webnn',
  context: myMlContext,
  gpuDevice: device,
  deviceType: 'gpu'
};
```
2024-05-31 03:25:14 -07:00
Jon Campbell
768c79317c
Enable QNN HTP support for Node (#20576)
### Description
Add support for using Onnx Runtime with Node

### Motivation and Context
Onnx Runtime supports the QNN HTP, but does not support it for Node.js.
This adds baseline support for the Onnx Runtime to be used with Node.

Note it does not update the node packages that are distributed
officially. This simply patches the onnxruntime.dll to allow 'qnn' to be
used as an execution provider.

Testing was done using the existing onnxruntime-node package. The
`onnxruntime.dll` and `onnxruntime_binding.node` were swapped into
`node_modules\onnxruntime-node\bin\napi-v3\win32\arm64` with the newly
built version, then the various QNN dlls and .so files were placed next
to the onnxruntime.dll. Testing was performed on a variety of models and
applications, but the easiest test is to modify the [node quickstart
example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/quick-start_onnxruntime-node).
2024-05-09 13:11:07 -07:00
Wanming Lin
fe1c3a45c1
[WebNN EP] Support NPU deviceType (#20278) 2024-04-15 18:43:46 -07:00
Yulong Wang
b29849a287
[js/common] fix typedoc warnings (#19933)
### Description
Fix a few warnings in typedoc (for generating JS API):
```
[warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used.
[warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation.
[warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation.
[warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation.
[warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation.
[warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter.
[warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device.
```

Changes highlighted:
- Merge `CoreMlExecutionProviderOption` and
`CoreMLExecutionProviderOption`. They expose 2 set of different options
for React-native and ORT nodejs binding. This should be fixed in future.
- Fix a few inconsistency of names between JSDoc and parameters
- Fix broken type links
- Exclude trace functions
2024-03-15 19:01:50 -07:00
Jiajia Qin
85cef0af8c
[js/webgpu] Support capture and replay for jsep (#18989)
### Description
This PR expands the graph capture capability to JS EP, which is similar
to #16081. But for JS EP, we don't use the CUDA Graph, instead, we
records all gpu commands and replay them, which removes most of the cpu
overhead to avoid the the situation that gpu waiting for cpu.

mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from
4.58ms on Intel A770.

All limitations are similar with CUDA EP:
1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
2. Usage of graph capture is limited to models where-in all ops in the
model can be partitioned to the JS EP or CPU EP and no memory copy
between them.
3. Shapes of inputs/outputs cannot change across inference calls.
4. IObinding is required.

The usage is like below:
Method 1: specify outputs buffers explicitly.
```
    const sessionOptions = {
        executionProviders: [
          {
            name: "webgpu",
          },
        ],
        enableGraphCapture: true,
      };
    const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions);
   
    // prepare the inputBuffer/outputBuffer
    ... ...

   const feeds = {
       'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims })
   };

   const fetches = {
       'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] })
   };

   let results = await session.run(feeds, fetches);  // The first run will begin to capture the graph.

   // update inputBuffer content
  ... ...
   results = = await session.run(feeds, fetches);  // The 2ed run and after will directly call replay to execute the graph.

  ... ...
   session.release();
```
Method 2: Don't specify outputs buffers explicitly. Internally, when
graph capture is enabled, it will set all outputs location to
'gpu-buffer'.
```
    const sessionOptions = {
        executionProviders: [
          {
            name: "webgpu",
          },
        ],
        enableGraphCapture: true,
      };
    const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions);

    // prepare the inputBuffer
    ... ...

   const feeds = {
       'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims })
   };

   let results = await session.run(feeds);  // The first run will begin to capture the graph.
   
   // update inputBuffer content
  ... ...
   results = = await session.run(feeds);  // The 2ed run and after will directly call replay to execute the graph.

  ... ...
   session.release();
2024-01-30 18:28:03 -08:00
Yulong Wang
f917dde717
[web] remove xnnpack from web backends (#19116)
### Description
XNNPACK is already disabled in web assembly build. This change removes
the xnnpack backend registration in JS.
2024-01-13 23:04:02 -08:00
Yulong Wang
07cfc56538
[js] enable external data loading for ort-web (#19087)
### Description
enable external data loading for ort-web.

### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.

### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
  // session options
  externalData: [
    {
      // relative or absolute path/URL of the file,
      // or a pre-loaded Uint8Array containing the data of the external data file
      data: './path/data/b.bin', 

      // the relative path of the external data. Should match initializers' "location" value defined in the model file
      path: './b.bin'
    },
    // { } if multiple external data file
  ]
});
```

Currently, this feature only works with JSEP build enabled.
2024-01-12 19:24:24 -08:00
Wanming Lin
73ed34ac4b
[WebNN EP] Support numThreads option for WebNN CPU device (#18054) 2023-11-12 16:45:10 -08:00
Yulong Wang
451c02543a
[js/webgpu] allow specify preferredLayout (#17756)
### Description
Allow WebGPU backend to specify `preferredLayout`. Default is NHWC.

```js
const options = {executionProviders: [{name:'webgpu', preferredLayout: 'NCHW'}]};
sess1 = await ort.InferenceSession.create('./mobilenetv2-12.onnx', options);
```

### Motivation and Context
- implement @qjia7's requirement for an easier way to do performance
comparison between NCHW vs NHWC.
- It's possible that NCHW does better on some models and NHWC on others.
So offer user the capability to switch.
2023-10-02 21:25:12 -07:00
Yulong Wang
a2e75114cc
[js/web] add sessionOptions.freeDimensionOverrides (#17488)
### Description
Allows to specify fixed size for dynamic input of a model. resolves
#16707

Pending test
2023-09-13 09:17:34 -07:00
Yulong Wang
e5ca3f3dcb
[js/api] introducing IO binding for tensor (#16452)
[//]: # (## Work In Progress. Feedbacks are welcome!)

### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.

This change is a way to resolve #15312

### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`

2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.

3. Add factories for creating `Tensor` instances:
    a. `fromTexture()` to create a WebGL texture bound tensor data
    b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
    c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer

### Examples:

create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
  executionProviders: [ 'webgl' ],
  preferredOutputLocation: { 'image_output:0': 'texture' }
});

...

const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
2023-08-29 12:58:26 -07:00
Arthur Islamov
c262879214
Added DML and CUDA provider support in onnxruntime-node (#16050)
### Description
I've added changes to support CUDA and DML (only on Windows, on other
platforms it will throw an error)



### Motivation and Context
It fixes this feature request
https://github.com/microsoft/onnxruntime/issues/14127 which is tracked
here https://github.com/microsoft/onnxruntime/issues/14529

I was working on StableDiffusion implementation for node.js and it is
very slow on CPU, so GPU support is essential.

Here is a working demo with a patched and precompiled version
https://github.com/dakenf/stable-diffusion-nodejs

---------
2023-08-25 16:57:06 -07:00
Jhen-Jie Hong
685816bb0a
[js/rn] Add executionProviders support (#16233)
### Description
<!-- Describe your changes. -->

This PR adds support for `executionProviders` option for react-native
package, support:

- Android: cpu / xnnpack / nnapi
- iOS: cpu / xnnpack /  coreml

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

In my case I want to enable Core ML / NNAPI EP for react-native project.
2023-06-16 19:38:41 +10:00
Yulong Wang
e3e4926d00
[js/common] allow import onnxruntime-common as ESM and CJS (#15772)
### Description
allow import onnxruntime-common as ESM and CJS.
2023-06-12 12:05:11 -07:00
Yulong Wang
ba5f5e3198
[js] allow manually release inference session (#16169)
### Description
This change adds a new instance function (method) to type
`InferenceSession` to allow users to manually release an inference
session instance.

#16131 depends on this change to work correctly.
2023-05-31 00:31:38 -07:00
Wanming Lin
00b1e79e04
Support WebNN EP (#15698)
**Description**: 

This PR intends to enable WebNN EP in ONNX Runtime Web. It translates
the ONNX nodes by [WebNN
API](https://webmachinelearning.github.io/webnn/), which is implemented
in C++ and uses Emscripten [Embind
API](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#).
Temporarily using preferred layout **NHWC** for WebNN graph partitions
since the restriction in WebNN XNNPack backend implementation and the
ongoing
[discussion](https://github.com/webmachinelearning/webnn/issues/324) in
WebNN spec that whether WebNN should support both 'NHWC' and 'NCHW'
layouts. No WebNN native EP, only for Web.

**Motivation and Context**:
Allow ONNXRuntime Web developers to access WebNN API to benefit from
hardware acceleration.

**WebNN API Implementation Status in Chromium**:
- Tracked in Chromium issue:
[#1273291](https://bugs.chromium.org/p/chromium/issues/detail?id=1273291)
- **CPU device**: based on XNNPack backend, and had been available on
Chrome Canary M112 behind "#enable-experimental-web-platform-features"
flag for Windows and Linux platforms. Further implementation for more
ops is ongoing.
- **GPU device**: based on DML, implementation is ongoing.

**Open**:
- GitHub CI: WebNN currently is only available on Chrome Canary/Dev with
XNNPack backend for Linux and Windows. This is an open to reviewers to
help identify which GitHub CI should involved the WebNN EP and guide me
to enable it. Thanks!
2023-05-08 21:25:10 -07:00
Yulong Wang
a631ed77c0
[js/web] support flag 'optimizedModelFilePath' in session options (#14355)
### Description
* Support flag 'optimizedModelFilePath' in session options.

In Node.js, the model will be saved into filesystem just like its
behaviour on native platforms.

In browser, the new model is not saved to filesystem. the file path is
ignored. Instead, a new pop-up window will be launched in browser and
user can 'save' the file as onnx model.

* Add corresponding commandline args for the following session option
flags:
    - optimizedModelFilePath
    - graphOptimizationLevel
2023-02-24 15:50:15 -08:00
Yulong Wang
82786baed1
[js/web] add 'xnnpack' to EP list (#12723)
**Description**: This PR adds support for "XNNPACK EP" in ORTWeb and
changes the behavior of how ORTWeb deals with "backends", or "EPs" in
API.

**Background**: Term "backend" is introduced in ONNX.js to representing
a TypeScript type which implements a "backend" interface, which is a
similar but different concept to ORT's EP (execution provider). There
was 3 backends in ONNX.js: "cpu", "wasm" and "webgl".

When ORT Web is launched, the concept is derived to help users to
integrate smoothly. Technically, when "wasm" backend is used, users need
to also specify "EP" in the session options. Considering it may get
complicated and confused for users to figure out the difference between
"backend" and "EP", the JS API hide the "backend" concept and made a
mapping between names, backends and EPs:
"webgl" (Name) <==> "onnxjsBackend" (Backend)
"wasm" (Name) <==> "wasmBackend" (Backend) <==> "CPU" (EP)

**Details**:
The following changes are applied in this PR:
1. allow multi-registration for backends using the same name. This is
for use scenarios where both "onnxruntime-node" and "onnxruntime-web"
are consumed in a Node.js App ( so "cpu" will be registered twice in
this scenario. )
2. re-assign priority values to backends. I give 100 as base to "cpu"
for node and react_native, and 10 as base to "cpu" in web.
3. add "cpu", "xnnpack" as new names of backends.
4. update onnxruntime wasm exported functions to support EP
registration.
5. update implementations in ort web to handle execution providers in
session options.
6. add '--use_xnnpack' as default build flag for ort-web
2022-10-03 10:38:45 -07:00
Yulong Wang
c144acc534
Replace 'master' branch ref to 'main' in the code (#12547) 2022-08-22 10:48:12 -07:00
Yulong Wang
af21a04977
[js] upgrade async@3.2.3 /js/ (#11421)
* [js] upgrade async@3.2.3 /js/

* format code
2022-05-03 23:41:36 -07:00
Yulong Wang
49b329e266
[js/api] add typedoc and revise comments (#9077)
* [js/api] add typedoc and revise comments

* update document

* fix lint error

* use config file for typedoc
2021-09-20 17:54:46 -07:00
Sunghoon
da5f24bd2d
Support additional session options and run options in WebAssembly (#7712)
* add all session options and run options in C API except AddInitializer and AddFreeDimensionOverride

* remove unnecessary comment

* change extra session and run options to object notation

* resolve comments

* use an optional chaining for options

* resolve comments
2021-05-17 14:57:19 -07:00
Yulong Wang
bdefc6c4d8
[js/web] support multi-thread for wasm backend (#7601) 2021-05-07 12:12:37 -07:00
Yulong Wang
79dc7d3e50
[js/common] revise TSDoc of some interfaces (#7541) 2021-05-01 22:20:22 -07:00
Yulong Wang
4ebc9c3b5e
[JS] onnxruntime-web (#7394)
* add web

* add script and test

* fix lint

* add test/data/ops

* add test/data/node/ to gitignore

* modify scripts

* add onnxjs

* fix tests

* fix test-runner

* fix sourcemap

* fix onnxjs profiling

* update test list

* update README

* resolve comments

* set wasm as default backend

* rename package

* update copyright header

* do not use class "Buffer" in browser context

* revise readme
2021-04-27 00:04:25 -07:00
Yulong Wang
009f342caf
[JS] refactor Javascript/Typescript libraries in ONNX Runtime (#7308)
* working on re-organizing js code for ortweb

* remove dup files

* move folder

* fix common references

* fix common es5

* add webpack to common

* split interfact/impl

* use cjs for node

* add npmignore for common

* update sourcemap config for common

* update node

* adjust folder/path in CI and build

* update folder

* nit: readme

* add bundle for dev

* correct nodejs paths

* enable ORT_API_MANUAL_INIT

* set name for umd library

* correct name for commonjs export

* add priority into registerBackend()

* fix npm ci pwd

* update eslintrc

* revise code

* revert package-lock lockfileVersion 2->1

* update prebuild

* resolve comments

* update document

* revise eslint config

* update eslint for typescript rules

* revert changes by mistake in backend.ts

* add env

* resolve comments
2021-04-16 01:33:10 -07:00
Renamed from nodejs/lib/inference-session.ts (Browse further)