### Description
This PR is a preview of cherry-picks for ort-web to `rel-1.17.3` based
on `rel-1.17.2`.
<details>
<summary>Changes of ort-web to cherry-pick</summary>
The following commits are from main branch.
`o` stands for pick, and `x` stands for skip.
```
o 2e0a388c36 [js/webgpu] Add HardSigmoid support (#19215)
o d226e40856 [js/webgpu] set query type in onRunStart (#19202)
o 61610ff986 [js/webgpu] Add FusedConv clip test case (#18900)
o a33b5bd1fa [JS/WebGPU] Added Uniforms to SkipLayerNorm. (#18788)
o 591f90c0b9 [js/webgpu] Fix issue of timestamp query (#19258)
o 7252c6e747 [WebNN EP] Support WebNN async API with Asyncify (#19145)
o 5b06505073 [js/webgpu] Fix Tanh explosion (#19201)
o 656ca66186 [js/webgpu] Support uniforms for conv, conv transpose, conv grouped (#18753)
o a3f0e2422b [js/webgpu] Support f16 uniform (#19098)
o 9e69606360 fix f16 for attention, enable slice and flatten for more types (#19262)
o 624b4e2063 [js/webgpu] Remove enableShapesUniforms (#19279)
o 90883a366a [js/webgpu] Add hardSigmoid activation for fusedConv (#19233)
o 85cef0af8c [js/webgpu] Support capture and replay for jsep (#18989)
o d73131cf0f [js/webgpu] Use DataType as uniform cpu type (#19281)
o dd1f6ccc45 [js/webgpu] resolve codescan alert (#19343)
o 3a2ab1963a [js/webgpu] Refactor createTensorShapeVariables (#18883)
o efc17e79de [js/webgpu] Fix the undefined push error (#19366)
x 50806a7dd5 [js/web] support external data in npm test (#19377)
o ccbe264a39 [js/webgpu] Add LeakyRelu activation for fusedConv (#19369)
o 5ff27ef02a [js/webgpu] support customop FastGelu (#19392)
x 03be65e064 [js/web] fix types exports in package.json (#19458)
o 06269a3952 [js/webgpu] allow uint8 tensors for webgpu (#19545)
o dfeda9019c [JS/WebGPU] Add MatMulNBits (#19446)
o 1b48054e1b [js/webgpu] Create Split indices helpers by rank, not by shape (#19554)
o 3fe2c137ee [js] small fix to workaround formatter (#19400)
x 70567a4b3a [js/web] use ApiTensor insteadof onnxjs Tensor in TensorResultValidator (#19358)
o 6e04e36e3f [js/common] upgrade tsc in common from 4.9.5 to 5.2.2 (#19317)
o 58f4921686 [js] changes to allow Float16Array if any polyfill is available (#19305)
o 57d6819212 [js/web] Fix fused-conv is not included in npm test (#19581)
o ebd220b073 Misspelling in README.md (#19433)
o 38c3432393 Bump ip from 1.1.8 to 1.1.9 in /js/react_native (#19582)
o fe82fccf1a [js/webgpu] Fix Conv2DTransposeMatMul f16 compilation failure (#19596)
o 76a2a487a1 Bump ip from 1.1.8 to 1.1.9 in /js/react_native/e2e (#19583)
o 29b1106033 [node] Switch to setImmediate to avoid starving the Node.js event loop (#19610)
o ae3d73c981 [JS/WebGPU] Fix Split and Where to handle corner cases. (#19613)
o aec2389ad0 [js/webgpu] allows a ProgramInfo's RunData to use zero sized output (#19614)
o bb43a0f133 [js/webgpu] minor fixes to make tinyllama work (#19564)
o 0edb035808 [js/web] fix suite test list for zero sized tensor (#19638)
o 3cb81cdde2 [js/common] move 'env.wasm.trace' to 'env.trace' (#19617)
o e30618d055 [js/webgpu] use Headless for webgpu test by default (#19702)
o f06164ef8b [js/web] transfer input buffer back to caller thread (#19677)
x a788514027 [js/web] dump debug logs for karma for diagnose purpose (#19785)
o 24b72d2613 [JS/WebGPU] Preserve zero size input tensor dims. (#19737)
o 4538d31a8b [js/webgpu] expose a few properties in WebGPU API (#19857)
o 53de2d8cb0 [js/webgpu] Enable GroupedConvVectorize path (#19791)
o ed250b88c3 [JS/WebGPU] Optimize MatMulNBits (#19852)
x e771a763c3 [js/test] align web test runner flags with ort.env (#19790)
o 79e50aeef3 [js/web] rewrite backend resolve to allow multiple EPs (#19735)
o acb0df2280Fix#19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932)
o b29849a287 [js/common] fix typedoc warnings (#19933)
o afdab62f53 Bump follow-redirects from 1.15.4 to 1.15.6 in /js/web (#19949)
o 28ad6c3955 Bump follow-redirects from 1.15.4 to 1.15.6 in /js/node (#19951)
o 7e0d424934 accumulate in fp32 for Reduce* (#19868)
o 4c6a6a37f7 [js/webgpu] Fix NAN caused by un-initialized buffer in instance-norm (#19387)
o 01c7aaf6aa [js/webgpu] allow setting env.webgpu.adapter (#19940)
o c45cff60cf [js/webgpu] fix maxpool / fp16 (#19981)
```
</details>
<details>
<summary>Cherry-pick commandlines</summary>
```sh
git cherry-pick 2e0a388c36
git cherry-pick d226e40856
git cherry-pick 61610ff986
git cherry-pick a33b5bd1fa
git cherry-pick 591f90c0b9
git cherry-pick 7252c6e747
git cherry-pick 5b06505073
git cherry-pick 656ca66186
git cherry-pick a3f0e2422b
git cherry-pick 9e69606360
git cherry-pick 624b4e2063
git cherry-pick 90883a366a
git cherry-pick 85cef0af8c #<<<<< Note: conflicts
git cherry-pick d73131cf0f
git cherry-pick dd1f6ccc45
git cherry-pick 3a2ab1963a
git cherry-pick efc17e79de
git cherry-pick ccbe264a39
git cherry-pick 5ff27ef02a
git cherry-pick 06269a3952
git cherry-pick dfeda9019c
git cherry-pick 1b48054e1b
git cherry-pick 3fe2c137ee
git cherry-pick 6e04e36e3f
git cherry-pick 58f4921686
git cherry-pick 57d6819212
git cherry-pick ebd220b073
git cherry-pick 38c3432393
git cherry-pick fe82fccf1a
git cherry-pick 76a2a487a1
git cherry-pick 29b1106033
git cherry-pick ae3d73c981
git cherry-pick aec2389ad0
git cherry-pick bb43a0f133
git cherry-pick 0edb035808
git cherry-pick 3cb81cdde2
git cherry-pick e30618d055
git cherry-pick f06164ef8b
git cherry-pick 24b72d2613
git cherry-pick 4538d31a8b
git cherry-pick 53de2d8cb0
git cherry-pick ed250b88c3
git cherry-pick 79e50aeef3
git cherry-pick acb0df2280
git cherry-pick b29849a287
git cherry-pick afdab62f53
git cherry-pick 28ad6c3955
git cherry-pick 7e0d424934
git cherry-pick 4c6a6a37f7
git cherry-pick 01c7aaf6aa
git cherry-pick c45cff60cf
```
</details>
<details>
<summary>Cherry-pick conflicts</summary>
- 85cef0af8c#18989
this change is for enabling graph capture feature for JSEP, and it is
done after ROCM EP enabled graph capture feature. However, the ROCM EP
graph capture feature is not cherry-picked in rel-1.17.2.
</details>
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Jiajia Qin <jiajia.qin@intel.com>
Co-authored-by: Xu Xing <xing.xu@intel.com>
Co-authored-by: satyajandhyala <satya.k.jandhyala@gmail.com>
Co-authored-by: Yang Gu <yang.gu@intel.com>
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
Co-authored-by: Jiajie Hu <jiajie.hu@intel.com>
Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Matttttt <18152455+martholomew@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Segev Finer <segev208@gmail.com>
Co-authored-by: Belem Zhang <belem.zhang@intel.com>
### Description
enable external data loading for ort-web.
### Why
The ORT external data design is highly depending on the file system,
especially synchronous file I/O APIs. Those are not available in web
platforms. We need to have extra code to make external data working on
web.
### How
Considering there is no file system in web, an implementation for web to
support external data is to use pre-loaded data. Assume model file
a.onnx includes initializers that linked to ./b.bin, we require users to
pass a full data file list when creating the session. The user code will
be look like:
```js
const mySess = await ort.InferenceSession.create('./path/model/a.onnx', {
// session options
externalData: [
{
// relative or absolute path/URL of the file,
// or a pre-loaded Uint8Array containing the data of the external data file
data: './path/data/b.bin',
// the relative path of the external data. Should match initializers' "location" value defined in the model file
path: './b.bin'
},
// { } if multiple external data file
]
});
```
Currently, this feature only works with JSEP build enabled.
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190#18144
### Description
- set tsconfig "noUnusedParameters" to `true` and fix a few bugs
discovered by typescript.
how unused parameter is fixed:
- for most code (webgl), add underscore as prefix, which is the standard
ignore pattern for typescript check.
- remove unused parameter from function and modify corresponding
function calls (jsep)
- fix a bug in ArgMinMax: this 2 operators do not have more than one
input(s) so the `createArgMinMaxAttributesFromInputs()` is removed.
- add proxy main.ts into typescript check and fix a bug in parameter
passing
- fixed `run()` function call and add typecheck fix (hack)
### Description
* Adds TrainingSession.create() functionality following the web bindings
for training design doc
* Added 2 new training APIs to wasm/api.h:
* OrtTrainingGetInputOutputName
* OrtTrainingGetInputOutputCount
* Moved isOrtEnvInitialized boolean to the wasm-core-impl and added a
method that references it
### Motivation and Context
* Adding web bindings for training
#### Related work
* #16521 allowed for training artifacts to be built
* #17333 added interfaces for training
* #17474 allows for training package to be built + adds training backend
to web package **[MUST BE MERGED IN BEFORE THIS ONE]**
---------
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
### Description
Use esbuild to accelerate bundle build.
This change uses esbuild to replace webpack for onnxruntime-web. Bundle
build time reduced from ~20sec to ~0.6sec on my windows dev box.
A few changes applied:
- import nodejs modules using "node:" prefix
- remove enum declaration inside namespace (EncoderUsage)
- use "fs/promise" to replace the old promisify from "util"
- separate ort-web and test-runner. Previously they are bundled
together, now they are built into 2 files.
- optimize karma runner launch time
- remove unnecessary sourcemap preprocessor. sourcemaps are handled
inside esbuild
- remove unnecessary proxies (because ort-web and test-runner are
separated now, the path are correctly inferred)
- remove file watcher for test data
- optimize special handling as esbuild plugins:
- polyfill dummy imports for node.js modules when targetting browser.
- load as content string for ort-wasm-*.worker.js
- load as content string for ./proxy-worker/main.ts
- a source patch to ort-wasm*-threaded*.js (see details in comments in
code)
- updated debug configurations for sourcemap mapping to ensure
out-of-box good dev experience
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>