### Description
optimize eslint config to:
- set parserOptions.project to `true` to allow @typescript-eslint/parser
to find the nearest tsconfig.json file to that source file. This helps
to avoid parsing extra files, may helps with:
- reduce the possibility of seeing OOM or stackoverflow with "npm run
lint"
- faster processing
- enforce rule "no-underscore-dangle" with a list of exceptions.
### Description
- set tsconfig "noUnusedParameters" to `true` and fix a few bugs
discovered by typescript.
how unused parameter is fixed:
- for most code (webgl), add underscore as prefix, which is the standard
ignore pattern for typescript check.
- remove unused parameter from function and modify corresponding
function calls (jsep)
- fix a bug in ArgMinMax: this 2 operators do not have more than one
input(s) so the `createArgMinMaxAttributesFromInputs()` is removed.
- add proxy main.ts into typescript check and fix a bug in parameter
passing
- fixed `run()` function call and add typecheck fix (hack)
### Description
* based on design document & following InferenceSession's run
implementation, implemented TrainingSession.runTrainStep
### Motivation and Context
* Adding web bindings for training
#### Related work
* #16521 allowed for training artifacts to be built
* #17333 added interfaces for training
* #17474 allowed for training package to be built + added training
backend to web package
* #17891 implementation for createTrainingSession on the TypeScript side
**[SHOULD BE MERGED IN BEFORE THIS PR]**
---------
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
Co-authored-by: Ashwini Khade <askhade@microsoft.com>
### Description
Use esbuild to accelerate bundle build.
This change uses esbuild to replace webpack for onnxruntime-web. Bundle
build time reduced from ~20sec to ~0.6sec on my windows dev box.
A few changes applied:
- import nodejs modules using "node:" prefix
- remove enum declaration inside namespace (EncoderUsage)
- use "fs/promise" to replace the old promisify from "util"
- separate ort-web and test-runner. Previously they are bundled
together, now they are built into 2 files.
- optimize karma runner launch time
- remove unnecessary sourcemap preprocessor. sourcemaps are handled
inside esbuild
- remove unnecessary proxies (because ort-web and test-runner are
separated now, the path are correctly inferred)
- remove file watcher for test data
- optimize special handling as esbuild plugins:
- polyfill dummy imports for node.js modules when targetting browser.
- load as content string for ort-wasm-*.worker.js
- load as content string for ./proxy-worker/main.ts
- a source patch to ort-wasm*-threaded*.js (see details in comments in
code)
- updated debug configurations for sourcemap mapping to ensure
out-of-box good dev experience
### Description
Following the design document:
* Added CreateTrainingSessionHandler to the Backend interface
* All existing Backend implementations throw an error for the new method
createTrainingSessionHandler
* Created TrainingSession namespace, interface, and
TrainingSessionFactory interface
* Created TrainingSessionImpl class implementation
As methods are implemented, the TrainingSession interface will be added
to or modified.
### Motivation and Context
Adding the public-facing interfaces to the onnxruntime-common package is
one of the first steps to support ORT training for web bindings.
---------
Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com>
[//]: # (## Work In Progress. Feedbacks are welcome!)
### Description
This PR adds a few properties, methods and factories to Tensor type to
support IO-binding feature. This will allow user to create tensor from
GPU/CPU bound data without a force transferring of data between CPU and
GPU.
This change is a way to resolve#15312
### Change Summary
1. Add properties to `Tensor` type:
a. `location`: indicating where the data is sitting. valid values are
`cpu`, `cpu-pinned`, `texture`, `gpu-buffer`.
b. `texture`: sit side to `data`, a readonly property of `WebGLTexture`
type. available only when `location === 'texture'`
c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer`
type. available only when `location === 'gpu-buffer'`
2. Add methods to `Tensor` type (usually dealing with inference
outputs):
- async function `getData()` allows user to download data from GPU to
CPU manually.
- function `dispose()` allows user to release GPU resources manually.
3. Add factories for creating `Tensor` instances:
a. `fromTexture()` to create a WebGL texture bound tensor data
b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data
c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer
### Examples:
create tensors from texture and pass to inference session as inputs
```js
// when create session, specify we prefer 'image_output:0' to be stored on GPU as texture
const session = await InferenceSession.create('./my_model.onnx', {
executionProviders: [ 'webgl' ],
preferredOutputLocation: { 'image_output:0': 'texture' }
});
...
const myImageTexture = getTexture(); // user's function to get a texture
const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format.
const results = await session.run(myFeeds);
const myOutputTexture = results['image_output:0'].texture;
```
### Description
onnxjs contains a `Resize` op input check which is outdated since opset
9. Currently `Resize` supports up to 4 inputs. This PR looses the input
check.
### Motivation and Context
Fixes#15636
argmax and argmin are similar to reduce. Eventually we need to add
optimized flavors of the shader.
softmax is optimized but only works on the last axis for now which
should be the common use case.
todo: enable more ut for argmax/argmin
### Description
This change upgrades a lot of dependencies. There are 2 motivations of
doing this change:
- fix the security issue reported by dependabot (protobufjs Prototype
Pollution vulnerability -
https://github.com/advisories/GHSA-h755-8qp9-cq85)
- resolve the requirement of using ONNX IR_VERSION 9 (#16638)
This requires:
- upgrade protobufjs to v7.2.4
- upgrade library 'onnx-proto' to consume latest ONNX release (v1.14.0).
Problems:
- protobufjs v7.2.4 depends on long.js v5, which does not work well with
typescript (commonjs).
- onnx-proto depends on this fix with a new release of long.js
- long.js is in maintenance and it takes longer than expected to put in
new changes
Solutions:
- use a patch script in `preprepare` to copy type declarations to make
long.js work with typescript (commonjs)
- generate onnx protobuf JS/TS files and put them under
js/web/lib/onnxjs/ort-schema/protobuf folder - remove 'onnx-proto' from
dependency.
- apply fixes to generated onnx.d.ts
### Description
Modify the creating of webgl context.
Previous behavior:
STEP.1 - create canvas (document.createElement), if failed, goto step.2
else step.3
STEP.2 - create offscreenCanvas, if failed abort
STEP.3 - use the canvas created in step.1 or 2 to create webgl context.
if successful return context else abort
Now bahavior:
STEP.1 create offscreenCanvas, if failed goto step.3
STEP.2 use it to create webgl context. if successful, return context
STEP.3 create canvas (document.createElement). if failed, abort
STEP.4 use it to create webgl context. if successful, return context
else abort
Motivation:
we found in some environment, normalCanvas.getContext() returns null but
offscreenCanvas.getContext() returns the context object. and when
offscreenCanvas is available it is good idea to always prefer to use it.
### Description
We used to use `typeof fetch === 'undefined'` as condition to detect the
environment is Node.js or not. Before Node.js v18, this works. However,
in Node.js v18, it introduced `fetch` function, so this check does not
work any more.
This PR changes the condition to check whether `process`,
`process.versions` and `process.versions.node` exists.
Checking whether `process` exists is not enough. This is because in some
configuration, webpack may polyfill nodejs's process.
### Description
The PR implements FloatE4M3FN, FloatE5M2, FloatE4MEFNUZ, FloatE5M2FNUZ
as described in PR https://github.com/onnx/onnx/pull/4805. It uses CUDA
API to cast float/half to float8 if CUDA>=11.8, a custom implementation
if CUDA<11.8.
* It implements, Cast, QuantizeLinear, DequantizeLinear for all types on
CPU, only for types FloatE4M3FN, FloatE5M2 on CUDA.
* It extends the supported types for control flow operator, Shape,
Reshape, Identity, If, Loop, Scan, Reshape
* It implements Equal(19).
* Cast, QuantizeLinear, DequantizeLinear operators now support a
parameter `saturate` only valid for float 8 types. It is true by
default. In that case, any value out of range is converted into the
maximum float 8 value. If false, it is infinite.
* QuantizeLinear, DequantizeLinear now supports multiple scales on CUDA
(and ROCm by extension), scale = 1D tensor with one scale per channel
### Motivation and Context
Supports latest onnx version.
Fixes
[AB#15395](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/15395)
---------
Co-authored-by: Xavier Dupre <xadupre@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Randy Shuai <rashuai@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
### Description
This change introduced the following new components into ONNX Runtime
Web:
- JavaScript Execution Provider (JSEP)
- Asynchronized inferencing execution powered by Emscripten's Asyncify
- WebGPU backend implemented in TypeScript
- initial implementation of kernels:
- elementwise operators (22)
- binary operators (5)
- tensor: Shape, Reshape, Transpose, Gemm
- nn: Conv, {Global}Maxpool, {Global}AveragePool
Code need to be polished. still working on it.
## Q&A
What is JSEP?
> JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime
execution provider that specifically works on Web environment
(browsers). JSEP allows JavaScript code to kick in from various places
when ONNX Runtime inferences a model.
Why JSEP?
> JSEP is a hybrid mode EP that contains both C/C++ and
TypeScript/JavaScript implementation. There are 2 strong reasons why we
introduces JSEP:
> 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities
as much as possible including graph transformer, optimizers and also the
capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to
develop and debug much easier in the browser for the kernel
implementation.
> 2. the requirement of asynchronized execution from JavaScript API (eg.
`buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a
synchronized context (see "async problem" section below). This is done
by using Emscripten's Asyncify.
What is WebGPU?
> WebGPU is the new GPU API that available in browser. It's one of the
only 2 APIs that currently available to access the GPU from browser (the
other is WebGL).
> WebGPU is designed with more advanced and stronger features comparing
to WebGL and is potentially solution that offer the best GPU performance
for model inferencing that currently available.
What is the async problem and why we have the problem?
> The "async problem" is a problem that you cannot call an async
function in a synchronous context. Think about the following C++ code:
> ```c
> // C-style declarations (API)
> typedef void (*ON_COMPLETE)(PVOID state, DATA *data);
> void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete);
>
> // implementation
> DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) {
> // how to implement?
> }
> ```
> The answer is, it's impossible to implement this function. Usually we
try to find a sync version API, or launch a thread to call the async
function and sync-wait on the main thread. Unfortunately, in browser
environment, neither is possible.
>
> WebGPU does not offer any synchronized API for data downloading (GPU
to CPU). This is the only operation that MUST be async. As `OrtRun()`
will eventually call into DataTransfer for copy data from GPU to CPU,
and `OrtRun()` is a synchronized function, this cannot be done in normal
way.
What is Emscripten? How is the Asyncify feature resolved the problem?
> Emscripten is the C/C++ compiler for WebAssembly. It's what we use to
compile ORT and generates the WebAssembly artifacts which runs on
browsers.
>
> Asyncify is a [compiler
feature](https://emscripten.org/docs/porting/asyncify.html) that allows
calling async functions from a synchronized context. In short, it
generates code to unwind and rewind call stack to emulate async
execution. With this feature, we are able to call the async function
inside `OrtRun()` call.
## Design Overview
**Inter-op**
JSEP is doing pretty much same thing to just another EP. It exposes an
interface for inter-op with JavaScript, which is defined in
onnxruntime/wasm/js_internal_api.js:
```js
// init JSEP
Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) {
Module.jsepBackend = backend;
Module.jsepAlloc = alloc;
Module.jsepFree = free;
Module.jsepCopy = copy;
Module.jsepCopyAsync = copyAsync;
Module.jsepCreateKernel = createKernel;
Module.jsepReleaseKernel = releaseKernel;
Module.jsepRun = run;
};
```
This simple JavaScript snippet defines all language barrier level
functions that requires by JSEP to achieve implementing kernels and data
transfers using JavaScript inside ONNX Runtime:
- `jsepBackend`: assign the singleton object to webassembly module
- `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc()
and Free()
- `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU)
- `jsepCopyAsync`: asynchronized copy ( GPU to CPU)
- `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object
that maintained in JS to match lifecycle of Kernel in ORT
- `jsepRun`: OpKernel::Compute() should call into this
The abstraction above allows to tie as little as possible connections
and dependencies between C/C++ and TypeScript/JavaScript.
**Resource Management**
Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the
implementation are left to JavaScript. JavaScript code are responsible
to implement the callbacks correctly.
For WebGPU, the GPU data is managed by JavaScript using a singleton map
(tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton.
Shaders are managed using a singletonmap (shader_key => gpu_program),
while shader_key is generated by cache_key (OP specific, including
attributes) and input shapes.
**about data transfer**
`js::DataTransfer::CopyTensor` implemented to call either synchronized
or asynchronized copy callback, depending on the destination is GPU or
not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function
to be called in the synchronized context.
**run kernel in JS**
Kernel class constructor calls once `jsepCreateKernel()` with an
optional per-kernel specific serialization to pass attributes into
JavaScript.
`Compute()` are implemented in a way that a metadata serialization is
performed in a base class and JavaScript code can access the data using
the Emscripten specific builtin macro `EM_ASM_*`.
**disabled features**
memory pattern is force disabled, because the WebGPU data is not
presented by a general memory model (a buffer can be represented by
offset + size).
concurrent run support is disabled. WebGPU is stateful and it also has
async function call. To support concurrent run will significantly
increase the complexity and we don't get any real benefit from it.
**prefer channels last**
JSEP prefers channels last and returns `DataLayout::NHWC` in method
`GetPreferredLayout()`. This will let the graph transformers to
preprocess the graph into a channels last form so that a more optimized
WebGPU shader can be used.
**Testing code**
It's impossible to test JSEP directly because JSEP itself does not
contain any kernel implementation. However, it has the kernel
registration which need to work together with the corresponding
JavaScript code. There are unit tests that run onnx models from
JavaScript API.
---------
Co-authored-by: Scott McKay <skottmckay@gmail.com>
### Description
This PR includes the following changes:
- upgrade js dependencies
- enable STRICT mode for web assembly build.
- corresponding fix for cmake-js upgrade
- corresponsing fix for linter upgrade
- upgrade default typescript compile option of:
- `moduleResolution`: from `node` to `node16`
- `target`: from `es2017` to `es2020`
- fix ESM module import in commonJS source file
## change explanation
### changes to onnxruntime_webassembly.cmake
`-s WASM=1` and `-s LLD_REPORT_UNDEFINED` in latest version is
by-default and deprecated.
### changes to onnxruntime_node.cmake
The npm package `cmake-js` updated its way to find file `node.lib`.
previously it downloads this file from Node.js public release channel,
and now it generates it from a definition file.
The node.js release channel does not contain a windows/arm64 version, so
previously cmake-js will fail to download `node.lib` for that platform.
this is why we made special handling to download the unofficial binary
to build. now this is no longer needed so we removed that from the cmake
file.
### changes to tsconfig.json
`node16` module resolution supports async import and `es2020` as target
supports top level await.
### Description
While browsing the sources I found several typos here and there.
I collected them to a single PR and fixed them.
Namely these typos are: operater, tranform, neccessary, trainig.
After fixing none of them was found anymore:
$ git grep "operater"
$ git grep "tranform"
$ git grep "neccessary"
$ git grep "trainig"
$
### Motivation and Context
Since some of the typos are in example notebooks and markdown files,
users can see them.
**Description**:
1. add pytorch_half_pixel interpolation mode in resize-packed.ts
Changes: add the following case in createPackedResizeProgramInfo
function:
```
case 'pytorch_half_pixel':
getSourceFracIndex = `
vec4 getSourceFracIndex(ivec4 coords) {
vec4 fcoords = vec4(coords);
return vec4(
${outputWidth}.0 > 1.0 ? (fcoords.x + 0.5) / scaleWHWH.x - 0.5 : 0.0,
${outputHeight}.0 > 1.0 ? (fcoords.y + 0.5) / scaleWHWH.y - 0.5 : 0.0,
${outputWidth}.0 > 1.0 ? (fcoords.z + 0.5) / scaleWHWH.z - 0.5 : 0.0,
${outputHeight}.0 > 1.0 ? (fcoords.w + 0.5) / scaleWHWH.w - 0.5 : 0.0
);
}
`;
break;
```
2. fix "unrecognized input '' for node: Resize_$num" error when inputs
like [input_tensor, None, scale_factor] (roiInput not given) are fed
into the resize layer.
Changes: change in input handling logic in upsample.ts & node scanning
logic in graph.ts
**Motivation and Context**
Before this fix, we aren't able to use webGL backend when the neural
network contains pytorch resize layers. This fix adds
'pytorch_half_pixel' interpolation mode support and makes it possible to
use webGL backend for more kind of computer vision networks.
This commit solves:
#10430
Co-authored-by: neo <neo@icode-lab.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
* add p50 in test
* Support FusedConv in WebGL
* resolve comments
* add a comment for longToNumber change
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
* add p50 in test
* support opset-13 of softmax
* update a operators.md
* resolve comments
* fix lint and format
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
* Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast, clip
* merge master and update a operators.md
* resolve comment. revise pool and cast kernel implementation.
* skip fusion when clip min and max is not in initializer
* [js/web] support string tensor for wasm backend
* disable v9/test_cast_STRING_to_FLOAT: test data is wrong
* add non-string check
* Update session-handler.ts
* Update session-handler.ts
* fixed bugs in packed mode and enable pack mode tests in ci
* removed unnecessary space
* pr comments
* pr comments
* disable an average pool test
* try disabling another avg pool
* disable more avg pool tests
* disable maxpool tests
* migrated changes to support running super resolution model using ortweb
* reverted benchmarking tool related changes which will be in a separate pr
* added kernel tests to op and node tests
* minor change to the order of variables
* added one more unit test for packed matmul