onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

Author	SHA1	Message	Date
Jon Campbell	768c79317c	Enable QNN HTP support for Node (#20576 ) ### Description Add support for using Onnx Runtime with Node ### Motivation and Context Onnx Runtime supports the QNN HTP, but does not support it for Node.js. This adds baseline support for the Onnx Runtime to be used with Node. Note it does not update the node packages that are distributed officially. This simply patches the onnxruntime.dll to allow 'qnn' to be used as an execution provider. Testing was done using the existing onnxruntime-node package. The `onnxruntime.dll` and `onnxruntime_binding.node` were swapped into `node_modules\onnxruntime-node\bin\napi-v3\win32\arm64` with the newly built version, then the various QNN dlls and .so files were placed next to the onnxruntime.dll. Testing was performed on a variety of models and applications, but the easiest test is to modify the [node quickstart example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/js/quick-start_onnxruntime-node).	2024-05-09 13:11:07 -07:00
Yi-Hong Lyu	b2481e3602	Bump up version in main from 1.18.0 to 1.19.0 (#20489 ) Bump up version in main from 1.18.0 to 1.19.0 since the release branch has been cut. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-04-29 20:21:41 -07:00
Wanming Lin	fe1c3a45c1	[WebNN EP] Support NPU deviceType (#20278 )	2024-04-15 18:43:46 -07:00
Yulong Wang	01c7aaf6aa	[js/webgpu] allow setting env.webgpu.adapter (#19940 ) ### Description Allow user to set `env.webgpu.adapter` before creating the first inference session. Feature request: https://github.com/microsoft/onnxruntime/pull/19857#issuecomment-1999984753 @xenova	2024-03-19 12:55:00 -07:00
Yulong Wang	b29849a287	[js/common] fix typedoc warnings (#19933 ) ### Description Fix a few warnings in typedoc (for generating JS API): ``` [warning] The signature TrainingSession.loadParametersBuffer has an @param with name "buffer", which was not used. [warning] NonTensorType, defined in ./lib/onnx-value.ts, is referenced by OnnxValue but not included in the documentation. [warning] TensorFactory, defined in ./lib/tensor-factory.ts, is referenced by Tensor but not included in the documentation. [warning] ExternalDataFileType, defined in ./lib/onnx-model.ts, is referenced by InferenceSession.SessionOptions.externalData but not included in the documentation. [warning] TensorToDataUrlOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toDataURL.toDataURL.options but not included in the documentation. [warning] TensorToImageDataOptions, defined in ./lib/tensor-conversion.ts, is referenced by Tensor.toImageData.toImageData.options but not included in the documentation. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.adapter. [warning] Failed to resolve link to "GpuBufferType" in comment for Env.WebGpuFlags.device. ``` Changes highlighted: - Merge `CoreMlExecutionProviderOption` and `CoreMLExecutionProviderOption`. They expose 2 set of different options for React-native and ORT nodejs binding. This should be fixed in future. - Fix a few inconsistency of names between JSDoc and parameters - Fix broken type links - Exclude trace functions	2024-03-15 19:01:50 -07:00
Belem Zhang	acb0df2280	Fix #19931 broken Get Started link of "ONNX Runtime JavaScript API" page (#19932 ) ### Description Fix #19931 broken Get Started link HTTP 404 for "Get Started" link in "ONNX Runtime JavaScript API" page Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2024-03-15 19:00:30 -07:00
Yulong Wang	79e50aeef3	[js/web] rewrite backend resolve to allow multiple EPs (#19735 ) ### Description This PR rewrite the backend resolve logic to support specifying multiple EPs. #### Backend The first version of ONNX Runtime Web actually carried some existing code from [ONNX.js](https://github.com/microsoft/onnxjs), which includes the "backend" concept. The original "backend" in ONNX.js is designed in a way assuming there is only one backend from user's backend hint list will be used. For example, in ONNX.js, if user specify a backend hint as `['webgl', 'wasm']`, ONNX.js will first try to use WebGL backend - if it loads successfully (the browser supports webgl), then "webgl" backend will be used and "wasm" will be ignored; otherwise, "webgl" will be ignored and try to load "wasm" backend. In short: only one backend will be used when initializing a session. #### Execution Provider Execution Provider, or EP, in ONNX Runtime is a different concept. One of the differences is that users are allow to specify multiple EPs, and if one does not support a particular kernel, it can fallback to other EP. This is a very common case when using a GPU EP in ONNX Runtime. #### Current Status: Backend v.s. EP Because of the history reasons mentioned above, the current status is quite confusing. There are real backends, which means it's different implementation in code; and there are backend hints, which are used as string names for backend hint; and there are EPs of the ONNX Runtime concepts. currently there are only 2 backends in our code base: The "onnxjs backend", and the "wasm backend". The "onnxjs backend" currently only powers backend hint "webgl", which go into the old onnx.js code path. All other backend hints including "wasm", "cpu"(alias to wasm), "webgpu" and "webnn" are all powered by "wasm backend". And because ORT Web treat "backend" as an internal concept and want to align with ONNX Runtime, so those names of backend hints are becoming EP names. The following table shows today's status: \| Execution Provider Name (public) / Backend Hint (internal) \| Backend \| EP in ORT \| -------- \| ------- \| ------- \| \| "wasm"/"cpu" \| WasmBackend \| CPU EP \| "webgl" \| OnnxjsBackend \| \* technically not an EP \| "webgpu" \| WasmBackend \| JSEP \| "webnn" \| WasmBackend \| WebNN EP #### Problem While the API allows to specify multiple EPs, the backend resolving only allows one backend. This causes issues when user specify multiple EP names in session options, the backend resolve behavior and EP registration behavior is inconsistent. Specifically, in this issue: https://github.com/microsoft/onnxruntime/issues/15796#issuecomment-1925363908: EP list `['webgpu', 'wasm']` on a browser without WebGPU support resolves to 'wasm' backend, but the full EP list is passed in session options, so JSEP is still enabled, causing the runtime error. #### Solution Since we still need WebGL backend, we cannot totally remove the backend register/resolve system. In this PR I made the following changes: - initialize every backend from the EP list, instead of only do that for the first successful one. - for the first resolved backend, filter all EP using the exact same backend. Remove all EPs not using this backend from session options - for every explicitly specified EP, if it's removed, show a warning message in console	2024-03-15 11:47:45 -07:00
Yulong Wang	4538d31a8b	[js/webgpu] expose a few properties in WebGPU API (#19857 ) ### Description This change exposes a few properties in `ort.env.webgpu` to resolve feature requirement mentioned in properties in https://github.com/microsoft/onnxruntime/pull/14579#discussion_r1519612619. - Add `powerPreference` and `forceFallbackAdapter` in `ort.env.webgpu`, to allow users to set the value of the properties before the first inference session is created. - Add readonly property `adapter` in `ort.env.webgpu` to allow users to get the adapter instance. Now users can access `ort.env.webgpu.device` and `ort.env.webgpu.adapter`. @xenova @beaufortfrancois	2024-03-12 19:50:51 -07:00
Yulong Wang	3cb81cdde2	[js/common] move 'env.wasm.trace' to 'env.trace' (#19617 ) ### Description Try to move 'env.wasm.trace' to 'env.trace' to make it less confusing, because it also works in webgpu. Marked 'env.wasm.trace' as deprecated.	2024-02-27 11:07:15 -08:00
Yulong Wang	58f4921686	[js] changes to allow Float16Array if any polyfill is available (#19305 ) ### Description This change adds only necessary code to enable ort-web works with any Float16Array polyfill. Unlike #19302, in this PR, ort-web does not include any specific polyfill; instead, it's user's choice for how to use a polyfill. ORT-web uses Float16Array if it's available; otherwise, fallback to use Uint16Array. ```js // case 1: user does not use polyfill: import * as ort from 'onnxruntime-web'; const myF16Data = new Uint16Array(...); // need to use Uint16Array const myF16tensor = new ort.Tensor('float16', myF16Data, dims); ``` ```js // case 2: user use polyfill: import * as ort from 'onnxruntime-web'; import { Float16Array, isFloat16Array, isTypedArray, getFloat16, setFloat16, f16round, } from "@petamoriken/float16"; globalThis.Float16Array = Float16Array; // ort-web will pick the global Float16Array const myF16Data = new Float16Array(...); // Use the polyfilled Float16Array type const myF16tensor = new ort.Tensor('float16', myF16Data, dims); ```	2024-02-21 00:31:06 -08:00
Yulong Wang	06269a3952	[js/webgpu] allow uint8 tensors for webgpu (#19545 ) ### Description allow uint8 tensors for webgpu	2024-02-16 18:28:27 -08:00
Jiajia Qin	85cef0af8c	[js/webgpu] Support capture and replay for jsep (#18989 ) ### Description This PR expands the graph capture capability to JS EP, which is similar to #16081. But for JS EP, we don't use the CUDA Graph, instead, we records all gpu commands and replay them, which removes most of the cpu overhead to avoid the the situation that gpu waiting for cpu. mobilenetv2-12 becomes 3.7ms from 6ms on NV 3090 and becomes 3.38ms from 4.58ms on Intel A770. All limitations are similar with CUDA EP: 1. Models with control-flow ops (i.e. If, Loop and Scan ops) are not supported. 2. Usage of graph capture is limited to models where-in all ops in the model can be partitioned to the JS EP or CPU EP and no memory copy between them. 3. Shapes of inputs/outputs cannot change across inference calls. 4. IObinding is required. The usage is like below: Method 1: specify outputs buffers explicitly. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer/outputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; const fetches = { 'output': ort.Tensor.fromGpuBuffer(outputBuffer, { dataType: 'float32', dims: [1, 1000] }) }; let results = await session.run(feeds, fetches); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds, fetches); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release(); ``` Method 2: Don't specify outputs buffers explicitly. Internally, when graph capture is enabled, it will set all outputs location to 'gpu-buffer'. ``` const sessionOptions = { executionProviders: [ { name: "webgpu", }, ], enableGraphCapture: true, }; const session = await ort.InferenceSession.create('./models/mobilenetv2-12.onnx', sessionOptions); // prepare the inputBuffer ... ... const feeds = { 'input': ort.Tensor.fromGpuBuffer(inputBuffer, { dataType: 'float32', dims }) }; let results = await session.run(feeds); // The first run will begin to capture the graph. // update inputBuffer content ... ... results = = await session.run(feeds); // The 2ed run and after will directly call replay to execute the graph. ... ... session.release();	2024-01-30 18:28:03 -08:00
Rachel Guo	bd9d8fb2a5	[ORT 1.17.0 release] Bump up version to 1.18.0 (#19170 ) ### Description <!-- Describe your changes. --> Bump up version to 1.18.0 since the release branch has been cut. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2024-01-17 11:18:32 -08:00
Yulong Wang	f917dde717	[web] remove xnnpack from web backends (#19116 ) ### Description XNNPACK is already disabled in web assembly build. This change removes the xnnpack backend registration in JS.	2024-01-13 23:04:02 -08:00
Yang Gu	e803f8eb0f	[js/webgpu] Refactor timestamp-query and introduce timestamp-query-inside-passes (#18894 ) We submit kernels in a batch (a fixed number 16 is used except for the last batch) for better performance. However, timestamp query support is at pass level so we disable the batch execution in profiling mode in previous implementation. Actually we can have multiple passes in a batch so that we don't have to disable batch execution, which is the first enhancement of this PR. Furthermore, WebGPU has an extension to support timestamp query inside passes, which isn't supported by all the platforms (e.g., Windows supports it, while macOS doesn't). This is expected to have lower cost compared with multiple passes solution. So this PR also introduce this support when available. This PR also refactors some implementation related to kernelInfo, and try to unify the related kernel names.	2024-01-13 00:23:17 -08:00
Yulong Wang	07cfc56538	[js] enable external data loading for ort-web (#19087 ) ### Description enable external data loading for ort-web. ### Why The ORT external data design is highly depending on the file system, especially synchronous file I/O APIs. Those are not available in web platforms. We need to have extra code to make external data working on web. ### How Considering there is no file system in web, an implementation for web to support external data is to use pre-loaded data. Assume model file a.onnx includes initializers that linked to ./b.bin, we require users to pass a full data file list when creating the session. The user code will be look like: ```js const mySess = await ort.InferenceSession.create('./path/model/a.onnx', { // session options externalData: [ { // relative or absolute path/URL of the file, // or a pre-loaded Uint8Array containing the data of the external data file data: './path/data/b.bin', // the relative path of the external data. Should match initializers' "location" value defined in the model file path: './b.bin' }, // { } if multiple external data file ] }); ``` Currently, this feature only works with JSEP build enabled.	2024-01-12 19:24:24 -08:00
Jiangzhuo	a503561d0c	[js] using OffscreenCanvas when DOM is not available (#19033 ) ### Description when DOM API is not avaiable, using OffscreenCanvas ### Motivation and Context In some environment like service worker or web worker, the DOM API is not avaiable, we can use OffscreenCanvas API to replace `document.createElement('canvas')`. Most of the APIs of OffscreenCanvas and HTMLCanvasElement are the same, except that `toDataUrl` is missing. It fix this issues #19032	2024-01-12 13:54:05 -08:00
Yang Gu	c5f3952b68	[js/webgpu] Introduce trace support (#18928 ) This is to leverage console.timeStamp to add a single marker to browsers' (only Chromium and Firefox support it) performance tool. With this support, we can dump both CPU and GPU timestamps, and use post-processing tool to clearly understand the calibrated timeline. A demo tool can be found at https://github.com/webatintel/ort-test, and more detailed info can be found at https://docs.google.com/document/d/1TuVxjE8jnELBXdhI4QGFgMnUqQn6Q53QA9y4a_dH688/edit.	2024-01-03 10:13:17 -08:00
Yulong Wang	9a61388f0a	[js/web] revise backend registration (#18715 ) ### Description This PR revises the backend registration. The following describes the expected behavior after this change: (bolded are changed behavior) - (ort.min.js - built without webgpu support) - loading: do not register 'webgpu' backend - creating session without EP list: use default EP list ['webnn', 'cpu', 'wasm'] - creating session with ['webgpu'] as EP list: should fail with backend not available - (ort.webgpu.min.js - built with webgpu support) - loading: always register 'webgpu' backend ( previous behavior: only register 'webgpu' backend when `navigator.gpu` is available) - creating session without EP list: use default EP list ['webgpu', 'webnn', 'cpu', 'wasm'] - when WebGPU is available (win): use WebGPU backend - when WebGPU is unavailable (android): should fail backend init, and try to use next backend in the list, 'webnn' (previous behavior: does not fail backend init, but fail in JSEP init, which was too late to switch to next backend) - creating session with ['webgpu'] as EP list - when WebGPU is available (win): use WebGPU backend - when WebGPU is unavailable (android): **should fail backend init, and because no more EP listed, fail. related PRs: #18190 #18144	2023-12-20 14:45:55 -08:00
Caroline Zhu	eb03032925	[js/web/training] lazyResetGrad implementation (#18711 ) ### Description * implemented lazyResetGrad function ### Motivation and Context * we are in the process of adding language bindings to enable training on web * lazyresetgrad ensures that the gradients are calculated correctly after the first runTrainStep call --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-12-11 17:36:54 -08:00
Yulong Wang	efbef5f611	[js/webgpu] allow to specify callback for profiling data (#18732 ) ### Description This PR is a replacement of #17820. allow to specify callback for profiling data Previous: ```js ort.env.webgpu.profilingMode = 'default'; // enable profiling // profiling data will output to console. ``` Now: ```js ort.env.webgpu.profiling = { mode: 'default'; // enable profiling ondata: (data) => { // .. process the profiling data } }; //for each kernel, "ondata" will be called once. only output to console if ondata is not specified. ```	2023-12-07 14:10:28 -08:00
Caroline Zhu	c02a386145	[js/web/training] Implemented runEvalStep & runOptimizerStep (#18259 ) ### Description * implemented runEvalStep and runOptimizerStep * added hasEvalModel and hasOptimizerModel boolean fields in TrainingSession representation * added evalInputNames and evalOutputNames fields to TrainingSessionHandler & TrainingSession * removed the inputNamesEncoded and outputNamesEncoded fields from TrainingSessionHandler -- since none of the training methods require the input names and output names as parameters, there's no need to store them. ### Motivation and Context * part of the work for implementing web bindings for training * previous PR: #18250 --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-12-04 13:37:14 -08:00
Caroline Zhu	dd355e39a0	[js/web/training] Added parameters methods (#18250 ) ### Description * Implemented: `getParametersSize`, `getContiguousParameters` (equivalent to copyParametersToBuffer), and `loadParametersBuffer` (equivalent to copyParametersFromBuffer) * as part of these changes, getParametersSize was added to the TrainingSession interface so that users know what size buffer to create for loadParametersBuffer * The parameters methods in the interface were modified to take in a Float32Array instead ### Motivation and Context * part of the work for implementing web bindings for training * enables federated learning in the web * previous PR: #18006 --------- Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-11-27 10:30:13 -08:00
Wanming Lin	73ed34ac4b	[WebNN EP] Support numThreads option for WebNN CPU device (#18054 )	2023-11-12 16:45:10 -08:00
Caroline Zhu	e3b043ba17	[js/web/training] runTrainStep implementation (#18006 ) ### Description * based on design document & following InferenceSession's run implementation, implemented TrainingSession.runTrainStep ### Motivation and Context * Adding web bindings for training #### Related work * #16521 allowed for training artifacts to be built * #17333 added interfaces for training * #17474 allowed for training package to be built + added training backend to web package * #17891 implementation for createTrainingSession on the TypeScript side [SHOULD BE MERGED IN BEFORE THIS PR] --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-11-02 08:32:50 -07:00
Caroline Zhu	64de71c5e2	[js/web/training] Add CreateTrainingSession (#17891 ) ### Description * Adds TrainingSession.create() functionality following the web bindings for training design doc * Added 2 new training APIs to wasm/api.h: * OrtTrainingGetInputOutputName * OrtTrainingGetInputOutputCount * Moved isOrtEnvInitialized boolean to the wasm-core-impl and added a method that references it ### Motivation and Context * Adding web bindings for training #### Related work * #16521 allowed for training artifacts to be built * #17333 added interfaces for training * #17474 allows for training package to be built + adds training backend to web package [MUST BE MERGED IN BEFORE THIS ONE] --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com> Co-authored-by: Ashwini Khade <askhade@microsoft.com>	2023-10-26 09:22:10 -07:00
Yulong Wang	451c02543a	[js/webgpu] allow specify preferredLayout (#17756 ) ### Description Allow WebGPU backend to specify `preferredLayout`. Default is NHWC. ```js const options = {executionProviders: [{name:'webgpu', preferredLayout: 'NCHW'}]}; sess1 = await ort.InferenceSession.create('./mobilenetv2-12.onnx', options); ``` ### Motivation and Context - implement @qjia7's requirement for an easier way to do performance comparison between NCHW vs NHWC. - It's possible that NCHW does better on some models and NHWC on others. So offer user the capability to switch.	2023-10-02 21:25:12 -07:00
xhcao	0d60604638	[JS/WebGPU] support Range operator (#17233 ) The patch also introduces the method which copies data from GPU to CPU synchronously. ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-09-30 02:05:32 -07:00
Caroline Zhu	6a5f469d44	Add training interfaces to js/common (#17333 ) ### Description Following the design document: * Added CreateTrainingSessionHandler to the Backend interface * All existing Backend implementations throw an error for the new method createTrainingSessionHandler * Created TrainingSession namespace, interface, and TrainingSessionFactory interface * Created TrainingSessionImpl class implementation As methods are implemented, the TrainingSession interface will be added to or modified. ### Motivation and Context Adding the public-facing interfaces to the onnxruntime-common package is one of the first steps to support ORT training for web bindings. --------- Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com>	2023-09-29 19:05:10 -07:00
Vincent Wang	e6301eee6a	Bump Up Version to 1.17.0 (#17587 ) Bump up version to 1.17.0 as the 1.16.0 release branch had been branched out.	2023-09-20 11:02:58 +08:00
Yulong Wang	a2e75114cc	[js/web] add sessionOptions.freeDimensionOverrides (#17488 ) ### Description Allows to specify fixed size for dynamic input of a model. resolves #16707 Pending test	2023-09-13 09:17:34 -07:00
Yulong Wang	4d753b74a5	[js/common] prepare work for supporting webgpu IO binding implementation (#17465 ) ### Description This PR contains a few changes in /js/common/ to support a coming PR for a full implementation of webgpu IO binding. - allows pass-through if value is already a Tensor instance in return value of `handler.run()` called by `InferenceSession.run()` (inference-session-impl.ts). Specifically, onnxruntime-node and onnxruntime-react-native uses native bindings to generate a Tensor-like object so we need to create a real Tensor instance here; for onnxruntime-web the return value is already a Tensor instance. - adds new types for GPU buffer supported types: `'float32'\|'int32'` -> `'float32'\|'float16'\|'int32'\|'int64'\|'uint32'\|'bool'` - exposes types `GpuBufferDataTypes` together with `CpuPinnedDataTypes` and `TextureDataTypes` as exported	2023-09-08 13:49:24 -07:00
Yulong Wang	d88406a31b	[js/common] use Map instead of object for backends (#17352 ) ### Description resolved https://github.com/microsoft/onnxruntime/security/code-scanning/1140	2023-09-05 23:14:46 -07:00
Yulong Wang	2cb75420ac	[js/common] clean up JSDoc (#17408 ) ### Description clean up JSDoc for onnxruntime-common: - replace "@internal" to "@ignore" as JSDoc do not use "@internal". Using "@ignore" will let the content not show on the generated doc.	2023-09-05 20:40:23 -07:00
Yulong Wang	e5ca3f3dcb	[js/api] introducing IO binding for tensor (#16452 ) [//]: # (## Work In Progress. Feedbacks are welcome!) ### Description This PR adds a few properties, methods and factories to Tensor type to support IO-binding feature. This will allow user to create tensor from GPU/CPU bound data without a force transferring of data between CPU and GPU. This change is a way to resolve #15312 ### Change Summary 1. Add properties to `Tensor` type: a. `location`: indicating where the data is sitting. valid values are `cpu`, `cpu-pinned`, `texture`, `gpu-buffer`. b. `texture`: sit side to `data`, a readonly property of `WebGLTexture` type. available only when `location === 'texture'` c. `gpuBuffer`: sit side to `data`, a readonly property of `GPUBuffer` type. available only when `location === 'gpu-buffer'` 2. Add methods to `Tensor` type (usually dealing with inference outputs): - async function `getData()` allows user to download data from GPU to CPU manually. - function `dispose()` allows user to release GPU resources manually. 3. Add factories for creating `Tensor` instances: a. `fromTexture()` to create a WebGL texture bound tensor data b. `fromGpuBuffer()` to create a WebGPUBuffer bound tensor data c. `fromPinnedBuffer()` to create a tensor using a CPU pinned buffer ### Examples: create tensors from texture and pass to inference session as inputs ```js // when create session, specify we prefer 'image_output:0' to be stored on GPU as texture const session = await InferenceSession.create('./my_model.onnx', { executionProviders: [ 'webgl' ], preferredOutputLocation: { 'image_output:0': 'texture' } }); ... const myImageTexture = getTexture(); // user's function to get a texture const myFeeds = { input0: Tensor.fromTexture(myImageTexture, { width: 224, height: 224 }) }; // shape [1, 224, 224, 4], RGBA format. const results = await session.run(myFeeds); const myOutputTexture = results['image_output:0'].texture; ```	2023-08-29 12:58:26 -07:00
Arthur Islamov	c262879214	Added DML and CUDA provider support in onnxruntime-node (#16050 ) ### Description I've added changes to support CUDA and DML (only on Windows, on other platforms it will throw an error) ### Motivation and Context It fixes this feature request https://github.com/microsoft/onnxruntime/issues/14127 which is tracked here https://github.com/microsoft/onnxruntime/issues/14529 I was working on StableDiffusion implementation for node.js and it is very slow on CPU, so GPU support is essential. Here is a working demo with a patched and precompiled version https://github.com/dakenf/stable-diffusion-nodejs ---------	2023-08-25 16:57:06 -07:00
Artyom Stepanishchev	ba23e5b234	[JS/Common] Fix malformed result of Tensor.fromImage(ImageBitmap) (#16919 ) ### Description Set `canvas` dimensions to the `ImageBitmap` dimensions, thus fixing a malformed Tensor creation. ### Motivation and Context According to the [HTMLCanvasElement.drawImage() spec](https://html.spec.whatwg.org/multipage/canvas.html#drawing-images): > When the destination rectangle is outside the destination image (the output bitmap), the pixels that land outside the output bitmap are discarded, as if the destination was an infinite canvas whose rendering was clipped to the dimensions of the output bitmap. meaning that `ImageBitmap` pixels exceeding the canvas dimensions will be discarded. Since no canvas dimensions are set for `Tensor.fromImage(ImageBitmap)` if-case, the default 300x150px canvas dimensions are used leading to the creation of malformed Tensors where all the exceeding pixels are discarded and equal to `0, 0, 0, 0` during the subsequent `pixels2DContext.getImageData()` call.	2023-07-31 18:18:06 -07:00
Yulong Wang	ecca11340a	[js/common] allow creating (u)int64 tensors in 2 ways (#16541 ) ### Description allow creating (u)int64 tensors from either a number array or a bigint array. before: ```js // TypeScript think is good, but actually does not work // runtime error: Uncaught TypeError: Cannot convert 1 to a BigInt const myTensor1 = new Tensor('int64', [1, 2, 3, 4], [2, 2]); // runtime good, but TypeScript thinks myTensor2 is a string tensor const myTensor2 = new Tensor('int64', [1n, 2n, 3n, 4n], [2, 2]); ``` after: ```js // both work at runtime and TypeScript populates the correct types const myTensor1 = new Tensor('int64', [1, 2, 3, 4], [2, 2]); const myTensor2 = new Tensor('int64', [1n, 2n, 3n, 4n], [2, 2]); ```	2023-07-11 21:07:36 -07:00
Jhen-Jie Hong	685816bb0a	[js/rn] Add executionProviders support (#16233 ) ### Description <!-- Describe your changes. --> This PR adds support for `executionProviders` option for react-native package, support: - Android: cpu / xnnpack / nnapi - iOS: cpu / xnnpack / coreml ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> In my case I want to enable Core ML / NNAPI EP for react-native project.	2023-06-16 19:38:41 +10:00
Yulong Wang	e3e4926d00	[js/common] allow import onnxruntime-common as ESM and CJS (#15772 ) ### Description allow import onnxruntime-common as ESM and CJS.	2023-06-12 12:05:11 -07:00
Yulong Wang	59f42cccb8	[js/common] refactor tensor type in onnxruntime-common (#15843 ) ### Description <!-- Describe your changes. --> refactor tensor type in onnxruntime-common. ### Motivation and Context There major motivation is that I am doing a local change to address the API part of #15312. And I am doing a refactoring of onnxruntime-common anyway (#15772). The `tensor.ts` and `tensor-impl.ts` are too large, so I split contents into multiple files to make the type declarations clearer. The original target of this change is for API only ( ie. do not refactor any implementation.). However, there are a few type/implementation inconsistencies so I also made minimal changes to fix them. ### Changes - extract `TensorUtils` for non-template interfaces - extract `TensorFactory` for all overloads of `Tensor.fromImage()` - refactor options type that used for `Tensor.fromImage()` - fix JSDoc comments to make option descriptions consistent with actual type declarations - fix an inconsistency for `options.format` and `options.bitmapFormat`; change all `bitmapFormat` to `format` - extract `ConversionUtils` for `tensor.toDataURL()` and `tensor.toImageData()` - put implementations into multiple files from `tensor-impl.ts` - fix a bug that cause unittest fail. put comments for future fix.	2023-06-09 16:19:29 -07:00
Yulong Wang	f274bbb0c8	[js] add API that allows to get package version (#16207 ) ### Description Add an API for users to get version of current package. example usage: ```js import { env } from 'onnxruntime-node'; console.log(env.versions.node); // output "1.16.0" ``` ```js import { env } from 'onnxruntime-web'; console.log(env.versions.web); // output "1.16.0" console.log(env.versions.common); // output "1.16.0" console.log(env.versions.node); // output "undefined" ``` #16156	2023-06-09 16:18:53 -07:00
Wanming Lin	a8c2f24ae0	[WebNN EP] Merge support for segment anything into main branch (#16208 ) We implemented a number of new ops and data types to support running segment anything model on Chromium WebNN DML backend (POC) in a forked branch https://github.com/honry/onnxruntime/tree/stable-diffusion In this PR, we migrate the changes in the forked branch to main branch, includes: - 22 new ops - New tensor data types: bool, int32, uint32, uint64, int64, float16 (As JavaScript hasn't shipped Float16Array, we use Uint16Array as a workaound) - Handle empty input tensors and duplicated outputs - Fixed some nits	2023-06-07 09:56:37 -07:00
Yulong Wang	ba5f5e3198	[js] allow manually release inference session (#16169 ) ### Description This change adds a new instance function (method) to type `InferenceSession` to allow users to manually release an inference session instance. #16131 depends on this change to work correctly.	2023-05-31 00:31:38 -07:00
Yulong Wang	204111a79e	[js/webgpu] support proxy for webgpu (#15851 ) ### Description [js/webgpu] support proxy for webgpu. fixes #15832	2023-05-15 16:23:13 -07:00
Wanming Lin	00b1e79e04	Support WebNN EP (#15698 ) Description: This PR intends to enable WebNN EP in ONNX Runtime Web. It translates the ONNX nodes by [WebNN API](https://webmachinelearning.github.io/webnn/), which is implemented in C++ and uses Emscripten [Embind API](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#). Temporarily using preferred layout NHWC for WebNN graph partitions since the restriction in WebNN XNNPack backend implementation and the ongoing [discussion](https://github.com/webmachinelearning/webnn/issues/324) in WebNN spec that whether WebNN should support both 'NHWC' and 'NCHW' layouts. No WebNN native EP, only for Web. Motivation and Context: Allow ONNXRuntime Web developers to access WebNN API to benefit from hardware acceleration. WebNN API Implementation Status in Chromium: - Tracked in Chromium issue: [#1273291](https://bugs.chromium.org/p/chromium/issues/detail?id=1273291) - CPU device: based on XNNPack backend, and had been available on Chrome Canary M112 behind "#enable-experimental-web-platform-features" flag for Windows and Linux platforms. Further implementation for more ops is ongoing. - GPU device: based on DML, implementation is ongoing. Open: - GitHub CI: WebNN currently is only available on Chrome Canary/Dev with XNNPack backend for Linux and Windows. This is an open to reviewers to help identify which GitHub CI should involved the WebNN EP and guide me to enable it. Thanks!	2023-05-08 21:25:10 -07:00
shalvamist	c10a6a9d17	Tensor <--> image - Adding per channel compute for Norm mean & Bias (#14705 ) ### Description Enabled the use of per channel Bias and Mean normalization when converting an image <--> tensor. Added a few bug fixes and updates to the relevant E2E tests. --------- Co-authored-by: shalvamist <shalva.mist@microsoft.com>	2023-05-01 09:37:50 -07:00
Yulong Wang	14cc02c65c	[js/web] WebGPU backend via JSEP (#14579 ) ### Description This change introduced the following new components into ONNX Runtime Web: - JavaScript Execution Provider (JSEP) - Asynchronized inferencing execution powered by Emscripten's Asyncify - WebGPU backend implemented in TypeScript - initial implementation of kernels: - elementwise operators (22) - binary operators (5) - tensor: Shape, Reshape, Transpose, Gemm - nn: Conv, {Global}Maxpool, {Global}AveragePool Code need to be polished. still working on it. ## Q&A What is JSEP? > JSEP, aka JavaScript Execution Provider, is a new ONNXRuntime execution provider that specifically works on Web environment (browsers). JSEP allows JavaScript code to kick in from various places when ONNX Runtime inferences a model. Why JSEP? > JSEP is a hybrid mode EP that contains both C/C++ and TypeScript/JavaScript implementation. There are 2 strong reasons why we introduces JSEP: > 1. the C/C++ part helps JSEP to leverage ONNX Runtime's capabilities as much as possible including graph transformer, optimizers and also the capabilities to fallback to CPU EP. TypeScript/JavaScript helps JSEP to develop and debug much easier in the browser for the kernel implementation. > 2. the requirement of asynchronized execution from JavaScript API (eg. `buffer.mapAsync()`) makes it impossible to run `OrtRun()` in a synchronized context (see "async problem" section below). This is done by using Emscripten's Asyncify. What is WebGPU? > WebGPU is the new GPU API that available in browser. It's one of the only 2 APIs that currently available to access the GPU from browser (the other is WebGL). > WebGPU is designed with more advanced and stronger features comparing to WebGL and is potentially solution that offer the best GPU performance for model inferencing that currently available. What is the async problem and why we have the problem? > The "async problem" is a problem that you cannot call an async function in a synchronous context. Think about the following C++ code: > ```c > // C-style declarations (API) > typedef void (ON_COMPLETE)(PVOID state, DATA data); > void read_data_from_file(FILEHANDLE file, ON_COMPLETE on_complete); > > // implementation > DATA * my_impl_read_data_from_file_sync(FILEHANDLE file) { > // how to implement? > } > ``` > The answer is, it's impossible to implement this function. Usually we try to find a sync version API, or launch a thread to call the async function and sync-wait on the main thread. Unfortunately, in browser environment, neither is possible. > > WebGPU does not offer any synchronized API for data downloading (GPU to CPU). This is the only operation that MUST be async. As `OrtRun()` will eventually call into DataTransfer for copy data from GPU to CPU, and `OrtRun()` is a synchronized function, this cannot be done in normal way. What is Emscripten? How is the Asyncify feature resolved the problem? > Emscripten is the C/C++ compiler for WebAssembly. It's what we use to compile ORT and generates the WebAssembly artifacts which runs on browsers. > > Asyncify is a [compiler feature](https://emscripten.org/docs/porting/asyncify.html) that allows calling async functions from a synchronized context. In short, it generates code to unwind and rewind call stack to emulate async execution. With this feature, we are able to call the async function inside `OrtRun()` call. ## Design Overview Inter-op JSEP is doing pretty much same thing to just another EP. It exposes an interface for inter-op with JavaScript, which is defined in onnxruntime/wasm/js_internal_api.js: ```js // init JSEP Module["jsepInit"] = function (backend, alloc, free, copy, copyAsync, createKernel, releaseKernel, run) { Module.jsepBackend = backend; Module.jsepAlloc = alloc; Module.jsepFree = free; Module.jsepCopy = copy; Module.jsepCopyAsync = copyAsync; Module.jsepCreateKernel = createKernel; Module.jsepReleaseKernel = releaseKernel; Module.jsepRun = run; }; ``` This simple JavaScript snippet defines all language barrier level functions that requires by JSEP to achieve implementing kernels and data transfers using JavaScript inside ONNX Runtime: - `jsepBackend`: assign the singleton object to webassembly module - `jsepAlloc` and `jsepFree`: implementation of data transfer's Alloc() and Free() - `jsepCopy`: synchronized copy ( GPU to GPU, CPU to GPU) - `jsepCopyAsync`: asynchronized copy ( GPU to CPU) - `jsepCreateKernel` and `jsepReleaseKernel`: a corresponding object that maintained in JS to match lifecycle of Kernel in ORT - `jsepRun`: OpKernel::Compute() should call into this The abstraction above allows to tie as little as possible connections and dependencies between C/C++ and TypeScript/JavaScript. Resource Management Lifecycle of tensor data and kernels are managed by ORT(C/C++) but the implementation are left to JavaScript. JavaScript code are responsible to implement the callbacks correctly. For WebGPU, the GPU data is managed by JavaScript using a singleton map (tensot_data_id => GPUBuffer). GPU pipeline is managed as singleton. Shaders are managed using a singletonmap (shader_key => gpu_program), while shader_key is generated by cache_key (OP specific, including attributes) and input shapes. about data transfer `js::DataTransfer::CopyTensor` implemented to call either synchronized or asynchronized copy callback, depending on the destination is GPU or not. Emscripten's macro `EM_ASYNC_JS` is used to wrap the async function to be called in the synchronized context. run kernel in JS Kernel class constructor calls once `jsepCreateKernel()` with an optional per-kernel specific serialization to pass attributes into JavaScript. `Compute()` are implemented in a way that a metadata serialization is performed in a base class and JavaScript code can access the data using the Emscripten specific builtin macro `EM_ASM_`. disabled features* memory pattern is force disabled, because the WebGPU data is not presented by a general memory model (a buffer can be represented by offset + size). concurrent run support is disabled. WebGPU is stateful and it also has async function call. To support concurrent run will significantly increase the complexity and we don't get any real benefit from it. prefer channels last JSEP prefers channels last and returns `DataLayout::NHWC` in method `GetPreferredLayout()`. This will let the graph transformers to preprocess the graph into a channels last form so that a more optimized WebGPU shader can be used. Testing code It's impossible to test JSEP directly because JSEP itself does not contain any kernel implementation. However, it has the kernel registration which need to work together with the corresponding JavaScript code. There are unit tests that run onnx models from JavaScript API. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-04-24 15:21:18 -07:00
Yulong Wang	f972d21e81	[js] upgrade dependencies and enable strict mode (#14930 ) ### Description This PR includes the following changes: - upgrade js dependencies - enable STRICT mode for web assembly build. - corresponding fix for cmake-js upgrade - corresponsing fix for linter upgrade - upgrade default typescript compile option of: - `moduleResolution`: from `node` to `node16` - `target`: from `es2017` to `es2020` - fix ESM module import in commonJS source file ## change explanation ### changes to onnxruntime_webassembly.cmake `-s WASM=1` and `-s LLD_REPORT_UNDEFINED` in latest version is by-default and deprecated. ### changes to onnxruntime_node.cmake The npm package `cmake-js` updated its way to find file `node.lib`. previously it downloads this file from Node.js public release channel, and now it generates it from a definition file. The node.js release channel does not contain a windows/arm64 version, so previously cmake-js will fail to download `node.lib` for that platform. this is why we made special handling to download the unofficial binary to build. now this is no longer needed so we removed that from the cmake file. ### changes to tsconfig.json `node16` module resolution supports async import and `es2020` as target supports top level await.	2023-03-22 15:05:04 -07:00
Yulong Wang	3c4efd2e77	[js/common] allows polyfill for bigint (#14921 ) ### Description This change delays the execution of checking whether bigint is available in the context. This allows polyfill for `BigInt64Array`/`BigUint64Array` (if there is any)	2023-03-08 15:29:04 -08:00

1 2

70 commits