2021-04-27 07:04:25 +00:00
|
|
|
// Copyright (c) Microsoft Corporation. All rights reserved.
|
|
|
|
|
// Licensed under the MIT License.
|
|
|
|
|
|
2023-10-06 20:37:37 +00:00
|
|
|
import {readFile} from 'node:fs/promises';
|
2024-01-03 18:13:17 +00:00
|
|
|
import {InferenceSession, InferenceSessionHandler, SessionHandler, Tensor, TRACE_FUNC_BEGIN, TRACE_FUNC_END} from 'onnxruntime-common';
|
2021-05-17 21:57:19 +00:00
|
|
|
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
import {SerializableInternalBuffer, TensorMetadata} from './proxy-messages';
|
|
|
|
|
import {copyFromExternalBuffer, createSession, endProfiling, releaseSession, run} from './proxy-wrapper';
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
import {isGpuBufferSupportedType} from './wasm-common';
|
2021-04-27 07:04:25 +00:00
|
|
|
|
2023-11-02 15:32:50 +00:00
|
|
|
export const encodeTensorMetadata = (tensor: Tensor, getName: () => string): TensorMetadata => {
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
switch (tensor.location) {
|
|
|
|
|
case 'cpu':
|
|
|
|
|
return [tensor.type, tensor.dims, tensor.data, 'cpu'];
|
|
|
|
|
case 'gpu-buffer':
|
|
|
|
|
return [tensor.type, tensor.dims, {gpuBuffer: tensor.gpuBuffer}, 'gpu-buffer'];
|
|
|
|
|
default:
|
|
|
|
|
throw new Error(`invalid data location: ${tensor.location} for ${getName()}`);
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
2023-11-02 15:32:50 +00:00
|
|
|
export const decodeTensorMetadata = (tensor: TensorMetadata): Tensor => {
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
switch (tensor[3]) {
|
|
|
|
|
case 'cpu':
|
|
|
|
|
return new Tensor(tensor[0], tensor[2], tensor[1]);
|
|
|
|
|
case 'gpu-buffer': {
|
|
|
|
|
const dataType = tensor[0];
|
|
|
|
|
if (!isGpuBufferSupportedType(dataType)) {
|
|
|
|
|
throw new Error(`not supported data type: ${dataType} for deserializing GPU tensor`);
|
|
|
|
|
}
|
|
|
|
|
const {gpuBuffer, download, dispose} = tensor[2];
|
|
|
|
|
return Tensor.fromGpuBuffer(gpuBuffer, {dataType, dims: tensor[1], download, dispose});
|
|
|
|
|
}
|
|
|
|
|
default:
|
|
|
|
|
throw new Error(`invalid data location: ${tensor[3]}`);
|
|
|
|
|
}
|
|
|
|
|
};
|
|
|
|
|
|
2023-09-30 02:05:10 +00:00
|
|
|
export class OnnxruntimeWebAssemblySessionHandler implements InferenceSessionHandler {
|
2021-08-31 17:23:42 +00:00
|
|
|
private sessionId: number;
|
2021-04-27 07:04:25 +00:00
|
|
|
|
|
|
|
|
inputNames: string[];
|
|
|
|
|
outputNames: string[];
|
|
|
|
|
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
async fetchModelAndCopyToWasmMemory(path: string): Promise<SerializableInternalBuffer> {
|
2022-11-14 20:18:02 +00:00
|
|
|
// fetch model from url and move to wasm heap. The arraybufffer that held the http
|
|
|
|
|
// response is freed once we return
|
|
|
|
|
const response = await fetch(path);
|
2023-08-18 16:59:03 +00:00
|
|
|
if (response.status !== 200) {
|
|
|
|
|
throw new Error(`failed to load model: ${path}`);
|
|
|
|
|
}
|
2022-11-14 20:18:02 +00:00
|
|
|
const arrayBuffer = await response.arrayBuffer();
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
return copyFromExternalBuffer(new Uint8Array(arrayBuffer));
|
2022-11-14 20:18:02 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async loadModel(pathOrBuffer: string|Uint8Array, options?: InferenceSession.SessionOptions): Promise<void> {
|
2024-01-03 18:13:17 +00:00
|
|
|
TRACE_FUNC_BEGIN();
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
let model: Parameters<typeof createSession>[0];
|
2021-04-27 07:04:25 +00:00
|
|
|
|
2022-11-14 20:18:02 +00:00
|
|
|
if (typeof pathOrBuffer === 'string') {
|
2023-06-20 07:20:58 +00:00
|
|
|
if (typeof process !== 'undefined' && process.versions && process.versions.node) {
|
2022-11-14 20:18:02 +00:00
|
|
|
// node
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
model = await readFile(pathOrBuffer);
|
2022-11-14 20:18:02 +00:00
|
|
|
} else {
|
|
|
|
|
// browser
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
// fetch model and copy to wasm heap.
|
|
|
|
|
model = await this.fetchModelAndCopyToWasmMemory(pathOrBuffer);
|
2022-11-14 20:18:02 +00:00
|
|
|
}
|
|
|
|
|
} else {
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
model = pathOrBuffer;
|
2022-11-14 20:18:02 +00:00
|
|
|
}
|
[js/web] revise backend registration (#18715)
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190 #18144
2023-12-20 22:45:55 +00:00
|
|
|
|
|
|
|
|
[this.sessionId, this.inputNames, this.outputNames] = await createSession(model, options);
|
2024-01-03 18:13:17 +00:00
|
|
|
TRACE_FUNC_END();
|
2021-04-27 07:04:25 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
async dispose(): Promise<void> {
|
2021-08-31 17:23:42 +00:00
|
|
|
return releaseSession(this.sessionId);
|
2021-04-27 07:04:25 +00:00
|
|
|
}
|
|
|
|
|
|
2021-05-13 17:50:04 +00:00
|
|
|
async run(feeds: SessionHandler.FeedsType, fetches: SessionHandler.FetchesType, options: InferenceSession.RunOptions):
|
|
|
|
|
Promise<SessionHandler.ReturnType> {
|
2024-01-03 18:13:17 +00:00
|
|
|
TRACE_FUNC_BEGIN();
|
2021-04-27 07:04:25 +00:00
|
|
|
const inputArray: Tensor[] = [];
|
|
|
|
|
const inputIndices: number[] = [];
|
|
|
|
|
Object.entries(feeds).forEach(kvp => {
|
|
|
|
|
const name = kvp[0];
|
|
|
|
|
const tensor = kvp[1];
|
|
|
|
|
const index = this.inputNames.indexOf(name);
|
|
|
|
|
if (index === -1) {
|
|
|
|
|
throw new Error(`invalid input '${name}'`);
|
|
|
|
|
}
|
|
|
|
|
inputArray.push(tensor);
|
|
|
|
|
inputIndices.push(index);
|
|
|
|
|
});
|
|
|
|
|
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
const outputArray: Array<Tensor|null> = [];
|
2021-04-27 07:04:25 +00:00
|
|
|
const outputIndices: number[] = [];
|
|
|
|
|
Object.entries(fetches).forEach(kvp => {
|
|
|
|
|
const name = kvp[0];
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
const tensor = kvp[1];
|
2021-04-27 07:04:25 +00:00
|
|
|
const index = this.outputNames.indexOf(name);
|
|
|
|
|
if (index === -1) {
|
|
|
|
|
throw new Error(`invalid output '${name}'`);
|
|
|
|
|
}
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
outputArray.push(tensor);
|
2021-04-27 07:04:25 +00:00
|
|
|
outputIndices.push(index);
|
|
|
|
|
});
|
|
|
|
|
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
const inputs =
|
|
|
|
|
inputArray.map((t, i) => encodeTensorMetadata(t, () => `input "${this.inputNames[inputIndices[i]]}"`));
|
|
|
|
|
const outputs = outputArray.map(
|
|
|
|
|
(t, i) => t ? encodeTensorMetadata(t, () => `output "${this.outputNames[outputIndices[i]]}"`) : null);
|
|
|
|
|
|
|
|
|
|
const results = await run(this.sessionId, inputIndices, inputs, outputIndices, outputs, options);
|
2021-05-13 17:50:04 +00:00
|
|
|
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
const resultMap: SessionHandler.ReturnType = {};
|
|
|
|
|
for (let i = 0; i < results.length; i++) {
|
|
|
|
|
resultMap[this.outputNames[outputIndices[i]]] = outputArray[i] ?? decodeTensorMetadata(results[i]);
|
2021-04-27 07:04:25 +00:00
|
|
|
}
|
2024-01-03 18:13:17 +00:00
|
|
|
TRACE_FUNC_END();
|
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484
Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.
</del>
### Description
This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.
### Examples
An E2E demo/example is being worked on.
Following is some simple demo with code snippet.
Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });
// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};
// STEP.3 - run model
const myResults = await mySession.run(feeds);
// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array
```
#### for inputs (GPU tensor):
Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```
### for outputs (pre-allocated GPU tensor)
you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};
// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```
### for outputs (specify location)
if you do not know the output shape, you can specify the output location
when creating the session:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: "gpu-buffer"
});
```
if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
executionProviders: ['webgpu'],
preferredOutputLocation: {
"output_image:0": "gpu-buffer"
}
});
```
now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.
#### read data
when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer
// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();
// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);
// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```
#### resource management
JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources
To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 18:24:42 +00:00
|
|
|
return resultMap;
|
2021-04-27 07:04:25 +00:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
startProfiling(): void {
|
|
|
|
|
// TODO: implement profiling
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
endProfiling(): void {
|
2021-09-08 00:18:08 +00:00
|
|
|
void endProfiling(this.sessionId);
|
2021-04-27 07:04:25 +00:00
|
|
|
}
|
|
|
|
|
}
|