Commit graph

9700 commits

Author SHA1 Message Date
Arthur Islamov
d0519a7603
[js/web] BiasSplitGelu and BiasAdd kernels (#17161)
### Description
Two contrib kernels that supposed to speed-up StableDiffusion according
to this doc
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/models/stable_diffusion/README.md

However, there is no noticable effect in speed or memory consumption. So
i guess the only way to make it faster is to implement
MultiHeadAttention but i'm not capable of doing that right now. So i'll
focus on existing PRs and finding the JSEP kernel that produces
incorrect results. It should be one of the old ones (i suspect Conv or
ConvTranspose), as SD was not generating images correctly on webgpu
since i started working on it. I hoped someone else would fix that by
the time i finish with kernels/optimizations 😅

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-10-03 12:20:20 -07:00
Kaz Nishimura
d11e053412
Add option to specify the EP to use, enabling DML EP and others (#17490)
### Description
Add DML EP to the acceptable provider list in the optimizer.

### Motivation and Context
With DML EP, graph optimization was not performed in onnxruntime.
2023-10-02 23:53:09 -07:00
Yulong Wang
451c02543a
[js/webgpu] allow specify preferredLayout (#17756)
### Description
Allow WebGPU backend to specify `preferredLayout`. Default is NHWC.

```js
const options = {executionProviders: [{name:'webgpu', preferredLayout: 'NCHW'}]};
sess1 = await ort.InferenceSession.create('./mobilenetv2-12.onnx', options);
```

### Motivation and Context
- implement @qjia7's requirement for an easier way to do performance
comparison between NCHW vs NHWC.
- It's possible that NCHW does better on some models and NHWC on others.
So offer user the capability to switch.
2023-10-02 21:25:12 -07:00
zesongw
f158f394d6
[WebNN EP] Support Softmax since version 13 (#17714)
### Description
<!-- Describe your changes. -->
WebNN only supports 2-D input tensor along axis 1. For now, we use
Reshape and Transpose wraparound to get the compatible input.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable more models to run on WebNN.
2023-10-02 13:01:04 -07:00
Scott McKay
ac4e726046
Add bytes model loading test to react native e2e (#17749)
### Description
<!-- Describe your changes. -->
Update E2E test to also check InferenceSession.create with bytes. 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Add tests to validate #17739
2023-10-02 12:25:28 +10:00
Ella Charlaix
63acaf47d2
Fix onnx quantizer activation and weight type attribute (#17651)
In
[`quantize_subgraph`](https://github.com/microsoft/onnxruntime/blob/v1.16.0/onnxruntime/python/tools/quantization/onnx_quantizer.py#L188-L189)
`self.weight_qType` and `self.activation_qType` are
[integers](https://github.com/microsoft/onnxruntime/blob/v1.16.0/onnxruntime/python/tools/quantization/onnx_quantizer.py#L115-L116)
while `ONNXQuantizer` expects `QuantType`
2023-09-30 18:06:34 -07:00
xhcao
0d60604638
[JS/WebGPU] support Range operator (#17233)
The patch also introduces the method which copies
data from GPU to CPU synchronously.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-30 02:05:32 -07:00
Arthur Islamov
a941dd583e
[js/web] FP16 Conv, ConvTranspose and MatMul (#17514)
### Description
Another three ops for fp16

---------

Co-authored-by: Guenther Schmuelling <guschmue@microsoft.com>
Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2023-09-30 00:00:23 -07:00
Changming Sun
9aad78721c
Update debug_alloc.cc: filter out one more memory leak from absl (#17746) 2023-09-29 20:40:09 -07:00
Pranav Sharma
668c70ee11
Add support for specifying a custom logging function per session. (#17727)
### Description
Add support for specifying a custom logging function per session.
Bindings for other languages will be added after this PR is merged.

### Motivation and Context
Users want a way to override the logging provided by the environment.
2023-09-29 19:46:55 -07:00
Caroline Zhu
6a5f469d44
Add training interfaces to js/common (#17333)
### Description
Following the design document:
* Added CreateTrainingSessionHandler to the Backend interface
* All existing Backend implementations throw an error for the new method
createTrainingSessionHandler
* Created TrainingSession namespace, interface, and
TrainingSessionFactory interface
* Created TrainingSessionImpl class implementation 

As methods are implemented, the TrainingSession interface will be added
to or modified.

### Motivation and Context
Adding the public-facing interfaces to the onnxruntime-common package is
one of the first steps to support ORT training for web bindings.

---------

Co-authored-by: Caroline Zhu <carolinezhu@microsoft.com>
2023-09-29 19:05:10 -07:00
Rachel Guo
e106b1eb8f
Fix react native load from Uint8Array buffer bug (#17739)
### Description
<!-- Describe your changes. -->

Use `.buffer` of Uint8Array to get ArrayBuffer.

TODO: Add E2E React Native test case to cover JS level testing to avoid
future breakage.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#17732

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-09-29 18:03:28 -07:00
shaahji
5a623dca01
Python API to check whether collective ops are available or not (#17730)
Python API to check whether collective ops are available or not

### Description
<!-- Describe your changes. -->

Adding an API to check whether collective ops are available or not.
Since there is no independent MPI enabled build, this flag can be used
on Python front for branching. Specifically, to conditionally enable
tests.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Flag to be used in Python to check whether onnxruntime supports
collective ops or not. Handy for conditionally enabling/disabling tests
and for other branching decisions.
2023-09-29 14:11:05 -07:00
Changming Sun
14d349e290
Enable backtrace in unit tests (#17655)
### Description
Google test can be built either with absl/re2 or not. This PR enables
the build option so that google test framework can print out a nice
stacktrace when something went wrong. It helps locate test errors in CI
build pipelines.

Also, Google test will remove the build option and make it always ON. So
sooner or later we must make this change.
2023-09-29 12:32:56 -07:00
Yulong Wang
561aca97cf
[js/webgpu] support IO binding (#17480)
<del>
**This PR is based on a few prerequisites PRs. They are listed as
below:**
- #17465
- #17469
- #17470
- #17472
- #17473
- #17484

Please review the current change by only looking at commit
e2e6623e673ec6de55a5c1f8edcbd3a46b535a89 and later.


</del>

### Description

This PR introduces WebGPU IO binding. This new feature allows
onnxruntime-web users to use tensors created from GPU as model
input/output so that a model inferencing can be done without unnecessary
data copy between CPU and GPU for model input/output.

### Examples

An E2E demo/example is being worked on.

Following is some simple demo with code snippet.

Let's first check today how we do:
```js
// STEP.1 - create an inference session:
const mySession = await ort.InferenceSession.create('./my_model.onnx', { executionProviders: ['webgpu'] });

// STEP.2 - create model input: (supposing myImageCpuData is a Float32Array)
const feeds = {
  'input_image:0': new ort.Tensor('float32', myImageCpuData, [1, 224, 224, 3])
};

// STEP.3 - run model
const myResults = await mySession.run(feeds);

// STEP.4 - get output data
const myData = myResults['output_image:0'].data; // Float32Array

```

#### for inputs (GPU tensor):

Now, with IO binding, you can create a tensor from a GPU buffer, and
feed it to the model:
```js
// new STEP.2.A - create model input from a GPU buffer: (supposing myInputGpuBuffer is a `GPUBuffer` object with input data)
const feeds = {
  'input_image:0': ort.Tensor.fromGpuBuffer(myInputGpuBuffer, { dataType: 'float32', dims: [1, 224, 224, 3] })
};
```

### for outputs (pre-allocated GPU tensor)

you can also do that for output, **if you know the output shape**:
```js
// new STEP.2.B - create model output from a GPU buffer: (supposing myOutputGpuBuffer is a pre-allocated `GPUBuffer` object)
const fetches = {
  'output_image:0': ort.Tensor.fromGpuBuffer(myOutputGpuBuffer, { dataType: 'float32', dims: [1, 512, 512, 3] })
};

// new STEP.3 - run model with pre-allocated output (fetches)
const myResults = await mySession.run(feeds, fetches);
```

### for outputs (specify location)

if you do not know the output shape, you can specify the output location
when creating the session:

```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
    executionProviders: ['webgpu'],
    preferredOutputLocation: "gpu-buffer"
});
```

if the model has multiple outputs, you can specify them seperately:
```js
// new STEP.1 - create an inference session with an option "preferredOutputLocation":
const mySession = await ort.InferenceSession.create('./my_model.onnx', {
    executionProviders: ['webgpu'],
    preferredOutputLocation: {
         "output_image:0": "gpu-buffer"
    }
});
```

now you don't need to prepare the `fetches` object and onnxruntime-web
will prepare output data on the location that specified.

#### read data

when you get the output tensor, you can:
```js
// get the gpu buffer object:
const gpuBuffer = myOutputTensor.gpuBuffer; // GPUBuffer

// get the CPU data asynchronizely
const cpuData = await myOutputTensor.getData();

// get the CPU data asynchronizely and release the underlying GPU resources
const cpuData = await myOutputTensor.getData(true);

// dispose the tensor (release the underlying GPU resources). This tensor object will be invalid after dispose() is called.
myOutputTensor.dispose();
```

#### resource management

JavaScript has GC so you don't need to worry about managing JavaScript
objects. But there are 2 types of resources that are not managed by GC:
- GPU buffer that used in tensors
- Underlying ORT native resources

To simplify, most of the unmanaged resources and handled inside ORT web.
But there are a few resources that need users to manage:
- All external GPU resources, including GPU buffers inside all tensors
created by `Tensor.fromGpuBuffer()`, will not be managed by ORT. User
should manage those GPU buffers themselves.
- When a session is created with `preferredOutputLocation` ==
"gpu-buffer" specified in session options, and the corresponding output
is not pre-allocated, user need to call the output tensor's `dispose()`
or `getData(true)` to manually release the underlying GPU buffers.
- ORT internal errors (including providing a pre-allocated output tensor
with wrong type/dims) will invalidate the whole wasm memory and is not
recoverable. An exception is thrown in this situation.
2023-09-29 11:24:42 -07:00
satyajandhyala
b4fbc25b1f
[JS/Web] Add ConvTranspose implementation using MatMul (#17573)
### Description
Add ConvTranspose implementation using MatMul to increase perf.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-29 11:00:44 -07:00
Changming Sun
caf98128c1
Update linux-wasm-ci.yml: remove the ln command (#17735)
### Description
/usr/local/bin can only be modified by root.  This command seems unnecessary
2023-09-28 21:43:29 -07:00
Scott McKay
9cb60c5b86
Resize and EP specific transpose optimization updates (#17664)
### Description
<!-- Describe your changes. -->
- Treat Resize as layout sensitive by default
- whilst the ONNX spec does not specify a layout, EPs tend to implement
only one
- add second usage in L2 of TransposeOptimizer to plugin the ability to
push a Transpose through a Resize assigned to the CPU EP
- Allow EP specific logic for changes the ops considered to be layout
sensitive to be plugged in
  - expected usage is for #17200 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Finish simplifying/clarifying transpose optimization and layout
transformation that was proposed in #15552. This PR along with #17618
should complete the changes.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-09-29 08:11:36 +10:00
Tianlei Wu
20f96fd096
Fix Attention Runtime Error for CLIP model (#17729)
### Description
The condition check is not correct
```
if (is_unidirectional_ && enable_fused_causal_attention_) {  // GPT
}
else { // BERT
}
```

Change it to 
```
if (is_unidirectional_) {  // GPT
}
else { // BERT
}
```

Another walkaround is to enable fused causal attention by adding an
environment variable `ORT_ENABLE_FUSED_CAUSAL_ATTENTION=1` before
running stable diffusion.

### Motivation and Context

Without the fix, optimized CLIP model of stable diffusion will encounter
error in running Attention node:

2023-09-24 16:15:31.206037898 [E:onnxruntime:,
sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned
while running Attention node. Name:'Attention_0' Status Message:
/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/mha_runner.cu:207
bool
onnxruntime::contrib::cuda::FusedMHARunnerFP16v2::mhaImpl::is_flash_attention(int)
const interface->mHasCausalMask == false was false.

Note that the bug has been there for a long time. It is just surfaced
since we recently added a fusion for CLIP, which will trigger the error.

We will add a comprehensive test for causal attention later to avoid
such corner cases.
2023-09-28 14:32:08 -07:00
Jian Chen
fc9a69dcae
Update VecAddMoveOnlyFunctor and VecAddWithIsSupportedMethod with Default constructor (#17705)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-28 09:30:42 -07:00
Yi Zhang
9136748462
Fix: Fail to skip disabledmodel in winml (#17728)
### Description
Move appending source name behind the ModifyNameIfDisabledTest

### Motivation and Context
In winml,  disabled test name doesn't include the model source name.
WinML job will be broken in the new image.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1151451&view=logs&s=4eef7ad1-5202-529d-b414-e2b14d056c05

### Verified

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1151691&view=logs&s=4eef7ad1-5202-529d-b414-e2b14d056c05
2023-09-28 13:46:44 +08:00
Jambay Kinley
1f4a3529dd
Bugfix: Add initializer to model in AttentionMask directly (#17719) 2023-09-27 14:53:37 -07:00
MistEO
8621bdcd44
Remove unnecessary #incldue <stacktrace> (#17716) 2023-09-27 14:23:26 -07:00
MistEO
870b0bc305
Fix typo of cmake (#17715)
This caused a cmake configuration error.
2023-09-27 11:48:46 -07:00
Adam Pocock
522cc968e8
[java] Filling out the javadoc for the float8 types (#17694) 2023-09-27 10:52:11 -07:00
Mustafa Ateş Uzun
13b0f8a6ce
fix: supported typo (#17216) 2023-09-27 10:45:27 -07:00
Changming Sun
276e8733bd
Update onnx python package and setuptools (#17709)
### Description
A follow-up for #17125
2023-09-27 07:54:48 -07:00
Vincent Wang
e6aa0fa174
Add Gelu Related Ops to Triton Codegen (#17713)
Add Gelu/QuickGelu/GeluGrad/QuickGeluGrad support to Triton Codegen so
that it can be fused with some other connected supported Ops. For
example, in llama2, it can be fused with Mul so we will have extra 1-2%
perf gain.
2023-09-27 19:57:39 +08:00
Scott McKay
a99c965d05
Make transpose optimizer able to look past DQ node for const initializer (#17618)
### Description
<!-- Describe your changes. -->
Add ability for transpose optimizer to look past a DQ node if it has a
constant initializer as input. This allows UnsqueezeInput/TransposeInput
to modify the initializer in-place in the same way it would for a
non-QDQ format model.

Shared initializers are also handled, and any additional
Squeeze/Transpose added to the other usages of the initializer should
cancel out when we push the same Transpose though them.

The in-place modification means we don't need to run QDQ fixup and
constant folding after layout transformation. This means we do not need
to enable those optimizers in a minimal build to get an optimal model
post-layout transformation.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Ensure layout transformation produces optimal model in full and minimal
builds.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-09-27 21:17:16 +10:00
Scott McKay
33295ed883
Handle string initializers in constant folding (#17422)
### Description
<!-- Describe your changes. -->
* Allow either an allocator or a MemBuffer to be used when creating an
OrtValue from an TensorProto
* `Tensor<std::string>` requires an allocator to allocate/free the
string values
* Forcing the buffer to be allocated outside of the Tensor doesn't seem
to provide any benefit in this usage as the Tensor class disables copy
and assignment (so we wouldn't create 2 copies of the buffer via the
Tensor class that externally managing the would buffer avoid)
* New approach means we don't need to manage the buffers in the
optimizer Info class as the Tensor dtor will do that
* Update naming - MLValue was replaced by OrtValue a long time ago

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#17392
2023-09-27 21:15:58 +10:00
trajep
bcc6205161
🐛 Bugfix win del file err (#17697)
### Description
<!-- Describe your changes. -->

Fix for this issue which raise the error of FileNotAccessd in windows
when the context of TemporaryDirectory finished.
https://github.com/microsoft/onnxruntime/issues/17627

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/17627
2023-09-26 15:32:04 -07:00
liqun Fu
2be4dc6d04
ONNX 1.15 integration (#17125)
### Description
this is for ORT 1.17.0 - make ORT to use ONNX release 1.15.0 branch. Eventually will update to the release tag once ONNX 1.15.0 is released


### Motivation and Context
Prepare for ORT 1.17.0 release. People can start work on new and updated ONNX ops in ORT.
---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-09-26 14:44:48 -07:00
RandySheriffH
37dcefb5b7
Patch lite custom op API (#17605)
A few enhancements:
- Support compute returning status;
- Support variadic;

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-09-26 14:02:18 -07:00
Nicolò Lucchesi
4ab0e17fe8
[Technical docs] Fixed a couple of old links in FAQ.md (#17415)
### Description
Updated a couple of old links in the technical documentation that where
pointing to files present prior to the migration to
https://onnxruntime.ai/docs.
2023-09-26 13:38:24 -07:00
zesongw
93f22aa189
[WebNN EP] Fix bug in Softmax (#17665)
### Description
<!-- Describe your changes. -->
For now, WebNN Softmax only support 2D (or implicitly coerce to 2D)
inputs and the last axis.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fallback some cases to pass the CI.
2023-09-26 12:51:29 -07:00
Brian Lambert
614af3742c
Add prepacked weights container to subgraphs (#17671)
### Description
Adds prepacked weights container to model subgraphs.



### Motivation and Context
Allows for initializer sharing when the initializers are located in
subgraphs. I encountered this bug when attempting to share weights
between T5 BeamSearch models where the shareable initializers are
located in the encoder and decoder subgraphs and it failed to reduce
memory usage.
2023-09-26 12:01:41 -07:00
Jian Chen
0141e27ca1
Enabling c++ 20 in MacOS build (#16187)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-26 11:27:02 -07:00
Vadym Stupakov
b8e348145c
fixed #16873 (#16932) 2023-09-26 09:57:01 -07:00
Kaz Nishimura
f43acf2d33
Close the JSON object in settings.json (#17583)
### Description

This patch adds a closing curly bracket at the end of `settings.json`.

### Motivation and Context

`settings.json` is just not closed. It was accidentally removed at
4e6ea730d6
2023-09-26 09:51:13 -07:00
RandySheriffH
1c245e6775
Stop throwing exception on python binding when multiple EP available (#17659)
Stop throwing the exception when the provider list is empty but there
are multiple available EPs.
Other language bindings throw no exception at all, this change will
align them up.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-09-26 09:46:30 -07:00
Chi Lo
7572e6055c
[TensorRT EP] Back out the PerThreadContext (#17690)
Current TRT EP's PerthreadContext allows more than one IExecutionContext
instance to be created by one engine instance.
But, it's possible to hit an error that caused by TRT API
context.setBindingDimensions() in our TRT EP code
[here](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fmicrosoft%2Fonnxruntime%2Fblob%2Fmain%2Fonnxruntime%2Fcore%2Fproviders%2Ftensorrt%2Ftensorrt_execution_provider.cc%23L2775&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=5EZoAoXgWFSuz%2BIRMH%2FXZaO%2BfKNP%2FZDZYEZg3W%2Ff30w%3D&reserved=0)
under the case of the input shape changes ( meaning engine being
rebuilt) with multithreading.
From the
[doc](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.nvidia.com%2Fdeeplearning%2Ftensorrt%2Fapi%2Fc_api%2Fclassnvinfer1_1_1_i_execution_context.html%23ada050e88320bcc40987b0acadc2ef962&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=%2BmVZU5iLD97B3YBPdHZP7jOQ2dGoleI3R0mSMVgopG4%3D&reserved=0)
and the
[discussion](https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2FNVIDIA%2FTensorRT%2Fissues%2F846&data=05%7C01%7CChi.Lo%40microsoft.com%7Cd8b23c3a4c0b4dcce9b408dbbd9309de%7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C638312211465211140%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=c8v%2FK2UkQ%2FNbf8w1sHNDGsB2kxw4sSmkyQ2QuCs8Fs8%3D&reserved=0),
it seems we should have different OptimizationProfile for different
IExecutionContext which our current TRT EP doesn’t support regardless of
using PerThreadContext implementation.
Back out the PerThreadContext until we completely solve this issue.
2023-09-26 09:28:17 -07:00
Adam Pocock
aed43f429a
[java] Enable output pinning in OrtSession and OrtTrainingSession (#16835) 2023-09-26 01:49:13 -07:00
Baiju Meswani
ccb73fd827
[On-Device Training] Expose Parameters through the Training API (#17364) 2023-09-25 20:03:24 -07:00
aimilefth
95e8dfaea5
Update quant_utils.py/write_calibration_table (#17314) 2023-09-25 15:56:03 -07:00
Changming Sun
a942bbf489
Update nodejs to 18.x (#17657)
1. Upgrade nodejs from 16.x to 18.x for Windows pipelines
2. Avoid using Azure DevOps "NodeTool" on Linux. The tool installs
nodejs from internet or local disk cache. But we already moved all Linux
tests to docker. So we do not need the installer anymore.
3. Remove some other unused code.
2023-09-25 14:12:11 -07:00
Yulong Wang
b2b1408608
[js/web] update browser launch cmd flags (#17658)
### Description
update Chromium browser launch command line flags

Canary already using dxc so no need to specify
'--enable-dawn-features=use_dxc' for canary.
2023-09-25 12:24:46 -07:00
Yulong Wang
f50fa46fe0
[JSEP] allow DataTransfer to deal with zero sized input (#17661)
### Description
allow DataTransfer to deal with zero sized input.

This is a standalone fix for zero-sized tensor handling for JSEP
DataTransfer. There are other components in JSEP not supporting
zero-sized tensors need to be fixed.
2023-09-25 12:21:20 -07:00
Yulong Wang
fcfc2391b8
[JSEP] allow JsCustomAllocator to deal with zero sized input (#17660)
### Description
allow JsCustomAllocator to deal with zero sized input.

This is a standalone fix for zero-sized tensor handling for
JsCustomAllocator. There are other components in JSEP not supporting
zero-sized tensors need to be fixed.
2023-09-25 12:20:56 -07:00
Xavier Dupré
905faea3b2
Fix static quantization for QDQ and Percentile distribution (#17649)
### Description
One quantization case was not covered by the current list of unit tests.
This PR adds a unit test to cover that case with the fix. It fixes the
issue #17619.



### Motivation and Context
2023-09-25 10:11:58 -07:00
Yulong Wang
df15a3a335
[js/web] configure 5GB memory space for webpack build (#17684)
### Description
ort-web build step - webpack consumes the amount of memory on the edge
of Node.js(V8)'s default max-old-space-size, so increase the default
memory size to 5GB to avoid this issue.
2023-09-25 09:22:00 -07:00