Commit graph

8995 commits

Author SHA1 Message Date
cao lei
dd72192cf4
ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833)
### Description
This PR is to refactor ExecutionProvider API for memory management,
which is to move allocators from EP level to SessionState level and
indexed by OrtDevice



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR is to refactor ExecutionProvider API for memory management,
which is to move allocators from EP level to SessionState level and
indexed by OrtDevice. By this change, EP level will shift the burden of
maintaining allocators, which will be user friendly for EP developers

---------

Co-authored-by: Lei Cao <leca@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:44:45 -07:00
jingyanwangms
5dcaf70501
Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375)
### Description
torch.optim Adam zero_grad() signature is
zero_grad(set_to_none=True)

https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad

We set this flag in initialization, similar to deepspeed:
https://deepspeed.readthedocs.io/en/latest/optimizers.html#deepspeed.ops.adam.FusedAdam

Adding this flag to have signature parity with pytorch Adam

### Motivation and Context
Easier model integration

Co-authored-by: Jingyan Wang <jingywa@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:27:41 -07:00
PeixuanZuo
470d6c1cce
[ROCm] Delete unused file to fix Component Governance Alert (#16407)
Delete unused file to fix Component Governance Alert
2023-06-19 11:28:32 -07:00
guyang3532
341484e67c
Embedding sparsity optimization (#16141)
### Description
Optimize compute graph by eliminating padding in embedding.


### Motivation and Context
The computation for padding in nodes after embedding is unnecessary and
waste computation resources.
This pr just add an Optimizer of PaddingElimination to check and
eliminate the padding after embedding automatically by modifying the
graph.

### Implementation:
1. Find and check embedding node in graph.
2. Iterate the subgraph afterward the embedding node and record all the
input nodes and output nodes to this subgraph.
3. Insert 'Reshape + ShrunkenGather' to flatten each input node shape
from [batch_size, seqlen, ...] to [valid_token_without_padding, ...],
and insert 'GatherGrad + Reshape' to unflatten each output node shape
from [valid_token_without_padding, ...] to [batch_size, seqlen, ...]

---------

Co-authored-by: mindest <linminuser@gmail.com>
2023-06-19 20:34:53 +08:00
PeixuanZuo
1418d8728c
[ROCm] Fix CI Pipeline (#16409)
1. add `set -ex` before commands.
2. update ccache.
2023-06-19 15:22:13 +08:00
Yi Zhang
8b9eab093b
keep symlinks in maven package (#16376)
### Description
1. Keep symlink in the package.
2. keep the artifact package format

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-19 09:41:39 +08:00
Dipanjan Sengupta
35fa6af428
Fix for the build break in AMX feature on Mac OS. (#16390)
### Description
Fixing the build break issue in Apple pipeline due to AMX flag removal.
2023-06-16 21:00:41 -07:00
Scott McKay
8fdfd20191
Separate out operator vs model testing. (#16228)
### Description
<!-- Describe your changes. -->
Split up OpTester to separate out operator vs model testing. This led to
a lot of other cleanups/refactoring.

- create BaseTester class and derived OpTester/ModelTester classes to
limit APIs to what is applicable for each test type
  - e.g. adding an attribute isn't relevant to a model test
- cleanup structure
- don't expose member variables either directly or via public methods
returning them
  - split out checkers so they can be easily re-used
- refactor so there's one public Check method for comparing two OrtValue
instances containing any data type
  - refactor the GradientOpTester usage
- it required a lot of OpTester internals to be exposed and no other
tests needed this
- it also returned Status through various parts which prevented the
usage of the google test macros which provide better output. change to
return void and use the macros.
- fix some other minor issues
  - update some cmake files so all the source files are included
  - remove some low value helpers (FetchTensor and GetShapeVector)
- remove some outdated code to allow unreleased opset versions from when
onnx opset 15 wasn't released
  - move files from test/util/include/test to test/util/include
- doesn't seem to be any reason for the additional subdirectory given
they're not files use to test the code in test/util
    - files were moved with no changes
    
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Cleanup test infrastructure.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-06-17 12:58:57 +10:00
saurabh
a6ce7b339f
Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147)
### Description
This PR enables execution of subgraphs in OVEP and currently, when OVEP
developers install the onnxruntime-openvino package on windows from
pypi, they would have to additionally download OpenVINO windows binaries
and run the setupvars.bat script which sets the environment PATH to
locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer
sample.



### Motivation and Context
Fix: We want to make the user experience easy for OVEP Python developers
on windows platform.
This fix, introduces a function add_openvino_libs_to_path at the
location tools/python/util/add_openvino_win_libs.py.
The above function, can be called by OVEP python users in the
application code and that takes care of setting
the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino)
which was installed.
This change also makes sure that add_openvino_libs_to_path() function is
added to onnxruntime python package
only when it is build for OpenVINO Execution Provider for ONNXRuntime
and not for default ORT python package builds.

New user experience for Python OVEP developers on windows platform:
step 1: pip install onnxruntime-openvino
step 2: pip install openvino
step 3: <Add these 2 lines in the application code>
import onnxruntime.tools.add_openvino_win_libs as utils
utils.add_openvino_libs_to_path()

---------

Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2023-06-16 19:47:09 -07:00
kunal-vaishnavi
3f7f90aed0
Stabilize Whisper export with beam search (#16297)
### Description
This PR stabilizes the Whisper export with beam search by adding the
following:
- Remove unused ONNX models and extra folders generated during the
export process
- Specify the Whisper with beam search model's IR version for E2E
integration
- Parity check for Whisper with beam search model between PyTorch and
ORT
- Remove previously exported Whisper with beam search model before
saving newly exported model


### Motivation and Context
- Removing the unused ONNX models and extra folders frees up disk space
after exporting and makes it easier to copy and move the output folder
to other environments.
- Specifying the IR version fixes an issue with generating the ONNX E2E
model
- Adding a parity check helps detect runtime issues during the export
process
- Removing the previously exported Whisper with beam search model
prevents the data file size from doubling when the newly exported model
is saved with the same filename
2023-06-16 18:56:52 -07:00
dependabot[bot]
dd660c054e
Bump transformers from 4.24.0 to 4.30.0 in /tools/ci_build (#16331) 2023-06-16 13:08:46 -07:00
zesongw
d813d991b1
[WebNN EP] Support Squeeze Op (#16361)
### Description
<!-- Describe your changes. -->
Adds support for the Squeeze Op to WebNN EP.
It shares the similar parameters as Unsqueeze, so they are merged.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable more models to run on WebNN EP.

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2023-06-16 11:18:58 -07:00
Chi Lo
fbf08c4b4d
Fix minor TRT EP provider option issue (#16107)
Several TRT EP provider options are not included when calling
OrtApis::GetTensorRTProviderOptionsAsString().
This issue doesn't affect TRT EP, but when user calling above api to get
all the provider options will find some provider options not included in
the string.
2023-06-16 10:07:40 -07:00
Silvio Traversaro
4915191e63
Fix build of Python wheel on Windows with single-config generator (#16337)
### Description

Before this PR, the CMake code assumed that when on Windows a
multiple-config CMake generator was used, while on non-Windows there was
the assumption of a single-config CMake generator. After this PR this
information is obtained from the
[`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html)
global CMake propery.



### Motivation and Context

I discovered this problem when building with Ninja generator on Windows,
but I guess this should fix problems also on non-Windows platforms when
using a multiple-config generator (such as Xcode on macOS or "Ninja
Multi-Config" on all platforms).

See
https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html
for more info.
2023-06-16 09:17:49 -07:00
Jhen-Jie Hong
685816bb0a
[js/rn] Add executionProviders support (#16233)
### Description
<!-- Describe your changes. -->

This PR adds support for `executionProviders` option for react-native
package, support:

- Android: cpu / xnnpack / nnapi
- iOS: cpu / xnnpack /  coreml

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

In my case I want to enable Core ML / NNAPI EP for react-native project.
2023-06-16 19:38:41 +10:00
Jhen-Jie Hong
ea1a5cf920
[js/rn] Implement blob exchange by JSI instead of use base64 (#16094)
### Description
<!-- Describe your changes. -->

- Create `OnnxruntimeJSIHelper` native module to provide two JSI
functions
- `jsiOnnxruntimeStoreArrayBuffer`: Store buffer in Blob Manager &
return blob object (iOS: RCTBlobManager, Android: BlobModule)
  - `jsiOnnxruntimeResolveArrayBuffer`: Use blob object to get buffer
- The part of implementation is reference to
[react-native-blob-jsi-helper](https://github.com/mrousavy/react-native-blob-jsi-helper)
- Replace base64 encode/decode
  - `loadModelFromBlob`: Rename from `loadModelFromBase64EncodedBuffer`
  - `run`: Use blob object to replace input.data & results[].data

For [this
context](https://github.com/microsoft/onnxruntime/issues/16031#issuecomment-1556527812),
it saved a lot of time and avoid JS thread blocking in decode return
type, it is 3700ms -> 5~20ms for the case. (resolve function only takes
0.x ms)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It’s related to #16031, but not a full implementation for migrate to
JSI.

It just uses JSI through BlobManager to replace the slow part (base64
encode / decode).

Rewriting it entirely in JSI could be complicated, like type convertion
and threading. This PR might be considered a minor change.

/cc @skottmckay
2023-06-16 19:37:02 +10:00
cloudhan
9110e5b9bd
[ROCm] Add attention kv cache for decoding (#16076) 2023-06-16 14:17:56 +08:00
Tianlei Wu
96471491d7
Fix test failure in debug CUDA build (#16370)
Fix assertion failure in onnxruntime_test_all in debug build with CUDA,
which is caused by a test case added in
https://github.com/microsoft/onnxruntime/pull/16075.

Remove an assumption that bias exists in MultiHeadAttention.
2023-06-15 23:16:16 -07:00
Tianlei Wu
1866a9d818
Use the lowest float for causal mask (#16369)
Always set causal mask to the lowest float. 

Note that since huggingface transformers v4.21, gpt2 uses lowest half
for FP16, and lowest float for FP32:

66fd3a8d62/src/transformers/models/gpt2/modeling_gpt2.py (L199)
Assume that most fp16 ONNX models are converted from fp32 models. We
decided to use lowest float32 for both half and float model for
consistency.

The mask_filter_value only applies to raw attention mask (2D, 3D or 4D).
For 1D mask, masked item is 0.0 after softmax so mask filter value is
the lowest float for 1D mask.
* For BERT model, when users use 1D mask (required by FMHA) and
mask_filter_value is not applicable.
* For BERT or GPT-2, when fused kernel is used, mask_filter_value has no impact

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/12843
https://github.com/microsoft/onnxruntime/issues/14363
2023-06-15 21:32:29 -07:00
PeixuanZuo
bcdb81c563
[Whisper] add a fusion option to split input bias from MHA/DMHA (#16049)
Whsiper model contains five different types of attention, q, k, v bias
was fused into Attention/MHA/DMHA op,

encoderdecoderinit subgraph
- Attention: encoder attention
- Attention: decoder self attention + present k, v
- MultiHeadAttention: decoder cross attention + present k and v. q and v
have bias.

decoder subgraph
- DecoderMultiHeadAttention: decoder cross attention + past k, v. q has
bias
- DecoderMultiHeadAttention: decoder self attention + past/present k, v.
q, k, v have bias.

For ROCm EP, MHA/DMHA doesn't support additional bias. This PR add a
fusion option `disable_multi_head_attention_bias` to split q.k,v bias
from MHA/DMHA.
2023-06-16 10:29:48 +08:00
Jeff Bloomfield
6949cfaf94
Fix MS domain QuantizeLinear and DequantizeLinear type registrations … (#16298)
This fixes the type lists used to register DML kernels for Microsoft
domain QuantizeLinear and DequantizeLinear. These previously did not
include FP16 and incorrectly used the same type list for both operators.

The new type lists are the same as opset 19 ONNX which aren't
implemented yet in the DML EP.
2023-06-15 18:21:56 -07:00
Changming Sun
188d5f5398
Fix Linux Multi GPU build pipeline (#16368)
### Description
The build pipeline runs on Azure NV12 machines that will be deprecated
soon because the SKU is too old. So this PR will move the pipeline to a
Windows machine with two A10 GPUs.
2023-06-15 16:24:46 -07:00
Changming Sun
5754cd7d1d
Add fp16 support to CPU EP gemm op (#15506) 2023-06-15 14:38:17 -07:00
Skand Hurkat
67093b204d
Clean up aarch64 quantized GEMM dispatch (#16120)
### Description

 - Add a new field to `MLAS_PLATFORM` for S8S8 GEMM dispatch.

- Set this field to either dot product instructions or NEON MLA in
platform.cpp.

 - Clean up dispatch selector in qgemm.h.

### Motivation and Context

This will allow future extensibility as other functions that use other
ARM64
extensions for quantized matrix multiplication.

---------

Co-authored-by: Skand Hurkat <skhurkat@microsoft.com>
2023-06-15 14:24:40 -07:00
Guenther Schmuelling
5c0d5768e7
make package.json more rebost (#16366)
"default" should be last element for exports.
This fixes "Module not found: Error: Default condition should be last
one" when importing the onnxruntime-web package in some conditions.
2023-06-15 14:17:37 -07:00
Hariharan Seshadri
63f5573354
Relax node placement check for CUDA Graph usage (#16358) 2023-06-15 14:03:08 -07:00
Dipanjan Sengupta
681a0d084d
Removing AMX build flag (#16086)
### Description
1. Replacing AMX intrinsics with machine code macro instructions in
QGEMM kernel.
2. Removing AMX build flags for GCC in cmake file.



### Motivation and Context
The additional AMX flag in cmake adds an extra layer of dependency on
GCC version to use the feature.These changes should allow the usage of
the AMX feature with just the CPU ID check.
2023-06-15 11:22:59 -07:00
Rachel Guo
65434dce57
Bump decode-uri-component from 0.2.0 to 0.2.2 in /js/react_native/e2e (#16329)
### Description
<!-- Describe your changes. -->

As title.

Similar as this pr: https://github.com/microsoft/onnxruntime/pull/13846


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

To resolve component governance alert.

https://aiinfra.visualstudio.com/Lotus/_componentGovernance/97926/alert/8087084?typeId=16589570

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-06-15 10:30:48 -07:00
Yulong Wang
4f7900b553
[js/web] enable ONNX Runtime Web error messages in JS (#16335)
### Description

enabling passing error messages from C++ to JavaScript so that when ORT
Web API fails it generates more verbose errors.
2023-06-15 09:45:41 -07:00
Yi Zhang
3e99e43a1d
extend Final AAR testing timeout limit (#16340)
### Description
<!-- Describe your changes. -->



### Motivation and Context
improve nuget pipeline stability
2023-06-15 17:27:45 +08:00
pengwa
735a32fee1
Introduce memory observer for ORTModule (#16213)
### Introduce memory observer for ORTModule

To analyze memory usage for ORTModule training, we need collect
per-iteration memory footprint in different stages (pre-forward,
post-forward, pre-backward, and post-backward).

Currently we only collect the data using torch.cuda APIs. The next step
is, we could collect the detailed stashed activation list and its
percentage within ORT backend, which is beyond this PR.

Sample as below: 
```
0/8] step 0 memory (MiB) | phase: pre_forward | allocated: 1866 | max allocated: 1866 | cached: 1874 | max cached: 1874 | inactive: 8 | max inactive: 8
[0/8] step 0 memory (MiB) | phase: post_forward | allocated: 23277 | max allocated: 26215 | cached: 26406 | max cached: 26406 | inactive: 193 | max inactive: 405
[0/8] step 0 memory (MiB) | phase: pre_backward | allocated: 23277 | max allocated: 26215 | cached: 26406 | max cached: 26406 | inactive: 193 | max inactive: 405
[0/8] step 0 memory (MiB) | phase: post_backward | allocated: 2932 | max allocated: 26215 | cached: 26406 | max cached: 26406 | inactive: 6158 | max inactive: 6158
  0%|█                                                                                                                                                                                                            | 1/200 [00:26<1:26:18, 26.02s/it]
[0/8] step 1 memory (MiB) | phase: pre_forward | allocated: 2356 | max allocated: 26215 | cached: 26406 | max cached: 26406 | inactive: 2454 | max inactive: 6165
[0/8] step 1 memory (MiB) | phase: post_forward | allocated: 23767 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 2639 | max inactive: 6165
[0/8] step 1 memory (MiB) | phase: pre_backward | allocated: 23767 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 2639 | max inactive: 6165
[0/8] step 1 memory (MiB) | phase: post_backward | allocated: 3422 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 5284 | max inactive: 6165
  1%|██                                                                                                                                                                                                             | 2/200 [00:26<36:47, 11.15s/it]
[0/8] step 2 memory (MiB) | phase: pre_forward | allocated: 2356 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 2454 | max inactive: 6165
[0/8] step 2 memory (MiB) | phase: post_forward | allocated: 23767 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 2639 | max inactive: 6165
[0/8] step 2 memory (MiB) | phase: pre_backward | allocated: 23767 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 2639 | max inactive: 6165
[0/8] step 2 memory (MiB) | phase: post_backward | allocated: 3422 | max allocated: 26705 | cached: 29342 | max cached: 29342 | inactive: 5284 | max inactive: 6165
```
2023-06-15 15:45:36 +08:00
pengwa
574e17ade4
Fix Reshape check (#16349)
### Fix Reshape check

3D->2D reshape by merging the first dims. 

There is a bug for the case. 

```mermaid
stateDiagram
    [768,12,64] --> Reshape
    (—1,768) --> Reshape
   Reshape --> [768,768]
```
   


The Reshape pass the upstream Reshape check, but it should not. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-15 13:50:53 +08:00
PeixuanZuo
097346be9d
[ROCm] Add clean step for ROCm CI pipeline (#16336)
1. Add clean step for ROCm CI pipeline
2. Fix error "device or resource busy" bug by setting umount dataset
step as `always()` step.
2023-06-15 13:44:12 +08:00
Baiju Meswani
5eec24837f
Fix for AMD GPU pipeline (#16357) 2023-06-14 20:36:16 -07:00
Wanming Lin
73dad4452b
[WebNN EP] Support Shape op (#16282)
Since WebNN API doesn't support shape op, in the WebNN EP, we calculate
the ONNX Shape node output and pass the values to a WebNN's constant +
slice as workaround.
2023-06-14 20:31:01 -07:00
Changming Sun
dbc7a195b1
Update win-ci-pipeline.yml: enable xnnpack tests (#16244)
1. Enable xnnpack test
2. Change TSA database name from onnxruntime_master to onnxruntime_main.
This is a leftover of renaming the "master" branch to "main"
3. Add two static analysis jobs for WinML and DML
4. Rename the machine pool "aiinfra-dml-winbuild" to
"onnxruntime-Win2019-GPU-dml-A10", so that the internal and public ADO
instances use the same machine pool name.
5. Move Windows GPU CI build pipeline from "onnxruntime-Win2022-GPU-T4"
to "onnxruntime-Win2022-GPU-A10" machine pool, because we do not have
enough T4 GPUs.
2023-06-14 19:12:42 -07:00
Tianlei Wu
9be133231f
Fix cuda graph capture (#15005)
Fix two issues related to cuda graph capture:
https://github.com/microsoft/onnxruntime/issues/14942 and
https://github.com/microsoft/onnxruntime/issues/15002

Issue 1: Previously, graph capture starts at the second run. However,
memory pattern optimization will allocate memory from the second run,
and cudamalloc is not allowed during graph capture. In this PR, the
graph capture will start graph capture after 2 runs to avoid the issue.

Issue 2: https://github.com/microsoft/onnxruntime/pull/13495 introduced
multiple stream support. But stream cleanup will call
cudaStreamSyncronize which is not allowed in cuda graph capture. In this
PR, we move stream cleanup after cuda graph capture.

Update the squeeze net test model with dynamic axis so that we can test
with larger batch size. Add a test that could reproduce the bug (when
changing min runs from 2 back to 1).
2023-06-14 18:10:20 -07:00
Baiju Meswani
8a3de16d14
Temporary fix to make the training pipeline green (#16353) 2023-06-14 13:11:35 -07:00
Baiju Meswani
ed2482667b
Fix training pipeline (#16342) 2023-06-13 15:06:38 -07:00
zesongw
c5176ed122
[WebNN EP] Add several new unary Ops (Ceil, Exp, Identity, Reciprocal, Tan) (#16302)
### Description
 - Add new Ops: Ceil, Exp, Identity, Reciprocal, Tan.
 - Set MinSupportedOpSet for unary Ops.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Support more Ops for other models.
The legacy optimization attribute "consumed_inputs" is not supported in
WebNN EP.
2023-06-13 08:14:55 -07:00
Edward Chen
4f23577cb5
[React Native] Publish E2E test logs on build failure too. (#16327)
### Description
<!-- Describe your changes. -->

Publish E2E test logs on build failure too.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Get more information about intermittent test failures.
2023-06-12 17:56:46 -07:00
Yulong Wang
e3e4926d00
[js/common] allow import onnxruntime-common as ESM and CJS (#15772)
### Description
allow import onnxruntime-common as ESM and CJS.
2023-06-12 12:05:11 -07:00
Sheil Kumar
0df9e42960
User/sheilk/register div nonzero (#16309)
[DML EP] NonZero supported datatypes has incorrect number of template
datatypes

2 should be 1
2023-06-12 10:11:59 -07:00
satyajandhyala
889f80082f
[js/web] Added Reduce operators support (#16122)
### Description
Added support for ReduceL1, ReduceL2, ReduceMean, ReduceMin, ReduceMax,
ReduceSum, ReduceLogSum, ReduceLogSumExp, ReduceProd and
ReduceSquareSum.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Satya Jandhyala <sajandhy@microsoft.com>
Co-authored-by: guschmue <guschmue@microsoft.com>
2023-06-12 07:46:27 -07:00
pengwa
40bcc0441b
Enhance StatisticsSubscriber (#16098)
### Enhance StatisticsSubscriber

There are few improvements for `StatisticsSubscriber`:

- Reduce peak memory impact for tensors (having many many many elements,
consuming too much GPU memory, causing original recipe run failed with
OOM), by split the statistics into two phases (split into buckets, and
merge result across buckets).
- Allow dump intermediate tensors. Originally only nn.Module forward()'s
return value are dumped, there are requirements we want to inspect some
specific intermediate tensor in the forward() function, now we support
it.
- Add documents for collecting dumps on multiple ranks

Docs link on this branch for better view:
https://github.com/microsoft/onnxruntime/blob/pengwa/conv_tool_v2/docs/ORTModule_Convergence_Notes.md

---------

Co-authored-by: mindest <30493312+mindest@users.noreply.github.com>
2023-06-12 18:32:08 +08:00
JiCheng
eed02a3f78
Xnnpack QDQ test (#16281)
### Description
A few QDQ tests failed on XNNPACK EP.

The reason should be the range of input_data doesn't fit for scale and
zero_point.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-12 14:00:42 +08:00
zhangsibo1129
97751ad516
[CANN] Fix registration of Identity operator (#16210)
### Description
<!-- Describe your changes. -->
This
[PR](e726151b5c (diff-6957596681c25d78e7f3f56485f307fb7e66369309523240209a62c8fa21646b))
introduces a missing registration of Identity operator for version
greater than 14.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
It broke the CANN CI. I added the registration of identity operator.
2023-06-10 17:23:21 -07:00
JiCheng
5ab51694ab
gather OP with scalar indice in NNAPI EP (#16279)
### Description

NNAPI Doesn't support the indices input of Gather to be a scalar.
To workaround it.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-10 09:32:07 +08:00
Yulong Wang
59f42cccb8
[js/common] refactor tensor type in onnxruntime-common (#15843)
### Description
<!-- Describe your changes. -->

refactor tensor type in onnxruntime-common.

### Motivation and Context
There major motivation is that I am doing a local change to address the
API part of #15312. And I am doing a refactoring of onnxruntime-common
anyway (#15772).

The `tensor.ts` and `tensor-impl.ts` are too large, so I split contents
into multiple files to make the type declarations clearer.

The original target of this change is for API only ( ie. do not refactor
any implementation.). However, there are a few type/implementation
inconsistencies so I also made minimal changes to fix them.

### Changes
- extract `TensorUtils` for non-template interfaces
- extract `TensorFactory` for all overloads of `Tensor.fromImage()`
- refactor options type that used for `Tensor.fromImage()`
- fix JSDoc comments to make option descriptions consistent with actual
type declarations
- fix an inconsistency for `options.format` and `options.bitmapFormat`;
change all `bitmapFormat` to `format`
- extract `ConversionUtils` for `tensor.toDataURL()` and
`tensor.toImageData()`
- put implementations into multiple files from `tensor-impl.ts`
- fix a bug that cause unittest fail. put comments for future fix.
2023-06-09 16:19:29 -07:00
Yulong Wang
f274bbb0c8
[js] add API that allows to get package version (#16207)
### Description

Add an API for users to get version of current package. example usage:

```js
import { env } from 'onnxruntime-node';

console.log(env.versions.node);  // output "1.16.0"
```

```js
import { env } from 'onnxruntime-web';

console.log(env.versions.web);  // output "1.16.0"
console.log(env.versions.common);  // output "1.16.0"
console.log(env.versions.node);  // output "undefined"
```

#16156
2023-06-09 16:18:53 -07:00