Commit graph

9426 commits

Author SHA1 Message Date
Edward Chen
d6cd41cfc1
[CoreML EP] Add Shape, Gather, and Slice ops (#17153)
Add CoreML EP shape related ops:
- Shape
- Gather
- Slice

Add support for int64/int32 inputs in CoreML EP.
2023-08-18 22:34:34 -07:00
Edward Chen
2b4cc24d5c
[CoreML EP] Limit input shapes to at most rank 5 (#17086)
When considering nodes for the CoreML EP, limit input shapes to at most rank 5.
2023-08-18 20:33:40 -07:00
Yulong Wang
3426954525
disable browser stack tests (#17224)
### Description
disable browser stack tests
2023-08-18 17:14:12 -07:00
Changming Sun
3cec88bd12
FIX: memory leak checker is incompatible with std::stacktrace (#17209)
### Description
When I worked on PR #17173, I didn't notice that
onnxruntime\core\platform\windows\debug_alloc.cc also needs to call
dbghelp functions like SymInitialize. So, if we use vc runtime's
stacktrace functionality, vc runtime will initialize/uninitialize the
dbghelp library independently and vc runtime's stacktrace helper DLLs
get unloaded before our memory leak checker starts get work. Then we
call SymSetOptions, it crashes.

More details:
In VC runtime the C++23 stacktrace functions are implemented on top of
dbgeng.dll. In C:\Program Files\Microsoft Visual
Studio\2022\Enterprise\VC\Tools\MSVC\14.37.32822\crt\src\stl\stacktrace.cpp,
you can see it has:
```
                dbgeng = LoadLibraryExW(L"dbgeng.dll", nullptr, LOAD_LIBRARY_SEARCH_SYSTEM32);
```
The dbgeng.dll is a wrapper around dbghelp.dll. It calls SymInitialize
and SymCleanup. dbgeng.dll gets unloaded before our memory leak check
starts to run. In theory we should be able to call SymInitialize again
if the previous user who called SymInitialize has also called
SymCleanup. However, users can use
SymRegisterCallback/SymRegisterCallback64/SymRegisterCallbackW64 to
register callback functions to dbghelp.dll. These callback functions
need to be alive when SymSetOptions(and some other dbghelp APIs) get
called.

### Motivation and Context
2023-08-18 17:10:33 -07:00
Changming Sun
6db72165eb
Fix python packaging test pipeline (#17204)
### Description
1. Fix python packaging test pipeline. There was an error in
tools/ci_build/github/linux/run_python_tests.sh that it installed a
released version of onnxruntime python package from pypi.org to run the
test. Supposedly it should pick one from the current build.
2. Refactor the pipeline to allow choosing cmake build type from the web
UI when manually trigger a build. Now this feature is for Linux only.
Because I don't want to change too much when we are about to cut a
release branch. After that I will expand it to all platforms. This
feature is useful for debugging pipeline issues, also, we may consider
having a nightly pipeline to run all tests in Debug mode which may catch
extra bugs because in debug mode we can enforce range check.

Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=342674&view=results

### Motivation and Context
Currently the pipeline has a crash error. 

AB#18580
2023-08-18 14:51:26 -07:00
xhcao
dd3b2cefd6
[js/webgpu] Support int32 type for binary (#16901)
### Description
Enable typed binary and support int32 type for binary.

Co-authored-by: Xing Xu <xing.xu@intel.com>

---------

Co-authored-by: Xing Xu <xing.xu@intel.com>
2023-08-18 12:19:01 -07:00
Adam Louly
c0b6c6c94b
Add SGDOptimizer in the on-device training offline tooling (onnxblock) (#17085)
### Description
Adding SGDOptimizer to on device training onnxblock
2023-08-18 10:50:39 -07:00
Changming Sun
ee09a5ff35
Add DISABLE_CUSPARSE_DEPRECATED flag to CUDA build (#17207)
This is to suppress a warning and make Windows CUDA 12.2 build work.
2023-08-18 10:25:49 -07:00
Hariharan Seshadri
a476dbf430
[JS/WebGPU] Support Tile operator (#17123)
### Description
As title

### Motivation and Context
Improve WebGPU op coverage
2023-08-18 10:07:21 -07:00
satyajandhyala
7d1a5635a0
[JS/Web] Added SkipLayerNormalization operator. (#17102)
### Description
Add SkipLayerNormalization operator to JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-18 09:59:03 -07:00
RandySheriffH
9266cf1772
Skip setting the name when AzureEP enabled. (#17208)
Skip setting the name when AzureEP enabled.

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-18 09:53:36 -07:00
Ashwini Khade
68a670c7f8
Move some tests from CUDA only to CPU (#17189)
### Description
Minor PR to move some CUDA only on-device training tests to CPU as well.
This is to make sure we have good coverage for CPU too.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-18 09:44:57 -07:00
Tianlei Wu
d65aa5400c
clean up transformers scripts (#17179)
(1) Remove class BertOptimizationOptions that has been deprecated a long
time ago
(2) Move sys path setttings to `__init__.py`, and update imports
(3) Fix bert_perf_test to run properly.
(4) Fix a onnx path in a whisper test case
(5) Fix a few typos
(6) Update comments in bert_perf_test regarding to graph inputs
2023-08-17 23:14:49 -07:00
Jack
78b35652a3
fix issue with obtaining the decoder layer number when converting the T5 model. (#17185)
### Description
fix issue with obtaining the decoder layer number when converting the T5
model.

### Motivation and Context
fix issue: https://github.com/microsoft/onnxruntime/issues/17072

Test with
[byt5-small](https://huggingface.co/google/byt5-small/tree/main) model,
which has 12 encoder layers and 4 decoder layers.
Here is the log.

![image](https://github.com/microsoft/onnxruntime/assets/3481539/ff1b69c5-f485-4301-a333-9ee2a984df07)
2023-08-17 23:14:22 -07:00
Adrian Lizarraga
6ee4be724b
Update LICENSE name in NuGet packaging pipelines (#17183)
### Description
Updates NuGet packaging pipelines to use the correct license name.

### Motivation and Context
The license name changed. See https://github.com/microsoft/onnxruntime/pull/17170
The QNN_Windows_Nuget and Zip-Nuget-* pipelines will not run without this update.
2023-08-17 22:22:19 -07:00
Dmitri Smirnov
5c54b64a63
Create NodeArgs for all Constant nodes and initializers for functions being inlined (#17089)
### Description
When functions are inlined and constant nodes are being converted to
initializers, we need to create NodeArg for them.
Similar for inlined function subgraph, but we choose to give priority to
non-constant nodes and then fill the gaps with constant and
initializers.

### Motivation and Context
This addresses issue
https://github.com/microsoft/onnxruntime/issues/16813 for
`eca_halonext26ts_mod.onnx` model where it fails to remove unused
initializer because `NodeArg` was not created for it.
2023-08-17 14:22:28 -07:00
Changming Sun
0cccbcc47b
Move DML build job's Prefast task to a CPU machine pool (#17192)
### Description
Move DML build job's Prefast task to a CPU machine pool which has larger
memory. The current one runs out of memory in every run.

### Motivation and Context
To fix the broken python packaging pipeline.
2023-08-17 13:16:29 -07:00
Jian Chen
e0022d061f
Set web-ci-pipeline.yml only triggered when related fields are updated (#17148)
- 'js/web'
    - 'js/node'
    - 'onnxruntime/core/providers/js'
    is updated

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-17 12:55:35 -07:00
BoarQing
df124c9313
[VITISAI] 1. Fix reading .dat and .onnx on Linux 2. Fix issue of compiling graph twice (#17108)
### Description
<!-- Describe your changes. -->
1. Fix reading .dat and .onnx on Linux 2. Fix issue of compiling graph
twice


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
1. Previous we have not tested large model on Linux. When the model is
sperate into .dat and .onnx, it failed to load the model.
2. Check if the provider pointer is already existed. If existed, do not
create again.
2023-08-17 12:30:03 -07:00
Chi Lo
2fb148dd88
Temporarily enforce "Debug build" TRT EP with trt oss parser on Windows (#17059)
This PR handles two changes:

1. There is an issue when running "Debug build" TRT EP with "Release
build" TRT builtin parser on Windows. Enforce use oss parser for Debug
build.
Note: args.config in build.py is an array, for example ["Debug",
"Release"...]. The code will be much mess if we made the change there.
2. Update to use latest commit of oss parser.

Please see the https://github.com/microsoft/onnxruntime/issues/16273
2023-08-17 12:17:25 -07:00
Pranav Sharma
59a2801136
Fix NuGet pkging pipeline (#17195)
### Description
Fix NuGet pkging pipeline

### Motivation and Context
Fix NuGet pkging pipeline
2023-08-17 11:23:34 -07:00
cloudhan
049adb9f31
[ROCm] Remove redundant ep field in softmax (#17048) 2023-08-17 11:53:30 +08:00
Changming Sun
5249b7ab7c
Re-implement stacktrace (#17173)
### Description
Re-implement stacktrace. The new implementation doesn't directly use
Windows API, hence can avoid problems regarding to
initialize/uninitialize the dbghelp library.

### Motivation and Context
2023-08-16 16:07:49 -07:00
Dmitri Smirnov
f45eef399e
Fix visualization issues with Attribute/Tensor protos (#17188)
### Description
Protobuf Natvis
2023-08-16 13:56:51 -07:00
RandySheriffH
3dd2c1b4d7
EP context for custom op (#16454)
Implement infrastructures to allow EP resources surfaced to custom ops.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-16 13:03:40 -07:00
Maximilian Müller
7b9d1f18c7
NVTX windows include and link fixes (#16831)
### Description

For windows headers are not duplicated to the normal cuda include. For
linux they are:
```
(base) maximilianm@maximilianm-dt-linux:~$ ls /usr/local/cuda/include/nvtx3 | grep nvTool
nvToolsExt.h
nvToolsExtCuda.h
nvToolsExtCudaRt.h
nvToolsExtOpenCL.h
nvToolsExtSync.h
(base) maximilianm@maximilianm-dt-linux:~$ ls /usr/local/cuda/include | grep nvTool
nvToolsExt.h
nvToolsExtCuda.h
nvToolsExtCudaRt.h
nvToolsExtOpenCL.h
nvToolsExtSync.h
```
Is the preference via those added defines or should the include just be
changed to be `nvtx3/` ?

Also there is no library linking needed on Windows and the library is
not even present.
2023-08-16 11:53:58 -07:00
Yulong Wang
cbee84ddfb
[js/web] allow optional input/output in operator test (#17184)
### Description
allow optional input/output in operator test
2023-08-16 11:50:11 -07:00
Adrian Lizarraga
96b1ff610b
Add CI and PR validation triggers to QNN Windows x64 Pipeline yaml (#17178)
### Description
Adds continuous integration and pull-requestion validation triggers
directly to the yaml file for the Windows x64 QNN CI Pipeline.


### Motivation and Context
There have been various unit tests failures that break the
QNN_Windows_Nuget pipeline, which builds QNN EP for Windows x64. This PR
ensures that QNN EP is built and tested on a Windows x64 image for every
pull request.
2023-08-16 11:44:54 -07:00
Hariharan Seshadri
66df11769c
[JS/WebGPU] Expand operator fixes (#17137) 2023-08-16 11:24:26 -07:00
Tianlei Wu
99349e58d7
dump tensor statistics (#15761)
Dump statistics of input and/or output tensors of each node. It could
help to find out why a model outputs NaN.

To use this tool, just add `--cmake_extra_defines
onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1` when build onnxruntime package.
Then set some environment varaibles before running model with
onnxruntime:

```
export ORT_DEBUG_NODE_IO_DUMP_INPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_OUTPUT_DATA=1
export ORT_DEBUG_NODE_IO_DUMP_STATISTICS_DATA=1
```

Then statistics data will be appended after the dumping of input and
output tensors.

One possible cause of a FP16 or mixed precision model outputs NaN: some
number exceeds the limit of FP16 (like max FP16 value is 65504). When a
fp32 model has value > 65504 in a node output, it will become INF when
converting the node to FP16. In this case, you need keep related nodes
in FP32 to avoid the issue. You can dump tensor statistics of FP32 model
to find out such candidate nodes.
2023-08-16 10:53:48 -07:00
satyajandhyala
89b682e3f3
[JS/Web] The bias input is optional, not required, for LayerNormalization operator (#17143)
### Description
Fix a typo. LayerNormalization takes 2 or 3 inputs. The third input,
bias, is optional.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-16 10:41:20 -07:00
Preetha Veeramalai
2ae930333b
Add checks for session options and fix gsubgraph fallback exceptions (#17095)
### Description
Bug fix for OVEP graph provider options and fallback


### Motivation and Context
A bug fix logic is added to handle the fallback to CPU EP. 
Corner case Assertions are added for ProviderOptions in OpenVINO.

---------

Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com>
Co-authored-by: Saurabh Kale <saurabh1.kale@intel.com>
2023-08-16 10:06:25 -07:00
Yulong Wang
133af1385c
[js/webgpu] update shader cache key to include input tensor datatype (#17176)
### Description
update shader cache key to include input tensor datatype.

and make the key a little bit easier to read
2023-08-16 09:14:19 -07:00
Jian Chen
8998b6811d
Fix NPM Packaging Pipeline (#17182)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-08-15 22:56:38 -07:00
PeixuanZuo
ebcd9b5cae
Fix deprecated optimum interface (#17112)
The `latest_model_name` argument to create an {self.__class__.__name__}
is deprecated since optimum 1.6.0. Replace it with `model_name`
2023-08-16 12:33:36 +08:00
Tianlei Wu
6b29837ed2
Move attention test data to file (#17158)
(1) Move attention test data from code to file to avoid prefast crash
(which blocks python packaging pipeline)
(2) Enable some test cases that previously disabled in Windows
(3) Fix an assertion error in
`MultiHeadAttentionTest.CrossAttention_WithPastPassedInDirectly_NoMask`
This test case is for Whisper cross attention. When Memory efficient
attention was enabled, format is converted to BNSH, which trigger
assertion error since memory efficient attention asserts BSNH format.
Temporarily disable memory efficient attention for this case. I also
disabled the test since Whisper does not use it anymore, and ROCm fails
in the test.
2023-08-15 21:31:57 -07:00
xhcao
33ecde9af1
[js/webgpu] Fix reshape int32 test case (#17113)
Co-authored-by: Xing Xu <xing.xu@intel.com>

Co-authored-by: Xing Xu <xing.xu@intel.com>
2023-08-15 21:18:13 -07:00
Guenther Schmuelling
8289e8b6ef
[js/webgpu] fix a few shader errors (#17171)
Fix for segment anything decoder, reduceMax with rank1 and concat.
2023-08-15 21:14:20 -07:00
Yulong Wang
35363dd9a5
[js/web] a few optimizations for test runner (#17174)
### Description
1. allows passing session options to operator test (eg. graph
optimization level)
2. add a short flag '-x' for '--wasm-number-threads' as it is frequently
used.
2023-08-15 21:00:23 -07:00
Justin Chu
2575b9aaa1
Improve comments in winml/ (#17163)
Follow up of #17144. Manually fixed indentation in block comments and
replaced all tabs with spaces.
2023-08-15 23:30:56 -04:00
dependabot[bot]
178e5991ac
Bump protobufjs from 6.11.3 to 6.11.4 in /js/node (#17177) 2023-08-16 02:00:38 +00:00
Arthur Islamov
ccf14e891e
[js/web] JSEP node assignment optimization (#17128)
### Description
Since WebGPU supports only float32 and int32, having Gather, Reshape,
Shape, Squeeze and Unsqueeze ops with other data types create additional
MemCpy ops and slow down the overall execution as all other OPs with
other tensor types will be done on CPU.

Before this patch SD Unet had these numbers:
Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1141
Node(s) placed on [JsExecutionProvider]. Number of nodes: 4025
memcpy tokens: 2001

After patch:
Node(s) placed on [CPUExecutionProvider]. Number of nodes: 1735
Node(s) placed on [JsExecutionProvider]. Number of nodes: 2243
memcpu tokens: 813

It also gives more than 5X performance benefit. From 12sec for one Unet
step to 2.2sec on RTX 3090 Ti, so we are almost getting to native
performance.

UPD: with latest changes from main branch and multi-threading it went
down to 1.6sec. Will try re-exporting my model to onnx with maximum
optimizations, like using MultiHeadAttention to decrease node count.
Maybe after implementing that it can go in less than 1 sec
2023-08-15 18:58:05 -07:00
shaahji
3cdf42548f
Issue #17098: Shape inferencing fails during quantization for large models (#17100) 2023-08-15 18:38:14 -07:00
Wanming Lin
789bac1dc8
[WebNN EP] Support BatchNormalization op (#17071)
Adds support for BatchNormalization via WebNN meanVarianceNormalization.
2023-08-15 17:52:09 -07:00
Pranav Sharma
c0f8197157
Add README to Nuget and fix license file name (#17170)
### Description
Add README to Nuget and fix license file name

### Motivation and Context
Fixes https://github.com/microsoft/onnxruntime/issues/17055
2023-08-15 16:04:34 -07:00
RandySheriffH
39dfcd5d84
Allow RunAsync with global TP (#17157)
Allow RunAsync called with a global thread pool.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-08-15 14:29:10 -07:00
Adam Louly
c647e3e8ab
Run nightly pipeline tests from the commit id. (#17162)
### Description

The onnxruntime-CI-nightly-ort-pipeline encounters occasional failures
due to synchronization discrepancies between the ACPT nightly image and
the repository. We are addressing this by executing tests using the
commit ID associated with the ort build within the ACPT image.

---------

Co-authored-by: Adam Louly <adamlouly@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-08-15 12:07:38 -07:00
dependabot[bot]
f086bd7bff
Bump github/issue-labeler from 2.5 to 3.2 (#16639) 2023-08-15 18:00:19 +00:00
Changming Sun
8e203efc69
Cleanup cmake file (#17154)
### Description
1. Clean up cmake files. Remove some unused code
2. Remove the "Semmle" task from
tools/ci_build/github/azure-pipelines/templates/win-ci.yml. Semmle is
deprecated and replaced by CodeQL.
2023-08-15 10:51:33 -07:00
Changming Sun
2a22325005
Explicitly set JDK version when building ORT java package (#17147)
### Description
Explicitly set JDK version when building ORT java package. This is to fix an internal build error.
2023-08-15 10:36:05 -07:00