Commit graph

7713 commits

Author SHA1 Message Date
PeixuanZuo
da2bd3ad4d
[ROCm] Build ROCm CI with Release config and enable kernel explorer test (#13687)
### Description
<!-- Describe your changes. -->
1. Build ROCm CI with Release config to save time.
2. use 32 threads to build, we have 256 threads on new CI machine.
3. enable ROCm kernel explorer test.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-21 10:04:10 +08:00
dependabot[bot]
8472876155
Bump socket.io-parser from 4.0.4 to 4.0.5 in /js/web (#13608)
Bumps [socket.io-parser](https://github.com/socketio/socket.io-parser)
from 4.0.4 to 4.0.5.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/socket.io-parser/releases">socket.io-parser's
releases</a>.</em></p>
<blockquote>
<h2>4.0.5</h2>
<h3>Bug Fixes</h3>
<ul>
<li>check the format of the index of each attachment (<a
href="b559f050ee">b559f05</a>)</li>
</ul>
<h4>Links</h4>
<ul>
<li>Diff: <a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5">https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/socketio/socket.io-parser/blob/main/CHANGELOG.md">socket.io-parser's
changelog</a>.</em></p>
<blockquote>
<h2><a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5">4.0.5</a>
(2022-06-27)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>check the format of the index of each attachment (<a
href="b559f050ee">b559f05</a>)</li>
</ul>
<h1><a
href="https://github.com/socketio/socket.io-parser/compare/4.1.2...4.2.0">4.2.0</a>
(2022-04-17)</h1>
<h3>Features</h3>
<ul>
<li>allow the usage of custom replacer and reviver (<a
href="https://github-redirect.dependabot.com/socketio/socket.io-parser/issues/112">#112</a>)
(<a
href="b08bc1a93e">b08bc1a</a>)</li>
</ul>
<h2><a
href="https://github.com/socketio/socket.io-parser/compare/4.1.1...4.1.2">4.1.2</a>
(2022-02-17)</h2>
<h3>Bug Fixes</h3>
<ul>
<li>allow objects with a null prototype in binary packets (<a
href="https://github-redirect.dependabot.com/socketio/socket.io-parser/issues/114">#114</a>)
(<a
href="7f6b262ac8">7f6b262</a>)</li>
</ul>
<h2><a
href="https://github.com/socketio/socket.io-parser/compare/4.1.0...4.1.1">4.1.1</a>
(2021-10-14)</h2>
<h1><a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.1.0">4.1.0</a>
(2021-10-11)</h1>
<h3>Features</h3>
<ul>
<li>provide an ESM build with and without debug (<a
href="388c616a92">388c616</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f3329eb5a4"><code>f3329eb</code></a>
chore(release): 4.0.5</li>
<li><a
href="b559f050ee"><code>b559f05</code></a>
fix: check the format of the index of each attachment</li>
<li>See full diff in <a
href="https://github.com/socketio/socket.io-parser/compare/4.0.4...4.0.5">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=socket.io-parser&package-manager=npm_and_yarn&previous-version=4.0.4&new-version=4.0.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-19 12:55:21 -08:00
Nat Kershaw (MSFT)
43a7b520e4
Convert label config to one line regexes (#13702) 2022-11-19 11:38:29 -08:00
Yulong Wang
2d732e9729
[js] [deps] upgrade minimatch@3.1.2 (#13703)
### Description
upgrade minimatch@3.1.2



### Motivation and Context
```
# npm audit report

minimatch  <3.0.5
Severity: high
minimatch ReDoS vulnerability - https://github.com/advisories/GHSA-f8q6-p94x-37v3
```
2022-11-18 22:27:57 -08:00
Hariharan Seshadri
c7329e004d
Improve fp16 performance of GPT-2's logits MatMul while using BeamSearch (#13686) 2022-11-18 18:50:19 -08:00
dependabot[bot]
c358d64b0e
Bump loader-utils from 2.0.0 to 2.0.4 in /js/web (#13666)
Bumps [loader-utils](https://github.com/webpack/loader-utils) from 2.0.0
to 2.0.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/webpack/loader-utils/releases">loader-utils's
releases</a>.</em></p>
<blockquote>
<h2>v2.0.4</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.3...v2.0.4">2.0.4</a>
(2022-11-11)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>ReDoS problem (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/225">#225</a>)
(<a
href="ac09944dfa">ac09944</a>)</li>
</ul>
<h2>v2.0.3</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.3">2.0.3</a>
(2022-10-20)</h3>
<h3>Bug Fixes</h3>
<ul>
<li><strong>security:</strong> prototype pollution exploit (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/217">#217</a>)
(<a
href="a93cf6f470">a93cf6f</a>)</li>
</ul>
<h2>v2.0.2</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.2">2.0.2</a>
(2021-11-04)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>base64 generation and unicode characters (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/197">#197</a>)
(<a
href="8c2d24ee40">8c2d24e</a>)</li>
</ul>
<h2>v2.0.1</h2>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.0...v2.0.1">2.0.1</a>
(2021-10-29)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>md4 support on Node.js v17 (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/193">#193</a>)
(<a
href="1069f61284">1069f61</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/webpack/loader-utils/blob/v2.0.4/CHANGELOG.md">loader-utils's
changelog</a>.</em></p>
<blockquote>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.3...v2.0.4">2.0.4</a>
(2022-11-11)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>ReDoS problem (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/225">#225</a>)
(<a
href="ac09944dfa">ac09944</a>)</li>
</ul>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.3">2.0.3</a>
(2022-10-20)</h3>
<h3>Bug Fixes</h3>
<ul>
<li><strong>security:</strong> prototype pollution exploit (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/217">#217</a>)
(<a
href="a93cf6f470">a93cf6f</a>)</li>
</ul>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.1...v2.0.2">2.0.2</a>
(2021-11-04)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>base64 generation and unicode characters (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/197">#197</a>)
(<a
href="8c2d24ee40">8c2d24e</a>)</li>
</ul>
<h3><a
href="https://github.com/webpack/loader-utils/compare/v2.0.0...v2.0.1">2.0.1</a>
(2021-10-29)</h3>
<h3>Bug Fixes</h3>
<ul>
<li>md4 support on Node.js v17 (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/193">#193</a>)
(<a
href="1069f61284">1069f61</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="6688b50281"><code>6688b50</code></a>
chore(release): 2.0.4</li>
<li><a
href="ac09944dfa"><code>ac09944</code></a>
fix: ReDoS problem (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/225">#225</a>)</li>
<li><a
href="7162619fb9"><code>7162619</code></a>
chore(release): 2.0.3</li>
<li><a
href="a93cf6f470"><code>a93cf6f</code></a>
fix(security): prototype polution exploit (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/217">#217</a>)</li>
<li><a
href="90c7c4be17"><code>90c7c4b</code></a>
chore(release): 2.0.2</li>
<li><a
href="8c2d24ee40"><code>8c2d24e</code></a>
fix: base64 generation and unicode characters (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/197">#197</a>)</li>
<li><a
href="5fb5562084"><code>5fb5562</code></a>
chore(release): 2.0.1</li>
<li><a
href="1069f61284"><code>1069f61</code></a>
fix: md4 support on Node.js v17 (<a
href="https://github-redirect.dependabot.com/webpack/loader-utils/issues/193">#193</a>)</li>
<li>See full diff in <a
href="https://github.com/webpack/loader-utils/compare/v2.0.0...v2.0.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=loader-utils&package-manager=npm_and_yarn&previous-version=2.0.0&new-version=2.0.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2022-11-18 18:01:25 -08:00
Edward Chen
4901987d1d
Remove SafeInt dependency from Objective-C API. (#13698) 2022-11-18 17:06:12 -08:00
Changming Sun
3e9e5e9d6d
Patch Protobuf and ONNX's cmake files and enforce BinSkim check (#13694)
Patch Protobuf and ONNX's cmake files and enforce BinSkim check.

This PR has overlap with #13523 . I would prefer to get this one merged
first so that we can finished the BinSkim work, and I try to make this
PR as small as possible.
2022-11-18 10:09:47 -08:00
Wei-Sheng Chin
6160ba0692
Fix aten::_to_copy in DORT (#13682)
`aten::_to_copy` is not exportable to ONNX. In DORT, so it's replaced in 
`_replace_to_copy_with_to`. This replacement logic becomes incorrect in latest PyTorch
commit, and this PR is a fix.

Basically, we examine more key-word attributes passed to
`aten::_to_copy` and if they lead to a type casting operator (i.e.,
mapped to ONNX's Cast), we replace that `aten::_to_copy` with
`aten::to`. Unsupported attributes are removed (with a low risk of
breaking FX graph's assumptions).
2022-11-18 09:31:18 -08:00
Vincent Wang
07812a2fa6
Fix UT Failure on AMD for ORTModule's Conv Test (#13688)
Currently provider option conv_algo_search is for CUDA only, so remove
the checking for ROCm EP.
2022-11-18 17:52:22 +08:00
Changming Sun
7a57976d1a
Make natvis files work better (#13665)
### Description
After this change, you will see GSL.natvis and wil.nativs files will be
added to every onnxruntime_xxx project.

Like this:

![image](https://user-images.githubusercontent.com/856316/202081013-314145a8-7a0f-4f45-bf85-f9ed0e247c63.png)

This is because in onnxruntime_common.cmake we have:

```cmake
    if (MSVC)
    set(ABSEIL_NATVIS_FILE "abseil-cpp.natvis")
    target_sources(
        onnxruntime_common
        INTERFACE $<BUILD_INTERFACE:${PROJECT_SOURCE_DIR}/external/${ABSEIL_NATVIS_FILE}>)
  endif()
```
It sets a property, INTERFACE_SOURCES, on the target
"onnxruntime_common".

Then if anyone else uses:
```
target_link_libraries(mytarget PRIVATE onnxruntime_common)
```
The nativis file will be added to `mytarget`.

However, in this project we don't use such things for the targets that
are static libraries. For example, onnxruntime_graph is a static
library.

Instead, we use the `onnxruntime_add_include_to_target ` function to
explicitly control what we want to propagate . The function was written
before we started to have nativis files. So it doesn't pass a source
file from one static library to another. Now we have the need. Probably
only for Windows.

### Motivation and Context

Add natvis  files to every project.
2022-11-17 19:13:40 -08:00
Ye Wang
38a74af45d
Support position_ids broadcasting in EmbedLayerNorm (#13677)
### Description
<!-- Describe your changes. -->


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

fix https://github.com/microsoft/onnxruntime/issues/13508
2022-11-17 17:56:27 -08:00
Adrian Lizarraga
abfdb63e31
Update protobuf-java to version 3.21.7 (#13630)
### Description
Update protobuf-java to version 3.21.7. This change only impact tests.

### Motivation and Context
The current version exhibits CVE-2022-3509
2022-11-17 15:04:42 -08:00
pengwa
d5721b3464
Fix wrong import path in docs (#13680)
### Fix wrong import path in docs
2022-11-17 18:15:02 +08:00
PeixuanZuo
a50877ac99
[ROCm] Add ROCm5.3.2 to python package pipeline (#13664)
### Description
<!-- Describe your changes. -->

Add ROCm5.3.2 to python package pipeline 

we build rocm/dev-centos-7:x.x.x stage by ourselves to avoid dependence
on AMD's release.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: peixuanzuo <peixuanzuo@linmif39a000004.zvflicr54joexhdgnhvmxrxygg.phxx.internal.cloudapp.net>
2022-11-17 16:10:49 +08:00
Sunny Shukla
de77c60e6e
[oneDNN ep] SLN performance improvement for bias (#13620)
### Description
SkipLayerNorm performance improvement when bias is present as input

### Motivation and Context
- For SkipLayerNorm op, adding bias tensor using post-op to the add
primitive adding input and skip tensors is causing drastic performance
degradation.

- Hence the post-op is removed and instead, two add primitives are used
in series, adding input and skip, and then adding bias to the result of
input and skip.

- This change has shown a significant amount of performance gain for
SkipLayerNorm operator.
2022-11-16 21:25:00 -08:00
cloudhan
b731cf397d
Make static analysis happy (#13655)
Just suppress some warning by changing code.
2022-11-17 09:07:20 +08:00
Yi Zhang
116079749e
Fix Mac CI in Packaging pipeline (#13671)
### Description
<!-- Describe your changes. -->
The default python upgrades to 3.11 in Mac, but 3.11 hasn't been
supported yet.
So Use python3.8 instead.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix MacOS CI in Zip-Nuget-Java-Nodejs Packaging Pipeline


### Test Run

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=249020&view=logs&j=ded01483-6627-58ac-64dc-d4a232827e5d
2022-11-17 08:12:30 +08:00
Jian Chen
8442d9df2c
Cjian/c4244 round 6 (#13663)
### Description
Fix round 6 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-16 16:26:11 -05:00
Rachel Guo
2efd2878ab
[rn] Add uint8 typedArray support for react native android (#13622)
### Description
<!-- Describe your changes. -->

- Add missing uint8 typedArray case
- Add createInputTensor_uint8 unit test in TensorHelperTest.java file


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Detected inferencesession.run() call error when running react native app
with uint8array input ort tensor. Add missing support to fix.
2022-11-16 12:37:47 -08:00
shalvamist
359091f64a
XNNPACK - GEMM & MATMUL integration (#13126)
### Description - 
Added support for XNNPACK GEMM & MATMUL ops.

### Motivation and Context
Documented ~5% performance improvement on mobileBert using XNNpack Gemm operation 

Co-authored-by: shalvamist <shalva.mist@microsoft.com>
2022-11-16 09:47:35 -08:00
Dwayne Robinson
55fb790d88
DML EP allow squeeze-13 axes to be empty (#13635)
### Description
**Description**: [ONNX
Squeeze-13](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Squeeze)
treats empty `axes` as if all axes had been given. This works for
[earlier Squeeze
versions](https://github.com/microsoft/onnxruntime/pull/12649), but
Squeeze-13 checks for axes as a dynamic input tensor, which means it
needs to checked for existence before accessing.

### Motivation and Context
- *Why is this change required? What problem does it solve?* Fixes a
customer model. Makes ORT DML EP consistent with spec.
2022-11-15 11:03:21 -08:00
Jian Chen
3201a1f841
Cjian/c4244 round 5 (#13645)
### Description
Round 5 of the fixes, there are 192 to go. 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-15 13:48:21 -05:00
Abhishek Udupa
9c6c219949
Enable shape-sensitive analysis in ProfileExplorer for GPU kernels (#13647)
### Description
Improve the profile explorer by enabling shape sensitivity for GPU
kernels.



### Motivation and Context
Due to problems with the ROCM profiler, it was previously challenging to
retrieve the shapes corresponding to a GPU kernel event. [PR
13546](https://github.com/microsoft/onnxruntime/pull/13549) addresses
these problems, so it's now possible to retrieve shapes from the ORT
ROCM/CUDA profilers. This PR leverages [PR
13546](https://github.com/microsoft/onnxruntime/pull/13549) to enable
shape-sensitive GPU kernel ranking.

Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>
2022-11-15 10:05:40 -08:00
Yulong Wang
4cd8b4269a
ignore dirty state of submodule XNNPACK (#13648)
### Description
ignore dirty state of submodule XNNPACK



### Motivation and Context
ONNX Runtime WebAssembly build will apply a patch to XNNPACK so it is
considered 'dirty' state in the submodule. We want to ignore this when
checking the workspace using `git status`.
2022-11-15 00:38:46 -08:00
cloudhan
9e649d1ac4
Allow CUDA EP enable or disable TunableOp via session options and environment variable (#13601)
This ports #13116 from ROCm EP to CUDA EP
2022-11-15 14:43:54 +08:00
JiCheng
2490cf84c9
[QLinearSoftmax]remove input_shape check in Ctor (#13489)
### Description
In some case, we can't get node's shape to do pre-process.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-15 12:02:17 +08:00
Changming Sun
ad31ac466b
Delete cpu-esrp-pipeline.yml (#13623)
The content has been moved to "Zip-Nuget-Java-Nodejs Packaging
Pipeline".
2022-11-14 19:00:40 -08:00
Jeff Bloomfield
b1169635cc
Ensure graph resolve occurs after free dimension is overridden (#13634)
### Description
This ensures that the graph is re-resolved after a free dimension shape
is overridden according to session options.

### Motivation and Context
This ensures that shape inference occurs, which is necessary to apply
the optimation and ensure it the session is compatible with bound
shapes. This bug seems to only have affected a small fraction of models.
2022-11-14 18:39:29 -08:00
Guenther Schmuelling
6f6560a7b9
fix to reduce peak memory usage in ort-web (#13323)
fix to reduce peak memory usage in ort-web
2022-11-14 12:18:02 -08:00
Justin Chu
197191e58c
Update pylint config to include valid short names (#13631)
### Description
Update pylint config to include valid short names
Also disabled `too-many-arguments` and `too-many-locals`


### Motivation and Context
Refine config to reduce lint noise
2022-11-14 10:00:25 -08:00
Jian Chen
f0ff2c5de9
Cjian/c4244 round 4 (#13632)
### Description
round 4, There are 436 more togo.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-14 12:20:26 -05:00
cloudhan
369a822409
Share TunableOp between CUDA and ROCM EP (#13560)
Make TunableOp to support CUDA kernel authoring and add the corresponding supports for kernel explorer
2022-11-11 13:56:44 +08:00
Edward Chen
78147c2d95
Pin react-native version in js/react_native/android/build.gradle. (#13619)
Fix React Native CI build.
Recently the build started picking up a more recent version of React Native that was published to Maven Central.
More details here: https://github.com/facebook/react-native/issues/35210
2022-11-10 15:32:09 -08:00
Abhishek Udupa
9954454c65
Make the ROCM profiler thread-safe, session-aware and preserve logical ordering between CPU and GPU events (#13549)
### Description
The existing ROCM profiler has a few shortcomings, which this PR fixes.

### Motivation and Context
The existing ROCM profiler:
1. Is not thread-safe
2. Is not session-aware: i.e., if multiple inference sessions enable
profiling, then events (esp GPU events) get mixed up between the
sessions
3. Has some issues with respect to coding standards.

This PR addresses all of the above by cleanly re-implementing parts of
the ROCM profiler as required.

Attached are 4 profile outputs from a multi-session run of the
StableDiffusion model, as well as a quick-and-dirty script that checks
the profile outputs for the invariants claimed.


[sd_profile_outputs.tar.gz](https://github.com/microsoft/onnxruntime/files/9924608/sd_profile_outputs.tar.gz)


[check_profile_output_wellformedness.zip](https://github.com/microsoft/onnxruntime/files/9924614/check_profile_output_wellformedness.zip)

Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>
2022-11-10 10:25:41 -08:00
Yi Zhang
240a7ecf86
Fix lgtm C++ error (#13613)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Recently, every change in C/C++ code has the exception as below
```
[2022-11-10 04:36:05] [build-stderr] CMake Error at CMakeLists.txt:5 (cmake_minimum_required):
[2022-11-10 04:36:05] [build-stderr]   CMake 3.24 or higher is required.  You are running version 3.23.1
```

https://lgtm.com/projects/g/microsoft/onnxruntime/logs/rev/pr-9c39e0fe82768b017af09118af7344a9703317a5/lang:cpp/stage:Build%20merge_d70f6e7a151e1fea8003b81a4e6d6aa6a80a788d

### Verification
We could see the test commit in my branch passed.
Once the PR is merged, master build check would pass too.
<img width="767" alt="image"
src="https://user-images.githubusercontent.com/16190118/201086512-25ea69e7-6fe5-4939-b557-b3468428d363.png">
2022-11-10 10:06:22 -08:00
Wei-Sheng Chin
cd85a6333a
Add Missing Test File (#13607)
I built a new test infra for CUDA EP in #13016 but forgot adding the
test to onnxruntime_test_all. Here is the missing file. Now, the
`TestAll` function is really called in CI.
2022-11-10 09:56:19 -08:00
Patrice Vignola
31cb3cb254
[DML EP] Revert DML's cpu fallback logic (#13605)
### Description
Revert DML's CPU fallback logic from
https://github.com/microsoft/onnxruntime/pull/13442.

### Motivation and Context
Although the logic works great in many models that have good DML
coverage, it makes perf worse in some models where many operators are
missing DML coverage (e.g. int64). Overall, the right fix seems to
instead implement the operator on DML even though it almost always falls
back to the CPU, just for the sake of having a registration.
2022-11-10 00:56:23 -08:00
JiCheng
a89015b940
[XNNPACK] wraps xnnpack alloc with cpu_allocator (#13349)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-10 15:41:06 +08:00
Jian Chen
d286822464
Fix round 4 (#13609)
### Description
Fix round 4. Still have about 632 to go.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-10 00:18:51 -05:00
Vincent Wang
2bda3fd341
Gather to Slice Fusion (#13599)
This PR is to optimize the running for below code from Huggingface's
XLNet model.
```
x = torch.index_select(x, 3, torch.arange(klen, device=x.device, dtype=torch.long))
```

The code will be exported to Range->Gather, which can be fused to a
Slice Op. Slice kernel is much faster than Gather, especially for
backward run. The main reason is for Gather, the data in indices can be
duplicated so that it needs sum during backward, but Slice node cannot
have such case.

Use Huggingface's XLNet model for profiling.
- Before the fuse
forward, ~753us

![image](https://user-images.githubusercontent.com/11661208/200758439-63f2f9b5-9610-4df8-98c8-a1ad4dc62f4e.png)
backward, ~46101us

![image](https://user-images.githubusercontent.com/11661208/200758530-fe16a8ec-ea8f-4b79-b3ac-386b72ba1670.png)

- After the fuse
forward, ~627us

![image](https://user-images.githubusercontent.com/11661208/200758654-ab9a6068-c45d-40f4-9c71-3862a56732f8.png)
backward, ~677us

![image](https://user-images.githubusercontent.com/11661208/200758833-aab1b8e1-1b5d-4e55-88cf-03c2a1d9d42b.png)
2022-11-10 13:03:30 +08:00
Jian Chen
0511443782
Cjian/c4244 round 3 (#13583)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-09 15:42:18 -05:00
Changming Sun
86968d1351
Merge win-gpu-ci.yml and win-cpu-ci.yml (#13597) 2022-11-09 11:32:39 -08:00
Dmitri Smirnov
bbedf2c4c5
Improve cache locality and perf of DeepGru on CPU (#13582)
### Description
<!-- Describe your changes. -->
Introduce Gemm weights pre-pack.

### Motivation and Context
A 1-P customer requested a performance improvement for DeepGru which
consumes a bulk of CPU in their model. This provides measurable
performance improvements.

Customer model numbers.

gru: mean = 356 us; 1ms = 99.8 prctile; 99th prctile = 665 ms
(yuslepukhin/deep_gru_opt)
main: mean = 375 us; 1ms = 99.8 prctile; 99th prctile = 695 ms (where
yuslepukhin/deep_gru_opt branched off main)
1.13.1: mean = 391 us; 1ms = 99.6 prctile; 99th prctile = 744 ms
2022-11-09 09:59:38 -08:00
Baiju Meswani
e0361e6256
Change protobuf pin in training requirements (#13596) 2022-11-09 09:37:41 -08:00
Jeff Daily
d5d6924688
rocblas alt impl during backward pass only (#13352)
On AMD Instinct MI200 GPUs, the FP16 and BF16 V_DOT2 and MFMA matrix
instructions flush input and output denormal values to zero. When
training using FP16 precision, some models may fail to converge with
FP16 denorms flushed to zero. The affected instructions are only used by
rocBLAS (GEMM) and MIOpen (convolution) kernels; all other onnxruntime
operations will not encounter this behavior. All other supported AMD
GPUs will not encounter this behavior.

rocBLAS and MIOpen provide alternate implementations for affected FP16
operations. Alternate implementations for BF16 operations are not
provided; BF16 numbers have a larger dynamic range than FP16 numbers and
are less likely to encounter denormal values. For the FP16 alternate
implementations, FP16 input values are cast to an intermediate BF16
value and then cast back to FP16 output after the accumulate FP32
operations. In this way, the input and output types are unchanged.

Denormal values more frequently occur in the backward pass of training
during gradient calculation. Therefore, it is necessary to track when
the backward pass of training is executing. For the ROCm EP only, the
`__backwardpass` attribute is added to all Nodes after the YieldOp is
detected. This takes place in a level1 graph optimization pass. The
attribute is forwarded to any newly created FusedMatMul Nodes. In
addition, the scope-based helper class `BackwardPassGuard` is provided
to toggle state for rocblas. This behavior of using the alternate
implementations during the backward pass is made automatic with this PR.
This default behavior can be overridden using environment variables,
ROCBLAS_INTERNAL_FP16_ALT_IMPL and
MIOPEN_DEBUG_CONVOLUTION_ATTRIB_FP16_ALT_IMPL. The behavior of these
environment variables is as follows:

|              | forward   | backward  |
|--------------|-----------|-----------|
| Env unset    | original  | alternate |
| Env set to 1 | alternate | alternate |
| Env set to 0 | original  | original  |

See also:


https://pytorch.org/docs/stable/notes/numerical_accuracy.html#reduced-precision-fp16-and-bf16-gemms-and-convolutions-on-amd-instinct-mi200-devices
2022-11-10 00:47:06 +08:00
Jian Chen
d10d66cc84
Cjian/c4244 round 1a (#13483)
### Description
Redo the round using gsl:narrow and SafeInt



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2022-11-08 23:58:05 -05:00
Patrice Vignola
3482180ec2
DML EP add a registration for Shape and Size (#13442)
### Description
Add a DML registration for Shape to avoid copying back to the CPU just
to get the shape of a GPU tensor.



### Motivation and Context
When using free dimensions, many Transformers models extensively use the
`Shape` operator. This causes hundreds of GPU->CPU copy that should be
completely avoidable. Note that this change also uses the same
heuristics as other providers (e.g. CUDA) to force some tensors on the
CPU in certain situations.

Co-authored-by: Patrice Vignola <pavignol@microsoft.com>
2022-11-08 19:29:37 -08:00
Yi Zhang
a9a9c34d98
Fix WinML Test Case: create LearningModelBinding for every testcase (#13587)
### Description
Fix #13509

### Motivation and Context
The exception was caused by the incorrect fetches, which was from the
binding with last test cases.

efcbdac58e/onnxruntime/core/session/onnxruntime_c_api.cc (L809-L815)
2022-11-09 11:20:48 +08:00
Adrian Lizarraga
281f199754
[EP-Perf-Dashboard] Reduce script excessive output (#13562)
### Description
Properly cleans up all temporary resources created while running
benchmarks.

Details:
- Dump all temporary artifacts (TRT engines, TRT profiles, inference
profiles, fp16 models) into a temp directory in `/tmp/`. Each model/EP
combination has its own temp directory that is deleted after validation
and benchmarking.
- Allow running both validation and benchmarking in one invocation of
the benchmark.py script. This is necessary to allow the benchmarking
step to reuse artifacts (e.g., TRT engines) created during validation.
Before this PR, we ran validation on all model/EP combinations before
running benchmarks on all combinations again. This required us to keep
all temporary artifacts for all model/EP combinations throughout the
entire run (expensive).
- Create individual functions for validation and benchmarking (split-up
large function that did it all)

### Motivation and Context
The EP Perf pipeline failed to run because the script generated too much
output and the VM ran out of disk space.
2022-11-08 16:17:29 -08:00