Commit graph

8044 commits

Author SHA1 Message Date
Nat Kershaw (MSFT)
abaed6f474
Add link to Python API examples (#14345) 2023-01-21 16:23:16 -08:00
Tianlei Wu
a95fcb4345
UNet fusion and fp16 conversion for stable diffusion (#14248)
Add script to fuse nodes to optimized operators in stable diffusion 1.5
models, and a script to convert fp32 models to fp16 models. Tested with
stable diffusion 1.5.

Note that the optimized model needs onnxruntime-gpu v1.14 (release candidate
will be available soon).

Note: We will update the script to work with latest diffusers and stable
diffusion v2 and v2.1 models.
2023-01-21 10:16:44 -08:00
Nat Kershaw (MSFT)
e57c312f9d
Pin sphinx to avoid broken link (#14383) 2023-01-21 09:50:56 -08:00
Yi Zhang
cf3661ff6d
Revert "Allow PostAnalysis@2 task to continue on error for Windows_Pa… (#14375)
…ckaging_CPU_x86_default (#14332)"

This reverts commit a491f33f54.

### Description


### Motivation and Context
It looks an ADO issue.
Now, it's recovered.
It could be reenabled.
2023-01-21 09:32:39 +08:00
Nat Kershaw (MSFT)
0d40119624
Fix broken link (#14368)
Fixes #11661
2023-01-20 15:55:03 -08:00
Ye Wang
de7a868d5f
Update quantization_defs.cc (#14380)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-20 15:03:50 -08:00
Hariharan Seshadri
2d8ee5251c
Misc transformer fixes - 3 (#14320) 2023-01-20 13:57:57 -08:00
kunal-vaishnavi
72821a6113
Add PyTorch 2.0 to ORT transformer benchmarking (#14300)
### Description
This PR adds PyTorch 2.0 as an option when running the ORT transformer
benchmarking script.


### Motivation and Context
PyTorch released [PyTorch
2.0](https://pytorch.org/get-started/pytorch-2.0/) in the nightly
binaries and a stable release of PyTorch 2.0 is expected in March 2023.
2023-01-20 12:50:53 -08:00
Tianlei Wu
414b012f42
Add memory efficient attention from CUTLASS (#14343)
### Description
Add memory efficient attention from CUTLASS.

TODO (in next pull request): 
(1) Need performance tests on different GPUs, then add a sequence length
threshold (only activate it for long sequence length).
(2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when
it is in cutlass master.
2023-01-20 12:33:01 -08:00
Zhang Lei
e64f357ad4
Fix some prefast checking found problems. (#14342)
Fix : BUG 8989, BUG 9014
2023-01-20 11:04:52 -08:00
Edward Chen
3b382ea7e1
Free OrtStatus in ASSERT_ORT_STATUS_OK, make run_android_emulator.py work with newer JDK version (#14369)
- Free OrtStatus in ASSERT_ORT_STATUS_OK in model_tests.cc
- Make run_android_emulator.py work with newer JDK version
2023-01-20 09:27:47 -08:00
cao lei
22fdc31667
remove unnecessary waitOnEPStep when current node and the consumer node are in the same stream (#14173)
### Description
Remove the unnecessary WaitOnEPStep if the current operator node and its
consumer are in the same stream while there are notifications filed in
the current node



### Motivation and Context
In the current code, the WaitOnEPStep will always be launched as long as
the notification is filed in the input node, no matter the current node
and the input node are in the same stream or not, which is not
necessary.
This PR is to remove the WaitOnEPStep for this case.

Co-authored-by: Lei Cao <leca@microsoft.com>
2023-01-20 07:35:15 -08:00
Kyushick Lee
cd24f0794a
Extend ort_backend.py for another ep (#14349)
### Description
<!-- Describe your changes. -->

This PR extends OrtBackend to allow for configuring an EP based on the
name, and fallbacks to existing mechanism that infers the EP based on
tensor affinity if nothing is provided.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Currently OrtBackend needs `get_ort_device()` with the device tag
inferred from torch.Tensor, but ort device is not yet supported for
dort. The change allows run dort with a supported EP, by configuring
dort with a desired EP and letting the dort (ort InferenceSession) take
CPU-affined pytorch Tensors as inputs then inject data transfer nodes
internally.
2023-01-20 07:30:00 -08:00
Yi Zhang
3d6cea14f4
Remove intermedia obj files once build finished (#14361)
### Description
Remove intermedia obj files and reenable cache

### Motivation and Context
Recently, training_debug_x64 pipeline often failed due to not enough
space.
It could free nearly 8G space by deleting obj files.
So, the compilation cache can be reenabled
2023-01-20 13:37:15 +08:00
Ye Wang
668586e8f8
Support muP in Attention (#14348)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-01-19 20:36:55 -08:00
Tianlei Wu
1dd07d147d
fix windows build error (#14362)
### Description
Fix https://github.com/microsoft/onnxruntime/issues/14359

test\greedy_search_top_one.cc(21,44): warning C4244: '=':
conversion from 'int32_t' to '_Ty', possible loss of data
[C:\Users\11000978\onnxruntime\build\Windows\Debug\onnxrunti
me_providers_cuda.vcxproj]



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-19 18:20:46 -08:00
Wei-Sheng Chin
432a9912a3
Fix LORT CI failure due to PyTorch change (#14367)
As title. The fuser in LORT doesn't like "scalar". With a recent PyTorch
change, scalar is intorduced somewhere it was there before. Now, a
simple fix is to check if all inputs are tensors or some specially
allowed cases before sending ops to ORT.
2023-01-19 16:02:40 -08:00
RandySheriffH
36ba3d8d21
Exclude a multi-stream case from reduced ops build (#14351)
Exclude a multi-stream case from reduced ops build to unblock
[pipeline](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=120&_a=summary).

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-19 14:39:25 -08:00
liqun Fu
5d6a049141
support ScatterND(18) and ScatterElement(18) (#14224) 2023-01-19 13:54:20 -08:00
Ye Wang
d2c3d8eb38
Add Bert/GPT2 fusion change for new attribute mask_filter_value in ORT optimizer (#14333)
### Description
<!-- Describe your changes. -->

The changes correspond to specify the mask_filter_value in attention
attribute. However, the ORT optimizer cannot fuse
SkipLayerNorm/Attention/EmbedLayerNorm with the most recent
transformers. So this PR may only address this issue with some older
version of onnx models(e.g the one used in the unittest)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-01-19 12:52:09 -08:00
Edward Chen
ae0e090c7b
Fix post merge jobs pipeline build issues (#14346)
- Fix debug node inputs outputs nullptr dereference with ONNX optional types.
- Fix model test memory leak.
- Convert jobs to stages in post-merge-jobs.yml to allow a subset of builds to be enabled when running manually.
- Fix buffer overrun in CumSum op exposed by Mimalloc build.
2023-01-19 11:16:42 -08:00
Ashwini Khade
ea7bbd667d
fix headers for training apis (#14350)
### Description
Minor refactor PR for fixing header placement for training apis
2023-01-19 10:26:53 -08:00
Yi Zhang
b51415b0ea
disable cache for training_x64_debug (#14358)
### Description
disable cache to save disk space for training_x64_debug


### Motivation and Context
To mitigate not enough disk space in training_x64_debug first.
2023-01-19 15:08:34 +08:00
Chi Lo
80d61989e9
Unit test modification for TensorRT EP (#14339)
Two modifications:

- After [TRT 8.5](https://github.com/microsoft/onnxruntime/pull/13867)
being merged, we can manually set timeout and make TRT EP only run small
portion of unit tests
(`onnxruntime_SKIP_AND_PERFORM_FILTERED_TENSORRT_TESTS=ON`) due to
additional TRT kernel overhead introduced by TRT 8.5 which increases
test time a lot. This PR modifies the checking condition and make
TensorRT CIs (can enable builder placeholder) still run most of the unit
tests.
- Exclude TRT EP from [Resize Opset
18](https://github.com/microsoft/onnxruntime/pull/13890) unit tests
since TensorRT 8.5 supports operators up to Opset 17.
2023-01-18 21:30:19 -08:00
Adrian Lizarraga
a491f33f54
Allow PostAnalysis@2 task to continue on error for Windows_Packaging_CPU_x86_default (#14332)
### Description
Allows the PostAnalysis@2 task for windows CI jobs to continue even if
an error is encountered.


### Motivation and Context
This is a temporary workaround that enables the
`Windows_Packaging_CPU_x86_default` job within the Zip-Nuget-Java-NodeJS
packaging pipeline to finish. A recent update to dotnet 6 has broken the
PostAnalysis task for this job.

This task was originally added by
https://github.com/microsoft/onnxruntime/pull/13694
2023-01-18 19:54:48 -08:00
Edward Chen
20e164786e
[objc] Fix parameter name in documentation. (#14330)
Fix mismatch between documented and actual parameter name.
2023-01-18 16:54:59 -08:00
Rui Ren
904e63633a
increase the time limit as more unit tests added (#14327)
### Description
Pipeline failed because we added more unit tests, reference:
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863643&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=305229be-e8ba-5189-ca61-fcb77d866478

Now we have: [2430 tests](
https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=863619&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43853)
Previously we had: [2422
tests](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=859543&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=4efd38bc-b0da-5f98-81a8-ea2885f78448&l=43640)

- Timeout error as we have 2 hour threshold
```
jobs:
- job: Linux_Build
  timeoutInMinutes: 120
  variables:
    skipComponentGovernanceDetection: true
```

### Motivation and Context

- Increase the timeoutInMinutes to `150`
2023-01-18 15:51:21 -08:00
Rui Ren
c4e693c4b7
update gsl-lite license (#14318)
### Description
- Update gsl-lite license with MS GSL's License




### Motivation and Context
- Work Item:
https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/10175
- Release ORT 1.14.0
2023-01-18 15:49:13 -08:00
dependabot[bot]
3c695f78fe
Bump electron from 15.5.5 to 18.3.7 in /js/web (#13617)
Bumps [electron](https://github.com/electron/electron) from 15.5.5 to
18.3.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/electron/electron/releases">electron's
releases</a>.</em></p>
<blockquote>
<h2>electron v18.3.7</h2>
<h1>Release Notes for v18.3.7</h1>
<h2>Fixes</h2>
<ul>
<li>Fixed WCO not responding to touch events on windows. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35177">#35177</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35176">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/35174">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed <code>webContents.getUserAgent()</code> incorrectly returning
an empty string unless previously set. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35130">#35130</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35151">17</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/35132">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/35131">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue in which calling setBounds() after e.preventDefault
in a 'will-move' or 'will-resize' event wouldn't change the window's
shape until the mouse button was released. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35082">#35082</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35083">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/35084">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed context menu not showing all items on macOS when dock is not
hidden. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35198">#35198</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35199">19</a>)<!--
raw HTML omitted --></li>
<li>None. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35171">#35171</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35172">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/35173">20</a>)<!--
raw HTML omitted --></li>
</ul>
<h2>Other Changes</h2>
<ul>
<li>Fixed page size always being restricted to 4k on Linux arm64. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35184">#35184</a></li>
<li>Security: backported fix for CVE-2022-2478. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35099">#35099</a></li>
<li>Security: backported fix for chromium:1334864. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35097">#35097</a></li>
</ul>
<h2>electron v18.3.6</h2>
<h1>Release Notes for v18.3.6</h1>
<h2>Fixes</h2>
<ul>
<li>Fixed a crash when calling <code>BrowserWindow.setEnabled()</code>.
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34973">#34973</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34971">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34972">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed a potential crash when changing window settings after
initializing WCO with an invalid <code>titleBarStyle</code>. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34873">#34873</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35031">17</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34874">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34875">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed alwaysOnTop BrowserWindow option for X11 Linux. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34911">#34911</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34912">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34913">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue where BrowserWindows on macOS were incorrectly marked
as resizable. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34907">#34907</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34906">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34433">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue where Windows Control Overlay buttons did not respect
maximizable/minimizable/closable states of a BrowserWindow. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34720">#34720</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34733">17</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34722">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34721">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue where calling
<code>BrowserWindow.setRepresentedFilename</code> on macOS with
<code>titlebarStyle: 'hiddenInset'</code> or <code>titlebarStyle:
'hidden'</code> inadvertently moves the traffic light location. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34847">#34847</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34848">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34849">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue where some <code>BrowserWindow</code>s opened from
new links wouldn't properly load URLs. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34910">#34910</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34189">19</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue where the minimize button with WCO enabled would
incorrectly be highlighted in some cases. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34838">#34838</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34837">17</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34839">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34840">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed an issue with background colors being improperly applied to
<code>BrowserView</code>s on Windows. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/33478">#33478</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/33546">16</a>)<!--
raw HTML omitted --></li>
<li>Fixed empty app_id when running under wayland. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34877">#34877</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34878">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34879">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed missing Sec-CH-UA headers and empty navigator.userAgentData.
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34758">#34758</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34760">17</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34757">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/34524">20</a>)<!--
raw HTML omitted --></li>
<li>Fixed symbol generation on 32-bit Windows release builds. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35096">#35096</a>
<!-- raw HTML omitted -->(Also in <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35090">19</a>,
<a
href="https://github-redirect.dependabot.com/electron/electron/pull/35091">20</a>)<!--
raw HTML omitted --></li>
<li>Prevent brief display of &quot;Ozone X11&quot; in window title on
Linux. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34943">#34943</a></li>
</ul>
<h2>Other Changes</h2>
<ul>
<li>Backported fix for CVE-2022-2294. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34882">#34882</a></li>
<li>Security: backported fix for 1287804. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35102">#35102</a></li>
<li>Security: backported fix for 1333333. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34689">#34689</a></li>
<li>Security: backported fix for 1335054. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34687">#34687</a></li>
<li>Security: backported fix for 1335458. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34685">#34685</a></li>
<li>Security: backported fix for 1336014. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35004">#35004</a></li>
<li>Security: backported fix for 1339844. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35002">#35002</a></li>
<li>Security: backported fix for 1340335. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/35000">#35000</a></li>
<li>Security: backported fix for 1340654. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34998">#34998</a></li>
<li>Security: backported fix for CVE-2022-2162. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34714">#34714</a></li>
<li>Security: backported fix for CVE-2022-2295. <a
href="https://github-redirect.dependabot.com/electron/electron/pull/34881">#34881</a></li>
</ul>
<h2>electron v18.3.5</h2>
<h1>Release Notes for v18.3.5</h1>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="dee6e01e9e"><code>dee6e01</code></a>
Bump v18.3.7</li>
<li><a
href="483e39cc74"><code>483e39c</code></a>
chore: cherry-pick 97193a64b431 from chromium (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35184">#35184</a>)</li>
<li><a
href="cd7490d233"><code>cd7490d</code></a>
fix: consider dock space when showing menu (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35198">#35198</a>)</li>
<li><a
href="b990bd6c97"><code>b990bd6</code></a>
fix: allow setsize to be called within a move or resize for
preventDefault (#...</li>
<li><a
href="56a0b45ef2"><code>56a0b45</code></a>
fix: modify file extension generation on Windows (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35171">#35171</a>)</li>
<li><a
href="5871f81bb9"><code>5871f81</code></a>
fix: touch events not recognized by WCO on windows (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35117">#35117</a>)
(<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35177">#35177</a>)</li>
<li><a
href="511f27506f"><code>511f275</code></a>
ci: turn off windows on arm test result comments (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35167">#35167</a>)</li>
<li><a
href="8189ee64b9"><code>8189ee6</code></a>
chore: add electron deps to //src gitignore (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35148">#35148</a>)</li>
<li><a
href="cc52f07023"><code>cc52f07</code></a>
ci: switch to GHA for WOA (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35127">#35127</a>)</li>
<li><a
href="890adefb95"><code>890adef</code></a>
docs: new main -&gt; renderers messageChannel example (<a
href="https://github-redirect.dependabot.com/electron/electron/issues/35133">#35133</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/electron/electron/compare/v15.5.5...v18.3.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=15.5.5&new-version=18.3.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
Dependabot will merge this PR once CI passes on it, as requested by
@fs-eire.

[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
- `@dependabot use these labels` will set the current labels as the
default for future PRs for this repo and language
- `@dependabot use these reviewers` will set the current reviewers as
the default for future PRs for this repo and language
- `@dependabot use these assignees` will set the current assignees as
the default for future PRs for this repo and language
- `@dependabot use this milestone` will set the current milestone as the
default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-01-18 14:58:09 -08:00
Adrian Lizarraga
de17d53c50
Custom Op runtime wrapper (#13427)
### Description

Adds the below C APIs to support custom ops that wrap an entire model to
be inferenced with an external runtime. The current SNPE EP is an
example of an EP that could be ported to use a custom op wrapper. Ex:
The custom op stores the serialized SNPE DLC binary as a string
attribute. The SNPE model is built when the kernel is created. The model
is inferenced with SNPE APIs on call to the kernel's compute method.

#### C APIs
| API | Description | Why |
| ---            | ---        | ---  |
| `KernelInfo_GetInputCount` | Gets number of inputs from
`OrtKernelInfo`. | Query I/O characteristics during kernel
creation<sup>1</sup> |
| `KernelInfo_GetOutputCount` | Gets number of outputs from
`OrtKernelInfo`. | Query I/O characteristics during kernel
creation<sup>1</sup> |
| `KernelInfo_GetInputName` | Gets an input's name. | Query I/O
characteristics during kernel creation<sup>1</sup> |
| `KernelInfo_GetOutputName` | Gets an output's name. | Query I/O
characteristics during kernel creation<sup>1</sup> |
| `KernelInfo_GetInputTypeInfo` | Gets the type/shape information for an
input. | Query I/O characteristics during kernel creation<sup>1</sup> |
| `KernelInfo_GetOutputTypeInfo` | Gets the type/shape information for
an output. | Query I/O characteristics during kernel
creation<sup>1</sup> |
| `KernelInfoGetAttribute_tensor` | Get a OrtValue tensor stored as an
attribute in the graph node | Extract serialized models, weights, etc. |
| `GetSessionConfigEntry` | Get a session configuration value | Need to
be able to get session-time configurations from within custom op |
| `HasSessionConfigEntry` | Check if session configuration entry exists.
| Need to be able to get session-time configurations from within custom
op |

#### Why so many KernelInfo APIs?<sup>1</sup>
Similar APIs currently exist for `OrtKernelContext`, but not
`OrtKernelInfo`. Note that `OrtKernelContext` is passed to the custom op
on call to its kernel's compute() function. However, `OrtKernelInfo` is
available on kernel creation, which occurs when the session is created.
Having these APIs available from `OrtKernelInfo` allows an operator to
trade-off computation time for session-creation time, and vice versa.
Operators that must build expensive state may prefer to do it during
session creation time instead of compute-time.

SNPE is an example of an EP that needs to be able to query `KernelInfo`
for the name, type, and shape of inputs and outputs in order to build
the model from the serialized DLC data. This is an expensive operation.
Other providers (e.g., OpenVINO) are able to query i/o info from the
serialized model, so they do not strictly need these APIs. However, the
APIs can still be used to validate the expected I/O characteristics.

Additionally, several of our CPU contrib ops currently use the same
internal version of these KernelInfo APIs (Ex:
[qlinear_softmax](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/contrib_ops/cpu/quantization/qlinear_softmax.cc#L71)).
If custom ops are also meant to be a test bed for future ops, then all
custom ops (not just runtime wrappers) would benefit from the addition
of these public KernelInfo APIs (IMO).

#### Example of usage in a custom OP
From
`onnxruntime/test/testdata/custom_op_openvino_wrapper_library/openvino_wrapper.h`

```c++
struct CustomOpOpenVINO : Ort::CustomOpBase<CustomOpOpenVINO, KernelOpenVINO> {
  explicit CustomOpOpenVINO(Ort::ConstSessionOptions session_options);

  CustomOpOpenVINO(const CustomOpOpenVINO&) = delete;
  CustomOpOpenVINO& operator=(const CustomOpOpenVINO&) = delete;

  void* CreateKernel(const OrtApi& api, const OrtKernelInfo* info) const;

  constexpr const char* GetName() const noexcept {
    return "OpenVINO_Wrapper";
  }

  constexpr const char* GetExecutionProviderType() const noexcept {
    return "CPUExecutionProvider";
  }

  // IMPORTANT: In order to wrap a generic runtime-specific model, the custom operator
  // must have a non-homogeneous variadic input and output.

  constexpr size_t GetInputTypeCount() const noexcept {
    return 1;
  }

  constexpr size_t GetOutputTypeCount() const noexcept {
    return 1;
  }

  constexpr ONNXTensorElementDataType GetInputType(size_t /* index */) const noexcept {
    return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED;
  }

  constexpr ONNXTensorElementDataType GetOutputType(size_t /* index */) const noexcept {
    return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED;
  }

  constexpr OrtCustomOpInputOutputCharacteristic GetInputCharacteristic(size_t /* index */) const noexcept {
    return INPUT_OUTPUT_VARIADIC;
  }

  constexpr OrtCustomOpInputOutputCharacteristic GetOutputCharacteristic(size_t /* index */) const noexcept {
    return INPUT_OUTPUT_VARIADIC;
  }

  constexpr bool GetVariadicInputHomogeneity() const noexcept {
    return false;  // heterogenous
  }

  constexpr bool GetVariadicOutputHomogeneity() const noexcept {
    return false;  // heterogeneous
  }

  std::vector<std::string> GetSessionConfigKeys() const { return {"device_type"}; }

 private:
  std::unordered_map<std::string, std::string> session_configs_;
};
```

#### How to create a session:
```c++
Ort::Env env;
Ort::SessionOptions session_opts;
Ort::CustomOpConfigs custom_op_configs;

// Create local session config entries for the custom op.
custom_op_configs.AddConfig("OpenVINO_Wrapper", "device_type", "CPU");

// Register custom op library and pass in the custom op configs (optional).
session_opts.RegisterCustomOpsLibrary(lib_name, custom_op_configs);

Ort::Session session(env, model_path.data(), session_opts);
```
### Motivation and Context
Allows creation of simple "wrapper" EPs outside of the main ORT code
base.
2023-01-18 09:09:32 -08:00
Tang, Cheng
734ae398ee
Fix a security warning in cuda gemm int8 kernel (#14335)
### Description
fix a security warning in GemmInt8 cuda kernel

### Motivation and Context
it is for issue:
https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/11158/

Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-01-18 09:00:44 -08:00
Yi Zhang
8236808e89
add opset18 node test (#14236)
### Description
add opset18 into download data list

### Motivation and Context
Ref:
https://github.com/onnx/onnx/releases/tag/v1.13.0
2023-01-19 00:56:57 +08:00
Scott McKay
dab900dfa0
Fix type mismatch when ORT_ENABLE_STREAM is off (#14324)
### Description
<!-- Describe your changes. -->
PartitionIntoStreams was incorrectly using std::string instead of
PathString for the config file argument when ORT_ENABLE_STREAM was not
defined.

Also Incorporate changes from #14291 to fix build and test issues.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix build error on Windows due to mismatched type.
2023-01-18 13:45:00 +10:00
Dwayne Robinson
f6d0598b4d
DML EP return clearer error message when users attempt to use software adapter (#14273)
### Description
The DML EP provider factory verifies the adapter id is a real GPU (not
some software emulation like WARP which would be quite slow or basic
display driver which lacks D3D compute ability), but the automated tests
sometimes erratically get run on a variety of ADO cloud machines that
lack a GPU or are in a bad state such that Windows fell back to software
emulation. In such cases, you end up reaching the `!IsSoftwareAdapter`
check in the provider factory ([line
132](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/dml/dml_provider_factory.cc#L132))
and seeing in the pipeline logs E_INVALIDARG. Let's return a more
immediately enlightening error code like
ERROR_GRAPHICS_INVALID_DISPLAY_ADAPTER rather than just E_INVALIDARG.

### Motivation and Context
- *Why is this change required? What problem does it solve* Pipeline
noise.
- *If it fixes an open issue, please link to the issue here.* NA.
2023-01-17 18:03:02 -08:00
Tianlei Wu
477cad3051
[CUDA] Add trt cross attention kernels (#14328)
Add TRT cross attention kernels for stable diffusion optimization.
2023-01-17 17:55:45 -08:00
Zhang Lei
a8df6c35f8
Support flash attention on 2d attention mask for gpt2 left padding. (#14215) 2023-01-17 16:45:29 -08:00
Adrian Lizarraga
30b9f5dde1
Clean up TensorRT deprecations, warnings, unbounded string copy (#14148)
### Description
- Updates deprecated use of `nvinfer1::___::destroy()` by using a
`std::unique_ptr<>` instead of our own smart pointer that calls
`destroy`. See [TensorRT deprecation
list](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/deprecated.html#:~:text=Deprecated%20prior%20to%20TensorRT%208.0%20and%20will%20be,noexcept%20Use%20addMatrixMultiply%20instead.%20Deprecated%20in%20TensorRT%208.4.)
and search for `destroy`.
- Fixes warnings regarding uninitialized member variables.
- Fixes bugs in TensorRT model ID generation:
  - Potential segfault when model path only has a root component.
  - Unbounded string copy for non-Windows builds.


### Motivation and Context
Clean up
2023-01-17 15:55:56 -08:00
Guenther Schmuelling
60290393f3
enable ort-extensions in wasm release builds (#14239)
enable ort-extensions in wasm release builds. sentence piece, gpt2, bert
and word piece tokenizers for now.

wasm size will grow from 8.4MB to 8.9MB.
2023-01-17 12:39:13 -08:00
Ye Wang
2db57a53a3
Add mask_filter in Attention related ops' attribute (#14274)
### Description
<!-- Describe your changes. -->


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/12843

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-01-17 12:28:11 -08:00
zhijiang
caa5900508
delete unused local typedef VK to fix pipeline error (#14322)
fix the error
"/onnxruntime_src/onnxruntime/core/providers/cuda/test/greedy_search_top_one.cc:34:9:
error: typedef ‘using VK = struct std::pair<float, int>’ locally defined
but not used [-Werror=unused-local-typedefs]34 | using VK =
std::pair<float, int32_t>"
2023-01-17 12:27:00 -08:00
Adrian Lizarraga
19b4d9d41e
Fix murmurhash3 inclusion in TensorRT shared library (#14221)
### Description
Updates TensorRT and CANN EPs to use murmurhash3 from core/framework via
provider bridge.



### Motivation and Context
A failure in a packaging pipeline required us to temporarily duplicate
murmurhash3 code for the TensorRT EP. This PR removes the duplicate
code. This is what is happening:

The original version of this code conditionally included a murmurhash
function for TensorRT only (not cuda) in the provider bridge. The
packaging pipeline selectively [copies binaries from two separate
builds](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/linux/extract_and_bundle_gpu_package.sh)
(a cuda-only build and a tensorrt build) into a single libs directory.
These are the files within the resulting libs directory:
- onnxruntime.so (copied from tensorrt build, implements murmurhash in
provider bridge host)
- onnxruntime_providers_shared.so (copied from tensorrt build)
- onnxruntime_providers_tensorrt.so (copied from tensorrt build)
- onnxruntime_providers_cuda.so (copied from **cuda-only build**,
expects a provider host w/o murmurhash)

The [squeezenet
example](https://github.com/microsoft/onnxruntime-inference-examples/tree/main/c_cxx/squeezenet)
crashed when onnxruntime_providers_cuda.so was loaded because the cuda
ep tried to call functions from a `ProviderHost` object that did not
match what was actually implemented by onnxruntime.so.

I've confirmed that we _can_ prevent the crash by modifying the pipeline
to use the onnxruntime_providers_cuda.so file from the tensorrt build
(instead of the file from the cuda-only build). However, I don't think
that is necessarily correct. Instead, I think we should try to make sure
that the provider bridge exposes the same interface to any EP libraries
that can potentially coexist in the same application (like cuda and
tensorrt). Failing that, there's probably something we can do to
generate a better error message when an EP detects that the Provider
Host implements an unexpected interface.

Note that the above applies to the Windows build in the packaging
pipeline as well. I used the onnxruntime branch
[adrianl/test-trt-cuda-bridge-packaging-pipeline](https://github.com/microsoft/onnxruntime/tree/adrianl/test-trt-cuda-bridge-packaging-pipeline)
along with the onnxruntime-inference-examples branch
[adrianl/squeezenet_ld_debug](https://github.com/microsoft/onnxruntime-inference-examples/tree/adrianl/squeezenet_ld_debug)
to test that copying the onnxruntime_providers_cuda.so file from the
tensorrt build gets rid of the crash.
2023-01-17 11:18:49 -08:00
Yufeng Li
c99cd06b10
fix transformer model unit tests (#14319)
For following failures, folder of convert_to_onnx should be specified to
import for source code case:
FAILED
test_gpt2_to_onnx.py::TestGpt2ConvertToOnnx::test_auto_mixed_precision
FAILED test_gpt2_to_onnx.py::TestGpt2ConvertToOnnx::test_stage1 -
TypeError: ...
FAILED test_gpt2_to_onnx.py::TestGpt2ConvertToOnnx::test_stage2 -
TypeError: ...

For failure below, SkipLayerNormal is fused:
FAILED
test_optimizer.py::TestModelOptimization::test_huggingface_openaigpt_fusion
2023-01-17 10:34:56 -08:00
Yi Zhang
909d7f4be5
Skip some tests to pass orttraining-gpu with TRT8.5 (#14314)
### Description
skip 3 tests


### Motivation and Context
These 3 tests failed in orttraining-linux-gpu-ci with TRT8.5 image.
2023-01-17 14:41:55 +08:00
kailums
db69079312
fix roctracer missing nccl operation bug (#14277)
### Description
<!-- Describe your changes. -->
This change fixes a bug that when running ort with nccl collective
operation on AMD, it can't trace nccl operation.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
The reason of missing nccl operation in roctracer is that roctracer is
using whitelist of which api can be traced, and nccl use
hipExtLaunchKernel api which is not included in the whitelist. This fix
is to add hipExtLaunchKernel into whitelist, then nccl operation could
be traced.
2023-01-17 11:14:05 +08:00
Jian Chen
d95249f516
Removing Double QDQ from Graphs (#14024)
### Description
When there are 2 QDQ pair back to back, we want to delete the 1 Q and 1
DQ nodes.
ex:
Q->DQ->Q->DQ  =====> Q->DQ



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-16 19:06:57 -08:00
Yi Zhang
fb801d58b1
Add Cache in Linux CPU Aten Pipeline (#14313)
### Description
Add compilation cache in Linux CPU Aten Pipeline.
The pipeline could be completed in 6 minutes at best.

### Motivation and Context
1. Accelerate the pipeline.
2. It's the shortest pipeline with docker image. I'll use it to try
moving the storage of linux docker image from ACR to ADO pipeline cache.
2023-01-17 10:49:29 +08:00
xkszltl
3a9f30df46
Compatibility patch for nlohmann/json < 3.9.0. (#12394)
This is required on CentOS 7 if using distro-provided json-devel 3.6.1.

Regression introduced in:
- https://github.com/microsoft/onnxruntime/pull/11775

Related upstream commit:
-
74520d8bb0

Fixed https://github.com/microsoft/onnxruntime/issues/12393
2023-01-17 10:59:20 +10:00
Scott McKay
7f374f4012
Fix build error on Windows if Python debug libraries are installed (#14308)
### Description
<!-- Describe your changes. -->
If a user installs the debug libraries from Python on Windows the ORT
python project file attempts to use the debug python lib, which
conflicts with a pragma in pyconfig.h that wants the release lib (due to
pybind11 undefining _DEBUG).

Explicitly use the release lib instead of Python::Module so the build
doesn't break.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix obtuse build break.
2023-01-17 09:48:26 +10:00
stevenlix
49cfb56cc3
Fix subgraph index issue in TRT (#14305)
Subgraph index in TRT engine name keeps increasing when multiple
sessions are created for the same model, which causes TRT engine not
being reused and new engine is created again. The issue is because
trt_model_id_generator_ is defined globally.
This PR made following changes and improvements,
1. Define subgraph index as local variable thus it won't be shared
across sessions.
2. Decouple subgraph index from hash id generator
3. Call hash id generator once at the beginning of GetCapability since
hash id is shared between TRT subgraphs and there is no need to call it
for each subgraph

fix https://github.com/microsoft/onnxruntime/issues/14269
2023-01-16 14:40:41 -08:00
Yi Zhang
6d60dc24fe
install shared deps script (#14234)
### Description
Add a new install_shared_deps.sh

### Motivation and Context
Azcopy, Ninja, Node.js and CCache are all needed, but they are copied
everywhere.
2023-01-16 18:27:29 +08:00