Commit graph

9575 commits

Author SHA1 Message Date
Arthur Islamov
65249f42e4
[js/web] FP16 Gemm, Softmax & Transpose (#17494)
### Description
First three OPs to support fp16. Will add more once this gets merged
since others depend on changes in js_data_types
2023-09-11 21:09:37 -07:00
Adrian Lizarraga
f20e475e67
[QNN EP] Update QNN SDK to version 2.14.1 (#17467)
### Description
Updates the version of QNN SDK used by CI Pipelines. Enables some tests
fixed by 2.14.1, but still need to look into Resize in a separate PR.

### Motivation and Context
Test latest version of QNN SDK.
2023-09-11 21:07:50 -07:00
Patrice Vignola
8ad9ab1b9a
Fix CPU constant folding not reverting the node to its previous EP (#17399)
A recent change was made in
5a83a67f32
to make `ep_type` a reference instead of having it be a copy, presumably
to avoid assigning strings (so `auto& ep_type =
node->GetExecutionProviderType()` instead of `auto ep_type =
node->GetExecutionProviderType()`). The problem with this change is that
calling `node->SetExecutionProviderType(kCpuExecutionProvider)` will
change the value of the reference itself, which means that it's
impossible to revert the node to its previous EP.

This change fixes this bug and adds an optimization over the previous
approach by only assigning a string when we know that we are dealing
with a non-CPU node.
2023-09-11 17:38:37 -07:00
satyajandhyala
bf6d6961cc
[JS/Web] Added Einsum operator support. (#17401)
### Description
Added Einsum operator support to JSEP.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-11 15:57:15 -07:00
Yulong Wang
850baced33
[web] a few updates to web pipeline (#17485)
### Description

Update the Web CI pipelines:

- remove parameter 'WebTemplate': Since we start to support webgpu, the
linux-web-ci.yml is no longer working and it is already out-of-date.
remove this file and parameter so that we always use win-web-ci.yml

- change flag `RunWebGpuTests` into 2 flags, for release and debug.
Currently for CI we only run webgpu tests on release build. But we want
to have the capability to run webgpu tests on debug build as well.


After this PR is merged, next step is to enable both Debug and Release
webgpu tests in PostMerge pipeline.
2023-09-11 11:43:42 -07:00
Kaz Nishimura
24f0893d3c
Enable optimize_by_onnxruntime call for float32 unet model (#17483)
This makes it possible to call `optimize_by_onnxruntime` for float32 unet if `--use_external_data_format` is also used.

### Motivation and Context
When using `optimize_pipeline.py` without `--float16`, `optimize_by_onnxruntime` was not called for unet.
2023-09-10 16:50:50 -07:00
Chi Lo
b827ab0efc
[TRT EP] Fix build error for building oss onnx-tensorrt parser (#17468)
If building ORT TRT with `--use_tensorrt_oss_parse` (meaning ORT wil
include [oss onnx-tensorrt
parser](https://github.com/onnx/onnx-tensorrt/blob/main/CMakeLists.txt#L82)
and build it from source) ,the cmake CUDA_INCLUDE_DIR variable is
needed.

if not, you will encounter following [ build
error](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1133937&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=39e3f98f-7fe5-578c-20bd-5ae5a4590bda):

CMake Error: The following variables are used in this project, but they
are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the
CMake files:
    /build/Release/_deps/onnx_tensorrt-src/CUDA_INCLUDE_DIR

Note: Not quite sure why in the past when CI still tested with oss
parser won't hit this issue. probably the CUDA_INCLUDE_DIR was defined
somewhere back then.
2023-09-08 20:34:57 -07:00
Yulong Wang
89da5a0108
[js/webgpu] exclude WebGPU reduce_log_sum_exp_* float64 test cases (#17472)
### Description

as explained in the comments, tests "test_reduce_log_sum_exp_*" on
opset17/opset18 are excluded because they use float64.

They are passing now because they fallback to CPU. WebGPU does not
support f64.


This is one of the prerequisites for supporting IO binding for WebGPU
buffer in onnxruntime-web.

list of prerequisites PRs:
https://github.com/microsoft/onnxruntime/pull/17465
https://github.com/microsoft/onnxruntime/pull/17469
https://github.com/microsoft/onnxruntime/pull/17470
https://github.com/microsoft/onnxruntime/pull/17472 (this one)
2023-09-08 17:03:04 -07:00
Yulong Wang
550293d9ad
OrtMemoryInfo: support new name "WebGPU_Buffer" (#17469)
### Description
Add new name "WebGPU_Buffer" to OrtMemoryInfo.

This is one of the prerequisites for supporting IO binding for WebGPU
buffer in onnxruntime-web.

list of prerequisites PRs:
#17465
#17469 (this one)
2023-09-08 16:37:35 -07:00
Caroline Zhu
dcc93909b4
Add training WASM generation to Web CI pipeline (#17319)
### Description
[Successful pipeline
run](https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1123141&view=results)

Added flag to build the training artifacts & updated the
pull-wasm-artifacts script to pull the training artifacts as well.

Bundled into this PR are minor formatting fixes + naming fixes.

### Motivation and Context
[This PR](https://github.com/microsoft/onnxruntime/pull/16521) extended
the WASM API wrapper to build training WASM artifacts as well.
The ORT training WASM artifacts are required to support ORT training web
bindings.
2023-09-08 15:49:47 -07:00
Tianlei Wu
29a818caa0
Attention fusion for stable diffusion clip model (#17445)
Add attention fusion for stable diffusion clip model to improve performance of SD or SDXL
2023-09-08 14:17:14 -07:00
Yulong Wang
4d753b74a5
[js/common] prepare work for supporting webgpu IO binding implementation (#17465)
### Description
This PR contains a few changes in /js/common/ to support a coming PR for
a full implementation of webgpu IO binding.

- allows pass-through if value is already a Tensor instance in return
value of `handler.run()` called by `InferenceSession.run()`
(inference-session-impl.ts). Specifically, onnxruntime-node and
onnxruntime-react-native uses native bindings to generate a Tensor-like
object so we need to create a real Tensor instance here; for
onnxruntime-web the return value is already a Tensor instance.

- adds new types for GPU buffer supported types: `'float32'|'int32'` ->
`'float32'|'float16'|'int32'|'int64'|'uint32'|'bool'`

- exposes types `GpuBufferDataTypes` together with `CpuPinnedDataTypes`
and `TextureDataTypes` as exported
2023-09-08 13:49:24 -07:00
Changming Sun
bc84f52633
Update C/C++ dependencies: abseil, date, nsync, googletest, wil, mp11, cpuinfo and safeint (#15470)
### Description
Update C/C++ dependencies abseil, date, nsync, googletest, wil, mp11,
cpuinfo and safeint to newer versions per request of @
mayeut. He created the following PRs to update the deps:
https://github.com/microsoft/onnxruntime/pull/15432
https://github.com/microsoft/onnxruntime/pull/15434
https://github.com/microsoft/onnxruntime/pull/15435
https://github.com/microsoft/onnxruntime/pull/15436
https://github.com/microsoft/onnxruntime/pull/15437

However, our build system needs to fetch the dependencies from an
internal mirror that only Microsoft employees have write access to. So I
closed his PRs and created this one.

This PR also updates abseil to a newer version. This is to prepare for
upgrading re2.
2023-09-08 13:35:04 -07:00
Changming Sun
f51a765e64
Avoid calling patchelf (#17365)
### Description
Resolve #9754
2023-09-08 12:25:16 -07:00
Ashwini Khade
c5dbd5c919
Updates to training pipelines (#17292) 2023-09-08 11:57:12 -07:00
Yi Zhang
ae74a517b6
Run Nuget_Test_Linux_GPU in container (#17452)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Verification

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=351542&view=results
2023-09-08 13:41:20 +08:00
Tianlei Wu
7bc6dcecf7
fix embed layer norm fusion with embedding sum output (#17460)
The embedding sum could be graph output (when exporting with output
hidden state enabled). Previously, we only check whether there are
multiple children node to decide whether to output embedding sum in
fused node. This fix will check if the sum is graph output, we will
retain the name.
2023-09-07 22:01:26 -07:00
xhcao
9017ea131b
[js/webgpu] support GreaterOrEqual and LessOrEqual operators (#17310)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-07 17:41:16 -07:00
dependabot[bot]
eaef485461
Bump electron from 23.1.2 to 23.3.13 in /js/web (#17436)
Bumps [electron](https://github.com/electron/electron) from 23.1.2 to
23.3.13.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/electron/electron/releases">electron's
releases</a>.</em></p>
<blockquote>
<h2>electron v23.3.13</h2>
<h1>Release Notes for v23.3.13</h1>
<h2>End of Support for 23.x.y</h2>
<p>Electron 23.x.y has reached end-of-support as per the project's <a
href="https://www.electronjs.org/docs/latest/tutorial/electron-timelines#version-support-policy">support
policy</a>. Developers and applications are encouraged to upgrade to a
newer version of Electron.</p>
<h2>electron v23.3.12</h2>
<h1>Release Notes for v23.3.12</h1>
<h2>Other Changes</h2>
<ul>
<li>Fixed a crash while screen sharing on Wayland with PipeWire. <a
href="https://redirect.github.com/electron/electron/pull/39274">#39274</a></li>
<li>Security: backported fix for CVE-2023-3732.
<ul>
<li>Security: backported fix for CVE-2023-3728.</li>
<li>Security: backported fix for CVE-2023-3730. <a
href="https://redirect.github.com/electron/electron/pull/39268">#39268</a></li>
</ul>
</li>
</ul>
<h2>electron v23.3.11</h2>
<h1>Release Notes for v23.3.11</h1>
<h2>Fixes</h2>
<ul>
<li>Fixed a crash when listing desktop capture sources on Wayland with
PipeWire. <a
href="https://redirect.github.com/electron/electron/pull/39116">#39116</a>
<!-- raw HTML omitted -->(Also in <a
href="https://redirect.github.com/electron/electron/pull/39050">24</a>,
<a
href="https://redirect.github.com/electron/electron/pull/39051">25</a>,
<a
href="https://redirect.github.com/electron/electron/pull/39049">26</a>)<!--
raw HTML omitted --></li>
</ul>
<h2>electron v23.3.10</h2>
<h1>Release Notes for v23.3.10</h1>
<h2>Other Changes</h2>
<ul>
<li>Security: backported fix for CVE-2023-3422.
<ul>
<li>Security: backported fix for CVE-2023-3421.</li>
<li>Security: backported fix for CVE-2023-3420.</li>
<li>Security: backported fix for 1454860. <a
href="https://redirect.github.com/electron/electron/pull/38948">#38948</a></li>
</ul>
</li>
</ul>
<h2>electron v23.3.9</h2>
<h1>Release Notes for v23.3.9</h1>
<h2>Fixes</h2>
<ul>
<li>Fixed <code>preload</code> script may not run in some child windows
opened by <code>window.open</code>. <a
href="https://redirect.github.com/electron/electron/pull/38933">#38933</a>
<!-- raw HTML omitted -->(Also in <a
href="https://redirect.github.com/electron/electron/pull/38932">24</a>,
<a
href="https://redirect.github.com/electron/electron/pull/38931">25</a>,
<a
href="https://redirect.github.com/electron/electron/pull/38930">26</a>)<!--
raw HTML omitted --></li>
<li>Fixed minimize button to be visible when all buttons reenabled. <a
href="https://redirect.github.com/electron/electron/pull/38880">#38880</a>
<!-- raw HTML omitted -->(Also in <a
href="https://redirect.github.com/electron/electron/pull/38881">24</a>,
<a
href="https://redirect.github.com/electron/electron/pull/38879">25</a>)<!--
raw HTML omitted --></li>
</ul>
<h2>electron v23.3.8</h2>
<h1>Release Notes for v23.3.8</h1>
<h2>Other Changes</h2>
<ul>
<li>Security: backported fix for CVE-2023-3215.
<ul>
<li>Security: backported fix for CVE-2023-3216.</li>
<li>Security: backported fix for 1450536. <a
href="https://redirect.github.com/electron/electron/pull/38788">#38788</a></li>
</ul>
</li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="4b782e259b"><code>4b782e2</code></a>
fix: avoid package.json check on built-in modules (<a
href="https://redirect.github.com/electron/electron/issues/39426">#39426</a>)</li>
<li><a
href="b2047d710c"><code>b2047d7</code></a>
ci: fix hang when validating AppVeyor artifacts (<a
href="https://redirect.github.com/electron/electron/issues/39401">#39401</a>)</li>
<li><a
href="10b2baea43"><code>10b2bae</code></a>
docs: clean up removed systemPreferences methods (<a
href="https://redirect.github.com/electron/electron/issues/39349">#39349</a>)</li>
<li><a
href="454990a201"><code>454990a</code></a>
chore: cherry-pick 4 changes from Release-0-M115 (<a
href="https://redirect.github.com/electron/electron/issues/39268">#39268</a>)</li>
<li><a
href="10b49ffa12"><code>10b49ff</code></a>
chore: cherry-pick 2 changes from webrtc (<a
href="https://redirect.github.com/electron/electron/issues/39274">#39274</a>)</li>
<li><a
href="dc0fc78fac"><code>dc0fc78</code></a>
fix: do not resolve electron entrypoints on disk (<a
href="https://redirect.github.com/electron/electron/issues/39249">#39249</a>)</li>
<li><a
href="1aafc2ae38"><code>1aafc2a</code></a>
ci: fail appveyor build if artifacts are missing (<a
href="https://redirect.github.com/electron/electron/issues/39219">#39219</a>)</li>
<li><a
href="595e25a270"><code>595e25a</code></a>
fix: use StartUpdating method for PipeWire capturer (<a
href="https://redirect.github.com/electron/electron/issues/39116">#39116</a>)</li>
<li><a
href="7fe5925c94"><code>7fe5925</code></a>
build: disable unneeded depot_tools update on Windows CI (<a
href="https://redirect.github.com/electron/electron/issues/39016">#39016</a>)</li>
<li><a
href="c4b0ff4994"><code>c4b0ff4</code></a>
chore: cherry-pick 4 changes from Release-3-M114 (<a
href="https://redirect.github.com/electron/electron/issues/38948">#38948</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/electron/electron/compare/v23.1.2...v23.3.13">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=electron&package-manager=npm_and_yarn&previous-version=23.1.2&new-version=23.3.13)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-09-07 17:39:49 -07:00
Dmitri Smirnov
21c202bb5d
Eliminate hashmap copies during function inlining (#17439)
### Description
Eliminate unnecessary HashMap copies. This saves 22% of CPU usage on a
reference Dynamo exported model.

### Motivation and Context
Our function inlining is currently slow.

Before:


![image](https://github.com/microsoft/onnxruntime/assets/11303988/fd38a857-8c12-42ef-9de2-3485123a9fe7)

After


![image](https://github.com/microsoft/onnxruntime/assets/11303988/ea65813d-26cb-41dc-ba55-6a609b169767)
2023-09-07 14:08:38 -07:00
Xavier Dupré
024f1dd72b
Fix float 8 rounding on CPU (#16940)
### Description
Fix float 8 rounding issues discovered in issue #16938 (only CPU
provider).
2023-09-07 20:48:25 +02:00
Yi Zhang
0a3eb60b01
Fix Bug: Step failed but not exited with error (#17442)
### Description
Add "set -ex" in the script.


### Motivation and Context
Build failed but it still passed.

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1132003&view=logs&j=7536d2cd-87d4-54fe-4891-bfbbf2741d83&t=39e3f98f-7fe5-578c-20bd-5ae5a4590bda
2023-09-07 14:33:31 +08:00
Changming Sun
b38fb0da06
Revert the yaml file changes in "Nodejs_Packaging_CPU" build job (#17441)
### Description
The yaml file changes made in #16050 do not really work. Currently the
pipeline is failing with error:
```
Error: Not found SourceFolder: C:\a\_work\5\b\RelWithDebInfo\RelWithDebInfo\nuget-artifacts\onnxruntime-win-x64\lib
```

So, I will revert the yaml changes first to bring the pipeline back.
Some people are waiting for our nightly packages.

Test run:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=351104&view=results

### Motivation and Context
2023-09-06 20:20:55 -07:00
Adrian Lizarraga
1e4bfa1da2
[QNN EP] Add more op unit tests (#17424)
### Description
Adds more units and enables HTP support for several ops:
- Exp
- Floor (enable qdq node unit)
- Min (enable qdq node unit)
- Max (enable qdq node unit)
- Neg (enable qdq node unit)
- Not
- Pow
- PRelu (enable qdq node unit)
- Relu **(Does not work!)**
- Sigmoid
- Sqrt
- Tanh
- LogSoftmax (enable qdq node unit)
- Concat
- GlobalAveragePool

Still missing (9):
- Reshape
- Flatten
- Squeeze
- Unsqueeze
- Gemm
- Clip
- Split
- Topk
- Tile

### Motivation and Context
Increase test coverage and op support
2023-09-06 18:36:09 -07:00
Yi Zhang
ede339f304
Move dotnet build and test into docker in Linux CPU CI (#17417)
### Description
install dotnet 6.0 in the docker image.
move C# build and test into docker.

### Motivation and Context

### Note
The Unit tests and Symbolic shape infer's migration will be in another
PR.
2023-09-07 09:28:16 +08:00
Changming Sun
7862a521b3
Update cmake's hash in android custom build (#17435)
### Description
Update cmake's hash in android custom build. It was forgotten in last PR.
2023-09-06 15:26:46 -07:00
Tianlei Wu
e8b8d0d13b
Fix weight tensors in transformers optimizer not saved to external data (#17427)
Some initializers are added without raw=True flag. That causes those
tensors cannot be saved to external data. If those tensors exceed 2GB
in total, optimized model cannot be saved due to protobuf limit.

This change will save attention weights and bias in raw data.

Note: it is optional to use raw data for shape tensor since they are
tiny.

### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/17212
https://github.com/microsoft/onnxruntime/issues/15349
2023-09-06 13:06:19 -07:00
BoarQing
2629cb8606
[VitisAI] graph_save only saves proto of the graph instead of entire model (#17368)
### Description
<!-- Describe your changes. -->
graph_save only saves proto of the graph instead of entire model.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
We would like to export a part of a model as a new model for unit test.
Therefore, we have to change the API to support such need.
2023-09-06 12:40:48 -07:00
Jian Chen
8914fe687b
[js/webgpu] Include Support for neg.int32 (#17374)
### Description
Include Support for neg.int32



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-09-06 12:00:16 -07:00
Edward Chen
a3a1237270
Disable xcpretty filtering of xcodebuild output in iOS packaging pipeline. (#17429) 2023-09-06 09:04:17 -07:00
Yulong Wang
fa868ca9cd
[js/node] release sessions after use in npm test (#17353)
### Description
resolve sessions after use in NPM test.
2023-09-05 23:42:32 -07:00
Yulong Wang
d88406a31b
[js/common] use Map instead of object for backends (#17352)
### Description
resolved
https://github.com/microsoft/onnxruntime/security/code-scanning/1140
2023-09-05 23:14:46 -07:00
Yulong Wang
75710f0006
[js/webgpu] add matmul broadcast tests (#17335)
### Description

Commit fffefb1c22 (#16969) optimized
matmul and also fixes broadcasting. So #17191 is no longer needed.
However, the newly added operator test file from the PR by @dakenf is
helpful so pick and add it to enhance the tests.
2023-09-05 20:41:46 -07:00
Yulong Wang
110a2d0b73
[build][wasm] add js_internal_api.js to link dependency (#17407)
### Description
add js_internal_api.js to link dependency. Now changes to
js_internal_api.js will correctly trigger re-link of ort-wasm.wasm
2023-09-05 20:40:40 -07:00
Yulong Wang
2cb75420ac
[js/common] clean up JSDoc (#17408)
### Description
clean up JSDoc for onnxruntime-common:

- replace "@internal" to "@ignore" as JSDoc do not use "@internal".
Using "@ignore" will let the content not show on the generated doc.
2023-09-05 20:40:23 -07:00
Vincent Wang
deda5db231
[ORTModule] Add Manual Seed to Fix UT Failure (#17411)
Add manual seed to fix ORTModule UT failure.
2023-09-06 11:24:55 +08:00
James Baker
16eba537a8
rust bindings: Do not unnecessarily re-run build.rs (#17018)
### Description

Remove unnecessary cargo:rerun-if-changed declaration.

### Motivation and Context

'cargo:rerun-if-changed' declarations tell Cargo when to re-run the
build script. The intention is that if the build script depends on other
files, then Cargo knows to re-run if those files change. It stores the
output and checks it before each build. The intention is that one emits
the declarations for _inputs_ of the build.

This rerun-if-changed declaration is a declaration on the _output_ of
the build, and stores the absolute path of the output. This is not a
useful declaration because the output path is unique to the build script
- there is no way for anything else to change it.

However, this does generate unnecessary rebuilds in some cases, for
example if the dependent repository is moved in the filesystem. This
causes me some issues when using https://crane.dev, as due to some
implementation details, if a crate being moved triggers a rebuild, by
default the build is broken.

To summarise:
- declaration is redundant
- causes issues in niche cases.
2023-09-05 19:42:06 -07:00
Changming Sun
c6b0d185b4
Update cmake to 3.27 and upgrade Linux CUDA docker files from CentOS7 to UBI8 (#16856)
### Description
1. Update docker files and their build instructions.
ARM64 and x86_64 can use the same docker file.

2. Upgrade Linux CUDA pipeline's base docker image from CentOS7 to UBI8
AB#18990
2023-09-05 18:12:10 -07:00
xhcao
026672e947
[js/webgpu] Support slice int32 (#16968)
Co-authored-by: Xing Xu <xing.xu@intel.com>
2023-09-05 18:05:47 -07:00
Scott McKay
e1a9f2ed6d
Fix insufficient space error in Android CI (#17423)
### Description
<!-- Describe your changes. -->
Remove onnxruntime_test_all from emulator once tests have finished as
it's 1.2GB and takes up too much space given the 2GB maximum partition
size for the emulator.

Side issue is the java build isn't able to strip the binaries in the
java apk which causes that to be 800MB (exceeding the 2GB max). That may
require an Android/Gradle fix as I don't think we can hardcode an NDK
version into our build files.
https://issuetracker.google.com/issues/237187538?pli=1


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix Android CI build failures for
2023-09-06 10:12:05 +10:00
petermcaughan
fa28359beb
Reduce GPU memory for Whisper models converted to ONNX (#17378)
### Description
This PR changes the Whisper export scripts to further optimize the
process of removing duplicate initializers from two subgraphs.

The current Greedy approach is quicker by a large factor, but results in
some duplicate initializers not being caught and removed. This not only
results in a slightly larger Whisper model, but also a model that uses
more GPU memory.

The approach in this PR uses data hashes and caches to keep a quick
export but no longer rely on a greedy approach.

---------

Co-authored-by: Peter McAughan <petermca@microsoft.com>
2023-09-05 16:24:20 -07:00
Dmitri Smirnov
dbcc60bed5
Introduce output type/shape validation (#17301)
### Description
Validate outputs type and shapes. Make sure sparse initializers are
taken into account.

### Motivation and Context
ORT currently does not validate output types or shapes. Further, neither
inputs or outputs take into account sparse initializers that are
converted from dense.

It is currently possible to pre-allocate a wrong type/shape buffer for
output.

Cc: @Craigacp
2023-09-05 15:25:12 -07:00
Tianlei Wu
8818a99c93
Set proper nvcc threads to avoid OOM (#17419)
### Description

There are 8 cu files under [flash
attention](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/flash_attention)
and 4 cu files under [cutlass
fmha](https://github.com/microsoft/onnxruntime/tree/main/onnxruntime/contrib_ops/cuda/bert/cutlass_fmha)
need a lot of memory to compile.

Previously, the default value is same as parallel - number of CPU cores.
Standard_NC4as_T4_v3 has 4 CPUs and 28 GB memory, and we launched 16
nvcc threads in total (4 parallel jobs, and 4 nvcc threads per job).
Each thread might take 4 GB on average (peak is around 6GB, but threads
are not started at same time). OOM happens since 16 threads might need
close to 64 GB in worst case. When build machine has 64GB or larger
memory, OOM is rare.

Here we set a proper nvcc --threads based on available memory to avoid
OOM.

### Motivation and Context
Fix `Python Packaging Pipeline (Training Cuda 11.8)`
2023-09-05 10:59:27 -07:00
Lennart Hannink
e3bb2a0cdd
Fix git working dir for ORT_BUILD_INFO (fixes #17197) (#17198)
### Description
Git commands producing `git-commid-id` and `git-branch` are always run
in `CMAKE_CURRENT_SOURCE_DIR` (i.e. `onnxruntime/cmake`)


### Motivation and Context
Please refer to corresponding issue
[#17197](https://github.com/microsoft/onnxruntime/issues/17197).
2023-09-05 09:20:49 -07:00
cloudhan
6ea3908db4
Add ck's streamk and splitk gemm impl (#17280) 2023-09-04 11:49:07 +08:00
Jiajia Qin
5e747071be
[js/webgpu] Fix bug in conv2dByMatMul path (#17369)
### Description
<!-- Describe your changes. -->
For the conv2dByMatMul path, the simulated matmul output shape is the
reshape of the original conv2d. So we should pass this information to
`createMatmulProgramInfo` so that it can process it correctly.
2023-09-02 00:16:28 -07:00
Tianlei Wu
e745575187
fix assert error in attention fusion script (#17375)
Add a check of num_heads and hidden_size to avoid assert error (https://github.com/microsoft/onnxruntime/issues/17254)
2023-09-01 08:18:50 -07:00
Tianlei Wu
e23f16adbf
output all parameters in the bert_perf_test tool (#17379)
Currently, there are some parameters missing in output file. This output
all parameters.

Example output:

Latency(ms) | Latency_P50 | Latency_P75 | Latency_P90 | Latency_P95 |
Latency_P99 | Throughput(QPS) | model | graph_optimization_level |
intra_op_num_threads | batch_size | sequence_length | test_cases |
test_times | use_gpu | use_io_binding | average_sequence_length |
random_sequence_length
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
| -- | -- | --
10.91 | 11.16 | 11.3 | 11.7 | 11.78 | 11.84 | 91.66 | model.onnx |
ENABLE_ALL | 4 | 1 | 512 | 1 | 10 | TRUE | TRUE | 64 | FALSE
2023-09-01 08:17:58 -07:00
Baiju Meswani
8b98ecad70
Change RuntimeError to ImportError (#17380)
The `onnxruntime-validation` for ORTModule checks for `ImportError`:


44101e8771/onnxruntime/python/onnxruntime_validation.py (L73-L75)

If any other kind of error is raised, it does not silently fail and will
raise an exception. This causes a problem when ortmodule is explicitly
not made available on win/mac packages since we currently raise a
RuntimeError.

Resolves issue:
https://github.com/microsoft/onnxruntime-training-examples/issues/161
2023-09-01 09:56:40 +08:00
Rachel Guo
16cfcd0590
Fix NNAPI optional input handling checks and unblock Android CI pipeline test failures (#17358)
### Description
<!-- Describe your changes. -->

- Fix missing optional input checks originally coming from a github
issue for no shape on Resize Op.
- Exclude Antialias support for Opset 18 + Resize for NNAPI
- Unblock Android CI pipeline tests failure. 

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Bug fixes.

Issue:
https://github.com/microsoft/onnxruntime/issues/17035

thanks @skottmckay for pointing out the cause.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
2023-08-31 16:40:22 -07:00