Commit graph

8417 commits

Author SHA1 Message Date
Scott McKay
eb8f6c7c52
Transpose optimizer enhancements (#15117)
### Description
<!-- Describe your changes. -->
- Add debug infrastructure to dump out model at various stages of
transpose optimization.
- Handle more scenarios where Transpose -> Reshape can be merged.
- Run L1 optimizers after layout transform to constant fold initializers
that had their layout changed.
- Use cost check for Concat post layout transform as pushing a Transpose
through it can potentially add Transpose nodes to multiple other inputs.
- Update internal testing EP to support test where you want it to take
all nodes, use NHWC layout, and to use dummy static kernels instead of
compiling so the ops in the graph post-initialization can be counted.
- Misc cleanup in InferenceSession to not unnecessarily pass args to
TransposeGraph for class members.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Address perf issue seen with model where a Transpose gets blocked by a
Reshape that could have been treated as a Transpose.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-03-28 08:28:17 +10:00
Jian Chen
792d411135
Update python 3.11 and remove 3.7 for Linux (#15214)
### Description
Update python 3.11 and remove 3.7



### Motivation and Context
Update python 3.11 and remove 3.7

---------

Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>
2023-03-27 14:46:30 -07:00
Edward Chen
ea40dc3ad6
Update build.py to disallow running as root user by default. (#15164)
Try to address intermittent permissions issues that show up in non-transient CI environments.
2023-03-27 14:46:04 -07:00
Nat Kershaw (MSFT)
3064fa7611
Fix C API docs error (#15216) 2023-03-27 14:34:18 -07:00
Jian Chen
527e006124
Update mlas (#15228)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-27 14:18:48 -07:00
Bengt Gustafsson
063ee8d504
Fixed some warnings that were treated as errors when compiling with D… (#15157)
…ML in Win32/MSVC.

### Description
Use onnxruntime::narrow to silence some warnings that are turned into
errors when compiling the DML provider in Win32.

Also one case of warning turned to error for mixing int loop variable
type with a vector size() as upper bound.


### Motivation and Context
Solves
[https://github.com/microsoft/onnxruntime/issues/14595](https://github.com/microsoft/onnxruntime/issues/14595)

Co-authored-by: bengt.gustafsson <bengt.gustafsson@contextvision.se>
2023-03-27 14:17:28 -07:00
Changming Sun
63cc1bb26a
Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144)
### Description
1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper
2. Enable CCache for orttraining pipeline

### Motivation and Context
Azure AMD CPU machines are generally much cheaper than Intel CPU
machines. However, they don't have local disks.
2023-03-27 14:10:08 -07:00
Patrice Vignola
67a6022c03
[DML EP] Add GroupNorm (#15189)
Comparison between the different normalization operations:
![](https://user-images.githubusercontent.com/1041752/106491728-73d40680-64b7-11eb-8769-3f758996e959.png)
2023-03-27 12:52:53 -07:00
Tianlei Wu
2e56620611
Add file and line info in CudaCall and RocmCall macros (#15148)
This PR add file and line information so that it is easy to trouble shoot the issue of cuda error.
Update Rocm call as well for hipify.
2023-03-27 11:04:19 -07:00
Changming Sun
ffcfb1ec98
Remove protobuf submodule (#15190)
### Description
Remove protobuf submodule as a follow-up of #13523

"Android CI Pipeline" and "Zip-Nuget-Java-Nodejs Packaging Pipeline"
need to be tested.


### Motivation and Context
It is related to
[AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753)

Fixed
[AB#14027](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14027)
2023-03-27 10:35:49 -07:00
Justin Chu
e754edaecf
Run rustfmt in CI (#15217)
I considered running clippy as well but ort takes too long to build
2023-03-27 08:12:59 -07:00
Patrice Vignola
b10aaf4e9c
Fix error message when running NhwcConv with a bad weight channel count (#15221) 2023-03-27 00:15:19 -07:00
Yi Zhang
d182d34f1d
pause caching docker image in pipeline cache in Linux Aten Pipeline (#15227)
### Description
Pause caching the docker images in pipeline cache in Linux Aten
Pipeline.

### Motivation and Context
We need to work out a better way to reduce the storage.
2023-03-27 11:06:53 +08:00
Adrian Lizarraga
d24b630fc3
[QNN EP] Support reduce ops with axes as initializer input (#15126)
### Description
- Adds support for newer opset of Reduction operators (ReduceSum,
ReduceMax, ReduceMin, ReduceMean, ReduceProd) with axes as an
initializer input.
- Adds tests for HTP and CPU backends.

### Motivation and Context
Newer opset versions changed the `axes` attribute into an optional
input. This PR adds support for these newer reduction operators as long
as the axes input is defined as an initializer. The goal is to enable more models on QNN.
2023-03-26 16:39:22 -07:00
cloudhan
d3565779c3
Allow bert_perf_test.py to load/save tuning results (#15096) 2023-03-26 18:03:08 +08:00
Chris Austen
93e6902790
resolve undefined symbol: rocblas_create_handle (#15204)
Update migraphx section of onnxruntime_providers.cmake to add the rocblas library
2023-03-26 18:01:58 +08:00
Jian Chen
750747d8c9
Cjian/multi stage packaging pipeline (#14993) 2023-03-24 23:39:15 -07:00
Hector Li
5a2e43bdd5
[QNN EP] Improve Slice to support opset 9 (#15186)
### Description
Improve Slice to support Onnx opset9 which has starts, ends & axes in
node attributes.

### Motivation and Context
To unblock some models.
2023-03-24 16:07:06 -07:00
Justin Chu
d834ec895a
Adopt linrtunner as the linting tool - take 2 (#15085)
### Description

`lintrunner` is a linter runner successfully used by pytorch, onnx and
onnx-script. It provides a uniform experience running linters locally
and in CI. It supports all major dev systems: Windows, Linux and MacOs.
The checks are enforced by the `Python format` workflow.

This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors
in Python code. `lintrunner` now runs all required python lints
including `ruff`(replacing `flake8`), `black` and `isort`. Future lints
like `clang-format` can be added.

Most errors are auto-fixed by `ruff` and the fixes should be considered
robust.

Lints that are more complicated to fix are applied `# noqa` for now and
should be fixed in follow up PRs.

### Notable changes

1. This PR **removed some suboptimal patterns**:

	- `not xxx in` -> `xxx not in` membership checks
	- bare excepts (`except:` -> `except Exception`)
	- unused imports
	
	The follow up PR will remove:
	
	- `import *`
	- mutable values as default in function definitions (`def func(a=[])`)
	- more unused imports
	- unused local variables

2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than
flake8 and is more robust. We are using it successfully in onnx and
onnx-script. It also supports auto-fixing many flake8 errors.

3. Removed the legacy flake8 ci flow and updated docs.

4. The added workflow supports SARIF code scanning reports on github,
example snapshot:
	

![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png)

5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Unified linting experience in CI and local.

Replacing https://github.com/microsoft/onnxruntime/pull/14306

---------

Signed-off-by: Justin Chu <justinchu@microsoft.com>
2023-03-24 15:29:03 -07:00
Dmitri Smirnov
2de15c5d50
Re-work OrtApi struct to satisfy C++20 compilers (#15183)
### Description
<!-- Describe your changes. -->
Remove `deletion` of copy functions from `OrtApi` as its initialization
no longer compiles in C++20.
Introduce a non-copyable member to implicitly delete copy ctor.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Inspired by https://github.com/microsoft/onnxruntime/pull/14901

Solution credits: @RyanUnderhill 

Cc: @georgthegreat
2023-03-24 13:52:17 -07:00
Justin Stoecker
dc87691000
Enable DML graph fusion independently of graph optimization level (#15172)
### Description
Apply the DML graph fusion transformer optimization independently of ORT
graph optimization level.

### Motivation and Context
The DML graph fusion transformer is not a graph optimizer in the normal
sense: it isn't optimizing the ONNX graph structure, but rather fusing
nodes into what will later become a single IDMLCompiledOperator (using
IDMLDevice1::CompileGraph). This transformer can't be done ahead of time
(hence why it's disabled if saving an optimized model), but it's also
gated by the ORT graph optimization level; this makes it impossible to
preoptimize ONNX models ("offline mode") and then later disable graph
optimizations for better startup performance ("online mode") while
benefiting from DML graph fusion.
2023-03-24 13:50:17 -07:00
PeixuanZuo
56bccac35d
[ROCm] update bert-L convergence reference file to fix CI (#15200)
The change of layernorm lead to the change of bert-L convergence result.
2023-03-24 21:43:44 +08:00
PeixuanZuo
7eb6dbe7d8
[ROCm] Add compute type for Skiplayernorm to fix ROCm CI (#15192)
- Add compute type for Skiplayernorm to fix ROCm CI and get more
accurate results.

SkipLayerNorm:
type T: input, skip, bias
type U: epsilon, compute result
type V: output, beta, gamma

- refactor the usage of aligned_vector, reduce the usage of
`reinterpret_cast`.
2023-03-24 19:31:14 +08:00
Patrice Vignola
3a4c895765
[DML EP] Add support for SkipLayerNorm's fourth output (#15160) 2023-03-23 23:47:21 -07:00
Ye Wang
0402f930f2
exclude decoder files in hipify.cmake (#15188) 2023-03-23 22:40:06 -07:00
Yi Zhang
5c5c345abc
Add smoking tests for all CPU Packages. (#15153)
### Description
So far, 2 packages are not supported.
1. Mac silicon, because there isn't Mac silicon agent in Azure.
2. Linux ARM64, because there isn't microsoft-hosted Linux ARM64 agent
in ADO and UsePythonVersion isn't supported in self-hosted Linux ARM
pool.

Test Run:

https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=291132&view=logs&j=3a60a0ba-1640-5a1c-2d51-19af647b2d6b
2023-03-24 12:30:05 +08:00
Yi Zhang
338e6672dd
use build.sourceversion in cache image key (#15019)
### Description
Use build.sourceversion in docker image cache key.



### Motivation and Context
We used filpath as the cache key in #14496.
In most cases, the docker base image tag is latest.
So, the hash of the files couldn't be aware of the change of base image.
As the result, the docker image restored, but the image will still be
rebuilt .
The maintenance cost would be huge if we pin image hash in docker file.
For example,
https://quay.io/repository/pypa/manylinux2014_x86_64?tab=tags&tag=latest,
it's updated almost every week.
So far, the build.sourceversion is the right way to keep cache is
updated and valid.
2023-03-24 10:01:22 +08:00
Scott McKay
ea245c94e7
Add constant folding for simple QDQ Node Units (#15138)
### Description
<!-- Describe your changes. -->
Currently we bail on constant folding if QDQ is enabled and we hit a DQ
node. However, if we have a simple DQ -> X -> Q node unit where the DQ
and X do not produce graph outputs, their output only has one consumer,
and X is deterministic, we can constant fold all three nodes.

Add support for this simple scenario primarily to constant fold a QDQ
model that has had initializers updated by layout transformation, which
results in patterns like `initializer -> DQ -> Transpose -> Q` or
`initializer- > DQ -> Unsqueeze -> Q -> DQ -> Transpose -> Q` if the
initializer is broadcast.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve end result of layout transformation on a QDQ model.
2023-03-24 08:46:07 +10:00
dependabot[bot]
0200995058
Bump webpack from 5.75.0 to 5.76.0 in /js (#15159)
Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to
5.76.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/webpack/webpack/releases">webpack's
releases</a>.</em></p>
<blockquote>
<h2>v5.76.0</h2>
<h2>Bugfixes</h2>
<ul>
<li>Avoid cross-realm object access by <a
href="https://github.com/Jack-Works"><code>@​Jack-Works</code></a> in <a
href="https://redirect.github.com/webpack/webpack/pull/16500">webpack/webpack#16500</a></li>
<li>Improve hash performance via conditional initialization by <a
href="https://github.com/lvivski"><code>@​lvivski</code></a> in <a
href="https://redirect.github.com/webpack/webpack/pull/16491">webpack/webpack#16491</a></li>
<li>Serialize <code>generatedCode</code> info to fix bug in asset module
cache restoration by <a
href="https://github.com/ryanwilsonperkin"><code>@​ryanwilsonperkin</code></a>
in <a
href="https://redirect.github.com/webpack/webpack/pull/16703">webpack/webpack#16703</a></li>
<li>Improve performance of <code>hashRegExp</code> lookup by <a
href="https://github.com/ryanwilsonperkin"><code>@​ryanwilsonperkin</code></a>
in <a
href="https://redirect.github.com/webpack/webpack/pull/16759">webpack/webpack#16759</a></li>
</ul>
<h2>Features</h2>
<ul>
<li>add <code>target</code> to <code>LoaderContext</code> type by <a
href="https://github.com/askoufis"><code>@​askoufis</code></a> in <a
href="https://redirect.github.com/webpack/webpack/pull/16781">webpack/webpack#16781</a></li>
</ul>
<h2>Security</h2>
<ul>
<li><a
href="https://github.com/advisories/GHSA-3rfm-jhwj-7488">CVE-2022-37603</a>
fixed by <a
href="https://github.com/akhilgkrishnan"><code>@​akhilgkrishnan</code></a>
in <a
href="https://redirect.github.com/webpack/webpack/pull/16446">webpack/webpack#16446</a></li>
</ul>
<h2>Repo Changes</h2>
<ul>
<li>Fix HTML5 logo in README by <a
href="https://github.com/jakebailey"><code>@​jakebailey</code></a> in <a
href="https://redirect.github.com/webpack/webpack/pull/16614">webpack/webpack#16614</a></li>
<li>Replace TypeScript logo in README by <a
href="https://github.com/jakebailey"><code>@​jakebailey</code></a> in <a
href="https://redirect.github.com/webpack/webpack/pull/16613">webpack/webpack#16613</a></li>
<li>Update actions/cache dependencies by <a
href="https://github.com/piwysocki"><code>@​piwysocki</code></a> in <a
href="https://redirect.github.com/webpack/webpack/pull/16493">webpack/webpack#16493</a></li>
</ul>
<h2>New Contributors</h2>
<ul>
<li><a
href="https://github.com/Jack-Works"><code>@​Jack-Works</code></a> made
their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16500">webpack/webpack#16500</a></li>
<li><a href="https://github.com/lvivski"><code>@​lvivski</code></a> made
their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16491">webpack/webpack#16491</a></li>
<li><a
href="https://github.com/jakebailey"><code>@​jakebailey</code></a> made
their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16614">webpack/webpack#16614</a></li>
<li><a
href="https://github.com/akhilgkrishnan"><code>@​akhilgkrishnan</code></a>
made their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16446">webpack/webpack#16446</a></li>
<li><a
href="https://github.com/ryanwilsonperkin"><code>@​ryanwilsonperkin</code></a>
made their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16703">webpack/webpack#16703</a></li>
<li><a href="https://github.com/piwysocki"><code>@​piwysocki</code></a>
made their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16493">webpack/webpack#16493</a></li>
<li><a href="https://github.com/askoufis"><code>@​askoufis</code></a>
made their first contribution in <a
href="https://redirect.github.com/webpack/webpack/pull/16781">webpack/webpack#16781</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0">https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="97b1718720"><code>97b1718</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16781">#16781</a>
from askoufis/loader-context-target-type</li>
<li><a
href="b84efe6224"><code>b84efe6</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16759">#16759</a>
from ryanwilsonperkin/real-content-hash-regex-perf</li>
<li><a
href="c98e9e0014"><code>c98e9e0</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16493">#16493</a>
from piwysocki/patch-1</li>
<li><a
href="5f34acfbc0"><code>5f34acf</code></a>
feat: Add <code>target</code> to <code>LoaderContext</code> type</li>
<li><a
href="b7fc4d876d"><code>b7fc4d8</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16703">#16703</a>
from ryanwilsonperkin/ryanwilsonperkin/fix-16160</li>
<li><a
href="63ea82da4d"><code>63ea82d</code></a>
Merge branch 'webpack:main' into patch-1</li>
<li><a
href="4ba225225b"><code>4ba2252</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16446">#16446</a>
from akhilgkrishnan/patch-1</li>
<li><a
href="1acd6350be"><code>1acd635</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16613">#16613</a>
from jakebailey/ts-logo</li>
<li><a
href="302eb37fe1"><code>302eb37</code></a>
Merge pull request <a
href="https://redirect.github.com/webpack/webpack/issues/16614">#16614</a>
from jakebailey/html5-logo</li>
<li><a
href="cfdb1dfe59"><code>cfdb1df</code></a>
Improve performance of hashRegExp lookup</li>
<li>Additional commits viewable in <a
href="https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0">compare
view</a></li>
</ul>
</details>
<details>
<summary>Maintainer changes</summary>
<p>This version was pushed to npm by <a
href="https://www.npmjs.com/~evilebottnawi">evilebottnawi</a>, a new
releaser for webpack since your current version.</p>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=webpack&package-manager=npm_and_yarn&previous-version=5.75.0&new-version=5.76.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-03-23 15:17:52 -07:00
Nat Kershaw (MSFT)
28f64066de
Auto deploy API docs (#15088) 2023-03-23 15:08:49 -07:00
Ye Wang
44ba23e0f5
Rename DecoderMaskedMHA to DecoderMaskedSelfAttn (#15166)
### Description
<!-- Describe your changes. -->

As synced offline, rename this op and will create another op for mha
that supports both self and cross attention.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-03-23 12:31:38 -07:00
Ye Wang
2ee822d483
Extend memory efficient attention coverage in Attention/MHA cuda op (#15064)
### Description
<!-- Describe your changes. -->

1. upgrade cutlass to 3.0 that containing attn_bias support.
2. extend Attention/MHA to use memory efficient attention when
rel_pos_bias with [1, num_head, s, s*] and 1d mask with [2 * batch_size
+ 1] are present.

new mask format introduction:
MASK_1D_KEY_SEQ_LEN_START,  
[3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1],
query_start[0], ..., query_start[batch_size - 1], query_end[batch_size -
1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size -
1]]

e.g
2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this
1D mask is [3, 5, 0, 6, 12, 0, 6, 12]


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It potentially benefits tnlrv6 and t5(encoder)

---------

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com>
Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-03-23 11:05:17 -07:00
Hariharan Seshadri
7033346605 Support mask_filter_value attribute in DecoderMaskedMultiheadAttention (#15158) 2023-03-23 11:00:09 -07:00
Zhang Lei
910fc09de2
Using standard layernorm cuda kernel for skiplayernorm. (#15076)
* Current SkipLayernorm did not using stable algo and cause correctness
issue.
  * Enrich existing layernorm kernel to accept bias and residual.
* Tune standard layernorm threads.y according to elements and device
property.
  * Remove existing skiplayernorm cuda implementation.
2023-03-23 10:04:22 -07:00
Tianlei Wu
88a66a289b
Fix prune_graph and gpt attention fusion scripts (#15147)
Fix two issues: (1) GPT attention fusion: get_parent could return None when the input is
initializer, add a check (2) ONNX node could have optional inputs and outputs. During
prune_graph, we shall exclude empty inputs/outputs. Here we exclude ""
from output_name_to_node and input_name_to_nodes.

Add an option allow_remove_graph_inputs in prune_graph
2023-03-23 09:45:16 -07:00
Faith Xu
b82f94ac2e
Update labeler.yml for web (#15142)
### Description
Adds a few addl tags for web

---------

Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com>
2023-03-23 09:39:04 -07:00
Boris Fomitchev
559a21c7c3
Fixing CUDA12 build (#15135)
Removing flags for CUDA architectures not supported in CUDA12 SDK
Adding build flags for Hopper architecture, supported in CUDA12 SDK.
2023-03-23 09:36:51 -07:00
G. Ramalingam
efa1262614
Handle unused function inputs (#15130)
### Description

Fix issue relating to unused inputs of model-local functions. ORT
creates a schema for all such functions. The creation of this schema
does not handle unused function-inputs. The schema-creation relies on
the use of the function-inputs to infer type-constraints for the input,
and the code ends up creating an erroneous input-descriptor when there
is no use of the function-input.

The fix is to create an input with the given name, with a
type-constraint that allows all types.

### Motivation and Context
Fix https://github.com/microsoft/onnxruntime/issues/15046

Fix https://github.com/microsoft/onnx-script/issues/524

---------

Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
Co-authored-by: Scott McKay <skottmckay@gmail.com>
2023-03-23 08:12:46 -07:00
Justin Chu
896ab94780
Remove root in run_python_dockerbuild.sh (#15169)
Running docker in root causes the pipeline to be stateful and
subsequently fail
2023-03-23 06:32:36 -07:00
pengwa
1d32285536
Statistics tool for ORTModule convergence parity (#15020)
### Statistics tool for ORTModule convergence parity

As ORTModule get more and more validated, it is pretty fast to
intergrade PyTorch based model with ORT.

The same time, we need make sure once there is convergence issue, we
don't spend months of time to investigate. As part of this efforts, this
PR is introducing a tool to dump activation statistics without much
involvement from users. The dumping results contains only some statistic
numbers plus sampled data, which is not big, compared with dumping all
the tensors, it is much faster and space efficient.

For us to use it, two single lines are needed before wrapping ORTModule.
For baseline run, need also apply the same trick.

```
+	from onnxruntime.training.utils.hooks import SubscriberManager, StatisticsSubscriber
+	SubscriberManager.subscribe(model, [StatisticsSubscriber("pt_out", override_output_dir=True)])
```

Once you run the steps, following command can be used to merge result
into per-step-summary respectively for ORT and baseline runs.
 
```bash
python -m onnxruntime.training.utils.hooks.merge_activation_summary --pt_dir pt_out --ort_dir ort_out --output_dir /tmp/output
```

Docs is added here as part of this PR [convergence investigation
notes](https://github.com/microsoft/onnxruntime/blob/pengwa/conv_tool/docs/ORTModule_Convergence_Notes.md)

Based on the generated merged files, we can compare them with tools. 


![image](https://user-images.githubusercontent.com/10530022/224653929-4e4480bd-bb02-4bbe-bd44-2672bdf91a87.png)

### Design and Implementation

This PR introduced a common mechanism registering custom logic for
nn.Module's post forward hooks. And statistics for activation
(StatisticsSubscriber) is one of the implementations. If there is other
needs, we can define another XXSubscriber to do the customized things.
2023-03-23 20:34:24 +08:00
cloudhan
039ca10822
Move offline_tuning.py, so that the utility will be package with whl distribution (#15124)
Just file move.
2023-03-23 15:24:41 +08:00
Chi Lo
11dac121b2
Fix test_custom_op_get_const_input inference test on Android CI (#15132)
Fix test_custom_op_get_const_input for inference test on Android CI and
copy custom op libraries to the emulator.
2023-03-22 23:02:42 -07:00
Yufeng Li
dccbe9d492
exclude packed_attention* from rocm (#15161)
exclude Contrib op PackedAttention from ROCM EP
2023-03-23 13:58:57 +08:00
Rachel Guo
fc3a2a3771
[CoreML EP] Add Pad op support (#14946)
### Description
<!-- Describe your changes. -->

As title.

- Only support constant mode Pad for CoreML EP for now.
- Enable Pad tests for CoreML with inputs as initializer types.

CoreML Spec for reference:

https://apple.github.io/coremltools/mlmodel/Format/NeuralNetwork.html#paddinglayerparams


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fill operator gaps for ClipChamp models.

---------

Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-03-22 21:54:55 -07:00
pengwa
7bec80d92a
Fix reference count for autograd.Function (#15121)
### Fix reference count for autograd

When PythonOp kernel initialized, `AddPointerScalarArgs` creates
`const_args_` which put all non-tensor references (including
ProcessGroup, string, or other user types) in it.

In kernel's destructor, all ref cnt got decreased for `const_args_`. 


```
void PythonOpBase::Clear() {
  for (auto ptr : const_args_) {
    auto obj = reinterpret_cast<PyObject*>(ptr);
    Py_DECREF(obj);
  }
}
```

It means, we did not increase cnt, but just decrease cnt. Running the
unit, segmentation fault will be thrown. The simple fix is to remove the
Py_DECREF for those pointer-type constant inputs triggered by kernel
destructor.

NONTENSOR_OBJECT_POINTER_STORE is the place we increase the reference
during export, then the reference will remain until the python program
terminates.


Additionally tunings:
1. Move some logs into verbose instead of warning in case of flooding
training logs.
2. Move pointer type ref holding from python side
(NONTENSOR_OBJECT_POINTER_STORE) to
orttraining/orttraining/core/framework/torch/custom_function_register.h.
Then we use a consistent approach to manage all PythonOp related python
object/methonds ref count increasing and decreasing.
2023-03-23 12:51:50 +08:00
Yulong Wang
f972d21e81
[js] upgrade dependencies and enable strict mode (#14930)
### Description
This PR includes the following changes:
- upgrade js dependencies
- enable STRICT mode for web assembly build.
- corresponding fix for cmake-js upgrade
- corresponsing fix for linter upgrade
- upgrade default typescript compile option of:
    - `moduleResolution`: from `node` to `node16`
    - `target`: from `es2017` to `es2020`
- fix ESM module import in commonJS source file

## change explanation

### changes to onnxruntime_webassembly.cmake
`-s WASM=1` and `-s LLD_REPORT_UNDEFINED` in latest version is
by-default and deprecated.

### changes to onnxruntime_node.cmake
The npm package `cmake-js` updated its way to find file `node.lib`.
previously it downloads this file from Node.js public release channel,
and now it generates it from a definition file.

The node.js release channel does not contain a windows/arm64 version, so
previously cmake-js will fail to download `node.lib` for that platform.
this is why we made special handling to download the unofficial binary
to build. now this is no longer needed so we removed that from the cmake
file.

### changes to tsconfig.json
`node16` module resolution supports async import and `es2020` as target
supports top level await.
2023-03-22 15:05:04 -07:00
cloudhan
71b67ec1e2
Refactor ke register to be decentralized (#15036)
So that we can remove all unnecessay header files
2023-03-22 14:49:26 +08:00
Baiju Meswani
0086f7590d
LSTM and LSTM gradient implementation for training (#15034) 2023-03-21 21:44:08 -07:00
JiCheng
126e7bf15f
[AMX] add assembler check (#15055)
### Description
<!-- Describe your changes. -->

AMX isn't supportted until assembler 2.40 even though the GCC frontend
supports it.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-03-22 07:57:22 +08:00
Tianlei Wu
3e2d453b64
Supports model > 2GB in fp16 conversion with onnx shape inference (#15067)
(1) Allow model to be path, and use infer_shapes_path to fix
https://github.com/microsoft/onnxruntime/issues/15063
(2) Add some logging for float data truncation
(3) Add RandomUniformLike to default op_block_list
(4) Some minor changes to use f string.
2023-03-21 15:08:28 -07:00