onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-13 18:08:13 +00:00

Author	SHA1	Message	Date
FFFrog	ecb89ed752	[CANN] Multi-stream execution support for CANN EP. (#14058 ) ### Description Multi-stream execution support for CANN EP. ### Motivation and Context CANN EP is currently unavailable due to the introduction of a new mechanism for multi-stream execution [#13495](https://github.com/microsoft/onnxruntime/pull/13495), the deletion of the Fence-based synchronization mechanism, and the failure to update the relevant logic of CANN EP synchronously. This PR is to fix it.	2023-03-29 11:57:22 -07:00
Adrian Lizarraga	febc69e1b2	[QNN EP] Support Cast in HTP backend (#15234 ) ### Description Adds support for the Cast operator to the QNN HTP backend. ### Motivation and Context Enable more models to run on QNN HTP backend.	2023-03-29 11:01:34 -07:00
PeixuanZuo	a6279d4cfb	[ROCm] update Stable Diffusion benchmark to support ROCm EP (#15094 ) Update Stable Diffusion benchmark to support ROCm EP	2023-03-29 15:19:52 +08:00
Jian Chen	85948d6bc6	Cjian/windows update python3.11 (#15243 ) ### Description windows update python3.11 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-28 22:15:47 -07:00
Ryan Hill	659118f939	Prefast warning fixes (#15175 ) ### Description transpose.cc: Arithmetic overflow: Using operator '-' on a 4 byte value and then casting the result to a 8 byte value. Cast the value to the wider type before calling operator '-' to avoid overflow (io.2). cuda_provider_factory.cc: The type 'struct onnxruntime::ProviderInfo_CUDA_Impl' with a virtual function needs either public virtual or protected non-virtual destructor (c.35).	2023-03-28 21:36:03 -07:00
Tianlei Wu	f752bb9973	Update stable diffusion benchmark results: A100 and PyTorch 2.0 (#15195 ) Update stable diffusion benchmark results with A100 results and PyTorch 2.0 number.	2023-03-28 19:47:22 -07:00
Justin Chu	710d095124	Refactor the constant `_ONE` in `orttraining_test_ortmodule_api.py` (#15128 ) Follow up of https://github.com/microsoft/onnxruntime/pull/15097#discussion_r1142399537	2023-03-28 08:59:51 -07:00
Chen Fu	41ddcd30a1	Fp16 NHWC Max and Average Pooling (#15181 ) ### Description Max and average pooling operators for fp16, NHWC ### Motivation and Context Continue on the steps for fp16 inference support	2023-03-28 08:22:55 -07:00
PeixuanZuo	021e46179a	[ROCm] refactor GroupNorm to set vecterize number as template parameter (#15198 ) refactor GroupNorm to set vecterize number as template parameter.	2023-03-28 16:09:56 +08:00
Justin Chu	938e2136c6	Enable pylint and numpy rules (#15218 ) ### Description Enable pylint and numpy rules ### Motivation and Context Modernize numpy usage and enable more quality checks	2023-03-27 20:37:53 -07:00
PeixuanZuo	62b2947ac1	[ROCm] remove python3.7 from python packaging pipeline (#15230 ) remove python3.7 from python packaging pipeline. https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=289720&view=results	2023-03-28 10:37:04 +08:00
Changming Sun	462c6043b5	Remove Win8 support (#15219 ) ### Description Remove Win8 support since it is EOL. See https://learn.microsoft.com/en-us/lifecycle/announcements/windows-8-1-end-support-january-2023 ### Motivation and Context Simplify code.	2023-03-27 18:51:49 -07:00
Scott McKay	eb8f6c7c52	Transpose optimizer enhancements (#15117 ) ### Description <!-- Describe your changes. --> - Add debug infrastructure to dump out model at various stages of transpose optimization. - Handle more scenarios where Transpose -> Reshape can be merged. - Run L1 optimizers after layout transform to constant fold initializers that had their layout changed. - Use cost check for Concat post layout transform as pushing a Transpose through it can potentially add Transpose nodes to multiple other inputs. - Update internal testing EP to support test where you want it to take all nodes, use NHWC layout, and to use dummy static kernels instead of compiling so the ops in the graph post-initialization can be counted. - Misc cleanup in InferenceSession to not unnecessarily pass args to TransposeGraph for class members. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Address perf issue seen with model where a Transpose gets blocked by a Reshape that could have been treated as a Transpose. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-03-28 08:28:17 +10:00
Jian Chen	792d411135	Update python 3.11 and remove 3.7 for Linux (#15214 ) ### Description Update python 3.11 and remove 3.7 ### Motivation and Context Update python 3.11 and remove 3.7 --------- Co-authored-by: Ubuntu <chasun@chasunlinux.lw3b1xzoyrkuzm34swpscft0ff.dx.internal.cloudapp.net>	2023-03-27 14:46:30 -07:00
Edward Chen	ea40dc3ad6	Update build.py to disallow running as root user by default. (#15164 ) Try to address intermittent permissions issues that show up in non-transient CI environments.	2023-03-27 14:46:04 -07:00
Nat Kershaw (MSFT)	3064fa7611	Fix C API docs error (#15216 )	2023-03-27 14:34:18 -07:00
Jian Chen	527e006124	Update mlas (#15228 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-03-27 14:18:48 -07:00
Bengt Gustafsson	063ee8d504	Fixed some warnings that were treated as errors when compiling with D… (#15157 ) …ML in Win32/MSVC. ### Description Use onnxruntime::narrow to silence some warnings that are turned into errors when compiling the DML provider in Win32. Also one case of warning turned to error for mixing int loop variable type with a vector size() as upper bound. ### Motivation and Context Solves [https://github.com/microsoft/onnxruntime/issues/14595](https://github.com/microsoft/onnxruntime/issues/14595) Co-authored-by: bengt.gustafsson <bengt.gustafsson@contextvision.se>	2023-03-27 14:17:28 -07:00
Changming Sun	63cc1bb26a	Move Linux CPU pipelines to an AMD CPU pool which is cheaper (#15144 ) ### Description 1. Move Linux CPU pipelines to an AMD CPU pool which is cheaper 2. Enable CCache for orttraining pipeline ### Motivation and Context Azure AMD CPU machines are generally much cheaper than Intel CPU machines. However, they don't have local disks.	2023-03-27 14:10:08 -07:00
Patrice Vignola	67a6022c03	[DML EP] Add GroupNorm (#15189 ) Comparison between the different normalization operations: ![](https://user-images.githubusercontent.com/1041752/106491728-73d40680-64b7-11eb-8769-3f758996e959.png)	2023-03-27 12:52:53 -07:00
Tianlei Wu	2e56620611	Add file and line info in CudaCall and RocmCall macros (#15148 ) This PR add file and line information so that it is easy to trouble shoot the issue of cuda error. Update Rocm call as well for hipify.	2023-03-27 11:04:19 -07:00
Changming Sun	ffcfb1ec98	Remove protobuf submodule (#15190 ) ### Description Remove protobuf submodule as a follow-up of #13523 "Android CI Pipeline" and "Zip-Nuget-Java-Nodejs Packaging Pipeline" need to be tested. ### Motivation and Context It is related to [AB#11753](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/11753) Fixed [AB#14027](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/14027)	2023-03-27 10:35:49 -07:00
Justin Chu	e754edaecf	Run rustfmt in CI (#15217 ) I considered running clippy as well but ort takes too long to build	2023-03-27 08:12:59 -07:00
Patrice Vignola	b10aaf4e9c	Fix error message when running NhwcConv with a bad weight channel count (#15221 )	2023-03-27 00:15:19 -07:00
Yi Zhang	d182d34f1d	pause caching docker image in pipeline cache in Linux Aten Pipeline (#15227 ) ### Description Pause caching the docker images in pipeline cache in Linux Aten Pipeline. ### Motivation and Context We need to work out a better way to reduce the storage.	2023-03-27 11:06:53 +08:00
Adrian Lizarraga	d24b630fc3	[QNN EP] Support reduce ops with axes as initializer input (#15126 ) ### Description - Adds support for newer opset of Reduction operators (ReduceSum, ReduceMax, ReduceMin, ReduceMean, ReduceProd) with axes as an initializer input. - Adds tests for HTP and CPU backends. ### Motivation and Context Newer opset versions changed the `axes` attribute into an optional input. This PR adds support for these newer reduction operators as long as the axes input is defined as an initializer. The goal is to enable more models on QNN.	2023-03-26 16:39:22 -07:00
cloudhan	d3565779c3	Allow bert_perf_test.py to load/save tuning results (#15096 )	2023-03-26 18:03:08 +08:00
Chris Austen	93e6902790	resolve undefined symbol: rocblas_create_handle (#15204 ) Update migraphx section of onnxruntime_providers.cmake to add the rocblas library	2023-03-26 18:01:58 +08:00
Jian Chen	750747d8c9	Cjian/multi stage packaging pipeline (#14993 )	2023-03-24 23:39:15 -07:00
Hector Li	5a2e43bdd5	[QNN EP] Improve Slice to support opset 9 (#15186 ) ### Description Improve Slice to support Onnx opset9 which has starts, ends & axes in node attributes. ### Motivation and Context To unblock some models.	2023-03-24 16:07:06 -07:00
Justin Chu	d834ec895a	Adopt linrtunner as the linting tool - take 2 (#15085 ) ### Description `lintrunner` is a linter runner successfully used by pytorch, onnx and onnx-script. It provides a uniform experience running linters locally and in CI. It supports all major dev systems: Windows, Linux and MacOs. The checks are enforced by the `Python format` workflow. This PR adopts `lintrunner` to onnxruntime and fixed ~2000 flake8 errors in Python code. `lintrunner` now runs all required python lints including `ruff`(replacing `flake8`), `black` and `isort`. Future lints like `clang-format` can be added. Most errors are auto-fixed by `ruff` and the fixes should be considered robust. Lints that are more complicated to fix are applied `# noqa` for now and should be fixed in follow up PRs. ### Notable changes 1. This PR removed some suboptimal patterns: - `not xxx in` -> `xxx not in` membership checks - bare excepts (`except:` -> `except Exception`) - unused imports The follow up PR will remove: - `import *` - mutable values as default in function definitions (`def func(a=[])`) - more unused imports - unused local variables 2. Use `ruff` to replace `flake8`. `ruff` is much (40x) faster than flake8 and is more robust. We are using it successfully in onnx and onnx-script. It also supports auto-fixing many flake8 errors. 3. Removed the legacy flake8 ci flow and updated docs. 4. The added workflow supports SARIF code scanning reports on github, example snapshot: ![image](https://user-images.githubusercontent.com/11205048/212598953-d60ce8a9-f242-4fa8-8674-8696b704604a.png) 5. Removed `onnxruntime-python-checks-ci-pipeline` as redundant ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Unified linting experience in CI and local. Replacing https://github.com/microsoft/onnxruntime/pull/14306 --------- Signed-off-by: Justin Chu <justinchu@microsoft.com>	2023-03-24 15:29:03 -07:00
Dmitri Smirnov	2de15c5d50	Re-work OrtApi struct to satisfy C++20 compilers (#15183 ) ### Description <!-- Describe your changes. --> Remove `deletion` of copy functions from `OrtApi` as its initialization no longer compiles in C++20. Introduce a non-copyable member to implicitly delete copy ctor. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Inspired by https://github.com/microsoft/onnxruntime/pull/14901 Solution credits: @RyanUnderhill Cc: @georgthegreat	2023-03-24 13:52:17 -07:00
Justin Stoecker	dc87691000	Enable DML graph fusion independently of graph optimization level (#15172 ) ### Description Apply the DML graph fusion transformer optimization independently of ORT graph optimization level. ### Motivation and Context The DML graph fusion transformer is not a graph optimizer in the normal sense: it isn't optimizing the ONNX graph structure, but rather fusing nodes into what will later become a single IDMLCompiledOperator (using IDMLDevice1::CompileGraph). This transformer can't be done ahead of time (hence why it's disabled if saving an optimized model), but it's also gated by the ORT graph optimization level; this makes it impossible to preoptimize ONNX models ("offline mode") and then later disable graph optimizations for better startup performance ("online mode") while benefiting from DML graph fusion.	2023-03-24 13:50:17 -07:00
PeixuanZuo	56bccac35d	[ROCm] update bert-L convergence reference file to fix CI (#15200 ) The change of layernorm lead to the change of bert-L convergence result.	2023-03-24 21:43:44 +08:00
PeixuanZuo	7eb6dbe7d8	[ROCm] Add compute type for Skiplayernorm to fix ROCm CI (#15192 ) - Add compute type for Skiplayernorm to fix ROCm CI and get more accurate results. SkipLayerNorm: type T: input, skip, bias type U: epsilon, compute result type V: output, beta, gamma - refactor the usage of aligned_vector, reduce the usage of `reinterpret_cast`.	2023-03-24 19:31:14 +08:00
Patrice Vignola	3a4c895765	[DML EP] Add support for SkipLayerNorm's fourth output (#15160 )	2023-03-23 23:47:21 -07:00
Ye Wang	0402f930f2	exclude decoder files in hipify.cmake (#15188 )	2023-03-23 22:40:06 -07:00
Yi Zhang	5c5c345abc	Add smoking tests for all CPU Packages. (#15153 ) ### Description So far, 2 packages are not supported. 1. Mac silicon, because there isn't Mac silicon agent in Azure. 2. Linux ARM64, because there isn't microsoft-hosted Linux ARM64 agent in ADO and UsePythonVersion isn't supported in self-hosted Linux ARM pool. Test Run: https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=291132&view=logs&j=3a60a0ba-1640-5a1c-2d51-19af647b2d6b	2023-03-24 12:30:05 +08:00
Yi Zhang	338e6672dd	use build.sourceversion in cache image key (#15019 ) ### Description Use build.sourceversion in docker image cache key. ### Motivation and Context We used filpath as the cache key in #14496. In most cases, the docker base image tag is latest. So, the hash of the files couldn't be aware of the change of base image. As the result, the docker image restored, but the image will still be rebuilt . The maintenance cost would be huge if we pin image hash in docker file. For example, https://quay.io/repository/pypa/manylinux2014_x86_64?tab=tags&tag=latest, it's updated almost every week. So far, the build.sourceversion is the right way to keep cache is updated and valid.	2023-03-24 10:01:22 +08:00
Scott McKay	ea245c94e7	Add constant folding for simple QDQ Node Units (#15138 ) ### Description <!-- Describe your changes. --> Currently we bail on constant folding if QDQ is enabled and we hit a DQ node. However, if we have a simple DQ -> X -> Q node unit where the DQ and X do not produce graph outputs, their output only has one consumer, and X is deterministic, we can constant fold all three nodes. Add support for this simple scenario primarily to constant fold a QDQ model that has had initializers updated by layout transformation, which results in patterns like `initializer -> DQ -> Transpose -> Q` or `initializer- > DQ -> Unsqueeze -> Q -> DQ -> Transpose -> Q` if the initializer is broadcast. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Improve end result of layout transformation on a QDQ model.	2023-03-24 08:46:07 +10:00
dependabot[bot]	0200995058	Bump webpack from 5.75.0 to 5.76.0 in /js (#15159 ) Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.76.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/webpack/webpack/releases">webpack's releases</a>.</em></p> <blockquote> <h2>v5.76.0</h2> <h2>Bugfixes</h2> <ul> <li>Avoid cross-realm object access by <a href="https://github.com/Jack-Works"><code>@Jack-Works</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16500">webpack/webpack#16500</a></li> <li>Improve hash performance via conditional initialization by <a href="https://github.com/lvivski"><code>@lvivski</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16491">webpack/webpack#16491</a></li> <li>Serialize <code>generatedCode</code> info to fix bug in asset module cache restoration by <a href="https://github.com/ryanwilsonperkin"><code>@ryanwilsonperkin</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16703">webpack/webpack#16703</a></li> <li>Improve performance of <code>hashRegExp</code> lookup by <a href="https://github.com/ryanwilsonperkin"><code>@ryanwilsonperkin</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16759">webpack/webpack#16759</a></li> </ul> <h2>Features</h2> <ul> <li>add <code>target</code> to <code>LoaderContext</code> type by <a href="https://github.com/askoufis"><code>@askoufis</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16781">webpack/webpack#16781</a></li> </ul> <h2>Security</h2> <ul> <li><a href="https://github.com/advisories/GHSA-3rfm-jhwj-7488">CVE-2022-37603</a> fixed by <a href="https://github.com/akhilgkrishnan"><code>@akhilgkrishnan</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16446">webpack/webpack#16446</a></li> </ul> <h2>Repo Changes</h2> <ul> <li>Fix HTML5 logo in README by <a href="https://github.com/jakebailey"><code>@jakebailey</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16614">webpack/webpack#16614</a></li> <li>Replace TypeScript logo in README by <a href="https://github.com/jakebailey"><code>@jakebailey</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16613">webpack/webpack#16613</a></li> <li>Update actions/cache dependencies by <a href="https://github.com/piwysocki"><code>@piwysocki</code></a> in <a href="https://redirect.github.com/webpack/webpack/pull/16493">webpack/webpack#16493</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/Jack-Works"><code>@Jack-Works</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16500">webpack/webpack#16500</a></li> <li><a href="https://github.com/lvivski"><code>@lvivski</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16491">webpack/webpack#16491</a></li> <li><a href="https://github.com/jakebailey"><code>@jakebailey</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16614">webpack/webpack#16614</a></li> <li><a href="https://github.com/akhilgkrishnan"><code>@akhilgkrishnan</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16446">webpack/webpack#16446</a></li> <li><a href="https://github.com/ryanwilsonperkin"><code>@ryanwilsonperkin</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16703">webpack/webpack#16703</a></li> <li><a href="https://github.com/piwysocki"><code>@piwysocki</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16493">webpack/webpack#16493</a></li> <li><a href="https://github.com/askoufis"><code>@askoufis</code></a> made their first contribution in <a href="https://redirect.github.com/webpack/webpack/pull/16781">webpack/webpack#16781</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0">https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`97b1718720`"><code>97b1718</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16781">#16781</a> from askoufis/loader-context-target-type</li> <li><a href="`b84efe6224`"><code>b84efe6</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16759">#16759</a> from ryanwilsonperkin/real-content-hash-regex-perf</li> <li><a href="`c98e9e0014`"><code>c98e9e0</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16493">#16493</a> from piwysocki/patch-1</li> <li><a href="`5f34acfbc0`"><code>5f34acf</code></a> feat: Add <code>target</code> to <code>LoaderContext</code> type</li> <li><a href="`b7fc4d876d`"><code>b7fc4d8</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16703">#16703</a> from ryanwilsonperkin/ryanwilsonperkin/fix-16160</li> <li><a href="`63ea82da4d`"><code>63ea82d</code></a> Merge branch 'webpack:main' into patch-1</li> <li><a href="`4ba225225b`"><code>4ba2252</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16446">#16446</a> from akhilgkrishnan/patch-1</li> <li><a href="`1acd6350be`"><code>1acd635</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16613">#16613</a> from jakebailey/ts-logo</li> <li><a href="`302eb37fe1`"><code>302eb37</code></a> Merge pull request <a href="https://redirect.github.com/webpack/webpack/issues/16614">#16614</a> from jakebailey/html5-logo</li> <li><a href="`cfdb1dfe59`"><code>cfdb1df</code></a> Improve performance of hashRegExp lookup</li> <li>Additional commits viewable in <a href="https://github.com/webpack/webpack/compare/v5.75.0...v5.76.0">compare view</a></li> </ul> </details> <details> <summary>Maintainer changes</summary> <p>This version was pushed to npm by <a href="https://www.npmjs.com/~evilebottnawi">evilebottnawi</a>, a new releaser for webpack since your current version.</p> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=webpack&package-manager=npm_and_yarn&previous-version=5.75.0&new-version=5.76.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-03-23 15:17:52 -07:00
Nat Kershaw (MSFT)	28f64066de	Auto deploy API docs (#15088 )	2023-03-23 15:08:49 -07:00
Ye Wang	44ba23e0f5	Rename DecoderMaskedMHA to DecoderMaskedSelfAttn (#15166 ) ### Description <!-- Describe your changes. --> As synced offline, rename this op and will create another op for mha that supports both self and cross attention. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>	2023-03-23 12:31:38 -07:00
Ye Wang	2ee822d483	Extend memory efficient attention coverage in Attention/MHA cuda op (#15064 ) ### Description <!-- Describe your changes. --> 1. upgrade cutlass to 3.0 that containing attn_bias support. 2. extend Attention/MHA to use memory efficient attention when rel_pos_bias with [1, num_head, s, s] and 1d mask with [2 batch_size + 1] are present. new mask format introduction: MASK_1D_KEY_SEQ_LEN_START, [3 * batch_size + 2] with [key_len[0], ..., key_len[batch_size - 1], query_start[0], ..., query_start[batch_size - 1], query_end[batch_size - 1], key_start[0], ..., key_start[batch_size - 1], key_end[batch_size - 1]] e.g 2D mask with [[1, 1, 1, 0, 0, 0], [1, 1, 1, 1, 1, 0]] converts to this 1D mask is [3, 5, 0, 6, 12, 0, 6, 12] ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> It potentially benefits tnlrv6 and t5(encoder) --------- Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com> Co-authored-by: Kunal Vaishnavi <kvaishnavi@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2023-03-23 11:05:17 -07:00
Hariharan Seshadri	7033346605	Support mask_filter_value attribute in DecoderMaskedMultiheadAttention (#15158 )	2023-03-23 11:00:09 -07:00
Zhang Lei	910fc09de2	Using standard layernorm cuda kernel for skiplayernorm. (#15076 ) * Current SkipLayernorm did not using stable algo and cause correctness issue. * Enrich existing layernorm kernel to accept bias and residual. * Tune standard layernorm threads.y according to elements and device property. * Remove existing skiplayernorm cuda implementation.	2023-03-23 10:04:22 -07:00
Tianlei Wu	88a66a289b	Fix prune_graph and gpt attention fusion scripts (#15147 ) Fix two issues: (1) GPT attention fusion: get_parent could return None when the input is initializer, add a check (2) ONNX node could have optional inputs and outputs. During prune_graph, we shall exclude empty inputs/outputs. Here we exclude "" from output_name_to_node and input_name_to_nodes. Add an option allow_remove_graph_inputs in prune_graph	2023-03-23 09:45:16 -07:00
Faith Xu	b82f94ac2e	Update labeler.yml for web (#15142 ) ### Description Adds a few addl tags for web --------- Co-authored-by: Nat Kershaw (MSFT) <nakersha@microsoft.com>	2023-03-23 09:39:04 -07:00
Boris Fomitchev	559a21c7c3	Fixing CUDA12 build (#15135 ) Removing flags for CUDA architectures not supported in CUDA12 SDK Adding build flags for Hopper architecture, supported in CUDA12 SDK.	2023-03-23 09:36:51 -07:00
G. Ramalingam	efa1262614	Handle unused function inputs (#15130 ) ### Description Fix issue relating to unused inputs of model-local functions. ORT creates a schema for all such functions. The creation of this schema does not handle unused function-inputs. The schema-creation relies on the use of the function-inputs to infer type-constraints for the input, and the code ends up creating an erroneous input-descriptor when there is no use of the function-input. The fix is to create an input with the given name, with a type-constraint that allows all types. ### Motivation and Context Fix https://github.com/microsoft/onnxruntime/issues/15046 Fix https://github.com/microsoft/onnx-script/issues/524 --------- Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Scott McKay <skottmckay@gmail.com>	2023-03-23 08:12:46 -07:00

1 2 3 4 5 ...

8429 commits