onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Tianlei Wu	0bb4ea6797	Update BiasGelu fusion and related ops (#23518 ) ### Description (1) Update BiasGelu fusion to support onnx Gelu-20 Since onnx Gelu-20 supports float/double/bf16/fp16, here we update related ops to support these data types in CUDA and ROCm execution providers: (2) Add double support for Gelu/FastGelu op in CUDA/ROCm execution provider (3) Add BFloat16 support for Gelu ops in CUDA execution provider (4) Add unit tests (5) Update operator documents ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/23491	2025-01-30 22:53:59 -08:00
Alexis Tsogias	e20b529a32	Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs (#23090 ) * [CPU EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8, (u)int16, uint32 and uint64. * [CPU EP] Implement Neg unary operation for int16 * [CUDA EP] Implement Add/Sub/Mul/Div element wise operations for (u)int8 and (u)int16 ### Motivation and Context This solves https://github.com/microsoft/onnxruntime/issues/23051	2025-01-20 16:46:45 -08:00
Justin Chu	ad312d9677	Enable comprehension simplification in ruff rules (#23414 ) Enable comprehension simplification rules (C4) for ruff and apply autofix.	2025-01-17 08:43:06 -08:00
Ti-Tai Wang	a08211febb	Register opset 22 (#23344 ) ### Description Follw up #21897 To be compatible with onnx 17.0, Registering opset 22 is required in terms of the [updated operators (bfloat16)](https://github.com/onnx/onnx/releases/tag/v1.17.0) ### Motivation and Context Fix #23162 Fix #23161 Fix #23164 (Xnnpack) ### Remaining issue #23163 (QNN) See [the file](https://github.com/microsoft/onnxruntime/pull/23344/files#diff-04f5d6db0a6873f7299ed06ff1ec45a49e69f0865cb32f4397cd56db0cd0a784) ### Result of `find_optimizer_opset_version_updates_required.py (cpu only)` ``` [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_add_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.IsInf. Latest:20 Optimizer support ends at 10. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.HardSigmoid. Latest:22 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_add_act_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.AveragePool. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.BatchNormalization. Latest:15 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Upsample. Latest:10 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Resize. Latest:19 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.GlobalMaxPool. Latest:22 Optimizer support ends at 1. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.GlobalAveragePool. Latest:22 Optimizer support ends at 1. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc [WARNING] - Newer opset found for kOnnxDomain.Shape. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pre_shape_node_elimination.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_bn_fusion.cc [ERROR] - Call/Declaration is split over multiple lines. Please check manually.File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/label_encoder_fusion.cc Line:49 [ERROR] - Failed to find version information for "ai.onnx.ml".LabelEncoder. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/label_encoder_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.HardSigmoid. Latest:22 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_activation_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Dropout. Latest:22 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/dropout_elimination.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/gemm_transpose_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/gemm_transpose_fusion.cc [ERROR] - Symbolic name of 'ignorable_nodes[index].first' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/matmul_bn_fusion.cc [ERROR] - Symbolic name of 'dest.first' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/matmul_bn_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.AveragePool. Latest:22 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Pad. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Dropout. Latest:22 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/bias_dropout_fusion.cc [ERROR] - Failed to find version information for kMSDomain.BitmaskDropout. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/bias_dropout_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Clip. Latest:13 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/relu_clip_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/fast_gelu_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/fast_gelu_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Reshape. Latest:21 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/reshape_fusion.cc [ERROR] - Failed to find version information for kMSDomain.ConcatTraining. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/reshape_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Where. Latest:16 Optimizer support ends at 9. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/not_where_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Where. Latest:16 Optimizer support ends at 9. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/not_where_fusion.cc [WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_mul_fusion.cc [ERROR] - Symbolic name of 'QOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Symbolic name of 'QOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Symbolic name of 'DQOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Symbolic name of 'DQOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc [ERROR] - Call/Declaration is split over multiple lines. Please check manually.File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc Line:170 [WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_propagation.cc [ERROR] - Symbolic name of 'current_node.OpType(' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/compute_optimizer/upstream_transformer_base.cc [WARNING] - Newer opset found for kOnnxDomain.Reshape. Latest:21 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/compute_optimizer/upstream_reshape.cc [WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/attention_fusion_helper.h ```	2025-01-16 11:26:34 -08:00
Justin Chu	c7c8757a1c	Use ruff as the formatter to replace black-isort (#23397 ) Use ruff as the code formatter in place of black and isort since it is much faster, and as projects like PyTorch and ONNX have adopted ruff format as well. This PR include only auto-fixed changes in formatting.	2025-01-16 11:14:15 -08:00
Yi-Hong Lyu	e51bcfb541	Implement DepthToSpace uint8_t and Enable DropQDQNodesRules (#23352 ) ### Description <!-- Describe your changes. --> - Implemented the DepthToSpace uint8_t kernel. - Enabled DropQDQNodesRules for DepthToSpace. - Added unit tests for the DepthToSpace uint8_t kernel. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This commit aims to enhance the performance of the Image Super-Resolution INT8 Model (RFDN). Specifically, it improves the Inference Per Second (IPS) by 25%, providing a significant boost in efficiency and speed.	2025-01-15 19:24:50 -08:00
dependabot[bot]	1461a16e71	Bump ruff from 0.5.4 to 0.9.1 (#23328 ) Bumps [ruff](https://github.com/astral-sh/ruff) from 0.5.4 to 0.9.1. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/releases">ruff's releases</a>.</em></p> <blockquote> <h2>0.9.1</h2> <h2>Release Notes</h2> <h3>Preview features</h3> <ul> <li>[<code>pycodestyle</code>] Run <code>too-many-newlines-at-end-of-file</code> on each cell in notebooks (<code>W391</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li> <li>[<code>ruff</code>] Omit diagnostic for shadowed private function parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-bugbear</code>] Improve <code>assert-raises-exception</code> message (<code>B017</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Preserve trailing end-of line comments for the last string literal in implicitly concatenated strings (<a href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li> </ul> <h3>Server</h3> <ul> <li>Fix a bug where the server and client notebooks were out of sync after reordering cells (<a href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li> <li>[<code>pyupgrade</code>] Handle comments and multiline expressions correctly (<code>UP037</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li> </ul> <h2>Contributors</h2> <ul> <li><a href="https://github.com/AntoineD"><code>@AntoineD</code></a></li> <li><a href="https://github.com/InSyncWithFoo"><code>@InSyncWithFoo</code></a></li> <li><a href="https://github.com/MichaReiser"><code>@MichaReiser</code></a></li> <li><a href="https://github.com/calumy"><code>@calumy</code></a></li> <li><a href="https://github.com/dcreager"><code>@dcreager</code></a></li> <li><a href="https://github.com/dhruvmanila"><code>@dhruvmanila</code></a></li> <li><a href="https://github.com/dylwil3"><code>@dylwil3</code></a></li> <li><a href="https://github.com/sharkdp"><code>@sharkdp</code></a></li> <li><a href="https://github.com/tjkuson"><code>@tjkuson</code></a></li> </ul> <h2>Install ruff 0.9.1</h2> <h3>Install prebuilt binaries via shell script</h3> <pre lang="sh"><code>curl --proto '=https' --tlsv1.2 -LsSf https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.sh \| sh </code></pre> <h3>Install prebuilt binaries via powershell script</h3> <pre lang="sh"><code>powershell -ExecutionPolicy ByPass -c "irm https://github.com/astral-sh/ruff/releases/download/0.9.1/ruff-installer.ps1 \| iex" </code></pre> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's changelog</a>.</em></p> <blockquote> <h2>0.9.1</h2> <h3>Preview features</h3> <ul> <li>[<code>pycodestyle</code>] Run <code>too-many-newlines-at-end-of-file</code> on each cell in notebooks (<code>W391</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15308">#15308</a>)</li> <li>[<code>ruff</code>] Omit diagnostic for shadowed private function parameters in <code>used-dummy-variable</code> (<code>RUF052</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15376">#15376</a>)</li> </ul> <h3>Rule changes</h3> <ul> <li>[<code>flake8-bugbear</code>] Improve <code>assert-raises-exception</code> message (<code>B017</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15389">#15389</a>)</li> </ul> <h3>Formatter</h3> <ul> <li>Preserve trailing end-of line comments for the last string literal in implicitly concatenated strings (<a href="https://redirect.github.com/astral-sh/ruff/pull/15378">#15378</a>)</li> </ul> <h3>Server</h3> <ul> <li>Fix a bug where the server and client notebooks were out of sync after reordering cells (<a href="https://redirect.github.com/astral-sh/ruff/pull/15398">#15398</a>)</li> </ul> <h3>Bug fixes</h3> <ul> <li>[<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15394">#15394</a>)</li> <li>[<code>pyupgrade</code>] Handle comments and multiline expressions correctly (<code>UP037</code>) (<a href="https://redirect.github.com/astral-sh/ruff/pull/15337">#15337</a>)</li> </ul> <h2>0.9.0</h2> <p>Check out the <a href="https://astral.sh/blog/ruff-v0.9.0">blog post</a> for a migration guide and overview of the changes!</p> <h3>Breaking changes</h3> <p>Ruff now formats your code according to the 2025 style guide. As a result, your code might now get formatted differently. See the formatter section for a detailed list of changes.</p> <p>This release doesn’t remove or remap any existing stable rules.</p> <h3>Stabilization</h3> <p>The following rules have been stabilized and are no longer in preview:</p> <ul> <li><a href="https://docs.astral.sh/ruff/rules/stdlib-module-shadowing/"><code>stdlib-module-shadowing</code></a> (<code>A005</code>). This rule has also been renamed: previously, it was called <code>builtin-module-shadowing</code>.</li> <li><a href="https://docs.astral.sh/ruff/rules/builtin-lambda-argument-shadowing/"><code>builtin-lambda-argument-shadowing</code></a> (<code>A006</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/slice-to-remove-prefix-or-suffix/"><code>slice-to-remove-prefix-or-suffix</code></a> (<code>FURB188</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/boolean-chained-comparison/"><code>boolean-chained-comparison</code></a> (<code>PLR1716</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/decimal-from-float-literal/"><code>decimal-from-float-literal</code></a> (<code>RUF032</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/post-init-default/"><code>post-init-default</code></a> (<code>RUF033</code>)</li> <li><a href="https://docs.astral.sh/ruff/rules/useless-if-else/"><code>useless-if-else</code></a> (<code>RUF034</code>)</li> </ul> <p>The following behaviors have been stabilized:</p> <ul> <li><a href="https://docs.astral.sh/ruff/rules/pytest-parametrize-names-wrong-type/"><code>pytest-parametrize-names-wrong-type</code></a> (<code>PT006</code>): Detect <a href="https://docs.pytest.org/en/7.1.x/how-to/parametrize.html#parametrize"><code>pytest.parametrize</code></a> calls outside decorators and calls with keyword arguments.</li> </ul> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`12f86f39a4`"><code>12f86f3</code></a> Ruff 0.9.1 (<a href="https://redirect.github.com/astral-sh/ruff/issues/15407">#15407</a>)</li> <li><a href="`2b28d566a4`"><code>2b28d56</code></a> Associate a trailing end-of-line comment in a parenthesized implicit concaten...</li> <li><a href="`adca7bd95c`"><code>adca7bd</code></a> Remove pygments pin (<a href="https://redirect.github.com/astral-sh/ruff/issues/15404">#15404</a>)</li> <li><a href="`6b98a26452`"><code>6b98a26</code></a> [red-knot] Support <code>assert_type</code> (<a href="https://redirect.github.com/astral-sh/ruff/issues/15194">#15194</a>)</li> <li><a href="`c87463842a`"><code>c874638</code></a> [red-knot] Move tuple-containing-Never tests to Markdown (<a href="https://redirect.github.com/astral-sh/ruff/issues/15402">#15402</a>)</li> <li><a href="`c364b586f9`"><code>c364b58</code></a> [<code>flake8-pie</code>] Correctly remove wrapping parentheses (<code>PIE800</code>) (<a href="https://redirect.github.com/astral-sh/ruff/issues/15394">#15394</a>)</li> <li><a href="`73d424ee5e`"><code>73d424e</code></a> Fix outdated doc for handling the default file types with the pre-commit hook...</li> <li><a href="`6e9ff445fd`"><code>6e9ff44</code></a> Insert the cells from the <code>start</code> position (<a href="https://redirect.github.com/astral-sh/ruff/issues/15398">#15398</a>)</li> <li><a href="`f2c3ddc5ea`"><code>f2c3ddc</code></a> [red-knot] Move intersection type tests to Markdown (<a href="https://redirect.github.com/astral-sh/ruff/issues/15396">#15396</a>)</li> <li><a href="`b861551b6a`"><code>b861551</code></a> Remove unnecessary backticks (<a href="https://redirect.github.com/astral-sh/ruff/issues/15393">#15393</a>)</li> <li>Additional commits viewable in <a href="https://github.com/astral-sh/ruff/compare/0.5.4...0.9.1">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.5.4&new-version=0.9.1)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com> Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>	2025-01-15 11:11:17 -08:00
mingyue	4aca8f33df	[Bug Fix] Missing CustomOp SchemaRegister when generator EPContext ONNX model (#23091 ) ### Description Enhancements to EPContext Operations: 1. Introduced support for the bfloat16 data type in EPContext operations. 2. Bug Fix: Missing Custom OP Schema Registration when generator EPContext ONNX model --------- Co-authored-by: mingyue <mingyue@xilinx.com> Co-authored-by: Hector Li <hecli@microsoft.com>	2024-12-19 16:47:13 -08:00
Tianlei Wu	5afab787db	Update python version metadata (remove 3.7, 3.8, 3.9; add 3.13). (#23067 ) ### Description * Update python version metadata to be in sync with latest python packages (onnxruntime, onnxruntime-gpu and onnxruntime-qnn). * Update black format target-version to 3.10, and use lintrunner to format all files. * Update the lintrunner installation command line to be consistent. * Include `requirements-lintrunner.txt` in `requirements-dev.txt` to avoid duplicated settings. ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/22993 Python support by numpy: https://numpy.org/neps/nep-0029-deprecation_policy.html#drop-schedule ``` On Apr 05, 2024 drop support for Python 3.9 On Apr 04, 2025 drop support for Python 3.10 ```	2024-12-17 10:59:20 -08:00
Hector Li	401d16c671	Enable QNN HTP spill fill buffer setting to save RAM usage. (#22853 ) ### Description Enable QNN HTP spill fill buffer setting to save RAM usage. This feature is available after QNN 2.28. Need to re-generate QNN context binary. https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_backend.html#qnn-htp-backend-api Requirements: 1. Need to re-generate the Onnx model with QNN context binary by set the EP option enable_htp_spill_fill_buffer = 1. 2. Works for a model with multiple Context binaries. Need manually merge 2 Onnx model with context binary into 1 Onnx model. 3. Requires Linux platform if generate the context binary offline since QnnSystem lib is not available for Windows x86_64 platform. No need to do extra thing while running the model inference. The generated EPContext node will have a max_size attribute with the maximum spill fill buffer size for the context binary <img width="353" alt="image" src="https://github.com/user-attachments/assets/a3bf48be-a8da-4381-8a1d-3f2558eea37d"> --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-12-06 11:36:52 -08:00
Yulong Wang	7b0fa407eb	fix requirements.txt path (#22946 ) ### Description #22380 removes the file `tools/ci_build/github/linux/docker/inference/x86_64/python/cpu/scripts/requirements.txt` but it is still used in `dockerfiles/Dockerfile.cuda`. This change updates the file path of the requirements.txt fixes #22945.	2024-12-04 13:08:29 -08:00
Xavier Dupré	a2ba3cb547	Implementation of TreeEnsemble ai.onnx.ml==5 (#22333 ) ### Description Merges PR #21851, #21222. Implements TreeEnsemble from ai.onnx.ml==5 (CPU). --------- Co-authored-by: Bilyana Indzheva <bilyana2002@gmail.com> Co-authored-by: Bilyana Indzheva <36890669+bili2002@users.noreply.github.com> Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com>	2024-11-22 19:48:23 +01:00
Changming Sun	13346fdf18	Cleanup code (#22827 ) ### Description 1. Delete TVM EP because it is out of maintain 2. Delete ortmodule related docker files and scripts.	2024-11-19 14:13:33 -08:00
dtang317	12dfe2859c	Register groupnorm for opset 21 (#22830 ) ### Description This PR registers GroupNormalization for opset 21 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-14 10:06:30 -08:00
dtang317	9836ef1c89	register Identity and QLinearMatmul for opset21 (#22804 ) ### Description This PR registers the following opset 21 operators: Idenity-21 OlieanrMatmul-21 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-11-12 09:36:19 -08:00
Tianlei Wu	72186bbb71	[CUDA] Build nhwc ops by default (#22648 ) ### Description * Build cuda nhwc ops by default. * Deprecate `--enable_cuda_nhwc_ops` in build.py and add `--disable_cuda_nhwc_ops` option Note that it requires cuDNN 9.x. If you build with cuDNN 8, NHWC ops will be disabled automatically. ### Motivation and Context In general, NHWC is faster than NCHW for convolution in Nvidia GPUs with Tensor Cores, and this could improve performance for vision models. This is the first step to prefer NHWC for CUDA in 1.21 release. Next step is to do some tests on popular vision models. If it help in most models and devices, set `prefer_nhwc=1` as default cuda provider option.	2024-11-06 09:54:55 -08:00
Tianlei Wu	ba22d7879a	[CUDA/ROCm] Conditionally support ArgMax and ArgMin for opset 12 and above (#22713 ) ### Description Based on https://github.com/microsoft/onnxruntime/pull/9700, and extend it to ArgMin as well. This pull request introduces several enhancements and fixes related to the `ArgMax` and `ArgMin` operators in the CUDA execution provider. The changes ensure proper handling of these operators across different versions and improve kernel registration and fallback mechanisms. Key changes include: #### Enhancements to `ArgMax` and `ArgMin` Operators: * Added new kernel class registrations for `ArgMax` and `ArgMin` for different data types and versions in `onnxruntime/core/providers/cuda/cuda_execution_provider.cc`. [[1]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R966-R972) [[2]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R1209-R1215) [[3]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R1657-R1659) [[4]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285L1825-L1827) [[5]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R1933-R1939) [[6]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R2174-R2180) * Introduced `ArgMaxOrArgMinNeedFallbackToCPU` function to handle fallback to CPU when the `select_last_index` attribute is set to 1, as CUDA does not support this attribute. [[1]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R2597-R2622) [[2]](diffhunk://#diff-57ba769b54dce57acd89df47140ede5f29ea670d61176096076701912d573285R2672-R2674) #### Macro and Kernel Registration Improvements: * Replaced `REGISTER_KERNEL_UNTIL_VERSIONED_TYPED` with `REGISTER_KERNEL_VERSIONED_RANGE_TYPED` and `REGISTER_KERNEL_VERSIONED_SINCE_TYPED` macros for better version handling. [[1]](diffhunk://#diff-ee5316fc3898058f70e942d9a84de36be4c7da09f144633a2504236430d5d033L19-R29) [[2]](diffhunk://#diff-ee5316fc3898058f70e942d9a84de36be4c7da09f144633a2504236430d5d033L40-R46) * Updated kernel registration for `ArgMax` and `ArgMin` to use the new macros, ensuring proper version handling and support for different data types. #### Safety Checks: * Added safety checks in the `ArgMax` and `ArgMin` classes to ensure `select_last_index` is not set to 1, as it is not supported on CUDA. [[1]](diffhunk://#diff-8ab09fef1f4a12cbf3b3432e509f8f1ef561e83c72778a0e047780060aeef6efL91-R99) [[2]](diffhunk://#diff-8ab09fef1f4a12cbf3b3432e509f8f1ef561e83c72778a0e047780060aeef6efL101-R117) #### Testing Enhancements: * Added new tests for `ArgMax` and `ArgMin` operators to verify behavior when `select_last_index` is set to 0, ensuring compatibility with both CPU and CUDA execution providers. [[1]](diffhunk://#diff-77affe1b70d1a9d38c2485f7c6b16ef2b6b541ed94dd727bc9b286f068f1481aR3340-R3360) [[2]](diffhunk://#diff-77affe1b70d1a9d38c2485f7c6b16ef2b6b541ed94dd727bc9b286f068f1481aR3679-R3699) ### Motivation and Context Improve CUDA kernel coverage for stable diffusion model and hence improve its performance on CUDA	2024-11-06 09:54:32 -08:00
Tianlei Wu	120cb5a804	[Doc] Add I/O binding example using onnx data type in python API summary (#22695 ) ### Description Add I/O binding example using onnx data type in python API summary. The API is available since 1.20 release. ### Motivation and Context Follow up of https://github.com/microsoft/onnxruntime/pull/22306 to add some documentation.	2024-11-02 12:51:37 -07:00
dtang317	5b4e2a636b	DML EP Register Opset 21 (#22547 ) ### Description This PR registers the following opset 21 operators: - Size-21 - CastLike-21 - ConstantOfShape-21 - Flatten-21 - Pad-21 - Transpose-21 ### Motivation and Context	2024-10-25 09:21:19 -07:00
Hector Li	fc2be09386	Enable QLinearMatMul for opset21 (#22488 ) ### Description Enable QLinearMatMul for opset21	2024-10-22 14:33:36 -07:00
Akshay Sonawane	e5c2e50849	bumps up version in main from 1.20 -> 1.21 (#22482 ) Bump up version in main from 1.20.0 to 1.21.0 since the release branch has been cut.	2024-10-17 12:32:35 -07:00
mindest	1fa219d7d5	DecoderMaskedMultiHeadAttention CPU kernel. (#22292 ) ### Description DecoderMaskedMultiHeadAttention CPU kernel.	2024-10-12 13:43:00 -07:00
mindest	3c80aa9fee	Add CPU kernels for DynamicTimeWarping and UnfoldTensor. (#22033 ) ### Description Add CPU kernels for DynamicTimeWarping and UnfoldTensor.	2024-10-11 09:44:18 -07:00
kunal-vaishnavi	50bda44a70	Fix equation in MatMulNBits op spec (#22253 ) ### Description This PR fixes an equation in the MatMulNBits op spec. The old formula is stated as ``` [CeilDiv((N * n_blocks_per_col + 1) * bits, 8)] ``` but it should be stated as ``` [N * CeilDiv(n_blocks_per_col * bits, 8)] ``` or as ``` [N * FloorDiv((n_blocks_per_col + 1) * bits, 8)] ``` ### Motivation and Context For models such as ChatGLM where the column size is odd, the division math can be off. For example: ![image_360](https://github.com/user-attachments/assets/a5035bec-4dad-46af-9cb1-24a881eb70a0) With the old equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv((107 + 1) * 4, 8) = 4,096 * CeilDiv(108 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points = 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv((32 + 1) * 4, 8) = 13,696 * CeilDiv(33 * 4, 8) = 13,696 * 17 = 232,832 ``` With the new equation, the projections are calculated as follows. ``` # Down projection B = 4,096 x 107 x 64 zero_points = 221,184 N = 4,096 n_blocks_per_col = 107 4,096 * CeilDiv(107 * 4, 8) = 4,096 * 54 = 221,184 # Up projection B = 13,696 x 32 x 64 zero_points= 219,136 N = 13,696 n_blocks_per_col = 32 13,696 * CeilDiv(32 * 4, 8) = 13,696 * 16 = 219,136 ```	2024-10-01 09:31:56 -07:00
Patrice Vignola	20be51525b	Support if node with sequence outputs (#22234 ) `If` nodes can have sequence outputs. Those nodes are mapped to the DML EP to be able to keep the outputs on the GPU, but they actually execute on the CPU by selecting either the `then` subgraph or the `else` subgraph.	2024-09-27 12:40:01 -07:00
amarin16	eb2506d77a	Add MLFloat16 support for LayerNormalization, SkipLayerNormalization (#22063 ) Add `MLFloat16` support for: - `LayerNormalization` - `SimplifiedLayerNormalization` - `SkipLayerNormalization` - `SkipSimplifiedLayerNormalization` There are existing `LayerNormTest` unit tests that cover the `MLFloat16` functionality for `LayerNormalization` once `MLFloat16` is registered (for example [`LayerNormTest.LayerNorm_Scale_Float16Input`](`91c916f9c6/onnxruntime/test/contrib_ops/layer_norm_op_test.cc (L112)`)). Similarly, there are unit tests such as [`SkipLayerNormTest.SkipLayerNormBatch1_Float16`](`91c916f9c6/onnxruntime/test/contrib_ops/skiplayernorm_op_test.cc (L255)`) that cover MLFloat16 inputs for `SkipLayerNormalization`.	2024-09-24 15:06:27 -07:00
Ye Wang	6cc06ad069	GQA MLFloat16 cpu (#22102 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <you@example.com>	2024-09-24 09:51:59 -07:00
Tianlei Wu	0806879ad4	Update lintrunner requirements (#22185 ) ### Description * Add lintrunner to requirements-lintrunner.txt * Lock lintrunner and lintrunner-adapter version * Update documentation ### Motivation and Context The document is not up to date.	2024-09-23 18:27:16 -07:00
Christian Bourjau	1a84f53c35	Make argmin/armax support identical data types and add int64 support (#21641 )	2024-09-23 13:02:29 -07:00
liqun Fu	a89bddd5c2	Matmul_nbits kernel for mlas sqnbits to support Fp16 inputs (#21807 )	2024-09-13 14:55:08 -07:00
aciddelgado	7e2c722459	Add Continuous Decoding support in GQA (#21523 ) ### Description This PR will add support for Continuous Decoding for batch_size = 1 input. From now on, GQA can take arbitrary length input using seqlens_k as total_sequence_length - 1 and the sequence length of qkv as new_sequence_length. This change will not affect the default behavior of GQA ### Motivation and Context Prior to this change it was impossible to support sequence_length > 1 inputs when past context was given. This use case is essential to making continuous decoding work, which is one of our current efforts in ORT-GenAI.	2024-09-13 13:21:11 -07:00
aciddelgado	509cb54d6f	softcap gqa (#21683 ) ### Description Implement softcap for gqa. ### Motivation and Context Fixes certain models like Gemma-2 which need softcap to work so they don't output nan's.	2024-08-30 19:11:04 -07:00
Jing Fang	5dee95fa10	[CUDA] Support CUDA EP blocked quantization in Q/DQ ops. (#21846 ) ### Description 1. Added CUDA EP support for blocked quantization in QuantizeLinear and DequantizeLinear ops. 2. Currently CUDA EP blocked quantization only supports int4/uint4 quantized types and float32/float16 unquantized types. 3. Added CUDA EP support in QDQ selector/action transformer. CUDA EP is only added to DQ + MatMul -> MatMulNBits rule. Other rules' EP support are not changed. ### Motivation and Context ONNX opset 21 introduced blocked quantization for Q/DQ opts. ORT originally only supports CPU EP blocked quantization.	2024-08-30 18:28:00 -07:00
Ye Wang	1d059b8702	Phi3 MoE cuda kernel (#21819 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Your Name <you@example.com>	2024-08-27 09:21:30 -07:00
Tianlei Wu	6e57576988	Support Smooth Softmax in GroupQueryAttention (#21867 ) ### Description Softmax (formula 1) is like the following: ```math y_{i} = \frac{exp(x_{i})}{\sum_{i} exp(x_{i})} ``` After applying softmax, each element will be in the range of $(0, 1)$, and the elements will add up to 1, so that they can be interpreted as probabilities. However, in language model, softmax has two issues: * When all elements are -inf (for example, a whole row is masked when a query token is padding), the result is not defined since exp(-inf)=0 and divided-by-zero is encountered in the above formula. * Why do we need normalize in a way that each query word are treated as equal important (each row has sum equals to1)? Smooth Softmax (formula 2) is a modified version that introduces a smooth factor like the following: ```math s_{i} = \frac{exp(x_{i})}{1+ \sum_{i} exp(x_{i})} ``` This formula could tackle the above two issues: * It could handle the special case that all elements are -inf: the result $s_{i}$ is 0 for every element in such case. * Sum of all elements $\sum_{i}{s_{i}} = \frac{\sum_{i}{exp(x_{i})}}{1+ \sum_{i} exp(x_{i})}$ is in the range of (0, 1), so that we can train the model to assign different importance to different query words. Since exponential is prone to overflow or underflow, to get stable result, formula 3 can be used: ```math s_{i} = \frac{exp(x_{i} + c)}{exp(c)+ \sum_{i} exp(x_{i} +c)} ``` c can be any value in theory. In practical, choice of constant c shall avoid $exp(c)$ and $exp(x_{i} +c)$ overflow (or underflow) at the same time. A reasonable choice is like formula 4: ```math c=-\max_{i} \{ x_i \} ``` or apply a constraint that c <=0 like the following formula 5: ```math c=-\max(0, \max_{i} \{ x_i \}) ``` The latter one (formula 5) ensures that $s_{i}$ will fallback to formula 2 when all elements are negative. For CPU provider, smooth softmax is implemented in MLAS. CPU implementation uses formula 5. @wangyems implemented the smooth softmax in flash attention for CUDA, which requires Ampere or newer GPU. The implementation of smooth softmax in flash attention uses formula 4. --------- Co-authored-by: Ye Wang	2024-08-26 23:13:15 -07:00
Patrice Vignola	de6ebcbb54	[DML] Add int4 QDQ (#21592 )	2024-08-20 23:44:58 -07:00
Yi Zhang	9f7e19cedd	[Fix] Make python API doc generation in Microsoft-hosted Agent (#21766 ) ### Description <!-- Describe your changes. --> ### Motivation and Context 1. Python API doc needs to be merged from a fork, but 1ES self-hosted pool is only for one github repo. 2. ubuntu-latest will be install numpy above 2.0 by default, and current python API doc generation doesn't support it. So I pin numpy < 2.0.0 ---------	2024-08-20 23:32:38 +08:00
Tianlei Wu	d79e3c5791	Extend Attention Bias Broadcast Support (#21710 ) ### Description Previously, MultiHeadAttention supports relative position bias of shape [1, N, S, T] or [B, N, S, T], and DecoderMaskedMultiHeadAttention supports [1, N, S, T]. This will extend the support to allow [1, N, S, T], [B, N, S, T], [B, 1, S, T] and [1, 1, S, T] for CUDA and CPU EPs. - [x] Rename the input of "relative position bias" to "attention bias" because it can also be used for other types of bias, like ALiBi (Attention with Linear Biases) or attention mask. - [x] Update unfused kernel to support broadcasting 2nd dimension of attention bias. - [x] Update efficient attention to support broadcasting 2nd dimension of attention bias. - [x] Update operators (MultiHeadAttention, DecoderMaskedMultiHeadAttention, Attention, PackedAttention, PackedMultiHeadAttention) to support broadcast attention bias on CUDA and CPU EPs. - [x] Update ROCm, DML and WebGPU naming to be consistent. (Note that those EPs do not support broadcasting attention_bias for now). - [x] Add attention bias tests for MultiHeadAttention. - [x] Update operator documents - [x] Update benchmark script Other changes: * Fix some checks in multihead-attention.ts * Add helper functions to dump tensors given dimensions.	2024-08-16 15:40:04 -07:00
Yi Zhang	b92908e197	[Fix] Python API doc generation (#21717 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Make Python API doc generation workflow work. ### Verification Run https://github.com/microsoft/onnxruntime/actions/runs/10364762858	2024-08-14 08:48:29 +08:00
Jing Fang	f30581ed2c	[CPU EP] Add block quantized Gather contrib op (#21630 ) ### Description Add a gather that supports block-quantized input data. ### Motivation and Context To support Web inference scenario with quantized vocabulary embeddings.	2024-08-09 12:15:11 -07:00
Edward Chen	a5ce65d87a	Clean up some mobile package related files and their usages. (#21606 ) The mobile packages have been removed.	2024-08-05 16:38:20 -07:00
Prathik Rao	134f47743e	bumps up version in main from 1.19 -> 1.20 (#21588 ) Bump up version in main from 1.19.0 to 1.20.0 since the release branch has been cut.	2024-08-05 15:46:04 -07:00
Atanas Dimitrov	d0a6f57d74	Add reduce kernels for bigger types (#21490 )	2024-08-01 12:21:16 -07:00
Yi-Hong Lyu	530a2d7b41	Enable FP16 Clip and Handle Bias in FP16 Depthwise Conv (#21493 ) - Improved accuracy for face-detection, image-classification, and object-detection in the GeekBench ML benchmark on ARM64. - Fixed issue https://github.com/microsoft/onnxruntime/issues/18992	2024-07-30 03:49:14 -07:00
aamajumder	166809425e	[DML EP] Register ReduceMin-20 (#20477 ) ### Description This PR registers the ReduceMin-20 operator to the DML EP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-25 17:06:30 -07:00
Preetha Veeramalai	ca47f0fdd3	OVEP - PR 1.19 (#21443 ) ### Description Add OVEP features for 1.19 The PR has, - Added support for EpCtx with ORT Session options for optimized performance. - Added bug fixes - Support for OV 2024.3 --------- Co-authored-by: ubuntu <ubuntu@ubuntu-mtlp-118727.iind.intel.com> Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com> Co-authored-by: Maheshkar <ankit.maheshkar@intel.com>	2024-07-24 23:45:31 -07:00
Sheil Kumar	dd010edb37	Update DirectML from 1.14.1 to 1.15.0 (#21323 ) Update DirectML from 1.14.1 to 1.15.0 --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>	2024-07-22 16:59:03 -07:00
Prathik Rao	11ad299451	Adds ATen fallback for scaled_dot_product_attention (#21107 ) ### Description <!-- Describe your changes. --> Introduces an ATen fallback for `torch.nn.functional.scaled_dot_product_attention`. This operator was introduced in torch 2.0 and, since then, has had many updates including the implementation of memory efficient attention for V100 machines. The current torchscript exporter exports a subgraph for attention which does not provide the same memory savings that PyTorch's memory efficient attention kernel provides. Allowing fallback to PyTorch ATen op for attention helps mitigate memory spike issues for models leveraging memory efficient attention. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Memory issues arose when integrating ONNX Runtime Training with AML Stable Diffusion. --------- Co-authored-by: root <prathikrao@microsoft.com>	2024-07-22 16:37:04 -07:00
mindest	5b9369e93c	Fix typos according to reviewdog report. (#21335 ) ### Description Fix typos based on reviewdog report but with some exceptions/corrections.	2024-07-22 13:37:32 -07:00
Tianlei Wu	7d9b12a2e3	[CPU] SparseAttention op (#21110 ) Add SparseAttention cpu implementation. - [x] Refactoring GQAAttentionBase - [x] Add SparseAttention implementation - [x] Add test cases This is unfused version. Flash attention version will be added later.	2024-07-03 21:51:57 -07:00

1 2 3 4 5 ...

751 commits