onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-27 03:11:28 +00:00

Author	SHA1	Message	Date
Maximilian Müller	d8d8349a1b	fix: add missing nullptr of SessionOptions V2 (#16794 ) /builds/devtechproviz/dl/ort-builder/onnxruntime/onnxruntime/python/onnxruntime_pybind_state.cc:388:14: error: missing initializer for member 'OrtTensorRTProviderOptionsV2::trt_cuda_graph_enable' [-Werror=missing-field-initializers] 388 \| 0}; \| ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-24 15:17:11 -07:00
Wanming Lin	5d17bcd776	[WebNN EP] Support Greater and Less ops (#16782 )	2023-07-24 09:08:53 -07:00
PeixuanZuo	8ede2f139e	[ROCm] Optimize ROCm CI pipeline 2 (#16691 ) - Set `KERNEL_EXPLORER_TEST_USE_CUPY=1` to replace numpy with cupy on kernel explorer test. KERNEL_EXPLORER_TEST_USE_CUPY=0 The CPU utilization is shown as below: ![image](https://github.com/microsoft/onnxruntime/assets/94887879/91724b78-0b4e-4cbd-ad88-83cad9976472) KERNEL_EXPLORER_TEST_USE_CUPY=1 The CPU utilization is shown as below: ![image](https://github.com/microsoft/onnxruntime/assets/94887879/58239911-667c-4d5f-bb78-deca60d0266f) - Use `Bash@3`. - Update shell script.	2023-07-24 13:57:48 +08:00
Chi Lo	21ef14476b	Bug fix for nested control flow ops for TRT EP (#16343 ) Current TRT EP can support model which has nested control flow ops (multiple level subgraphs). But it fails at a case where the subgraph has outer scope value that is defined several levels up in the top-level graph, in this case, the outer scope value is the input of the top-level graph. The outer scope values are not properly handled during TRT EP's subgraph reconstruction stage and fails at `graph.resolve()`. The way ORT gets capability from EPs is a bottom-up approach meaning inner most subgraph gets handled first. TRT EP reconstructs each subgraph level by level and following modifications are made to fix the outer scope values issue: - `SetGraphOuterScopeValuesAndInputs()` and `SetAllGraphInputs()` are added to handle outer scope values and add those values as graph inputs if needed in order to make `graph.resolve()` happy. - Change to use `GetNodeArgIncludingParentGraphs` so that when creating the fused TRT node for some subgraphs in` Graph::CreateFusedSubGraphNode()`, it can get the NodeArgs for outer scope values from top-level graph. This PR fixes https://github.com/microsoft/onnxruntime/issues/16217	2023-07-23 16:16:17 -07:00
pengwa	40277b7f37	Fix orttraining-linux-gpu-ci-pipeline - LargeSizeTensorUInt64Index tests (#16820 ) ### Disable large index tests due to limited GPU mem Recently following two tests fail due to GPU mem not enough, not sure what else program running using GPU as well. So disable them for now to unblock the required CI. ``` 1: [ FAILED ] 2 tests, listed below: 1: [ FAILED ] CrossEntropyTest.SoftmaxCrossEntropyLossInternal_LargeSizeTensorUInt64Index 1: [ FAILED ] CrossEntropyTest.SoftmaxCrossEntropyLossInternalGrad_LargeSizeTensorUInt64Index 2023-07-23T02:15:39.7559251Z 1: [ RUN ] CrossEntropyTest.SoftmaxCrossEntropyLossInternal_LargeSizeTensorUInt64Index 2023-07-23T02:16:53.0904576Z 1: 2023-07-23 02:16:53.089586592 [E:onnxruntime:SoftmaxCrossEntropyLossInternal, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running SoftmaxCrossEntropyLossInternal node. Name:'node1' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* *onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4294973440** 2023-07-23T02:16:53.0905775Z 1: 2023-07-23T02:16:53.0906087Z 1: /onnxruntime_src/onnxruntime/test/providers/base_tester.cc:323: Failure 2023-07-23T02:16:53.0906698Z 1: Expected equality of these values: 2023-07-23T02:16:53.0907086Z 1: expect_result 2023-07-23T02:16:53.0907564Z 1: Which is: 4-byte object <00-00 00-00> 2023-07-23T02:16:53.0973055Z 1: ExpectResult::kExpectFailure 2023-07-23T02:16:53.0973984Z 1: Which is: 4-byte object <01-00 00-00> 2023-07-23T02:16:53.0975375Z 1: Run failed but expected success: Non-zero status code returned while running SoftmaxCrossEntropyLossInternal node. Name:'node1' Status Message: /onnxruntime_src/onnxruntime/core/framework/bfc_arena.cc:376 void* onnxruntime::BFCArena::AllocateRawInternal(size_t, bool, onnxruntime::Stream*, bool, onnxruntime::WaitNotificationFn) Failed to allocate memory for requested buffer of size 4294973440 2023-07-23T02:16:53.0976198Z 1: 2023-07-23T02:16:53.0976483Z 1: Google Test trace: 2023-07-23T02:16:53.0976818Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910 2023-07-23T02:16:53.0977229Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910 2023-07-23T02:16:53.0977639Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 2345 2023-07-23T02:16:53.0978035Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 5678 2023-07-23T02:16:53.0978441Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1234 2023-07-23T02:16:53.1303810Z 1: /onnxruntime_src/orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc:443: Failure 2023-07-23T02:16:53.1304644Z 1: Expected equality of these values: 2023-07-23T02:16:53.1304974Z 1: ret.first 2023-07-23T02:16:53.1305685Z 1: Which is: 4-byte object <04-00 00-00> 2023-07-23T02:16:53.1306030Z 1: COMPARE_RESULT::SUCCESS 2023-07-23T02:16:53.1306414Z 1: Which is: 4-byte object <00-00 00-00> 2023-07-23T02:16:53.1306754Z 1: Unsupported compare with CompareOrtValueNumerals. 2023-07-23T02:16:53.1307487Z 1: Google Test trace: 2023-07-23T02:16:53.1307848Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910 2023-07-23T02:16:53.1308252Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910 2023-07-23T02:16:53.1308652Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 2345 2023-07-23T02:16:53.1309068Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 5678 2023-07-23T02:16:53.1309460Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1234 2023-07-23T02:16:53.1309889Z 1: /onnxruntime_src/orttraining/orttraining/test/training_ops/cuda/cross_entropy_test.cc:443: Failure 2023-07-23T02:16:53.1310239Z 1: Expected equality of these values: 2023-07-23T02:16:53.1310527Z 1: ret.first 2023-07-23T02:16:53.1310893Z 1: Which is: 4-byte object <04-00 00-00> 2023-07-23T02:16:53.1311208Z 1: COMPARE_RESULT::SUCCESS 2023-07-23T02:16:53.1311600Z 1: Which is: 4-byte object <00-00 00-00> 2023-07-23T02:16:53.1311921Z 1: Unsupported compare with CompareOrtValueNumerals. 2023-07-23T02:16:53.1312229Z 1: Google Test trace: 2023-07-23T02:16:53.1312556Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910 2023-07-23T02:16:53.1312951Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 8910 2023-07-23T02:16:53.1313362Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 2345 2023-07-23T02:16:53.1313749Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 5678 2023-07-23T02:16:53.1314156Z 1: /onnxruntime_src/onnxruntime/test/common/random_generator.h:49: ORT test random seed: 1234 2023-07-23T02:16:53.4476437Z 1: [ FAILED ] CrossEntropyTest.SoftmaxCrossEntropyLossInternal_LargeSizeTensorUInt64Index (73692 ms) ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-23 15:02:09 +08:00
Yi Zhang	3252ff2cb7	Change DML GPU pool in Windows GPU workflow use Visual Studio 2022 (#16784 ) ### Description 1. use the pool with VS2022 2. upgrade System.Memory to 4.5.5 ### Motivation and Context Solve the build error while using VS2022: `[Failure] Msbuild failed when processing the file 'D:\a\_work\1\s\csharp\src\Microsoft.ML.OnnxRuntime\Microsoft.ML.OnnxRuntime.csproj' with message: Method not found: 'System.ReadOnlySpan`1<Char> Microsoft.IO.Path.GetFileName(System.ReadOnlySpan`1<Char>)'` Ref: https://stackoverflow.com/questions/73399777/azure-build-failing-due-to-method-not-found-system-readonlyspan1char-micros	2023-07-23 10:07:21 +08:00
dependabot[bot]	b92f02ad48	Bump word-wrap from 1.2.3 to 1.2.4 in /js/react_native (#16755 ) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/jonschlinkert/word-wrap/releases">word-wrap's releases</a>.</em></p> <blockquote> <h2>1.2.4</h2> <h2>What's Changed</h2> <ul> <li>Remove default indent by <a href="https://github.com/mohd-akram"><code>@mohd-akram</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li> <li>🔒fix: CVE 2023 26115 (2) by <a href="https://github.com/OlafConijn"><code>@OlafConijn</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li> <li>🔒 fix: CVE-2023-26115 by <a href="https://github.com/aashutoshrathi"><code>@aashutoshrathi</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li> <li>chore: publish workflow by <a href="https://github.com/OlafConijn"><code>@OlafConijn</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/42">jonschlinkert/word-wrap#42</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/mohd-akram"><code>@mohd-akram</code></a> made their first contribution in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li> <li><a href="https://github.com/OlafConijn"><code>@OlafConijn</code></a> made their first contribution in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li> <li><a href="https://github.com/aashutoshrathi"><code>@aashutoshrathi</code></a> made their first contribution in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f64b188c72`"><code>f64b188</code></a> run verb to generate README</li> <li><a href="`03ea08256b`"><code>03ea082</code></a> Merge pull request <a href="https://redirect.github.com/jonschlinkert/word-wrap/issues/42">#42</a> from jonschlinkert/chore/publish-workflow</li> <li><a href="`420dce9a24`"><code>420dce9</code></a> Merge pull request <a href="https://redirect.github.com/jonschlinkert/word-wrap/issues/41">#41</a> from jonschlinkert/fix/CVE-2023-26115-2</li> <li><a href="`bfa694edf5`"><code>bfa694e</code></a> Update .github/workflows/publish.yml</li> <li><a href="`ace0b3c78f`"><code>ace0b3c</code></a> chore: bump version to 1.2.4</li> <li><a href="`6fd7275946`"><code>6fd7275</code></a> chore: add publish workflow</li> <li><a href="`30d6daf60f`"><code>30d6daf</code></a> chore: fix test</li> <li><a href="`655929cabe`"><code>655929c</code></a> chore: remove package-lock</li> <li><a href="`49e08bbc32`"><code>49e08bb</code></a> chore: added an additional testcase</li> <li><a href="`9f626935f3`"><code>9f62693</code></a> fix: cve 2023-26115</li> <li>Additional commits viewable in <a href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=word-wrap&package-manager=npm_and_yarn&previous-version=1.2.3&new-version=1.2.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-22 13:36:49 -07:00
dependabot[bot]	dafe11839e	Bump word-wrap from 1.2.3 to 1.2.4 in /js (#16754 ) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/jonschlinkert/word-wrap/releases">word-wrap's releases</a>.</em></p> <blockquote> <h2>1.2.4</h2> <h2>What's Changed</h2> <ul> <li>Remove default indent by <a href="https://github.com/mohd-akram"><code>@mohd-akram</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li> <li>🔒fix: CVE 2023 26115 (2) by <a href="https://github.com/OlafConijn"><code>@OlafConijn</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li> <li>🔒 fix: CVE-2023-26115 by <a href="https://github.com/aashutoshrathi"><code>@aashutoshrathi</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li> <li>chore: publish workflow by <a href="https://github.com/OlafConijn"><code>@OlafConijn</code></a> in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/42">jonschlinkert/word-wrap#42</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/mohd-akram"><code>@mohd-akram</code></a> made their first contribution in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/24">jonschlinkert/word-wrap#24</a></li> <li><a href="https://github.com/OlafConijn"><code>@OlafConijn</code></a> made their first contribution in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/41">jonschlinkert/word-wrap#41</a></li> <li><a href="https://github.com/aashutoshrathi"><code>@aashutoshrathi</code></a> made their first contribution in <a href="https://redirect.github.com/jonschlinkert/word-wrap/pull/33">jonschlinkert/word-wrap#33</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f64b188c72`"><code>f64b188</code></a> run verb to generate README</li> <li><a href="`03ea08256b`"><code>03ea082</code></a> Merge pull request <a href="https://redirect.github.com/jonschlinkert/word-wrap/issues/42">#42</a> from jonschlinkert/chore/publish-workflow</li> <li><a href="`420dce9a24`"><code>420dce9</code></a> Merge pull request <a href="https://redirect.github.com/jonschlinkert/word-wrap/issues/41">#41</a> from jonschlinkert/fix/CVE-2023-26115-2</li> <li><a href="`bfa694edf5`"><code>bfa694e</code></a> Update .github/workflows/publish.yml</li> <li><a href="`ace0b3c78f`"><code>ace0b3c</code></a> chore: bump version to 1.2.4</li> <li><a href="`6fd7275946`"><code>6fd7275</code></a> chore: add publish workflow</li> <li><a href="`30d6daf60f`"><code>30d6daf</code></a> chore: fix test</li> <li><a href="`655929cabe`"><code>655929c</code></a> chore: remove package-lock</li> <li><a href="`49e08bbc32`"><code>49e08bb</code></a> chore: added an additional testcase</li> <li><a href="`9f626935f3`"><code>9f62693</code></a> fix: cve 2023-26115</li> <li>Additional commits viewable in <a href="https://github.com/jonschlinkert/word-wrap/compare/1.2.3...1.2.4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=word-wrap&package-manager=npm_and_yarn&previous-version=1.2.3&new-version=1.2.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) Dependabot will merge this PR once it's up-to-date and CI passes on it, as requested by @fs-eire. [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-07-22 13:36:38 -07:00
Ted Themistokleous	488544b79a	[MIGraphX EP] Fix CopyTensorAsync and add guards for stream sync CopyTensors (#16787 ) Add compile guards to gate functionality based on MIGRAPHX_STREAM_SYNC for adding the following - remove excess hipStreamSyncronize to nullstream on CopyTensor calls - Add proper call for stream synchronized CopyTensorAsync for DeviceToHost case Without this change subsequent CopyTensorAsync() calls will fail for cards that don't use pinned memory thus causing hipMemcpy() calls to occur before certain kernel operations occur. ![image](https://github.com/microsoft/onnxruntime/assets/107195283/4915c18a-fb2d-40c9-a50e-a7c6613c324b) becomes ![image](https://github.com/microsoft/onnxruntime/assets/107195283/f661acf4-e2af-4c9a-b26a-30fca339cf1d) --------- Co-authored-by: Ted Themistokleous <tthemist@amd.com>	2023-07-22 09:48:36 +08:00
Arthur Islamov	210d29b40e	Allow --build_wasm on a mac system (#16761 ) ### Description Changes allow downloading prebuilt protoc compiler when building WebAssebly version on mac systems. Otherwise it tries to build a js/wasm version of protoc and throws an error while executing it: "protoc.js permission denied" ### Motivation and Context I need to switch between my main working computer and a PC to make changes to WebAssebly build. Would like not to do that anymore.	2023-07-21 14:21:37 -07:00
Jiajia Qin	193415a162	[js/webgpu] reuse buffer for GpuDataManager (#16746 ) ### Description <!-- Describe your changes. --> Allocating new GPUBuffer in every session.run is not efficient. We should make it only happen in the first run. In the following runs, we should try to reuse those buffers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - This PR is for performance. See mobilenetv2 becomes 9.58 ms from 12.9 ms.	2023-07-21 13:13:01 -07:00
Justin Chu	d79515041c	[Better Engineering] Bump ruff to 0.0.278 and fix new lint errors (#16789 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * __->__ #16789 Bump ruff to 0.0.278 and fix new lint errors. I added noqa to all existing RUF012 errors which requires mutable class variables to be annotated with `ClassVar`, as well as all PERF issues. Signed-off-by: Justin Chu <justinchu@microsoft.com>	2023-07-21 12:53:41 -07:00
Justin Chu	d3295f4329	[Better Engineering] Fix N802 lint errors in tests (#16788 ) Stack from [ghstack](https://github.com/ezyang/ghstack) (oldest at bottom): * #16789 * __->__ #16788 This change fixes the N802 lint errors by renaming the test case to use snake case.	2023-07-21 09:17:34 -07:00
Adam Pocock	a8e776b78b	[java] Adds support for fp16 and bf16 tensors (#16703 ) ### Description The Java API currently only supports fp16 output tensors which it automatically casts to floats on the way out. This PR adds support for creating fp16 and bf16 tensors (from `java.nio.Buffer` objects or as the output of models, creation from Java short arrays is not supported), along with efficient methods for casting `FloatBuffer` into `ShortBuffer` filled with fp16 or bf16 values and vice versa. The fp16 conversions use a trick to pull in the efficient conversion methods added to Java 20, falling back to ports of the MLAS methods otherwise. The Java 20 methods can be special cased by the C2 JIT compiler to emit the single instruction on x86 and ARM which converts fp32<->fp16, or the vectorized versions thereof, so they should be quite a bit faster than the MLAS ported one. ### Motivation and Context fp16 and bf16 are increasingly popular formats and we've had several requests for this functionality. Fixes #7003. cc @yuslepukhin @cassiebreviu --------- Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-07-21 21:14:41 +10:00
Dmitri Smirnov	1e18efade5	[C#] Add ML Sequences and Maps Create and Process APIs (#16648 ) ### Description 1) Added Sequence And Maps convenience APIs to create input Sequences and Maps and also visit the outputs. 2) Address OrtValue design issue when the values are created on top of the managed memory and the ortValues are used for sequence and maps creation. We should retain the original managed instances that keep the memory pinned. We opt to keep track of those and dispose of them within an instance of OrtValue that represents a Map or a Sequence. 3) Set `LangVersion` to default per [MS Versioning Docs.](https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/configure-language-version) ### Motivation and Context 1) When writing code examples, use of Map and Sequences API proved to be cumbersome. 2) It is a BUG, that we should address, as the managed memory can move by the GC and lead to intermittent crashes. 3) Make use of the most feature of the C#.	2023-07-21 12:58:29 +08:00
Hector Li	4d569f6586	[QNN EP] Op support: LayerNorm, Asin, Sign (#16740 ) ### Description Add op support for LayerNorm, Asin, Sign. Enable QDQ node unit support for Sin Op --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2023-07-20 20:57:48 -07:00
Xavier Dupré	b508c7236f	Replace call to deprecated torch.norm (#16758 ) ### Description torch.norm is deprecated as mentioned in issue #16751. This PR replaces the call to torch.norm by the options suggested by torch documentation.	2023-07-20 19:52:19 -07:00
kunal-vaishnavi	b7176f9826	Fix bug with saving model optimized by inference session (#16716 ) ### Description A [previous PR](https://github.com/microsoft/onnxruntime/pull/16531) added a temporary directory to save the model optimizations after loading a model into an `InferenceSession`. Many models that have an external data file, however, require the data file to be in the same directory as the ONNX model file. Because the model is saved in a temporary directory and the data is saved in another directory, this causes a `FileNotFoundError` error when trying to load the model in the temporary directory. This PR fixes this error by saving the external data file in the same directory that the optimized model is located in. ### Motivation and Context This PR fixes a bug with using a temporary directory while running the optimizer for models that have an external data file.	2023-07-20 18:44:28 -07:00
Edward Chen	0f9883f804	Fix Mac M1 build (#16763 ) - Add ifndef `__APPLE__` to skip lines which cause EXC_BAD_INSTRUCTION error. - Fix floatToHalf/doubleToHalf conversion issue and add tests.	2023-07-20 18:24:57 -07:00
Baiju Meswani	538d2412ef	Objective-C Add Support to Create and Query String ORTValues (#16764 ) This pull request contains a few changes: 1. Adds support for string ort values. 2. Fixes the training minimal build (that was broken with #16601) by putting custom op registration behind #ifdefs 3. Fixes the iOS pod package generation (that was again broken with #16601) by explicitly providing paths to be copied during pod creation.	2023-07-20 17:39:29 -07:00
Adrian Lizarraga	a8c263f92c	[QNN EP] Update QNN SDK to 2.12 (#16750 ) ### Description - Updates the default QNN SDK to 2.12 for CI pipelines - Adds a disabled InstanceNormalization test for regression on QNN SDK 2.12 - Cleans up logs for unsupported ops. ### Motivation and Context Test with the latest QNN SDK.	2023-07-20 16:22:14 -07:00
Wanming Lin	eaea34f8e2	[WebNN EP] Support PRelu op (#16756 )	2023-07-20 10:39:30 -07:00
Jeff Daily	bb136f86c8	[ROCm][MIGraphX] for googletest dep, set OVERRIDE_FIND_PACKAGE (#16715 ) Otherwise, an unsupported version of gtest/gmock will be found at /opt/conda/include for ROCm builds. Though this issue was initially found for ROCm builds, the issue is generic. onnxruntime requires a specific version of googletest and should not rely on locating googletest using find_package. The ROCm error was: ``` In file included from /opt/conda/include/gmock/gmock-spec-builders.h:75, from /opt/conda/include/gmock/gmock-generated-function-mockers.h:47, from /opt/conda/include/gmock/gmock-function-mocker.h:39, from /opt/conda/include/gmock/gmock.h:61, from /stage/onnxruntime/onnxruntime/test/util/test_utils.cc:17: /opt/conda/include/gmock/gmock-matchers.h: In instantiation of ‘bool testing::internal::PointwiseMatcher<TupleMatcher, RhsContainer>::Impl<LhsContainer>:: MatchAndExplain(LhsContainer, testing::MatchResultListener*) const [with LhsContainer = const gsl::span<const float>&; TupleMatcher = testing::internal:: FloatingEq2Matcher<float>; RhsContainer = gsl::span<const float>]’: /opt/conda/include/gmock/gmock-matchers.h:2303:10: required from here /opt/conda/include/gmock/gmock-matchers.h:2312:48: error: no type named ‘const_iterator’ in ‘testing::internal::PointwiseMatcher<testing::internal:: FloatingEq2Matcher<float>, gsl::span<const float> >::Impl<const gsl::span<const float>&>::LhsStlContainer’ {aka ‘class gsl::span<const float>’} ```	2023-07-21 00:57:38 +08:00
zesongw	0e40049eb2	[WebNN EP] Add support for Op Pad. (#16732 ) ### Description <!-- Describe your changes. --> Support Op Pad for WebNN EP. It aims to support three modes (constant, reflect and edge). For now, only constant can be tested with Chrome Canary. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Support more models like SD1.5-VAE-encode.	2023-07-20 07:57:48 -07:00
Xavier Dupré	2bc9fbb621	Fix url in the code documentation (graph optimizations) (#16770 ) ### Description Fix a wrong url in the documentation as mentioned in issue #16678. ### Motivation and Context Better documentation.	2023-07-20 07:02:22 -07:00
Yi Zhang	c314d7724f	Update dml gpu pool to onnxruntime-Win2022-GPU-dml-A10 (#16765 ) ### Description onnxruntime-Win2022-GPU-dml-A10 is using VS2022. ### Motivation and Context 1. Upgrade VS2019 to VS2022 to fix prefast issue.	2023-07-20 16:52:13 +08:00
Scott McKay	8b866060f2	Comment out ORT-Nightly feed in test app NuGet.config (#16762 ) ### Description <!-- Describe your changes. --> Comment out ORT-Nightly feed in NuGet.config to see if that makes the Secure Supply Chain Analysis CI step happy. Add info to readme on manually adding feed and using it. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-07-20 14:30:29 +10:00
Edward Chen	fc1f463ff1	[ios] Enable training package in packaging pipeline (#16683 ) Build iOS training package in packaging pipeline. Refactor iOS packaging pipeline to build different package variants in parallel.	2023-07-19 19:55:00 -07:00
Yi-Hong Lyu	6e895fe70a	Parallelize Max (#16745 ) It gives up to 7.5% improvement in LLaMA 7B case.	2023-07-19 15:23:48 -07:00
saurabh	24566058b3	ovep dockerfile and wheel docs changes (#16482 ) ### Description This PR is includes changes in the documentation of _readmeOV.rst_ file and also the changes in the dockerfile which enables to build ORT with latest OpenVINO 2023.0.0 ### Motivation and Context Modified the dockerfile to incorporate the latest version of OpenVINO (2023.0.0) for building Onnxruntime. The changes in the PR aim to improve the overall user experience by providing accurate and up-to-date documentation while leveraging latest OpenVINO 2023.0.0	2023-07-19 09:01:09 -07:00
Wanming Lin	dcb0f2cdde	[WebNN EP] Only support opset >= 7 (#16730 ) Set WebNN EP minimum supported opset to 7 as ONNX Runtime currently only guarantees support for models stamped with opset 7 or above.	2023-07-18 17:59:19 -07:00
Yulong Wang	7dcb805ab8	[js/web] upgrade onnx-proto version (#16722 ) ### Description This change upgrades a lot of dependencies. There are 2 motivations of doing this change: - fix the security issue reported by dependabot (protobufjs Prototype Pollution vulnerability - https://github.com/advisories/GHSA-h755-8qp9-cq85) - resolve the requirement of using ONNX IR_VERSION 9 (#16638) This requires: - upgrade protobufjs to v7.2.4 - upgrade library 'onnx-proto' to consume latest ONNX release (v1.14.0). Problems: - protobufjs v7.2.4 depends on long.js v5, which does not work well with typescript (commonjs). - onnx-proto depends on this fix with a new release of long.js - long.js is in maintenance and it takes longer than expected to put in new changes Solutions: - use a patch script in `preprepare` to copy type declarations to make long.js work with typescript (commonjs) - generate onnx protobuf JS/TS files and put them under js/web/lib/onnxjs/ort-schema/protobuf folder - remove 'onnx-proto' from dependency. - apply fixes to generated onnx.d.ts	2023-07-18 16:36:39 -07:00
Adrian Lizarraga	f8e3aacb47	[QNN EP] Fix support of ArgMin and ArgMax on all QNN backends (#16731 ) ### Description - Fixes support for ArgMin/ArgMax to QNN CPU and HTP backends. - Adds Q/DQ node unit selection logic. - Handles casting int64 output to uint32 when necessary. - Adds unit tests for ArgMax/ArgMin. ### Motivation and Context QNN EP did not actually support ArgMin/ArgMax. Unit tests revealed that the existing translation was not sufficient to support these ops.	2023-07-18 14:50:24 -07:00
Tianlei Wu	95f9628776	Fix transformers optimizations for GPT-NeoX (#16743 ) ### Description Fix some issues found in GPT-NeoX graph fusion: (1) GPT-NeoX uses float16 weights. The step of using onnxruntime with opt_level==1 uses CPU provider. Since most operators does not have fp16 in CPU EP, so extra Cast nodes are added to up cast to fp32. (2) Add is shared by two LayerNormalization children, and SkipLayerNormalization might cause invalid graph. (3) Reshape fusion might miss since some part only check initializer but not Constant. This PR adds a check whether model uses FP16, and output a warning when use_gpu is not True, and use GPU provider for graph optimization when use_gpu=True.	2023-07-18 10:29:08 -07:00
Dmitri Smirnov	e752cbe7f2	Work on eliminating Internal Compiler Error (#16741 ) ### Description <!-- Describe your changes. --> Replace the offending bitwise `operator \|` with if() logic for ARM.	2023-07-18 10:17:52 -07:00
Wei-Sheng Chin	b71ebf91a5	[DORT] Reduce global configs to make enabling dynamic shape easier (#16720 ) There are several global configs used by DORT. ```py DEFAULT_ONNX_EXPORTER_OPTIONS = torch.onnx._internal.exporter.ResolvedExportOptions( torch.onnx._internal.exporter.ExportOptions() ) # TODO(wechi): This line must generate result identical to the call of # _create_onnx_supports_op_overload_table(...) inside # create_onnx_friendly_decomposition_table(...) in # torch/onnx/_internal/fx/decomposition_table.py. _SUPPORT_DICT = torch.onnx._internal.fx.decomposition_table._create_onnx_supports_op_overload_table( DEFAULT_ONNX_EXPORTER_OPTIONS.onnx_registry ) # type: ignore _EXTRA_SUPPORT_DICT: Dict[str, Any] = { "getattr": None, "_operator.getitem": None, } DORT_DECOMPOSITION_TABLE = DEFAULT_ONNX_EXPORTER_OPTIONS.decomposition_table ``` We can see all but `_EXTRA_SUPPORT_DICT` are extracted from deduced from ONNX exporter's options. As there are many ways to configure ONNX exporter's options, we decided to move these variables to `OrtBackend`'s `__init__` so that the construction of `OrtBackend` becomes more flexible (especially for enabling dynamic shape or not).	2023-07-18 09:06:58 -07:00
PeixuanZuo	9b549c646c	[ROCm] fix kernel explorer GemmSoftmaxGemm test (#16735 ) GemmSoftmaxGemmTunble occasionally broken with large numerical error. The root cause of this error is CK's Strided Batched Gemm has larger error under a specific initialization distribution `(multinormal_distribution)`. Generic(Gemm1 + Softmax + Gemm2) implementation is one instance of GemmSoftmaxGemmTunble. Gemm1 and Gemm2 in Generic implementation are TunableOps when tuning enabled. In some case GemmSoftmaxGemmTunble select Generic implentation, while Gemm1 or Gemm2 select ck implementation, the result of GemmSoftmaxGemmTunble affect by CK. - Make tolerance more loosen. - Add `GemmSoftmaxGemmPermuteGenericNestedTunable` to test Generic implementation with tuning enabled.	2023-07-18 16:47:39 +08:00
zhangsibo1129	9ba5cdbaa4	[CANN EP] Fix Float16 support for CANN EP (#16733 ) ### Description <!-- Describe your changes. --> Replace the constructor function `MLFloat16()` with the public member function `FromBits()` in the file `onnxruntime/core/providers/cann/cann_common.cc` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> PR [#16506](https://github.com/microsoft/onnxruntime/pull/16506) changed the public constructor function `MLFloat16(uint16_t x)` to private, and added a public function `MLFloat16::FromBits(uint16_t x)` in the file `include/onnxruntime/core/framework/float16.h`, which broke the CANN CI. This PR aligns the CANN behavior with the modified class `MLFloat16`.	2023-07-17 23:24:51 -07:00
cloudhan	0cab7e1a37	[ROCm] Generalize FastGeLU (#16623 ) Allow the whole pipeline to be parameterized with unary elementwise functor.	2023-07-18 11:23:12 +08:00
Scott McKay	ad90352a68	Add MAUI test app that can be used to test model loading and performance (#16658 ) ### Description <!-- Describe your changes. --> MAUI test app with tooling to add model and generated or provided input test data. The app will load the model and validate the output. It can also run a specified number of iterations to provide basic performance information. <img width="401" alt="image" src="https://github.com/microsoft/onnxruntime/assets/979079/daf3af13-fb22-4cbb-9159-486b483a7485"> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Primarily to make it easier to test an arbitrary model on iOS. A MAUI app allows testing on all platforms. --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-07-18 08:21:18 +10:00
cloudhan	a45b834722	Fix warning about uninitialized member (#16736 ) #16506 Cause almost every translation units on linux complaint ``` [1175/1235] Building CXX object CMakeFiles/onnxruntime_test_all.dir/home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc.o In file included from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:18, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/data_types.h:17, from /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/tensor.h:17, from /home/guangyunhan/onnxruntime/onnxruntime/test/common/tensor_op_test_utils.h:16, from /home/guangyunhan/onnxruntime/onnxruntime/test/providers/compare_provider_test_utils.h:7, from /home/guangyunhan/onnxruntime/orttraining/orttraining/test/training_ops/cuda/softmax_test.cc:4: /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h: In instantiation of ‘static constexpr uint16_t onnxruntime_float16::Float16Impl<Derived>::ToUint16Impl(float) [with Derived = onnxruntime::MLFloat16; uint16_t = short unsigned int]’: /home/guangyunhan/onnxruntime/include/onnxruntime/core/framework/float16.h:42:66: required from here /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:241:7: note: ‘union onnxruntime_float16::detail::float32_bits’ has no user-provided default constructor 241 \| union float32_bits { \| ^~~~~~~~~~~~ /home/guangyunhan/onnxruntime/include/onnxruntime/core/session/onnxruntime_float16.h:242:16: note: and the implicitly-defined constructor does not initialize ‘unsigned int onnxruntime_float16::detail::float32_bits::u’ 242 \| unsigned int u; \| ^ ``` This PR shut the compiler up.	2023-07-17 11:33:54 -07:00
Edward Chen	df8843c4a7	Upgrade old Python version in packaging pipeline (#16667 ) - Upgrade from Python 3.6 to 3.8 in packaging pipeline. - Raise build.py minimum required Python version.	2023-07-17 08:24:47 -07:00
Dmitri Smirnov	b8c40b7813	Fix parameter naming that fails Doc generation. (#16717 ) ### Description Rename `FromBits` param name to match the docs. ### Motivation and Context Fix API Doc generation.	2023-07-16 22:02:05 -07:00
RandySheriffH	e1ca8ee6d4	RunAsync C/CXX API (#16613 ) Implement RunAsync API - the session will run in a thread of intra-op thread pool. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-07-16 16:51:40 -07:00
Ryan Hill	2cf31a20cf	Cuda: Decoder Masked Multihead Attention Q values get corrupted when using cross attention (#16721 ) ### Description Some code was accidentally moved into the `if(!params.is_cross_attention)' block, it must stay outside to work in both cases. ### Motivation and Context This causes invalid results. We detected this as a performance bug, as it caused the EOS early exit to never happen, and the runs would always take max_length to complete which was slow.	2023-07-15 00:41:06 -07:00
Wanming Lin	2b7a94e65b	[WebNN EP] Make some types clearer (#16705 ) It's a follow-up to address comments in https://github.com/microsoft/onnxruntime/pull/16671#discussion_r1261761828 and https://github.com/microsoft/onnxruntime/pull/16671#discussion_r1261763873	2023-07-14 17:39:36 -07:00
Ryan Hill	2ae041f390	atomicAdd returns previous value, not current value. (#16690 ) ### Description Mistake in beam scorer processing, atomicAdd result should be compared with '1' vs '0' as it returns the original value, not the latest value. This error just results in slow perf, nothing fails. ### Motivation and Context Fixes #16642	2023-07-14 15:46:57 -07:00
Wei-Sheng Chin	44fd98ebfe	[DORT] Enable aten::full by implementing extra logics to select EP (#16699 ) DORT only select devices from inputs arguments' (type: torch.Tensor). However, it errors out when a graph doesn't have any inputs (e.g., a single aten::full graph). This PR address this problem by changing the EP selection to - First, inspect graph inputs. If there are some valid devices, use them plus a default one (`OrtBackend.ep: str`). - Otherwise, inspect graph outputs carried by `torch.fx.GraphModule` and use all valid devices plus the default `OrtBackend.ep`. - When both (1) and (2) fail, it uses the default EP specified by `OrtBackend.ep`.	2023-07-14 15:42:25 -07:00
Edward Chen	f236768d5c	[ios] Enable `--use_extensions` with custom built iOS pod (#16711 ) - Fix link errors by including the needed onnxruntime-extensions libraries in the static framework. - Add Objective-C API to register custom ops from embedded onnxruntime-extensions. Caveat: Not all onnxruntime-extensions build options are working yet. E.g., building with the onnxruntime-extensions OpenCV dependency does not work.	2023-07-14 15:37:16 -07:00
G. Ramalingam	4faee2e44c	Fix issue in constant-propagation inside function subgraph (#16330 ) ### Description The SequenceMap function-op has a graph-attribute. ORT's constant-folding optimization may identify constant-expressions inside the subgraph and promote them to constants, stored as initializers in the main graph. When it does this, the optimization updates the subgraph to remove the corresponding nodes. When we expand a SequenceMap node by inlining its function-expansion, we need to use this updated subgraph. However, the existing code uses the original graph-attribute (GraphProto), instead of regenerating it from the modified subgraph. This results in producing a graph with duplicate definitions for the constant-folded variable, resulting in an error during graph-resolve. This PR fixes this issue (just a single line fix), and adds a test-case to cover this scenario. --------- Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>	2023-07-14 14:44:59 -07:00

1 2 3 4 5 ...

9204 commits