onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-24 02:47:54 +00:00

Author	SHA1	Message	Date
Tianlei Wu	3aba736ee2	Refactoring of Stable Diffusion scripts (#17138 ) Reduce duplicated code in two stable diffusion pipelines (CUDA and TensorRT). Move the common code to models.py	2023-08-15 09:36:31 -07:00
Matthieu Darbois	5e971bc51a	Rework WIL dependency retrieval/usage (#17130 ) ### Description 1. `onnxruntime_fetchcontent_makeavailable` works around unconditional install commands so that can be used instead of `FetchContent_Populate` 2. This dependency is Windows specific, mark it as such. ### Motivation and Context 1. This simplifies `cmake/external/wil.cmake` not to do anything specific wether WIL was fetched or found 2. Given it's specific to Windows, it might not be available on other OS in specific air-gapped environment such as [conan-center-index](https://github.com/conan-io/conan-center-index). This allows downstream builds not to require specific patches for something not required by the build in the first place.	2023-08-15 09:11:46 -07:00
Tianlei Wu	412b0d0831	Update BERT and GPT-2 optimization notebooks for CPU EP (#17057 ) The notebooks are not up to update. (1) Update BERT and GPT-2 optimization notebooks for CPU EP with latest PyTorch and ONNX Runtime. (2) Add links to quantization example ### Motivation and Context https://github.com/microsoft/onnxruntime/issues/16515	2023-08-15 00:55:03 -07:00
pengwa	abf9765d73	PythonOp Enhancement: Bool and Tuple[Bool] Constants, Materialize Grads, Empty Inputs, Save In Context (#16828 ) ### PythonOp Enhancement: Bool and Tuple[Bool] Constants, Materialize Grads, Empty Inputs, Save In Context 1. Support `bool` or `Tuple[bool]` constant type in inputs. 2. Support `ctx.set_materialize_grads(True\|False)` 3. Backward op can accept empty input (that don't require grad) 4. Special handling for ORT tensors are saved in context Scenario: a tensor is generated by ORT, then it might be saved for backward by `ctx.save_for_backward(tensor)`, while `tensor`'s reference count is not increased in ORT's allocation plan, so it is possible ORT release the tensor data, before backward usage. Currently: we copy every tensor before running autograd.Function.forward(), this might be a problem for cases there are many PythonOp (for example zero stage 3). Proposal: To avoid those unnecessary copies for tensors that are not saved in context, this change introduced a `_GlobalOpKernelInfoMap`. During the kernel first run, we will anyway copy all tensors generated from ORT, and give it to torch.autograd.Function for run, then we check whether the inputs needs to be saved in context, and save the input index that needs saving in `_GlobalOpKernelInfoMap`. Then for later iterations, we just copy what is needed.	2023-08-15 13:31:04 +08:00
Adrian Lizarraga	b734db1924	[QNN EP] Fix CI build on Windows x64 pipelines (#17152 ) ### Description - Disables Resize tests that use nearest mode on QNN CPU. - Fixes indentation problems on yaml for win x64 qnn pipeline. ### Motivation and Context The QNN windows Nuget pipeline does not run due to failing unit tests on Windows x64. These tests should not be enabled until we determine the rounding behavior of QNN's ResizeNearestNeighbor operator.	2023-08-14 21:03:14 -07:00
Justin Chu	416dc2e84d	Fix clang-format comment indents on Windows for winml/ (#17144 ) On Windows, clang-format has a bug when AlignTrailingComments.Kind is set to `Leave` (https://clang.llvm.org/docs/ClangFormatStyleOptions.html#aligntrailingcomments), where it will keep adding indentation to comments after each formatting runs. This PR changes to always align comments so we do not hit the bug. As a consequence of the options change we need to reformat some of the files. Note that this option is aligned with the rest of the repository.	2023-08-14 23:50:14 -04:00
xhcao	24e0bd37b4	[JS/WebGPU] Support Log operator (#17045 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-14 18:04:12 -07:00
Baiju Meswani	289600b47d	ONNX Runtime training cpu package name for ADO (#17109 )	2023-08-14 11:32:35 -07:00
PeixuanZuo	be2200c00b	[ROCm] fix python package pipeline (#17136 ) ROCm python package pipeline failed because this PR(https://github.com/microsoft/onnxruntime/pull/16325) changed onnx version to a commit and we need to build onnx from source. Low protobuf version will cause build errors. This PR remove `cmake ` and `protobuf ` from Dockerfile, these two will install by `install_os_deps.sh`.	2023-08-14 11:22:43 -07:00
Jian Chen	45f52987a2	Web CI Pipeline Isolation (#17005 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-14 10:37:37 -07:00
Wenbing Li	d052c8a45c	Remove the extensions submodule (#17097 ) ### Description Remove the onnxruntime-extensions submodule since it now was used via cmake FetchContent ### Motivation and Context The submodule relies on an outdated version of the extensions, and the build instructions should be updated to eliminate any confusion.	2023-08-14 10:16:33 -07:00
Jian Chen	68ea9631af	Fix typo onnxruntimecpubuilpython (#17120 ) ### Description The correct name should be onnxruntimecpubuildpython ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-14 08:34:43 -07:00
RandySheriffH	f71b6944bf	Fix nuget pipeline (#17110 ) Fix nuget pipeline by correcting the calling convention on c# delegate. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-14 09:04:37 -04:00
Vincent Wang	e55e1b7da9	Mark end of version 16 C API (#17107 ) Mark end of version 16 C API in preparation for ORT 1.16 release.	2023-08-14 14:01:55 +08:00
pengwa	cd7b3f54da	Allow defining customized PythonOp shape inferer (#17093 ) ### Allow defining customized PythonOp shape inferer For `torch.autograd.Function`, we converted it to PythonOp in MSDomain, there are two places to do shape inferencing for it: 1. in SymbolicShapeInfer, there is one. 2. in PythonOp op definition. For common PythonOp, since we don't know the relation ship between inputs and outputs, so we only infer the rank from output ranks, and generate symbolic dimensions for each dim. While this will introduce many meaningless symbolic dimensions, sometimes blocking our graph transformers to do op fusion. This PR provide a way to define custom shape inferencing for `torch.autograd.Function` we defined, to propagate the original dimensions across the PythonOp at the best efforts. But the 2rd one is not covered yet, we could refine that later. Fixing 1st one is enough for ORTModule training/evaluation. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-08-14 09:13:32 +08:00
Guenther Schmuelling	9204cd7392	[js/webgpu] Add C++ registration for operator Tanh in JSEP (#17124 ) add webgpu/tanh Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-12 11:43:39 -07:00
Yulong Wang	e7adbb38f6	[js/webgpu] disable test case 'test_batchnorm_epsilon_training_mode' temporarily (#17129 ) ### Description test case 'test_batchnorm_epsilon_training_mode' on webgpu is failing. the issue need time to investigate so comment this off and re-enable it when the root cause is fixed.	2023-08-12 08:53:10 -07:00
Chen Fu	f2e1b91634	add int4 quantization code in python (#17077 ) ### Description Adding int4 quantization code in python ### Motivation and Context Python quantization tool no-longer needs to invoke shell to call a native exe	2023-08-11 15:17:58 -07:00
Yulong Wang	5704e71b89	update onnx.patch to apply wasm build break fix (#17104 ) ### Description This PR fixes build break for WebAssembly introduced in `6986981482` (`435ad2b1d8`). This change updates onnx.patch in onnxruntime repo. the corresponding PR in onnx repo is: https://github.com/onnx/onnx/pull/5495. It may takes a while for the next onnx version bump.	2023-08-11 15:00:39 -07:00
liqun Fu	6697635b91	To support size opset 19 (#15689 )	2023-08-11 14:48:53 -07:00
Yulong Wang	14a8315f10	[js/web] [webgpu] new incides helper (#16957 ) ### Description This PR introduces the new incides helper. IndicesHelper is a helper class for generating WGSL code for manipulating indices and data for a shader's input or output. This class is designed to offer a unified way to generate WGSL code for manipulating indices and data for a shader's input or output. The following is a list of terminologies used in this class: - `offset`: a uint32 value representing the offset of an element in the data buffer. - `indices`: an abstraction of a multi-dimensional array's indices representing the data's index on each dimension. - `value`: a value of a data element. Users are expected to create an instance of this class for each shader's input or output, and use the instance to generate WGSL code for manipulating indices and data. The following 2 exported functions are for users to call to create an instance of an indices helper: - `inputVariable()`: create an indices helper instance for an input. - `outputVariable()`: create an indices helper instance for an output. An indices helper instance contains helper functions for the following operations: - access readonly basic information, including: `name`(the name of the input or output), `usage`(whether it's an input or an output) and `shape`(the passed in shape). - `type`: access readonly type information, including: `indices`(the type of indices), `value`(the type of value at runtime), `storage`(the type of value at storage) and `tensor`(the tensor type as represented in TensorView). - generate WGSL code for getting indices from offset. Use `offsetToIndices()` for WGSL code snippet to calculate incides from offset, and use `indicesToOffset()` for WGSL code snippet to calculate offset from indices. - to manipulate an instance of indices, use `setIndices()` and `getIndices()` to set and get the indices on an indices variable. - to manipulate data, use `set()`/`get()` to access data at the given indices from parameter list, use `setByIndices()`/`getByIndices()` to access data at the given indices from an indices variable, and use `setByOffset()`/`getByOffset()` to access data at the given offset. - `impl`: get WGSL code of function implementation for the util functions mentioned above. This change applies the usage of new IndicesHelper through the code, but not necessary for all code.	2023-08-11 11:36:59 -07:00
Changming Sun	4728f20f9a	Fix CI build (#17118 ) ### Description Some pipelines are failing. It is because PR #16325 set ONNX version to `rel-1.14.1` . It is a branch name, not a commit or tag name. It means whenever the branch got a new commit, we will auto pick it and use it.	2023-08-11 10:56:38 -07:00
Edward Chen	e7e974b23f	Use double quotes so variable gets expanded. (#17105 )	2023-08-11 09:05:41 -07:00
Yulong Wang	ebaeda6c23	try to find patch.exe in git default installation folder (#17106 ) ### Description updates the build.bat file in root folder to try best efforts to locate patch.exe. patch.exe is needed to apply patches to some of the dependencies. for example, #17104. However, patch.exe is not available on every windows developer's search path, and if cannot find patch.exe, the build will continue silently. ( as a result, patch is not applied and for patches like #17104, this will cause a build break ) This change adds folder `C:\Program Files\Git\usr\bin` to the PATH, which is the default git installation directory. This may resolve the patch not found issue for most (hopefully) users.	2023-08-10 21:48:13 -07:00
Baiju Meswani	54153c73f0	Batchnorm training mode support in a minimal build (#17103 )	2023-08-10 21:44:50 -07:00
Hector Li	344c41fdb9	[QNN EP] Update QNN to v2.13 (#17079 ) ### Description Update QNN SDK to v2.13, update some UTs accordingly	2023-08-10 20:47:55 -07:00
Baiju Meswani	3e7f70bf88	LeakyRelu Gradient (#17039 )	2023-08-10 20:45:34 -07:00
Jeff Bloomfield	0180c0429f	Fix DML regression from allocator refactor and enable unrounded weight allocation in ORT API (#17030 ) This addresses a DML performance regression from the following PR resulting in allocations not being rounded and pooled in the DML execution provider. https://github.com/microsoft/onnxruntime/pull/15833 This also fixes a pre-existing limitation that allocations during session initialization (primarily large weights and persistent resources) only bypassed rounding and pooling while using the Winml API. The allocator now also respects a caller's rounding mode parameter when provided.	2023-08-10 17:02:24 -07:00
Yulong Wang	9cd4e5af68	[wasm] upgrade emsdk to 3.1.44 (#17069 ) ### Description This change upgrade emsdk to 3.1.44. Because backend is upgraded to LLVM 16, so need to fix a lot of build failures caused by "-Wshorten-64-to-32". most of the build failures comes from generated `onnx.pb.h`, and this can be fixed by including "core/graph/onnx_protobuf.h", which detects and ignore shorten-64-to-32 warnings.	2023-08-10 16:08:36 -07:00
dependabot[bot]	66b45e0085	Bump actions/upload-pages-artifact from 1 to 2 (#16727 ) Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 1 to 2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-pages-artifact/releases">actions/upload-pages-artifact's releases</a>.</em></p> <blockquote> <h2>v2.0.0</h2> <h1>Changelog</h1> <ul> <li>⚠️ <strong>BREAKING CHANGE:</strong> Remove built-in <code>chmod</code> commands for <code>v2</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/69">#69</a>)</li> <li>Update README for <code>v2</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/70">#70</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.10...v2.0.0">all code changes</a> since previous release.</p> <h2>v1.0.10</h2> <h1>Changelog</h1> <ul> <li>readme: fix/improve note about permissions <a href="https://github.com/tshepang"><code>@tshepang</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/65">#65</a>)</li> <li>Revert <code>chmod</code> removal for <code>v1</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/68">#68</a>)</li> <li>Add file perms handling <a href="https://github.com/tsusdere"><code>@tsusdere</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/64">#64</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.9...v1.0.10">all code changes</a> since previous release.</p> <h2>v1.0.9</h2> <p>Removed <code>chmod</code> as we moved towards trusting correct file permissions have been set. In the event this isn't the case then we raise an error in the action related to the file permissions.</p> <h2>v1.0.8</h2> <h1>Changelog</h1> <ul> <li>Fail if no artifact file is found to upload <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/55">#55</a>)</li> <li>Fix link to releases in README <a href="https://github.com/waldyrious"><code>@waldyrious</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/53">#53</a>)</li> <li>Bump actions/publish-action from 0.2.1 to 0.2.2 <a href="https://github.com/dependabot"><code>@dependabot</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/47">#47</a>)</li> <li>Add Dependabot config for Actions usage updates <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/46">#46</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.7...v1.0.8">all code changes</a> since previous release.</p> <h2>v1.0.7</h2> <h1>Changelog</h1> <ul> <li>Don't change file permissions of other files <a href="https://github.com/KyeRussell"><code>@KyeRussell</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/44">#44</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.6...v1.0.7">all code changes</a> since previous release.</p> <h2>v1.0.6</h2> <h1>Changelog</h1> <ul> <li>Customize artifact name <a href="https://github.com/yuradanyliuk"><code>@yuradanyliuk</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/41">#41</a>)</li> <li>Fix permissions <a href="https://github.com/yoannchaudet"><code>@yoannchaudet</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/42">#42</a>)</li> <li>Print warnings about changed file permissions in bulk <a href="https://github.com/TooManyBees"><code>@TooManyBees</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/38">#38</a>)</li> <li>Update to latest <code>actions/publish-action</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/36">#36</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.5...v1.0.6">all code changes</a> since previous release.</p> <h2>v1.0.5</h2> <h1>Changelog</h1> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`a753861a5d`"><code>a753861</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-pages-artifact/issues/69">#69</a> from actions/reapply-chmod-removal-for-v2</li> <li><a href="`dca6bac0e5`"><code>dca6bac</code></a> Merge branch 'main' into reapply-chmod-removal-for-v2</li> <li><a href="`3138c05496`"><code>3138c05</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-pages-artifact/issues/70">#70</a> from actions/v2-docs-improvements</li> <li><a href="`07f501f6a0`"><code>07f501f</code></a> Update README for <code>v2</code></li> <li><a href="`9c071e6bed`"><code>9c071e6</code></a> Reapply PR <a href="https://redirect.github.com/actions/upload-pages-artifact/issues/63">#63</a> for v2</li> <li>See full diff in <a href="https://github.com/actions/upload-pages-artifact/compare/v1...v2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-pages-artifact&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-10 15:00:35 -07:00
Justin Chu	83240d1346	Bump clang-format to 16.0.6 in CI (#17099 ) ### Description Bump clang-format to 16.0.6 in CI to take in fixes.	2023-08-10 13:53:04 -07:00
Bowen Bao	6986981482	Bump ONNX version (#16325 ) ### Description Bump ONNX version to https://github.com/onnx/onnx/tree/rel-1.14.1 to include a fix for segfault when shape inferencing nested onnx functions. ### Motivation and Context Resolves #16170	2023-08-10 11:27:28 -07:00
Changming Sun	6dffd1a890	Update model_tests.cc: avoid auto adding new tests from new opsets (#17084 ) ### Description 1. Update model_tests.cc: avoid auto adding new tests from new opsets. 2. Simplify the "ConcatPathComponent" function. It does not need to be a template. ### Motivation and Context All our Windows/Linux CI build machines are preloaded with some test data. In model_tests.cc, we auto add all of them to onnxruntime_test_all.exe's unit tests. However, it causes problems when we update the CI build machine images: new data could cause pipelines suddenly failing. Therefore, instead of auto discovering test data and adding all of them to tests, this PR changes it to explicitly specify the opset names. This change doesn't impact how Web CI pipeline runs its tests. Going forward, the workflow would be like: Step 1: update the onnx version in deps.txt Step 2: Update js/scripts/prepare-onnx-node-tests.ts. Like #16943 . Better to put step 1 and step 2 in the same PR. Step 3: onnxruntime-es team regenerates VM images, test them and deploy them. Step 4: Enable the new opset test data for EPs. [AB#18340](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18340)	2023-08-10 11:11:26 -07:00
PeixuanZuo	12837ba5c7	[ROCm] Update CI based on ubuntu 22.04 (#17076 ) - Update ROCm version to ROCm5.6 - Update CI based on ubuntu 22.04	2023-08-10 09:51:29 -07:00
BoarQing	87285323e6	[VITISAI] nested subgraph is unsupported for now (#17067 ) ### Description <!-- Describe your changes. --> return empty ComputeCapability when a graph contains nested subgraph. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> For now, our architecture does not support nested subgraph. So, we return empty ComputeCapability for this case.	2023-08-10 09:45:13 -07:00
BoarQing	1b081d51dc	[VITISAI] node arg can be used more than once (#17068 ) ### Description <!-- Describe your changes. --> a node arg can be matched multiple times. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previous, we thought the node name must be unique and thus can be used as identifier. However, we recently found that a node's name can be empty thus failed to identify which node is which. So, we use node arg to differentiate the node. To do so, we need to match node arg more than once.	2023-08-10 09:44:27 -07:00
satyajandhyala	e8a9d4f04d	[JS/Web] Fix Resize kMSInternalNHWCDomain (#17023 ) ### Description Fix some Resize failing tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-10 09:14:43 -07:00
guyang3532	ef6f4a4aa1	support broadcast shape for elementwise node in padding elimination (#16710 ) With PaddingElimination optimizer, input1 of element-wise op may be flattened like: ``` input1 (shape:[batch_size, seq_len, ...]) input1 (shape:[valid_tokens, ...]) \ \ \ input2 \ input2 \ / -----> \ / \ / \ / Element-wise Op Element-wise Op ``` So, the shape of input2 should be processed accordingly: 1. If input2.shape.dim_size <= input1.shape.dim_size-2, i.e. input2 has no [batch_size, seq_len] at begining, we needn't to process the shape of input2 because it's compatible with the flattened shape of input1 (shape:[valid_tokens, ...]). 2. If the shape of input2 has the same dim_size with shape of input1 and has [batch_size, seqlen] at begening, to be compatible with flattened shape of input1, we need to insert flatten pattern for input2 also, which flatten the shape of input2 from [batch_size, seq_len, ...] to [valida_tokens, ...]. 3. (which done in this pr) In other case for shape of input2, like [1, seq_len, ...] or [batch_size, 1, ...], we firstly need to expand it to [batch_size, seq_len, ...] which is convenient to flatten. And then insert flatten pattern.	2023-08-10 19:07:22 +08:00
cloudhan	b4e0fc87ea	[ROCm] Make KE reports with better format (#17049 )	2023-08-10 17:44:32 +08:00
pengwa	0471f6fbb3	Check type for building gradient graph (#17046 ) ### Check type for building gradient graph Bug1: To fix the error when running the model with ORTModule + Stage 3: ``` Exception happens when running <bound method Function.apply of <class 'onnxruntime.training.utils.hooks._zero_offload_subscriber.ORTZeROOffloadPreForwardFunction'>> Traceback (most recent call last): File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_custom_autograd_function_runner.py", line 207, in call_python_forward_function wrapped_arg.requires_grad = is_training_mode and grad_flag RuntimeError: only Tensors of floating point and complex dtype can require gradients ``` This is because when running PythonA, the 3rd input is int64, we find it requires gradient during the check in gradient builder, so we set its requires_grad = True, but PyTorch thinks it is incorrect, throwing the exception. So we need understand why ORT gradient builder think the 3rd input need gradients. During `ReverseBFSWithStopGradient`, which do reverse BFS from graph outputs, it collects all nodes that are needed for computing the graph outputs. `ReverseBFSWithStopGradient` define a queue, initially add all nodes that generate graph outputs, then iterate the nodes one by one, checking each node's input, if the input did not hit stop edge and its node arg type is allowed type (float, etc), then the input node is append into the queue, do the next iteration of work. PythonOpA is such a node that is needed to compute graph outputs, then IsReachable(PythonOpA) will return True. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/c4c53fb9-15f7-4e8d-9aa2-7fc20555a001) In the above code snippet, when node is PythonOpB, and next_node being PythonOpA, we did not check node_arg type between node and next_node on the connection of PythonOpA's 3rd input to PythonOpB's outputs. So we append the int64 typed node args to sets that require gradient. Fix1: add the node arg type check before appending it into require grad lists. After the fixing, A unit test failed "orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax[data_type0-True-0-min] Fatal Python error: Segmentation fault". After investigation, it is another bug. Bug2: Without the above fix1, the execution graph looks like this ![image](https://github.com/microsoft/onnxruntime/assets/10530022/b2fd4b03-95c7-414a-b268-2ba6a7300105) As you can see, int64 type has a gradient edge built, while it is not used for any consumers. And the execution runs well. While think twice, int type should not have grad edge built. With the Fix1, the execution graph looks like this; ![image](https://github.com/microsoft/onnxruntime/assets/10530022/1870d3cc-2fe5-4aa7-ad6b-0d88dcc40f8a) So the int type node arg did not has gradient edge built. Fix1 is fixing this problem. But another bug happens if the inital "y_node_arg_names" e.g. in this case Aten's two outputs, 1st one in float, 2nd one in int. When we check the y_node (`6e6f582e08/orttraining/orttraining/core/framework/gradient_graph_builder.cc (L60C16-L60C16)`), we did not check the data type, then add it into `y_node_args_` which is the list of graph output node args that requires gradient. Then `non_differentiable_y_node_arg_names_` did not has the int type graph output. Then `6e6f582e08/orttraining/orttraining/core/framework/ortmodule_graph_builder.cc (L312C18-L312C18)` will try to get the grad node arg into `yield_output_node_args`, BUT the grad node arg is not built for int type node arg (with the Fix1). So we insert a nullptr, later when we using it, we get segment fault. Fix2 Again, we add the type check when handle y_node_args, also add null check when getting gradient node arg and append into yield_output_node_args	2023-08-10 14:24:42 +08:00
Baiju Meswani	31cbd63af7	GRU Training and GRU Gradient Kernels (#16929 )	2023-08-09 21:24:47 -07:00
BoarQing	249c2221b6	[VITISAI] remove unused code (#17066 ) ### Description <!-- Describe your changes. --> remove unused code ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> remove unused code	2023-08-09 21:07:36 -07:00
Jeff Daily	dbbfc249f7	[ROCm] update header and binary search paths used by cmake (#17083 ) This is in preparation for planned ROCm 6.0 changes that are not backward compatible. However, the adjustments made by this PR to the current onnxruntime cmake files will work with ROCm 5.x and 6.x.	2023-08-10 11:05:21 +08:00
PeixuanZuo	7c7c991417	[ROCm] Workaround type conversion issue (#17074 )	2023-08-10 11:04:11 +08:00
Patrice Vignola	7201dbebe5	[DML EP] Split fused kernels when the persistent resource is too big (#16780 ) The approach is the following: 1. Build partitions 2. Try compiling each partition into a `IDMLCompiledOperator` 3. If the compiled operator's persistent resource is bigger than 4GB, tell the partitioner to split the partition in the middle and try again. 4. Once all partitions have been successfully compiled into an `IDMLCompiledOperator`, fuse the partitions into an ORT operator and register them all. This change is relatively simple (basically a basic retry mechanism), but it required a lot of refactoring just to make sure that we don't modify the graph until all partitions have been compiled successfully. This is because partly modifying the graph before making sure that all partitions can be compiled will break future retries. This path is not expected to be used a lot, and even then the loop is not expected to loop more than twice very often. This is a very specific edge case for large models that were able to merge a large number of nodes into a single partition.	2023-08-09 19:53:15 -07:00
BoarQing	e951f837e4	[VITISAI] fix out of bound error on graph with loop (#17065 ) ### Description <!-- Describe your changes. --> Check the bound of the node_get_inputs for out of bound error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Model with loop would encounter this error. Currrent we do not support custom op for loop. So, ideally it would throw an error and fall back to CPU evalution.	2023-08-09 18:38:30 -07:00
Baiju Meswani	f17efb5c7b	Copy to buffer for both trainable as well as non trainable parameters (#17070 )	2023-08-09 17:23:24 -07:00
Hector Li	555f346923	[QNN EP] Enable DepthToSpace & SpaceToDepth Ops (#17038 ) ### Description [QNN EP] Enable DepthToSpace & SpaceToDepth Ops	2023-08-09 16:52:15 -07:00
Zimon Tai	a3e02e8e2a	Fix Resize op input check (#16594 ) ### Description onnxjs contains a `Resize` op input check which is outdated since opset 9. Currently `Resize` supports up to 4 inputs. This PR looses the input check. ### Motivation and Context Fixes #15636	2023-08-09 15:42:30 -07:00
Changming Sun	7d340256f1	Add "windows_sdk_version" build arg and fix SCA build pipeline (#17062 ) ### Description 1. Add "--windows_sdk_version" argument to build.py 2. Fix Windows Static Analysis build pipeline. It is failing because it picks up a different Windows SDK version after a build machine image update. If we can explicitly specify Windows SDK version, we can avoid such things happening again. 3. Remove --enable_training from Windows Static Analysis build pipeline because PR #16993 makes it incompatible with "no_rtti". AB#18315	2023-08-09 14:01:16 -07:00

1 2 3 4 5 ...

9375 commits