onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Yulong Wang	9cd4e5af68	[wasm] upgrade emsdk to 3.1.44 (#17069 ) ### Description This change upgrade emsdk to 3.1.44. Because backend is upgraded to LLVM 16, so need to fix a lot of build failures caused by "-Wshorten-64-to-32". most of the build failures comes from generated `onnx.pb.h`, and this can be fixed by including "core/graph/onnx_protobuf.h", which detects and ignore shorten-64-to-32 warnings.	2023-08-10 16:08:36 -07:00
dependabot[bot]	66b45e0085	Bump actions/upload-pages-artifact from 1 to 2 (#16727 ) Bumps [actions/upload-pages-artifact](https://github.com/actions/upload-pages-artifact) from 1 to 2. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/upload-pages-artifact/releases">actions/upload-pages-artifact's releases</a>.</em></p> <blockquote> <h2>v2.0.0</h2> <h1>Changelog</h1> <ul> <li>⚠️ <strong>BREAKING CHANGE:</strong> Remove built-in <code>chmod</code> commands for <code>v2</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/69">#69</a>)</li> <li>Update README for <code>v2</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/70">#70</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.10...v2.0.0">all code changes</a> since previous release.</p> <h2>v1.0.10</h2> <h1>Changelog</h1> <ul> <li>readme: fix/improve note about permissions <a href="https://github.com/tshepang"><code>@tshepang</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/65">#65</a>)</li> <li>Revert <code>chmod</code> removal for <code>v1</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/68">#68</a>)</li> <li>Add file perms handling <a href="https://github.com/tsusdere"><code>@tsusdere</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/64">#64</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.9...v1.0.10">all code changes</a> since previous release.</p> <h2>v1.0.9</h2> <p>Removed <code>chmod</code> as we moved towards trusting correct file permissions have been set. In the event this isn't the case then we raise an error in the action related to the file permissions.</p> <h2>v1.0.8</h2> <h1>Changelog</h1> <ul> <li>Fail if no artifact file is found to upload <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/55">#55</a>)</li> <li>Fix link to releases in README <a href="https://github.com/waldyrious"><code>@waldyrious</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/53">#53</a>)</li> <li>Bump actions/publish-action from 0.2.1 to 0.2.2 <a href="https://github.com/dependabot"><code>@dependabot</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/47">#47</a>)</li> <li>Add Dependabot config for Actions usage updates <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/46">#46</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.7...v1.0.8">all code changes</a> since previous release.</p> <h2>v1.0.7</h2> <h1>Changelog</h1> <ul> <li>Don't change file permissions of other files <a href="https://github.com/KyeRussell"><code>@KyeRussell</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/44">#44</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.6...v1.0.7">all code changes</a> since previous release.</p> <h2>v1.0.6</h2> <h1>Changelog</h1> <ul> <li>Customize artifact name <a href="https://github.com/yuradanyliuk"><code>@yuradanyliuk</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/41">#41</a>)</li> <li>Fix permissions <a href="https://github.com/yoannchaudet"><code>@yoannchaudet</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/42">#42</a>)</li> <li>Print warnings about changed file permissions in bulk <a href="https://github.com/TooManyBees"><code>@TooManyBees</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/38">#38</a>)</li> <li>Update to latest <code>actions/publish-action</code> <a href="https://github.com/JamesMGreene"><code>@JamesMGreene</code></a> (<a href="https://redirect.github.com/actions/upload-pages-artifact/issues/36">#36</a>)</li> </ul> <p>See details of <a href="https://github.com/actions/upload-pages-artifact/compare/v1.0.5...v1.0.6">all code changes</a> since previous release.</p> <h2>v1.0.5</h2> <h1>Changelog</h1> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`a753861a5d`"><code>a753861</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-pages-artifact/issues/69">#69</a> from actions/reapply-chmod-removal-for-v2</li> <li><a href="`dca6bac0e5`"><code>dca6bac</code></a> Merge branch 'main' into reapply-chmod-removal-for-v2</li> <li><a href="`3138c05496`"><code>3138c05</code></a> Merge pull request <a href="https://redirect.github.com/actions/upload-pages-artifact/issues/70">#70</a> from actions/v2-docs-improvements</li> <li><a href="`07f501f6a0`"><code>07f501f</code></a> Update README for <code>v2</code></li> <li><a href="`9c071e6bed`"><code>9c071e6</code></a> Reapply PR <a href="https://redirect.github.com/actions/upload-pages-artifact/issues/63">#63</a> for v2</li> <li>See full diff in <a href="https://github.com/actions/upload-pages-artifact/compare/v1...v2">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/upload-pages-artifact&package-manager=github_actions&previous-version=1&new-version=2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-08-10 15:00:35 -07:00
Justin Chu	83240d1346	Bump clang-format to 16.0.6 in CI (#17099 ) ### Description Bump clang-format to 16.0.6 in CI to take in fixes.	2023-08-10 13:53:04 -07:00
Bowen Bao	6986981482	Bump ONNX version (#16325 ) ### Description Bump ONNX version to https://github.com/onnx/onnx/tree/rel-1.14.1 to include a fix for segfault when shape inferencing nested onnx functions. ### Motivation and Context Resolves #16170	2023-08-10 11:27:28 -07:00
Changming Sun	6dffd1a890	Update model_tests.cc: avoid auto adding new tests from new opsets (#17084 ) ### Description 1. Update model_tests.cc: avoid auto adding new tests from new opsets. 2. Simplify the "ConcatPathComponent" function. It does not need to be a template. ### Motivation and Context All our Windows/Linux CI build machines are preloaded with some test data. In model_tests.cc, we auto add all of them to onnxruntime_test_all.exe's unit tests. However, it causes problems when we update the CI build machine images: new data could cause pipelines suddenly failing. Therefore, instead of auto discovering test data and adding all of them to tests, this PR changes it to explicitly specify the opset names. This change doesn't impact how Web CI pipeline runs its tests. Going forward, the workflow would be like: Step 1: update the onnx version in deps.txt Step 2: Update js/scripts/prepare-onnx-node-tests.ts. Like #16943 . Better to put step 1 and step 2 in the same PR. Step 3: onnxruntime-es team regenerates VM images, test them and deploy them. Step 4: Enable the new opset test data for EPs. [AB#18340](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/18340)	2023-08-10 11:11:26 -07:00
PeixuanZuo	12837ba5c7	[ROCm] Update CI based on ubuntu 22.04 (#17076 ) - Update ROCm version to ROCm5.6 - Update CI based on ubuntu 22.04	2023-08-10 09:51:29 -07:00
BoarQing	87285323e6	[VITISAI] nested subgraph is unsupported for now (#17067 ) ### Description <!-- Describe your changes. --> return empty ComputeCapability when a graph contains nested subgraph. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> For now, our architecture does not support nested subgraph. So, we return empty ComputeCapability for this case.	2023-08-10 09:45:13 -07:00
BoarQing	1b081d51dc	[VITISAI] node arg can be used more than once (#17068 ) ### Description <!-- Describe your changes. --> a node arg can be matched multiple times. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Previous, we thought the node name must be unique and thus can be used as identifier. However, we recently found that a node's name can be empty thus failed to identify which node is which. So, we use node arg to differentiate the node. To do so, we need to match node arg more than once.	2023-08-10 09:44:27 -07:00
satyajandhyala	e8a9d4f04d	[JS/Web] Fix Resize kMSInternalNHWCDomain (#17023 ) ### Description Fix some Resize failing tests. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>	2023-08-10 09:14:43 -07:00
guyang3532	ef6f4a4aa1	support broadcast shape for elementwise node in padding elimination (#16710 ) With PaddingElimination optimizer, input1 of element-wise op may be flattened like: ``` input1 (shape:[batch_size, seq_len, ...]) input1 (shape:[valid_tokens, ...]) \ \ \ input2 \ input2 \ / -----> \ / \ / \ / Element-wise Op Element-wise Op ``` So, the shape of input2 should be processed accordingly: 1. If input2.shape.dim_size <= input1.shape.dim_size-2, i.e. input2 has no [batch_size, seq_len] at begining, we needn't to process the shape of input2 because it's compatible with the flattened shape of input1 (shape:[valid_tokens, ...]). 2. If the shape of input2 has the same dim_size with shape of input1 and has [batch_size, seqlen] at begening, to be compatible with flattened shape of input1, we need to insert flatten pattern for input2 also, which flatten the shape of input2 from [batch_size, seq_len, ...] to [valida_tokens, ...]. 3. (which done in this pr) In other case for shape of input2, like [1, seq_len, ...] or [batch_size, 1, ...], we firstly need to expand it to [batch_size, seq_len, ...] which is convenient to flatten. And then insert flatten pattern.	2023-08-10 19:07:22 +08:00
cloudhan	b4e0fc87ea	[ROCm] Make KE reports with better format (#17049 )	2023-08-10 17:44:32 +08:00
pengwa	0471f6fbb3	Check type for building gradient graph (#17046 ) ### Check type for building gradient graph Bug1: To fix the error when running the model with ORTModule + Stage 3: ``` Exception happens when running <bound method Function.apply of <class 'onnxruntime.training.utils.hooks._zero_offload_subscriber.ORTZeROOffloadPreForwardFunction'>> Traceback (most recent call last): File "/bert_ort/pengwa/py38/lib/python3.8/site-packages/onnxruntime/training/ortmodule/_custom_autograd_function_runner.py", line 207, in call_python_forward_function wrapped_arg.requires_grad = is_training_mode and grad_flag RuntimeError: only Tensors of floating point and complex dtype can require gradients ``` This is because when running PythonA, the 3rd input is int64, we find it requires gradient during the check in gradient builder, so we set its requires_grad = True, but PyTorch thinks it is incorrect, throwing the exception. So we need understand why ORT gradient builder think the 3rd input need gradients. During `ReverseBFSWithStopGradient`, which do reverse BFS from graph outputs, it collects all nodes that are needed for computing the graph outputs. `ReverseBFSWithStopGradient` define a queue, initially add all nodes that generate graph outputs, then iterate the nodes one by one, checking each node's input, if the input did not hit stop edge and its node arg type is allowed type (float, etc), then the input node is append into the queue, do the next iteration of work. PythonOpA is such a node that is needed to compute graph outputs, then IsReachable(PythonOpA) will return True. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/c4c53fb9-15f7-4e8d-9aa2-7fc20555a001) In the above code snippet, when node is PythonOpB, and next_node being PythonOpA, we did not check node_arg type between node and next_node on the connection of PythonOpA's 3rd input to PythonOpB's outputs. So we append the int64 typed node args to sets that require gradient. Fix1: add the node arg type check before appending it into require grad lists. After the fixing, A unit test failed "orttraining_test_ortmodule_api.py::test_gradient_correctness_minmax[data_type0-True-0-min] Fatal Python error: Segmentation fault". After investigation, it is another bug. Bug2: Without the above fix1, the execution graph looks like this ![image](https://github.com/microsoft/onnxruntime/assets/10530022/b2fd4b03-95c7-414a-b268-2ba6a7300105) As you can see, int64 type has a gradient edge built, while it is not used for any consumers. And the execution runs well. While think twice, int type should not have grad edge built. With the Fix1, the execution graph looks like this; ![image](https://github.com/microsoft/onnxruntime/assets/10530022/1870d3cc-2fe5-4aa7-ad6b-0d88dcc40f8a) So the int type node arg did not has gradient edge built. Fix1 is fixing this problem. But another bug happens if the inital "y_node_arg_names" e.g. in this case Aten's two outputs, 1st one in float, 2nd one in int. When we check the y_node (`6e6f582e08/orttraining/orttraining/core/framework/gradient_graph_builder.cc (L60C16-L60C16)`), we did not check the data type, then add it into `y_node_args_` which is the list of graph output node args that requires gradient. Then `non_differentiable_y_node_arg_names_` did not has the int type graph output. Then `6e6f582e08/orttraining/orttraining/core/framework/ortmodule_graph_builder.cc (L312C18-L312C18)` will try to get the grad node arg into `yield_output_node_args`, BUT the grad node arg is not built for int type node arg (with the Fix1). So we insert a nullptr, later when we using it, we get segment fault. Fix2 Again, we add the type check when handle y_node_args, also add null check when getting gradient node arg and append into yield_output_node_args	2023-08-10 14:24:42 +08:00
Baiju Meswani	31cbd63af7	GRU Training and GRU Gradient Kernels (#16929 )	2023-08-09 21:24:47 -07:00
BoarQing	249c2221b6	[VITISAI] remove unused code (#17066 ) ### Description <!-- Describe your changes. --> remove unused code ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> remove unused code	2023-08-09 21:07:36 -07:00
Jeff Daily	dbbfc249f7	[ROCm] update header and binary search paths used by cmake (#17083 ) This is in preparation for planned ROCm 6.0 changes that are not backward compatible. However, the adjustments made by this PR to the current onnxruntime cmake files will work with ROCm 5.x and 6.x.	2023-08-10 11:05:21 +08:00
PeixuanZuo	7c7c991417	[ROCm] Workaround type conversion issue (#17074 )	2023-08-10 11:04:11 +08:00
Patrice Vignola	7201dbebe5	[DML EP] Split fused kernels when the persistent resource is too big (#16780 ) The approach is the following: 1. Build partitions 2. Try compiling each partition into a `IDMLCompiledOperator` 3. If the compiled operator's persistent resource is bigger than 4GB, tell the partitioner to split the partition in the middle and try again. 4. Once all partitions have been successfully compiled into an `IDMLCompiledOperator`, fuse the partitions into an ORT operator and register them all. This change is relatively simple (basically a basic retry mechanism), but it required a lot of refactoring just to make sure that we don't modify the graph until all partitions have been compiled successfully. This is because partly modifying the graph before making sure that all partitions can be compiled will break future retries. This path is not expected to be used a lot, and even then the loop is not expected to loop more than twice very often. This is a very specific edge case for large models that were able to merge a large number of nodes into a single partition.	2023-08-09 19:53:15 -07:00
BoarQing	e951f837e4	[VITISAI] fix out of bound error on graph with loop (#17065 ) ### Description <!-- Describe your changes. --> Check the bound of the node_get_inputs for out of bound error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Model with loop would encounter this error. Currrent we do not support custom op for loop. So, ideally it would throw an error and fall back to CPU evalution.	2023-08-09 18:38:30 -07:00
Baiju Meswani	f17efb5c7b	Copy to buffer for both trainable as well as non trainable parameters (#17070 )	2023-08-09 17:23:24 -07:00
Hector Li	555f346923	[QNN EP] Enable DepthToSpace & SpaceToDepth Ops (#17038 ) ### Description [QNN EP] Enable DepthToSpace & SpaceToDepth Ops	2023-08-09 16:52:15 -07:00
Zimon Tai	a3e02e8e2a	Fix Resize op input check (#16594 ) ### Description onnxjs contains a `Resize` op input check which is outdated since opset 9. Currently `Resize` supports up to 4 inputs. This PR looses the input check. ### Motivation and Context Fixes #15636	2023-08-09 15:42:30 -07:00
Changming Sun	7d340256f1	Add "windows_sdk_version" build arg and fix SCA build pipeline (#17062 ) ### Description 1. Add "--windows_sdk_version" argument to build.py 2. Fix Windows Static Analysis build pipeline. It is failing because it picks up a different Windows SDK version after a build machine image update. If we can explicitly specify Windows SDK version, we can avoid such things happening again. 3. Remove --enable_training from Windows Static Analysis build pipeline because PR #16993 makes it incompatible with "no_rtti". AB#18315	2023-08-09 14:01:16 -07:00
Adrian Lizarraga	d793e239b0	[QNN EP] Increase tolerance for ReduceProd test on x64 Windows (#17078 ) ### Description Slightly increases the allowable error tolerance for ReduceProd tests on x64 Windows/Linux with the QNN CPU backend. ### Motivation and Context A recent [PR](https://github.com/microsoft/onnxruntime/pull/16916) updated the input range for ReduceProd tests, which uncovered an inaccuracy for ReduceProd on x64 Windows/Linux with the QNN CPU backend. This PR updates the allowable error tolerance and adds a TODO for investigation. This is needed to ensure the QNN_Nuget_Windows pipeline runs successfully.	2023-08-09 13:52:14 -07:00
Patrice Vignola	4bc2287a85	Fix GroupNorm tests failing when no providers are supported (#17054 )	2023-08-09 13:14:13 -07:00
RandySheriffH	a7542f48d6	Make AzureEP default for python and c# packaging (#17025 ) Make AzureEP default for python and c# packaging, with UT. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-09 12:36:52 -07:00
sfatimar	2c5d4dce77	Openvino ep ort 5.1 (#17042 ) OpenVINO EP ORT 5.1 Branch Changes for the new API to take in OpenVINO Provider Options and compatibility with OV 2023.1 ### Motivation and Context The change is required for the new API to take in OpenVINO Provider Options and make it seamless. --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: saurabhintel0 <saurabh1.kale@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-08-09 11:50:10 -07:00
Adam Pocock	03c3e91b0d	[java] Relaxing CoreML test (#16777 ) ### Description Reduces precision on the CoreML provider test as it returns slightly different answers than the other tested providers. Checked on a 2020 13" M1 MBP. ### Motivation and Context Fixes Java CoreML test failure after #16763.	2023-08-09 11:43:05 -07:00
Dmitri Smirnov	07dfe34714	Fix FunctionProto visualization (#17063 ) ### Description Title ### Motivation and Context Need to debug function protos	2023-08-09 11:05:52 -07:00
Chi Lo	7361c283c7	Add API for updating CUDA EP provider option user compute stream (#17037 ) Add a generic `UpdateCUDAProviderOptionsWithValue()` C API to update CUDA EP provider options where its data type is pointer that can't be represented by string. Note: Please see some comments for the similar [PR ](https://github.com/microsoft/onnxruntime/pull/16965)for TRT EP.	2023-08-09 09:24:19 -07:00
cloudhan	a4902ee65b	[CUDA][ROCm] Allow allocating ScratchBuffer from TuningContext (#17028 ) By switching to ort native stream, we can allocate scratch buffer directly from tuning context.	2023-08-10 00:05:10 +08:00
pengwa	6e6f582e08	Use full qualified name for PythonOp export (#17021 ) ### Use full qualified name for PythonOp export Originally, when there are duplicate named torch.autograd.Function in different module, for example: `a.b.c.Gelu` v.s. `d.e.func.<locals>.Gelu` We by default will throw exception to let user be aware we cannot distinguish the two Gelu because during model export, we did not module path. The workaround is we introduced `ORTMODULE_SKIPPED_AUTOGRAD_FUNCTIONS` to ignore those duplicated named Gelu that is not used by model run. This has limitations obviously for example if two Gelus are both used in training. This PR finds a way to construct a full qualified name. `def _export_pt_1_10(g, n, args, *kwargs):` 1. in exporter function, kwargs contains `name` and `module`, in the above example: `a.b.c.Gelu` --> name: `Gelu`, module: `a.b.c` `d.e.func.<locals>.Gelu` --> name: `Gelu`, module: `d.e` Using name and module is not enough to get a full qualified name, for the second case, where `d.e` is the module path, then there is a function called `func`, in this function, there is a local auto.grad.Function named `Gelu`. (Many of our UT looks like this). We can only get `d.e.Gelu`, but this is not the correct full qual name. The reason for this: `kwargs[name]` or `n.name` only return the class's name, not the class's full qual name. (be noted kwargs[module]` is correct). 2. `n` is torch.Node, we can access `pyobj` to get the torch.autograd.Function's apply method instance, then use `._self` to get the torch.autograd.Function class. Then we can get the `module` and `class`'s ful qual name, added together, we get the full qual name. With the above change, we don't need use `kwargs[name]` and `kwargs[module]` , and don't need check naming conflicting or `ORTMODULE_SKIPPED_AUTOGRAD_FUNCTIONS` env var any more.	2023-08-09 10:58:33 +08:00
Dmitri Smirnov	c424e42594	[C++] Correctly handle scalar inputs in reduction ops, enforce Transpose perm attribute matches input rank. (#17041 ) ### Description This PR addresses the following issues related to the use of the functions in ORT. - https://github.com/microsoft/onnxruntime/issues/16492 - https://github.com/microsoft/onnxruntime/issues/16997 - https://github.com/microsoft/onnxruntime/issues/14678 - Partially addresses https://github.com/microsoft/onnxruntime/issues/16813 The optimization case for a scalar input did not correctly recognize it as such. Transpose kernel assumed that `perm` attribute would always match input tensor rank. ### Motivation and Context The issues causes crashes and erratic behavior.	2023-08-08 14:47:01 -07:00
Tianlei Wu	fb11c67368	Fix SkipLayerNorm for 2D input (#17014 ) Fix an obvious bug: (1) In packing mode, the input for SLN has two dimensions (introduced by #15283): [token_count, hidden_size]. Current code of `element_count = input_dims[0] * sequence_length * hidden_size` will use element_size = token_count * hidden_size * hidden_size, and causes invalid memory write in cuda kernel and ORT crash and two minor issues: (2) potential integer overflow in `static_cast<int>(element_count)` (3) some dead code after `return LaunchSkipLayerNormKernel` that will never have chance to run.	2023-08-08 14:04:03 -07:00
Chi Lo	73037978f8	Add PerThreadContext for TRT EP (#16599 ) Maintaining one execution context on a per thread basis is suggested per TRT [doc](https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#threading) to avoid synchronization issue. For previous TRT EP, we did see synchronization issues when running multithreading on some models, for example, FasterRCNN. This PR leverages per thread context implementation from CUDA EP. Followings are the modifications: - Move CUDA graph and IExecutionContext objects to per thread context. - Remove lock_gruad that previously placed for the whole compute_func() and put lock_gruad in the blocks where multiple threads may update kernel function state, access one builder, create/serialize/save engine, save profile and serialize/save timing cache. - On CentOS, don't unload TRT EP shared library and leave it around, so that destructor of thread local data is still accessible upon thread exits. Note: Tested this PR with onnxruntime_perf_test and the overhead of PerThreadContext is small.	2023-08-08 13:02:34 -07:00
Yulong Wang	56bced0581	[js/web] enable webgpu in browser unit test (#16310 ) ### Description enable webgpu in browser unit test. The CI pipeline uses Edge v113+ which enables WebGPU. === UPDATE on 08/07/2023: - add flags to Edge browser launch commandline so that Edge on CI agents can initialize WebGPU correctly. - ONLY enable webgpu on web release build. Other pipelines are using flag `-b=wasm,webgl,xnnpack` to specify the other 3 backends explicitly. - disable "Resize" related test failures. Once they are fixed the tests can be re-enabled. --------- Co-authored-by: Satya Jandhyala <satya.k.jandhyala@gmail.com>	2023-08-08 11:45:04 -07:00
Arthur Islamov	c3f04251c7	[js/web] JSEP LayerNormalization and InstanceNormalizations kernels (#16830 ) ### Description Added two kernels for Layer and Instance norm Also added maximum limits for `maxBufferSize` when requesting GPU device as by default it's limited to 256mb and it fails allocating 600mb buffer while running fp32 StableDiffusion weights. ### Motivation and Context These two are used in StableDiffusion and many other networks	2023-08-08 09:09:37 -07:00
Chi Lo	5b9bf8b663	[TensorRT EP] Fix bug for using correct device id for EP allocator (#17036 ) The code always uses device id 0. Fix to use provider option `device_id_`	2023-08-08 09:06:44 -07:00
Edward Chen	50719d2f8e	[iOS] Add script to get simulator device info. (#17012 ) Add script to get iOS simulator device info so we don't need to use hardcoded specifiers which may or may not refer to a valid simulator device. Add use-xcode-version step to a packaging pipeline so it uses a consistent version of Xcode.	2023-08-08 09:04:06 -07:00
Ti-Tai Wang	45ea907f53	Fix orttraining_test_dort.py (#17034 ) Converter has moved `opset_version` out from `torch.onnx.ExportOptions`, and put it into `torch.onnx.OnnxRegistry`. This PR fixes the usage in DORT.	2023-08-08 08:11:48 -07:00
Xavier Dupré	d0316ee768	Updating QDQ to support Float8E4M3FN (#16550 ) ### Description Naive update quantization tools to support Float8E4M3FN for Gemm.	2023-08-08 12:18:48 +02:00
RandySheriffH	063e9054b8	RunAsync in C# (#16890 ) Implement c# binding for RunAsync. --------- Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-08-07 22:19:38 -07:00
Baiju Meswani	249917a093	Add mac and windows python packages for onnxruntime-training (#16993 )	2023-08-07 20:32:55 -07:00
Yi-Hong Lyu	e48dc3b281	Parallelize Transpose (#16854 ) It gives up to 5.6% improvement for prompt and 2.3% improvement for token generation in LLaMA 7B case.	2023-08-07 14:25:53 -07:00
Chen Fu	3c10f027de	4b quantization for weights of LLMs (#16833 ) ### Description Blockwise 4b quantization for LLMs. 1. Introduce 4b block-wise quantization for linear layer weights. 2. Implements matrix multiplication kernel for fp32 x int4 3. Implements special operator MatMulFpQ4 4. Implements quantization tool, that convert MatMul operator to MatMulFpQ4, when the right hand side is 2D const tensor. ### Motivation and Context Compress and accelerate LLMs \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|Q4GEMM/Q4Sym/M:1/N:4096/K:4096/Threads:8\| 218054\| \|Q4GEMM/Q4Sym/M:1024/N:4096/K:4096/Threads:8\| 35830155\| \|Q4GEMM/Q4Sym/M:2048/N:4096/K:4096/Threads:8\| 73479790\| \|Q4GEMM/Q4Zp8/M:1/N:4096/K:4096/Threads:8\| 270152\| \|Q4GEMM/Q4Zp8/M:1024/N:4096/K:4096/Threads:8\| 35826721\| \|Q4GEMM/Q4Zp8/M:2048/N:4096/K:4096/Threads:8\| 73021200\| \|Q4GEMM/Q4Sym128/M:1/N:4096/K:4096/Threads:8\| 213832\| \|Q4GEMM/Q4Sym128/M:1024/N:4096/K:4096/Threads:8\| 36749874\| \|Q4GEMM/Q4Sym128/M:2048/N:4096/K:4096/Threads:8\| 72618120\| \|Benchmark \| Time(ns)\| \|-------------\|----------\| \|SGEMM/LLM/M:1/N:4096/K:4096/Threads:8\| 522610\| \|SGEMM/LLM/M:1024/N:4096/K:4096/Threads:8\| 39237689\| \|SGEMM/LLM/M:2048/N:4096/K:4096/Threads:8\| 75983467\| --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-08-07 12:23:55 -07:00
Ti-Tai Wang	8a335b8347	Update torch.onnx.OnnxRegistry usage in DORT tests (#17009 ) Update the usage of torch.onnx.OnnxRegistry, as it's officially published in PyTorch: https://github.com/pytorch/pytorch/pull/106140. --------- Co-authored-by: Wei-Sheng Chin <wechi@microsoft.com>	2023-08-07 10:15:51 -07:00
Khalia Spear	4e6ea730d6	Broadcasting for SLN for CPU and CUDA (#16510 ) ### Description Enhanced SkipLayerNorm by implementing broadcasting for both CPU and CUDA ### Motivation and Context The input and skip tensors no longer have to be the same size which means that it can accept data where the skip shape can be the same size as the input shape, have a shape of {1, sequence_length, hidden_size}, or {sequence_length, hidden_size}. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2023-08-07 09:55:42 -07:00
pengwa	3649376f09	Fix few small bugs (#17019 ) ### Fix few bugs 1. symbolic shape infer, there is no None check before get length. 2. Rename PythonOp/PythonOpGrad's attribute `name` to `func_name`, otherwise, when we use onnx.helper.make_node to create node, `name` conflicts with node name. 3. Filter shape inference warnings for PythonOp for torch 2.0 or newer. 4. Close file descriptor for log suppression. Without the fix, two extra fd is left after the log suppression exit its context. Before enter log suppression (left), Before exit log suppression (right) ![image](https://github.com/microsoft/onnxruntime/assets/10530022/3cd3057a-59f9-4c89-8359-d9b32c49a17e) With the fix, no fd added after context exit. ![image](https://github.com/microsoft/onnxruntime/assets/10530022/03454a8f-ab48-4552-bb9b-293a4f51be67)	2023-08-07 14:01:36 +08:00
Chi Lo	a451318820	Refactor TRT EP error message with details (#17007 ) If users use `trt_profile_min_shapes`, `trt_profile_max_shapes` and `trt_profile_opt_shapes`, they need to provide all the dynamic shape input with associated shape profiles. In the case of the main graph is partitioned into TRT/CUDA subgraphs, if the input of the subgraph is also dynamic shape, users need to provide its shape profiles as well. User might not notice, so TRT EP will tell them which input shape profiles need to be provided. New warning message is : ``` Traceback (most recent call last): File "/home/azureuser/disk2/debug/optional_inputs.py", line 218, in <module> test_optional_input_dynamic(trt_profile=True, optional=True) File "/home/azureuser/disk2/debug/optional_inputs.py", line 195, in test_optional_input_dynamic session = ort.InferenceSession( File "/home/azureuser/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/home/azureuser/anaconda3/lib/python3.9/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 471, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.EPFail: [ONNXRuntimeError] : 11 : EP_FAIL : User needs to provide all the dynamic shape inputs with associated profiles if they want to explicitly set profiles through provider options. Please note that main graph could be partitioned into TRT/CUDA/CPU subgraphs, in this case, user also needs to provide shape profiles for the TRT subgraph's input if it's dynamic shape input. Following input(s) has no associated shape profiles provided: x1 ``` Please see this github issue: https://github.com/microsoft/onnxruntime/issues/16600	2023-08-06 09:04:21 -07:00
Dmitri Smirnov	d5e4bdbe7d	Fix protobuf TaggedStringPtr display (#17008 ) ### Description <!-- Describe your changes. --> Adjust nativs to display tagged strings. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Hard to debug without seeing names.	2023-08-04 17:51:01 -07:00
Sheil Kumar	78a5f049f4	[DML] Model corrupter during layernorm fusion and DmlNonZeroOperator crashes (#16918 ) [DML] Model corrupter during layernorm fusion and DmlNonZeroOperator crashes Two issues fixed in this PR: 1) Changes to layernom fusion regressed DirectML. This has been disabled for DML to unblock models. 2) DmlNonZero needs to create an operator call that needs to know the number of non-zero elements (size in bytes). Therefore this needs to be allocated during compute, but is being allocated during initialization. This causes the output tensor size to mismatch with the operator's expectations. --------- Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2023-08-04 17:44:54 -07:00

1 2 3 4 5 ...

9347 commits