onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-25 22:26:24 +00:00

Author	SHA1	Message	Date
dependabot[bot]	4f2d454211	Bump Sixlabors.ImageSharp from 2.1.1 to 2.1.7 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#19806 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.1 to 2.1.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.7</h2> <h2>What's Changed</h2> <ul> <li>[release/2.1] Disallow allocation attempts of unrepresentable sizes by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2553">SixLabors/ImageSharp#2553</a></li> <li>[release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2554">SixLabors/ImageSharp#2554</a></li> <li>[release/2.1] PBM decoder robustness improvements and BufferedReadStream observability by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2555">SixLabors/ImageSharp#2555</a></li> <li>Backport 2681 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2688">SixLabors/ImageSharp#2688</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7">https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7</a></p> <h2>v2.1.6</h2> <h2>What's Changed</h2> <ul> <li>Backport - Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2524">SixLabors/ImageSharp#2524</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6">https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6</a></p> <h2>v2.1.5</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2501">#2501</a> by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2509">SixLabors/ImageSharp#2509</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5">https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5</a></p> <h2>v2.1.4</h2> <h2>What's Changed</h2> <ul> <li>Backport WebP fix to 2.1 by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2420">SixLabors/ImageSharp#2420</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4">https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4</a></p> <h2>v2.1.3</h2> <h2>What's Changed</h2> <ul> <li>V2 Backport: 2133, 2154 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2157">SixLabors/ImageSharp#2157</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3">https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3</a></p> <h2>v2.1.2</h2> <h2>What's Changed</h2> <ul> <li>Backport - Issue 2123 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2126">SixLabors/ImageSharp#2126</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2">https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`fa7d712702`"><code>fa7d712</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2688">#2688</a> from SixLabors/js/backport-2681</li> <li><a href="`36b3533cc3`"><code>36b3533</code></a> Use correct property to disable upstream warnings.</li> <li><a href="`94bb7615a1`"><code>94bb761</code></a> Update ImageSharp.csproj</li> <li><a href="`3ea2574726`"><code>3ea2574</code></a> Update PngDecoderCore.cs</li> <li><a href="`e74a55fbfd`"><code>e74a55f</code></a> [release/2.1] PBM decoder robustness improvements and BufferedReadStream obse...</li> <li><a href="`749b1c04d7`"><code>749b1c0</code></a> [release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2554">#2554</a>)</li> <li><a href="`3064b78927`"><code>3064b78</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2553">#2553</a> from SixLabors/backport/2.1.x/2545</li> <li><a href="`f36ec12695`"><code>f36ec12</code></a> Disallow allocation attempts of unrepresentable sizes </li> <li><a href="`688e242a84`"><code>688e242</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2524">#2524</a> from SixLabors/js/backport-fix-jpeg-dos</li> <li><a href="`0f17a8be9c`"><code>0f17a8b</code></a> Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack.</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.1&new-version=2.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-05 08:32:18 -07:00
Edward Chen	2b3071119a	Add onnxruntime/test/run_benchmark.py helper script. (#19234 ) ### Description Add onnxruntime/test/run_benchmark.py helper script to repeat benchmark runs until a target coefficient of variance is reached. It works with [Google Benchmark](https://github.com/google/benchmark) programs like `onnxruntime_mlas_benchmark`. ### Motivation and Context Sometimes there is variability in benchmark run results. This automates the repeated running needed to get results that are stable enough.	2024-04-05 07:02:01 -07:00
Hans	6abfb6b928	[js/rn] Support load external data (#20090 ) Support load external data by passing local model path	2024-04-05 05:55:03 -07:00
Scott McKay	f61cca1b8f	NNAPI: Improve MatMul diagnostic output (#19721 ) ### Description <!-- Describe your changes. --> Re-order so that we don't get two messages for the one node. Currently the batched matmul 'not supported' message will appear for 2D input which is valid, which can be confusing to understand. Change the order so we only check if batched matmul can be used when the input ranks are > 3, as that is one of the requirements. `c311d1faf5/onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder_helpers.cc (L257-L264)`	2024-04-04 21:58:39 -07:00
Thomas Boby	254bdbb19d	OneDNN/dnnl: Fix filepath after dnnl move (#20086 ) ### Description This adjusts the path used in the nuget script for dnnl to the new location of the file. There isn't a CI pipeline for this as far as I can tell, and I can't easily confirm this change works on master, so please check. ### Motivation and Context It is currently not possible to build onednn nuget packages. It's possible that the correct action would be to move the file not fix this path, but I'm not familiar enough with the repository layout. --------- Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2024-04-04 21:24:49 -07:00
Yi Zhang	4ea54b82f9	[Fix] Upload training CUDA daily wheel (#20183 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-03 13:18:26 +08:00
Andrew Fantino	7303a90f49	Fix build errors from date/date.h C++20 compatibility (#20139 ) ### Description For C++ standards >= 20, use `std::chrono::operator<<` in place of `date::operator<<` to fix ambiguous operator compile error. ### Motivation and Context The external dependency HowardHinnant/date has a conflict with std::chrono for >=C++20. Solves #20137	2024-04-02 22:10:25 -07:00
Yi Zhang	dae77e6014	Support building Windows CUDA with Ninja (#20176 ) ### How to run it locally 1. conda install ninja 2. "C:\Program Files\Microsoft Visual Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64 3. python.exe {ort_repo}\tools\ci_build\build.py --config RelWithDebInfo --build_dir {ort_repo}\build_cuda --skip_submodule_sync --build_csharp --update --parallel --cmake_generator "Ninja" --build_shared_lib --enable_onnx_tests --enable_pybind --build_java --build_nodejs --use_cuda "--cuda_home=C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.8" --enable_cuda_profiling --cmake_extra_defines CMAKE_CUDA_ARCHITECTURES=60 4. cd build_cuda\RelWithDebInfo 5. cmake --build . j16 ### Motivation and Context In packaging pipelines, we often come across a random issue that the building with CUDA on Windows takes too much time. Although it has been reduced much by moving the building to the CPU machine. We're planning to build with Ninja instead of msbuild in Packaging pipelines, thus, nvcc can run parallelly. It's the first step to support it locally.	2024-04-03 11:19:31 +08:00
Yulong Wang	fa1917b81b	[js/webgpu] add validation to workgroup size (#20110 ) ### Description add validation to workgroup size in `shaderHelper.mainStart()`.	2024-04-02 19:29:20 -07:00
Shubham Bhokare	be831e1ba3	Export of Openai Whisper with batched prompts (#19854 ) Adds an example to demonstrate the export of openai whipser implemenation with batch_size > 1 and addition of prompts for each audio snippet. Also handles the scenario for when prompts are not of the same size. For example if our prompt ids are [p1_id_1, p1_id_2] and [p2_id_1], the final decoder_input_ids will look as such after padding: `[prev_token, p1_id_1, p1_id_2, start_token, lang_token, transcribe_token] [prev_token, p2_id_1, PAD_TOKEN, start_token, lang_token, transcribe_token]` --------- Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>	2024-04-02 17:01:48 -07:00
Rachel Guo	19793de1b3	#19921 [Dup] LLC Core count calculations updated (#20171 ) ### Description <!-- Describe your changes. --> See #19921 Just to address one comment: https://github.com/microsoft/onnxruntime/pull/19921#discussion_r1543398640 since this is an external branch. need to open another pull request for this. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com> Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Jian Chen <cjian@microsoft.com>	2024-04-02 16:53:47 -07:00
Dmitri Smirnov	12e2538065	Add new SessionOptions config entry to disable specific transformers and rules (#20135 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Certain transformers slow down session loading time while providing no runtime perf benefits. Allow clients to exclude them.	2024-04-02 16:33:05 -07:00
Chi Lo	e916929371	[TensorRT EP] Address compiler warnings on Windows (#20134 ) Previous [PR ](https://github.com/microsoft/onnxruntime/pull/19663)changes msvc compiler warning level from set_msvc_c_cpp_compiler_warning_level(3) to set_msvc_c_cpp_compiler_warning_level(4) when using CUDA EP (it also applies to TRT EP). Some warnings still need to be addressed in TRT EP code.	2024-04-02 10:39:46 -07:00
Xu Xing	a2998e5d42	[js/webgpu] Use global id in attention and instance-norm (#20008 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-02 01:42:39 -07:00
Adam Pocock	262b6bd3b7	[java][DML EP] Modifying dml_provider_factory.h so it can compile as a C header file (#20157 ) ### Description The dml_provider_factory header file can't be used in C programs as it defines C++ inline operators. This PR rearranges that header file so that it looks like valid C when used from C, and also makes a couple of small modifications to the Java code so it correctly binds to the DML EP at build time. I'm having some difficulty testing it as I think it's pulling in the old version of DirectML on my computer and I can't figure out what the library loading path is in Java to make it look at the recent version I downloaded. So the test I added fails with: ``` InferenceTest > testDirectML() FAILED ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Exception during initialization: <path-to-ort>\onnxruntime\core\providers\dml\DmlExecutionProvider\src\AbiCustomRegistry.cpp(518)\onnxruntime.dll!00007FFF74819333: (caller: 00007FFF74793509) Exception(3) tid(4f58) 80070057 The parameter is incorrect. at app//ai.onnxruntime.OrtSession.createSession(Native Method) at app//ai.onnxruntime.OrtSession.<init>(OrtSession.java:74) at app//ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:236) at app//ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:221) at app//ai.onnxruntime.InferenceTest.openSessionSqueezeNet(InferenceTest.java:1961) at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:665) at app//ai.onnxruntime.InferenceTest.testDirectML(InferenceTest.java:657) ``` But it does correctly compile, and this error seems very similar to other issues with the DML provider when it doesn't like a model due to the loaded library being old. The test is using the squeezenet file that's been in the repo since 2019. If someone can help me figure out how to get the right version of DML in the library path I can test it more on my end. I tried adding the folder with the new version into the system path, but I'm not very familiar with Windows' library loading behaviour. ### Motivation and Context Fixes #19656 to allow use of the DirectML EP from ORT Java. cc @martinb35	2024-04-01 21:58:50 -07:00
Xiaoyu	3979f53aa4	Update api backward compatibility (#20136 ) ### Description Update api backward compatibility ### Motivation and Context Update api backward compatibility	2024-04-01 21:37:56 -07:00
wangshuai09	3e2b659fce	[CANN] Add dump_om_model flag (#20075 ) ### Description New flag of `dump_om_model` for CANN EP, which defaults to "True". ### Motivation and Context When building an onnx model with CANN EP, the intermediate OM(offline model for Ascend NPU) is automatically saved. There are some users don't want to dump OM when resources are limited. This PR will resovle this situation with `dump_om_model=False`	2024-04-01 21:35:29 -07:00
Dhruv Matani	742d413586	Fix bug related to export failure for DynamicQuantizeLSTM [issue 15465] (#20160 ) ### Description See issue 15465: https://github.com/microsoft/onnxruntime/issues/15465 This PR just applies the workaround suggested in the thread that I and numerous others on the thread have validated to work for them and allows them to successfully export a PyTorch model with LSTM layers that are dynamically quantized by ONNX. ### Motivation and Context It is not possible to successfully export a dynamically quantized LSTM model that I have trained for use in the onnx runtime without this change. Currently, this workaround lives as a local change in my python package directory, and makes it basically impossible for anyone else at the place I work at to successfully export the quantized model that I am exporting. See issue 15465: https://github.com/microsoft/onnxruntime/issues/15465 Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com>	2024-04-01 21:33:00 -07:00
Yufeng Li	91654988fd	optimize threading of mha (#20088 ) ### Description <!-- Describe your changes. --> The cost computation of ComputeVxAttentionScore is wrong. It should be sequence_length * v_head_size * total_sequence_length instead of sequence_length * v_head_size * sequence_length. The PR also fine-tuned the cost computation. on my local box with i9 cpu, the performance is same as unfused version, but it is much faster on an azure vm with 16 threads. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> https://github.com/microsoft/onnxruntime/issues/19924	2024-04-01 21:32:36 -07:00
Atanas Dimitrov	9d06e1bfa4	Label encoder fusion (#19761 ) ### Description Created a new `LabelEncoderFusion` pass. This is useful in model that result from automatic conversion tools related to data-science: sometimes the produced model contains consecutive `LabelEncoder`-s. To merge 2 `LabelEncoder`-s the optimizer propagates the outputs of the first encoder through the second one. ### Motivation and Context This enhances the capabilities of the `onnxruntime::optimizer` by fusing consecutive `LabelEncoder` nodes. ### Fusion examples ``` Applying fusion node1: (a,C) (b,B) (c,A) -> Default: _Unused node2: (A,1) (B,2) (C,3) -> Default: -1 fused: (a,3) (b,2) (c,1) -> Default: -1 Applying fusion node1: (a,C) (b,B) (c,A) -> Default: D node2: (A,a) (B,b) (C,c) (D,d) -> Default: default fused: (a,c) (b,b) (c,a) -> Default: d Applying fusion node1: (a,0) (b,1) (c,2) -> Default: -1 node2: (2,a) (1,b) (0,c) -> Default: default fused: (a,c) (b,b) (c,a) -> Default: default Applying fusion node1: (a,3) (b,2) (c,1) -> Default: -1 node2: (1,a) (2,b) (3,c) -> Default: d fused: (a,c) (b,b) (c,a) -> Default: d ``` --------- Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2024-04-01 09:41:10 -07:00
Yi Zhang	523ef04240	enable lto in Python-CUDA-Packaging Pipline (#20164 ) ### Description Except [Python-CUDA-Packaging pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1299&_a=summary), all windows cuda packaging jobs have been running well now. After comparison, enable_lto isn't added in the pipeline, which might be one root cause of the random hang. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-01 15:42:28 +08:00
Sumit Agarwal	e1e292f94c	[DML EP] DML Graph Serialization Bug (#19748 ) ### Description This pull request addresses several issues: - The DML Graph's nodes were not sorted in a topologically ordered sequence, leading to crashes during deserialization when a child node preceded its parent node. This PR resolves this issue by implementing a topological sorting algorithm before serialization. - During the `RemoveUnconnectedNodes` process: - we update `intermeidateEdge.FromNodeIndex`. Additionally, we must update `intermediateEdge.Name` when it includes `intermediateEdge.FromNodeIndex`, as serialization/deserialization heavily relies on edge names. - we also eliminate unused edges. Consequently, we must erase inputs (now unused) from corresponding maps `serializedGraphInputIndexToSubgraphInputIndex` and `serializedGraphLargeConstantNameToSubgraphInputIndex`. ### Motivation and Context Why is this change required? What problem does it solve? There are few ONNX Zoo public models which were crashing during deserialization. <!-- - - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>	2024-03-31 14:41:42 -07:00
kunal-vaishnavi	a0ebd5fee5	Add flash attention v2 and INT4 CUDA for LLaMA E2E benchmarking (#20149 ) ### Description This PR adds flash attention v2 and support for INT4 CUDA benchmarking in PyTorch. ### Motivation and Context The [flash attention v2](https://github.com/Dao-AILab/flash-attention) algorithm helps improve model performance in PyTorch. Support for INT4 CUDA in PyTorch is done through the [`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) package.	2024-03-29 23:09:37 -07:00
mo-ja	00244ea143	fix quantization errors of ConvTranspose with per_channel=True (#19996 ) ### Description <!-- Describe your changes. --> - update axis value for per_channel quantization of QDQConv - we should use `axis=1` for ConvTranspose operator. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - this PR fixes https://github.com/microsoft/onnxruntime/issues/19694, which I have opened	2024-03-29 21:36:15 -07:00
Ye Wang	f3a864217f	Fix MoE tensor parallelism tests (#20147 ) ### Description <!-- Describe your changes. --> Previously the expert weights are in row-major. But with the updated cutlass extension introduced by https://github.com/microsoft/onnxruntime/pull/20108, weights are stored in col-major that aligns with Pytorch implementation. This change fixes the way the tensors are sliced across shards. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-29 16:10:09 -07:00
Jeff Bloomfield	2f31560430	Enable generic feature level devices in DML EP (#20114 ) ### Description Enable NPUs supporting DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML and D3D_FEATURE_LEVEL_1_0_GENERIC with DML EP. This also begins ingesting DX headers through the DirectX-Headers repo. Note that this includes an update to cgamanifest.json for onnx-tensorrt which is triggered during re-generation due to a prior changes to deps.txt. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-29 14:37:30 -07:00
cao lei	604b284261	add API function GetAliasMap and ReleaseAliasMap in OrtCustomOp (#20145 ) ### Description <!-- Describe your changes. --> Add API function GetAliasMap and ReleaseAliasMap in OrtCustomOp ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Add API function GetAliasMap and ReleaseAliasMap in OrtCustomOp	2024-03-29 13:49:56 -07:00
inisis	8396845806	fix shape inference bug (#19848 ) ### Description for nodes like add, their input should be merged dynamically ### Motivation and Context when doing shape inference, for nodes like Add, currently when doing _onnx_infer_single_node, their inputs are generated from last node's output, but they should be merged.	2024-03-29 13:06:27 -07:00
Adrian Lizarraga	b1a5eb255e	[Quant] Fix accuracy_level config option for MatMul 4bits quantizer (#20146 ) ### Description Fixes code that extracts the accuracy level when creating a MatMulNBits node in the `DefaultWeightOnlyQuantizer` class. ### Motivation and Context Error from line 443: `AttributeError: 'DefaultWeightOnlyQuantizer' object has no attribute 'accuracy_level'`. The solution is to access `self.config.accuracy_level` instead of `self.accuracy_level`. Relevant commit: https://github.com/microsoft/onnxruntime/pull/19106	2024-03-29 11:54:55 -07:00
Ye Wang	17919717b5	add QMoE (#20108 ) ### Description <!-- Describe your changes. --> 1. Introduce latest cutlass extension from TRTLLM that gives us cutlass upgrade(to 3.4) opportunity from MoE side. 2. Fix Windows build issue 3. Add Int4 MoE op and ut ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-29 10:24:19 -07:00
pengwa	2092bebc78	Fix transformer layer detection for recompute (#20106 ) ### Fix transformer layer detection for recompute Originally logic miss detecting the layer boudary node in Mistral model. This PR simplifies the searching, by using more strong pattern's match, to make sure it is flexible enough to cover different transformer variants. Also add a UT. Add a warning when user enable layerwise recompute but no layer boudary nodes are found.	2024-03-29 17:44:38 +08:00
cao lei	2a184ac1a1	use OrtCustomOp's new API GetMayInplace in CreateKernelCreateInfo (#20037 ) ### Description <!-- Describe your changes. --> use OrtCustomOp's new API GetMayInplace in CreateKernelCreateInfo to hook the inplace map of custom ops ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This PR is to use OrtCustomOp's new API GetMayInplace in CreateKernelCreateInfo to hook the inplace map of custom ops	2024-03-28 20:45:37 -07:00
Adam Pocock	2f82400b13	[java] Java 21 build support (#19876 ) ### Description Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to allow compiling ORT on Java 21. The build still targets Java 8. I'm not sure if there will be CI changes necessary to use this PR, specifically for the Gradle version as I don't know if that is cached somewhere earlier in the CI build process. The new Gradle version adds a warning that using `--source` and `--target` to select the Java language version is obsolete which is annoying, we can fix it if we decide to only allow building on newer versions of Java, while still supporting running on Java 8. ### Motivation and Context Java 21 is the latest LTS release of Java and ORT should be able to build on it.	2024-03-28 15:51:22 -07:00
Yi Zhang	f7b52d2e3e	[Fix] Only copy java files when build_java is True (#20121 ) ### Description ### Motivation and Context Fix error in Nuget-CUDA-Packaging-Pipeline	2024-03-28 14:06:28 -07:00
Pranav Sharma	3ed0c81b30	Expose Reserve() in OrtAllocator to allow custom allocators to work when session.use_device_allocator_for_initializers is specified. (#19904 ) ### Description Expose Reserve() in OrtAllocator to allow custom allocators to work when session.use_device_allocator_for_initializers is specified. Update: this change has been verified by Bing Ads and brings a significant benefit in terms of memory utilization: 30GB less memory and also better CPU utilization. ### Motivation and Context https://microsoft-my.sharepoint.com/:w:/p/prs/Eeidf5YNtWtKrPVkfuTDsuABak1oL4QRpuBGuhqRbLKoJg?e=Zl3bah	2024-03-28 12:28:37 -07:00
Yi Zhang	2a38168f0b	increase cl mpcount since Compilation is moved on CPU machine (#20116 ) ### Description The CPU machine has 16 cores, so we can increase the parallel count. Compared with 2 runs. 1. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=432328&view=results 2. https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=432331&view=results The compilation took about 25 minutes if the parallel count is 15, while it took 41 minutes if the parallel count is 3 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Yi Zhang <your@email.com>	2024-03-28 13:30:33 +08:00
Yi Zhang	c5d7310f1b	Remove TSA upload in testing stage (#20115 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yi Zhang <your@email.com>	2024-03-28 13:15:03 +08:00
Yi Zhang	8f069f81c4	Split more windows GPU workflow into 2 stages, building and testing, to make them more stable (#20080 ) ### Description reactor win-ci.yml to solve the random hang issue in more GPU workflows, move nugget-zip packages and python cuda12 packages building to CPU machine. --------- Co-authored-by: Yi Zhang <your@email.com>	2024-03-28 12:55:44 +08:00
wejoncy	16af7adc70	[llm exporter]auto infer output shape (#20071 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-28 09:52:10 +08:00
pengwa	55f63a48ca	Keep original name during fusion (#20097 ) ### Keep original name during fusion This could be helpful to know where the fused node coming from, I feel this is very useful when debugging the execution order issues between different transformer layers. For example: - A node named `/_original_module/model/layers.1/self_attn/MatMul/MatmulTransposeFusion//MatMulScaleFusion/` goes through two fusion paths in the 1st transformer layer - e.g. `MatmulTransposeFusion` and `MatMulScaleFusion`. - `/_original_module/model/layers.2/post_attention_layernorm/Mul_1/SimplifiedLayerNormFusion/` node is a fused node by `SimplifiedLayerNormFusion`. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-03-28 08:40:34 +08:00
Ye Wang	a9d9b083e4	Fix py package pipeline (#20065 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Fixes #20068	2024-03-27 15:59:35 -07:00
Dmitri Smirnov	b95fd4e644	Enable CUDA EP unit testing on Windows (#20039 ) ### Description Address build issues and source code discrepancies. Fix cuda_test_provider gtest argument stack corruption. ### Motivation and Context `OpTester` class that is widely used for kernel testing is not suitable for testing internal classes for EPs that are built as shared objects. Currently, CUDA EP tests run only on Linux. We want to enable testing and developments on Windows, and create a usable pattern for testing of other EPs internals. Alternatives considered: Abstracting EP unit tests into separate test executable such as `onnxruntime_test_all`. This alternative was rejected as it would create a lot more changes in the established patterns, and potentially interfere with CUDA functionality with more complex source code maintanence.	2024-03-27 13:32:36 -07:00
Yi Zhang	ab2eaedfaa	Install ONNX by buildling source code in Windows DML stage (#20079 ) ### Description In #20073, I use pin onnx version to unblock the whole PR CI. In fact, we could use the onnx that installed by building source code, that the onnx version is controlled by deps.txt. For some history reason, DML stage installed onnx from pypi. Now, the onnx can be installed as other stages. add an option to skip installing onnx in win-ci-prebuild-step	2024-03-27 12:29:34 -07:00
Yi Zhang	4df9d16f98	[Fix] TSAUpload task must be in building stage (#20098 ) ### Description In #20085, TSAUpload was in testing stage so main branch failed.	2024-03-27 12:20:57 -07:00
Xiaoyu	c8676ffbff	Add ModelProto support for quantize api (#20018 ) ### Description Add ModelProto support for `quantize` api ### Motivation and Context Currently, the `quantize` API only accepts a model path as the input model. However, for large models, saving and loading from disk can be time-consuming. By adding `ModelProto` as an input option to the `quantize` API, significant time can be saved.	2024-03-27 10:40:08 -07:00
Yulong Wang	47903e701a	fix condition in web CI YAML (#20095 ) ### Description fix condition in web CI YAML	2024-03-27 10:35:43 -07:00
Nanashi	ca465dc087	[js] Make error friendly when isOrtFormat is undefined (#19958 ) ### Description Make error friendly when isOrtFormat is undefined (`onnxruntime.InferenceSession.create` is called with ArrayBuffer or Uint8Array). ### Motivation and Context I was trying to run my onnx model in WebGL EP, but it gave me the error "Cannot read properties of null (reading 'irVersion')". I used debugger to find that actual error is `int64 is not supported`, but the error was invisible for me. So I made it to show both error when isOrtFormat is undefined. <s>I haven't written unit test yet, so I'm making it draft. (I have no idea about how do I test this though...)</s> [d62d942](`d62d9425ba`)	2024-03-27 02:07:00 -07:00
guyang3532	4aa84003ca	support Pow/Div/Sqrt in PaddingElimination (#20083 )	2024-03-27 16:10:07 +08:00
Yulong Wang	28907d8c59	[js/web] workaround NPM test fetch failure (#20020 ) ### Description Sometimes the `npm test` failed with an error of "TypeError: Failed to fetch". I checked the callback entry of the localhost server started by karma. When the "Failed to fetch" happens, no request is reflected on the server side. The root cause is still not identified. However, as this issue only happens sometimes when the browser is just launched by karma runner, doing retry can workaround this issue for most of the time.	2024-03-26 21:35:49 -07:00
Chi Lo	3dcda13e62	[TensorRT EP] Fix concurrency issue for TRT custom op list (#20093 ) The `CreateTensorRTCustomOpDomainList()` is not thread-safe due to its static variables, `created_custom_op_list` and `custom_op_domain`. This PR makes sure synchronization using mutex. see issue: https://github.com/microsoft/onnxruntime/issues/20089	2024-03-26 21:20:14 -07:00

... 22 23 24 25 26 ...

11997 commits