onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-09 17:28:58 +00:00

Author	SHA1	Message	Date
Hector Li	55e0aaeeef	fix android build issue (#20389 ) fix android build issue	2024-04-19 14:21:34 -07:00
Rachel Guo	f396748ed6	Nuget .NET changes for Mac Catalyst (#19923 ) ### Description <!-- Describe your changes. --> Add Nuget package changes for adding new 'net6.0-maccatalyst' platform. The output ORT Nuget package was manually tested and verified in a .NET MAUI app setup. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: rachguo <rachguo@rachguos-Mac-mini.local>	2024-04-19 14:20:03 -07:00
Guenther Schmuelling	497a627a69	fix fp16 for skiplayernorm (#20381 )	2024-04-19 12:12:02 -07:00
Dmitri Smirnov	42b700d463	Eliminate stray vector and the contention it creates (#20377 ) ### Description Unused vector allocating large memory chunk within a concurrent routine creates heap contention and is eliminated. ### Motivation and Context This partially addresses https://github.com/microsoft/onnxruntime/issues/20373.	2024-04-19 10:27:42 -07:00
Patrice Vignola	4d98f06f93	[DML EP] Add GroupQueryAttention (#20327 )	2024-04-19 10:25:29 -07:00
Wanming Lin	7c80c39f74	[WebNN EP] WebNN CPU backend only support up to 4 Split outputs (#20350 )	2024-04-19 08:31:22 -07:00
sfatimar	4d1963c2a2	OpenVINO EP Rel 1.18 Changes (#20337 ) ### Description These changes include Support to OpenVINO 2024.1 Import PreCompiled Blobs with EPContext Blob Separate Device/Precision as input Deprecate CPU_FP32 , GPU_FP32 terminology , introduce CPU, GPU AUTO GPU, CPU will only create GPU Blob and not CPU Blob. ### Motivation and Context - OpenVINO 2024.1 will be out soon - Import Precompiled Blob can greatly reduce FEIL/FIL Time. - Separating Device/Precision will make the input cleaner - --------- Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2024-04-19 00:31:38 -07:00
Yueqing Zhang	9001c69b84	[VitisAI] Add Version Check. Requsted by Microsoft (#20347 ) ### Description <!-- Describe your changes. --> Add version for onnxruntime_providers_vitisai.dll. So, the onnxruntime_vitisai_ep.dll can check if the version is compatible. To make sure the old onnxruntime_vitisai_ep.dll still work, we would offset the api struct by version field. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? --> This is the direct request from Microsoft. The following is the problem we try to solve: How would you describe the dependency between (a) onnxruntime_vitisai_ep.dll and (b) onnxruntime_providers_vitisai.dll? E.g. for each version of (a) there is a minimum required version of (b), or for each version of (b) there is minimum required version of (a). Please note that in practice we won't be able to use the exact version of ORT/EP that you tested against (because we might need to update ORT for other reasons), but we might be able to accommodate some version constraints that you specify. As we approach shipping, we'll lock the version of ORT/EP to allow for stabilization and more detailed testing (and work with you if it needs to be updated).	2024-04-18 23:05:44 -07:00
Patrice Vignola	12569626cb	Update DML to 1.14.1 (#20380 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 22:43:41 -07:00
Patrice Vignola	b8c90beef2	[DML EP] Add SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20326 )	2024-04-18 22:17:31 -07:00
Chi Lo	a747a00cd3	[TensorRT EP] Use protobuf with debug build on Windows (#20378 ) TRT EP implicitly uses oss_parser with debug build on Windows, therefore it should use protobuf rather than protobuf-lite.	2024-04-18 19:39:08 -07:00
Patrice Vignola	745b426c60	[DML] Update DML to 1.14 (#20304 ) I am prefiring this change to pre-run the non-dml checks, and also to give folks the time to review it before DML gets released. When DML 1.14 officially releases, we'll only need to run the DML pipeline to automatically pick up the nuget package. This should save us some valuable time. Note that DML 1.14 is the release needed for ORT 1.17.4, and DML 1.15 will come soon after.	2024-04-18 16:22:57 -07:00
Adrian Lizarraga	e4c0cb2b9a	[Quant tool] Do not default to contrib Q/DQ ops for 16-bit (#20376 ) ### Description Updates the QDQ quantizer to use ONNX Q/DQ ops for 16-bit quantization if opset >= 21. ### Motivation and Context The QDQ quantizer previously set the 'com.microsoft' domain on inserted Q/DQ ops when the model needed 16-bit support. ONNX 1.16.0 added int16/uint16 support to the QuantizeLinear and DequantizeLinear operators, so we can change the default behavior.	2024-04-18 15:26:07 -07:00
Chi Lo	a8f74e3ec7	[TensorRT EP] TensorRT 10 support (#20167 ) This PR has the change of supporting INT64 tensor type for TRT 10. This PR is also compatible with TRT 8.6 and TRT 10 meaning user can build ORT TRT against TRT 8.6 or TRT 10. Due to the timeline for TRT 10 GA and ORT 1.18 release is very tight (We don't have enough time to get our CIs installed with TRT 10 GA libraries and run the build/tests), as well as Nvidia new Triton release (The timeline is also very close to the timeline of TRT 10 GA) wants to integrate TRT EP with TRT 10. Therefore, our approach is to make this PR into ORT 1.18 first, so everything is fully tested with TRT 8.6 CIs, and user can still manually build ORT 1.18 against TRT 10 like the Triton case. As for testing TRT 10, once TRT 10 GA is released, we will have another branch which includes change at this PR as well as whatever changes needed and update our CIs with TRT 10.	2024-04-18 14:03:04 -07:00
Yulong Wang	3577a4bd02	[Node.js binding] Allow installation to download CUDA binaries via script (#20364 ) ### Description Currently we try to include all prebuilt binaries into the NPM packages. This was working until we added libonnxruntime_providers_cuda.so (>400MB) into the NPM package. The NPM registry refuses to accept new package publishment because the file is too large. To make the new NPM package working, we have to remove the large file from the package, and add a new script on package installation. This script will try to dynamically install onnxruntime CUDA dynamic library for Linux/x64.	2024-04-18 13:44:42 -07:00
Guenther Schmuelling	7b017cf9f8	fix web ci: csum tests need fp64 which is not supported on webgpu (#20374 )	2024-04-18 12:30:26 -07:00
Adam Louly	ee74fb6908	Introducing ORTPipelineModule - DeepSpeed Parallel Pipeline Support. (#20287 ) ### Description Introducing a new class ORTPipelineModule to handle wrapping layers in DeepSpeed pipeline parallel. ### Motivation and Context To support pipeline parallelism on ORTModule. This PR will include an initial support of deepspeed Pipeline parallelism. - [x] Support Pipeline parallel where layers are nn Modules in Sequential. - [ ] Support LayerSpec and TiedLayerSpec - [ ] Enable partitioning to accept List - [ ] Full-GPU Graph Consolidation - [ ] Subgraph Merging for Inference	2024-04-18 11:30:15 -07:00
Sumit Agarwal	f664f91298	[DML EP] Expose NPU macro via build command (#20306 ) ### Description This fixes following things: - Expose `ENABLE_NPU_ADAPTER_ENUMERATION` macro via build command, so that a user can enable NPU support for DML EP seamlessly. - Add keyword `_dmlEp_` as part of the node name, which would be useful for debugging purpose. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-18 11:23:13 -07:00
Patrice Vignola	76434907fb	[DML EP] Add graph capture (#20257 ) This adds a new "Graph Capture" option to the DML ep, similar to the cuda graph functionality. Here's how graph capture works: - A user can enable graph capture in the session options by setting `ep.dml.enable_graph_capture` to `true` - When they want to capture a run, they set `gpu_graph_id` in their `RunOptions` to a number bigger than 0 (0 is reserved for internal use according to the cuda graph documentation). - Then, when they start the inference, the graph will be captured and stored in the DML EP for future use - When they execute the run for a second time with the same id, the `ReplayGraph` function in the DML EP will be called instead of executing the kernels, resulting in very low overhead and avoiding kernel recompilation. This feature can give up-to-par or even better performance than specifying the static dimensions at session creation time, but is also much more flexible.	2024-04-18 10:15:00 -07:00
Vincent Wang	c47f446f25	Support BFloat16 for Triton Codegen (#20353 ) Previous implementation used numpy array and numpy data_type to store constant value and data type, which is not support BFloat16 natively. This PR is to switch to use torch tensor which supports BFloat16.	2024-04-18 17:15:11 +08:00
Wanming Lin	da86f6f408	[WebNN EP] Add operators support table (#20253 )	2024-04-17 21:19:46 -07:00
Hector Li	5daeb5e0b0	enable model with external data be loaded from memory buffer (#19089 ) ### Description Background: User save large model with initializer data in external file. e.g: onnx.save_model(onnx_model, "path/to/save/the/model.onnx", save_as_external_data=True, all_tensors_to_one_file=True, location="filename", size_threshold=1024). In that case, Ort loads the model, get the external initializer information (external file name, offset, length) and use the model path to find the external file, and locate to the tensor data via the offset and length. But it won't work if user load the model from memory, since Ort lost track of the model path. This PR adds API/session option to let user provide a table with external initializer file name as the key, the pointer to the loaded external file in memory and the buffer length as value. So that 1. user can load the model from memory buffer with external initializers in memory buffer too. 2. the initializers can be shared across sessions, for different EPs. 3. user can load the file in any way they want, e.g mmap. Internally, 1. at session creation time, Ort goes through the external initializers in the graph, gets the file name, offset, data length of the external initializers from Tensorproto . 2. With the file name, Ort get the file in memory buffer and buffer length from the table user provided. 4. Ort locates the tensor buffer from file in memory buffer (user provided) using the offset and data length (from Tensorproto ). 5. Ort creates the Tensor and replace the existing Tensor in the graph. ### Motivation and Context https://github.com/onnx/onnx/blob/main/docs/ExternalData.md For a model with external data, the Tensorproto may have initializer data in a separate file. The external file location is set via the file path relative to the model path. With the API to load model from memory buffer, it lost track of the model path. So it causes error if the model has external data. By adding a session option to set the external data buffer, Ort can find the external data correctly if model loaded from memory buffer.	2024-04-17 19:01:01 -07:00
Hector Li	301240433c	Log error caused by NPU SSR. (#20356 ) ### Description Log error caused by NPU SSR.	2024-04-17 18:27:45 -07:00
Edward Chen	ccaa4d1db2	[MLAS][AArch64] SQNBitGemm M>1 CompFp32 kernel optimization (#20319 ) Add ARM NEON intrinsics implementation for `Q4BitBlkDequantBForSgemm_CompFp32`.	2024-04-17 17:50:26 -07:00
Patrice Vignola	6bd6d879a3	[DML EP] Improve python API perf (#20331 ) This change improves the python API perf in 2 few ways: 1. Remove unnecessary CPU syncs by sharing a queue between the python EPs and the allocator. 2. Add an opt-in CPU spinning sync to reduce overhead in applications that run a lot of inferences per second.	2024-04-17 17:33:37 -07:00
Guenther Schmuelling	a8a77ddfdc	fix csum and enable ut (#20355 )	2024-04-17 15:01:06 -07:00
dependabot[bot]	4c3fc26255	Bump Sixlabors.ImageSharp from 2.1.7 to 2.1.8 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#20314 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.7 to 2.1.8. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.8</h2> <h2>What's Changed</h2> <ul> <li>V2 - Limit Read Palette Indices by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2719">SixLabors/ImageSharp#2719</a></li> <li>V2 - Clear Pixel Buffers on Decode. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2717">SixLabors/ImageSharp#2717</a></li> <li>V2 - Limit all memory allocations in the MemoryAllocator layer by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2715">SixLabors/ImageSharp#2715</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f21d64188e`"><code>f21d641</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2715">#2715</a> from SixLabors/backport/v2-memlimit</li> <li><a href="`8f0b4d3e68`"><code>8f0b4d3</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2717">#2717</a> from SixLabors/backport/v2-clear-buffers</li> <li><a href="`cf9496d284`"><code>cf9496d</code></a> test allocation limits</li> <li><a href="`3d298db2cd`"><code>3d298db</code></a> Adapt BmpDecoder_ThrowsException_Issue2696 for V2</li> <li><a href="`a78ce27a2b`"><code>a78ce27</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2719">#2719</a> from SixLabors/backport/v2-check-palette-indices</li> <li><a href="`e6209147b1`"><code>e620914</code></a> Clamp read palette indices.</li> <li><a href="`c122185ea0`"><code>c122185</code></a> Clear pixel buffers on decode.</li> <li><a href="`5c6ec5d6fb`"><code>5c6ec5d</code></a> Limit all allocations</li> <li>See full diff in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.7&new-version=2.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-17 14:47:44 -07:00
Adrian Lizarraga	eae7b705ac	[Quant tool] Fix quantized bias's scale dtype to properly handle fp16 bias inputs (#20340 ) ### Description - Fix quantization tool bug that did not correctly set a quantized bias's scale data type to fp16 if the original bias was fp16. - Enabled fp16 ConvTranspose quantization unit tests that were disabled. ### Motivation and Context Python quantization tests for fp16 ConvTranspose were originally disabled due to a shape inference bug. It turns out that we also have a bug in our quantizer that does not properly handle fp16 bias inputs. Fixing the bug allows us to re-enable these tests with the latest version of ONNX.	2024-04-17 10:24:28 -07:00
Adrian Lizarraga	0a1902525f	Add patch for ONNX 1.16.0 shape inference bug (#20316 ) ### Description - Adds a patch that fixes a shape inference bug that caused a segfault: https://github.com/onnx/onnx/pull/6080 - Fix documentation describing why QLinearMatMul tests are currently being skipped. ### Motivation and Context The [PR for integrating with ONNX 1.16.0](https://github.com/microsoft/onnxruntime/pull/19745) disabled various python quantization tests due to a shape inference bug. This PR applies the ONNX fix as a patch. We still can't enable the tests because some of our CIs pip install onnx-1.16.0, which doesn't include the fix.	2024-04-17 10:23:22 -07:00
Hector Li	bb1972264b	Enable provider option to let user provider the profiling file path (#20285 ) Enable provider option to let user provider the profiling file path. Separate out the profiling level for ETW, in case there's switch like ETW enabled when Ort creates the QNN profiling, then gets disabled when Ort logs the profiling events. vise versa. Enhance the logic to decide the profiling level.	2024-04-17 09:42:40 -07:00
Scott McKay	8143b0d798	Add helper to get errno and error message (#20324 ) ### Description <!-- Describe your changes. --> Add platform aware helper to fetch errno message string. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> For usage in #20077 --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-04-17 21:17:36 +10:00
Yi Zhang	4d2b98155f	More fixes on random connection excepiton in Mac Build. (#20328 ) ### Description supplement of #20322 ### Motivation and Context Fixes random connection exceptions in Mac build in Python Packaging Pipeline https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443617&view=logs&j=5849a411-e258-5ce5-39bd-7b65d44961a0&t=ccb871c8-76d9-5e80-55b0-4279efd5567f and IOS full xcframework https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443458&view=logs&j=370fd1a2-3dec-5916-4d2c-8aae58c72d28&t=686352ba-ee61-5ad4-8739-e8abd07372a4&s=e9aa87c8-a9ad-51f7-3b12-045ecc319776	2024-04-17 08:37:56 +08:00
Scott McKay	5c8034cc20	Avoid call to Node::ToProto on first Graph::Resolve to improve session creation performance. (#20296 ) ### Description <!-- Describe your changes. --> The first call to Graph::Resolve occurs when creating the Graph instance when loading an existing model from ModelProto. As the Node instance will exactly match the source NodeProto there's no need to call Node::ToProto in this case. Add a temporary reference to the original NodeProto to avoid the call on the first Graph::Resolve. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Better alternative to #19469	2024-04-17 10:07:12 +10:00
jingyanwangms	c11941289b	Add Gemma Rotary Embedding (#20267 ) ### Description Add GemmaRotaryEmbedding kernel which includes sin and cos in GemmaRotaryEmbedding forward and apply_rotary_pos_emb. See gemma_rotary_emb_impl.cu for subgraph details ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-16 15:31:56 -07:00
dependabot[bot]	7354f3cdd8	Bump transformers from 4.36.0 to 4.38.0 in /tools/ci_build (#20272 ) Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM</h2> <h2>New model additions</h2> <h3>💎 Gemma 💎</h3> <p>Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned versions and you can use them via <code>AutoModelForCausalLM</code>, <code>GemmaForCausalLM</code> or <code>pipeline</code> interface!</p> <p>Read more about it in the Gemma release blogpost: <a href="https://hf.co/blog/gemma">https://hf.co/blog/gemma</a></p> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <p>You can use the model with Flash Attention, SDPA, Static cache and quantization API for further optimizations !</p> <ul> <li>Flash Attention 2</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", torch_dtype=torch.float16, attn_implementation="flash_attention_2" )</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <ul> <li>bitsandbytes-4bit</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", load_in_4bit=True ) </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`08ab54ada5`"><code>08ab54a</code></a> [ <code>gemma</code>] Adds support for Gemma 💎 (<a href="https://redirect.github.com/huggingface/transformers/issues/29167">#29167</a>)</li> <li><a href="`2de9314197`"><code>2de9314</code></a> [<code>Maskformer</code>] safely get backbone config (<a href="https://redirect.github.com/huggingface/transformers/issues/29166">#29166</a>)</li> <li><a href="`476957b5b4`"><code>476957b</code></a> 🚨 Llama: update rope scaling to match static cache changes (<a href="https://redirect.github.com/huggingface/transformers/issues/29143">#29143</a>)</li> <li><a href="`7a4bec6e8f`"><code>7a4bec6</code></a> Release: 4.38.0</li> <li><a href="`ee3af60be0`"><code>ee3af60</code></a> Add support for fine-tuning CLIP-like models using contrastive-image-text exa...</li> <li><a href="`0996a10077`"><code>0996a10</code></a> Revert low cpu mem tie weights (<a href="https://redirect.github.com/huggingface/transformers/issues/29135">#29135</a>)</li> <li><a href="`15cfe38942`"><code>15cfe38</code></a> [<code>Core tokenization</code>] <code>add_dummy_prefix_space</code> option to help with latest is...</li> <li><a href="`efdd436663`"><code>efdd436</code></a> FIX [<code>PEFT</code> / <code>Trainer</code> ] Handle better peft + quantized compiled models (<a href="https://redirect.github.com/huggingface/transformers/issues/29">#29</a>...</li> <li><a href="`5e95dcabe1`"><code>5e95dca</code></a> [<code>cuda kernels</code>] only compile them when initializing (<a href="https://redirect.github.com/huggingface/transformers/issues/29133">#29133</a>)</li> <li><a href="`a7755d2409`"><code>a7755d2</code></a> Generate: unset GenerationConfig parameters do not raise warning (<a href="https://redirect.github.com/huggingface/transformers/issues/29119">#29119</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.36.0&new-version=4.38.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 14:21:12 -07:00
dependabot[bot]	efa51de7e4	Bump gradle/wrapper-validation-action from 2 to 3 (#20305 ) Bumps [gradle/wrapper-validation-action](https://github.com/gradle/wrapper-validation-action) from 2 to 3. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/gradle/wrapper-validation-action/releases">gradle/wrapper-validation-action's releases</a>.</em></p> <blockquote> <h2>v2.1.3</h2> <h2>What's Changed</h2> <ul> <li>Update various NPM dependencies</li> <li>Update wrapper checksums to include Gradle 8.7</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/gradle/wrapper-validation-action/compare/v2.1.2...v2.1.3">https://github.com/gradle/wrapper-validation-action/compare/v2.1.2...v2.1.3</a></p> <h2>v2.1.2</h2> <h2>What's Changed</h2> <ul> <li>Update various NPM dependencies</li> <li>Update wrapper checksums</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/gradle/wrapper-validation-action/compare/v2.1.1...v2.1.2">https://github.com/gradle/wrapper-validation-action/compare/v2.1.1...v2.1.2</a></p> <h2>v2.1.1</h2> <h2>Changelog</h2> <ul> <li>[FIX] Add hardcoded checksum for Gradle 7.6.4</li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/gradle/wrapper-validation-action/compare/v2...v2.1.1">https://github.com/gradle/wrapper-validation-action/compare/v2...v2.1.1</a></p> <h2>v2.1.0</h2> <p>This release should vastly reduce the number of network requests made by the <code>wrapper-validation-action</code>, by hardcoding the checksums of all known Gradle wrapper jars at time of release. With this improvement, a number of long-standing issues should be addressed (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/164">#164</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/162">#162</a>, <a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/57">#57</a>).</p> <p>The action should now only make network requests to validate the checksums of an unknown <code>gradle-wrapper.jar</code>. This can happen if:</p> <ul> <li>The Gradle version was published after this action was released</li> <li>The <code>gradle-wrapper.jar</code> is truly invalid</li> </ul> <h2>Changelog</h2> <ul> <li>[NEW] Hardcode list of known checksums to avoid network requests in most cases (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/161">#161</a>)</li> </ul> <p>Huge thanks to <a href="https://github.com/Marcono1234"><code>@Marcono1234</code></a> for contributing this long-awaited improvement.</p> <h2>v2.0.1</h2> <p>This patch release fixes error reporting when failing to retrieve the checksums from services.gradle.org</p> <ul> <li>[FIX] After migration from v1 to v2 silently fails (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/174">#174</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`460a3ca55f`"><code>460a3ca</code></a> Delegate to 'gradle/actions/wrapper-validation' (<a href="https://redirect.github.com/gradle/wrapper-validation-action/issues/200">#200</a>)</li> <li>See full diff in <a href="https://github.com/gradle/wrapper-validation-action/compare/v2...v3">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=gradle/wrapper-validation-action&package-manager=github_actions&previous-version=2&new-version=3)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 14:20:51 -07:00
dependabot[bot]	e1499a007a	Bump Sixlabors.ImageSharp from 2.1.7 to 2.1.8 in /csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample (#20315 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.7 to 2.1.8. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.8</h2> <h2>What's Changed</h2> <ul> <li>V2 - Limit Read Palette Indices by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2719">SixLabors/ImageSharp#2719</a></li> <li>V2 - Clear Pixel Buffers on Decode. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2717">SixLabors/ImageSharp#2717</a></li> <li>V2 - Limit all memory allocations in the MemoryAllocator layer by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2715">SixLabors/ImageSharp#2715</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f21d64188e`"><code>f21d641</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2715">#2715</a> from SixLabors/backport/v2-memlimit</li> <li><a href="`8f0b4d3e68`"><code>8f0b4d3</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2717">#2717</a> from SixLabors/backport/v2-clear-buffers</li> <li><a href="`cf9496d284`"><code>cf9496d</code></a> test allocation limits</li> <li><a href="`3d298db2cd`"><code>3d298db</code></a> Adapt BmpDecoder_ThrowsException_Issue2696 for V2</li> <li><a href="`a78ce27a2b`"><code>a78ce27</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2719">#2719</a> from SixLabors/backport/v2-check-palette-indices</li> <li><a href="`e6209147b1`"><code>e620914</code></a> Clamp read palette indices.</li> <li><a href="`c122185ea0`"><code>c122185</code></a> Clear pixel buffers on decode.</li> <li><a href="`5c6ec5d6fb`"><code>5c6ec5d</code></a> Limit all allocations</li> <li>See full diff in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.7&new-version=2.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 14:20:16 -07:00
Yi-Hong Lyu	6b6a62fb40	Add vectorized AVX512F kernel for ReduceMaximumF32Kernel (#20268 ) ### Description <!-- Describe your changes. --> This commit introduces a new vectorized AVX512F kernel, MlasReduceMaximumF32KernelAvx512F, which efficiently computes the maximum value of the supplied buffer. Additionally, microbenchmarks have been added for MlasComputeSoftmax (inplace), MlasReduceMaximumF32KernelAvx, MlasComputeSumExpF32KernelAvx512F, and MlasComputeSoftmaxOutputF32KernelAvx. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The goal of this commit is to enhance the performance of ReduceMaximumF32Kernel on CPUs with AVX512F instruction support. \| AVX \| \| \| AVX512 \| \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- name \| iterations \| real_time \| cpu_time \| iterations \| real_time \| cpu_time \| time_unit REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:3/real_time \| 271277304 \| 2.58095 \| 2.58091 \| 263338132 \| 2.65661 \| 2.65661 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:3/real_time \| 271220477 \| 2.58095 \| 2.58095 \| 263509929 \| 2.65652 \| 2.65649 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:3/real_time \| 271240587 \| 2.58064 \| 2.58064 \| 263479542 \| 2.65671 \| 2.65665 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:3/real_time \| 271227745 \| 2.58083 \| 2.58079 \| 263402506 \| 2.65657 \| 2.65657 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:3/real_time \| 271255069 \| 2.58073 \| 2.58071 \| 263463858 \| 2.65682 \| 2.65682 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:3/real_time \| 271257174 \| 2.58058 \| 2.58052 \| 263460120 \| 2.65682 \| 2.65682 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:4/real_time \| 174395051 \| 4.01401 \| 4.01401 \| 197330481 \| 3.5465 \| 3.54636 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:4/real_time \| 174645502 \| 3.99691 \| 3.99691 \| 197474831 \| 3.54298 \| 3.54278 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:4/real_time \| 174523308 \| 4.01391 \| 4.01386 \| 197389981 \| 3.54518 \| 3.54506 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:4/real_time \| 174779200 \| 3.99874 \| 3.99874 \| 197519075 \| 3.54227 \| 3.54209 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:4/real_time \| 174642874 \| 4.00645 \| 4.00641 \| 197642101 \| 3.54195 \| 3.54188 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:4/real_time \| 174546754 \| 4.0061 \| 4.00608 \| 197621033 \| 3.54296 \| 3.54281 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:5/real_time \| 162752651 \| 4.30119 \| 4.30114 \| 215552503 \| 3.24767 \| 3.24752 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:5/real_time \| 162717463 \| 4.30123 \| 4.30116 \| 215541082 \| 3.24711 \| 3.24695 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:5/real_time \| 162718819 \| 4.3016 \| 4.30153 \| 215589239 \| 3.24725 \| 3.24708 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:5/real_time \| 162719596 \| 4.30151 \| 4.30145 \| 215563846 \| 3.24956 \| 3.24949 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:5/real_time \| 162753333 \| 4.30125 \| 4.30125 \| 215537315 \| 3.24924 \| 3.24908 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:5/real_time \| 162752258 \| 4.3014 \| 4.30141 \| 215526482 \| 3.24744 \| 3.24735 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:7/real_time \| 143579660 \| 4.87526 \| 4.87516 \| 100000000 \| 5.25767 \| 5.25752 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:7/real_time \| 143585097 \| 4.87476 \| 4.87467 \| 100000000 \| 5.41583 \| 5.41567 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:7/real_time \| 143571011 \| 4.87506 \| 4.87503 \| 182359467 \| 3.83773 \| 3.83764 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:7/real_time \| 143587142 \| 4.87487 \| 4.8748 \| 182397261 \| 3.83807 \| 3.8379 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:7/real_time \| 143578465 \| 4.87525 \| 4.87521 \| 182428602 \| 3.83777 \| 3.83768 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:7/real_time \| 143588555 \| 4.87491 \| 4.87488 \| 125280452 \| 5.59791 \| 5.59766 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:9/real_time \| 284851058 \| 2.43476 \| 2.43476 \| 156879863 \| 4.42895 \| 4.42884 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:9/real_time \| 270700898 \| 2.59031 \| 2.59024 \| 157953114 \| 4.42995 \| 4.42968 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:9/real_time \| 282871172 \| 2.45385 \| 2.45385 \| 157801156 \| 4.42817 \| 4.42804 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:9/real_time \| 285307738 \| 2.47009 \| 2.47005 \| 158058507 \| 4.4279 \| 4.42786 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:9/real_time \| 285709536 \| 2.45481 \| 2.45476 \| 158070961 \| 4.42809 \| 4.42799 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:9/real_time \| 285449733 \| 2.47495 \| 2.47491 \| 158069718 \| 4.45026 \| 4.45017 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:11/real_time \| 189213618 \| 3.79684 \| 3.79676 \| 139459497 \| 5.01882 \| 5.01871 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:11/real_time \| 185600468 \| 3.76394 \| 3.76376 \| 139444892 \| 5.01922 \| 5.01905 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:11/real_time \| 184968668 \| 3.80636 \| 3.80636 \| 139470834 \| 5.01948 \| 5.01936 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:11/real_time \| 183867226 \| 3.80432 \| 3.80427 \| 139481986 \| 5.01975 \| 5.01944 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:11/real_time \| 184301650 \| 3.81634 \| 3.81634 \| 139452846 \| 5.01983 \| 5.01972 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:11/real_time \| 186215795 \| 3.82659 \| 3.82654 \| 139497736 \| 5.02119 \| 5.02113 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:13/real_time \| 135622415 \| 5.16256 \| 5.16252 \| 124661337 \| 5.61227 \| 5.61194 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:13/real_time \| 135618907 \| 5.15967 \| 5.1596 \| 124805224 \| 5.6088 \| 5.60854 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:13/real_time \| 135612192 \| 5.15506 \| 5.15501 \| 124803221 \| 5.60901 \| 5.60869 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:13/real_time \| 135906082 \| 5.15818 \| 5.15818 \| 124776601 \| 5.60898 \| 5.60886 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:13/real_time \| 135369523 \| 5.15709 \| 5.15682 \| 124790370 \| 5.60927 \| 5.60902 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:13/real_time \| 135596827 \| 5.1603 \| 5.1603 \| 124792145 \| 5.61637 \| 5.61614 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:15/real_time \| 110947137 \| 5.96511 \| 5.96495 \| 112861522 \| 6.20035 \| 6.20014 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:15/real_time \| 118004792 \| 6.22645 \| 6.22628 \| 112909900 \| 6.20073 \| 6.20073 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:15/real_time \| 112630319 \| 6.25564 \| 6.25552 \| 112874563 \| 6.19932 \| 6.19924 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:15/real_time \| 117403034 \| 6.17263 \| 6.17258 \| 112927318 \| 6.19866 \| 6.19842 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:15/real_time \| 108921863 \| 6.48624 \| 6.48612 \| 112927746 \| 6.20057 \| 6.20026 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:15/real_time \| 110358148 \| 6.66805 \| 6.66789 \| 112907312 \| 6.19938 \| 6.19908 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:16/real_time \| 203419574 \| 3.4415 \| 3.44137 \| 237134525 \| 2.95649 \| 2.95638 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:16/real_time \| 203414035 \| 3.4411 \| 3.44099 \| 237129564 \| 2.95178 \| 2.95171 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:16/real_time \| 203404068 \| 3.44157 \| 3.44151 \| 236981704 \| 2.9518 \| 2.95167 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:16/real_time \| 203391471 \| 3.44146 \| 3.44137 \| 237108807 \| 2.95203 \| 2.95196 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:16/real_time \| 203393801 \| 3.44131 \| 3.44127 \| 237126460 \| 2.95278 \| 2.95272 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:16/real_time \| 203407476 \| 3.44181 \| 3.44162 \| 237154444 \| 2.95293 \| 2.9528 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:500/real_time \| 37551439 \| 18.6407 \| 18.6407 \| 39222534 \| 17.858 \| 17.8571 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:500/real_time \| 37544097 \| 18.6404 \| 18.6401 \| 39174151 \| 17.8539 \| 17.8536 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:500/real_time \| 37549837 \| 18.6391 \| 18.6391 \| 39233956 \| 17.8507 \| 17.8505 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:500/real_time \| 45996345 \| 15.2157 \| 15.2153 \| 39285929 \| 17.848 \| 17.8474 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:500/real_time \| 46012429 \| 15.2184 \| 15.2179 \| 65664865 \| 10.7366 \| 10.7364 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:500/real_time \| 45912375 \| 15.2349 \| 15.2346 \| 65205908 \| 10.8498 \| 10.8492 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:2000/real_time \| 9493955 \| 73.7232 \| 73.7203 \| 10188090 \| 68.7931 \| 68.7908 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:2000/real_time \| 9495562 \| 73.7173 \| 73.7173 \| 10180895 \| 68.7533 \| 68.7511 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:2000/real_time \| 9487371 \| 73.7852 \| 73.7831 \| 10164473 \| 68.7279 \| 68.725 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:2000/real_time \| 10816047 \| 64.7322 \| 64.7287 \| 10168481 \| 68.8109 \| 68.8096 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:2000/real_time \| 10808802 \| 64.7232 \| 64.721 \| 19478320 \| 36.1471 \| 36.1461 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:2000/real_time \| 10818192 \| 64.7304 \| 64.728 \| 19419672 \| 35.9635 \| 35.9635 \| ns	2024-04-16 13:52:43 -07:00
Yifan Li	54f91ea65a	[TensorRT EP] support user_compute_stream in python API (#20168 ) ### Description <!-- Describe your changes. --> * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](https://github.com/microsoft/onnxruntime/pull/19619)	2024-04-16 12:49:29 -07:00
dependabot[bot]	e02aef1ded	Bump transformers from 4.36.0 to 4.38.0 in /onnxruntime/python/tools/transformers/models/stable_diffusion (#20271 ) Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM</h2> <h2>New model additions</h2> <h3>💎 Gemma 💎</h3> <p>Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned versions and you can use them via <code>AutoModelForCausalLM</code>, <code>GemmaForCausalLM</code> or <code>pipeline</code> interface!</p> <p>Read more about it in the Gemma release blogpost: <a href="https://hf.co/blog/gemma">https://hf.co/blog/gemma</a></p> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <p>You can use the model with Flash Attention, SDPA, Static cache and quantization API for further optimizations !</p> <ul> <li>Flash Attention 2</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", torch_dtype=torch.float16, attn_implementation="flash_attention_2" )</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <ul> <li>bitsandbytes-4bit</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", load_in_4bit=True ) </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`08ab54ada5`"><code>08ab54a</code></a> [ <code>gemma</code>] Adds support for Gemma 💎 (<a href="https://redirect.github.com/huggingface/transformers/issues/29167">#29167</a>)</li> <li><a href="`2de9314197`"><code>2de9314</code></a> [<code>Maskformer</code>] safely get backbone config (<a href="https://redirect.github.com/huggingface/transformers/issues/29166">#29166</a>)</li> <li><a href="`476957b5b4`"><code>476957b</code></a> 🚨 Llama: update rope scaling to match static cache changes (<a href="https://redirect.github.com/huggingface/transformers/issues/29143">#29143</a>)</li> <li><a href="`7a4bec6e8f`"><code>7a4bec6</code></a> Release: 4.38.0</li> <li><a href="`ee3af60be0`"><code>ee3af60</code></a> Add support for fine-tuning CLIP-like models using contrastive-image-text exa...</li> <li><a href="`0996a10077`"><code>0996a10</code></a> Revert low cpu mem tie weights (<a href="https://redirect.github.com/huggingface/transformers/issues/29135">#29135</a>)</li> <li><a href="`15cfe38942`"><code>15cfe38</code></a> [<code>Core tokenization</code>] <code>add_dummy_prefix_space</code> option to help with latest is...</li> <li><a href="`efdd436663`"><code>efdd436</code></a> FIX [<code>PEFT</code> / <code>Trainer</code> ] Handle better peft + quantized compiled models (<a href="https://redirect.github.com/huggingface/transformers/issues/29">#29</a>...</li> <li><a href="`5e95dcabe1`"><code>5e95dca</code></a> [<code>cuda kernels</code>] only compile them when initializing (<a href="https://redirect.github.com/huggingface/transformers/issues/29133">#29133</a>)</li> <li><a href="`a7755d2409`"><code>a7755d2</code></a> Generate: unset GenerationConfig parameters do not raise warning (<a href="https://redirect.github.com/huggingface/transformers/issues/29119">#29119</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.36.0&new-version=4.38.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 11:48:54 -07:00
Adrian Lizarraga	f644ff9fc0	[QNN EP] Support per-channel quantized weights (#20154 ) ### Description - Adds general support for per-channel quantized weights to QNN EP (HTP backend). - Add QNN EP unit tests for per-channel Conv - Update quantization tool to allow selecting which ops are quantized per-channel (and which axis) via tensor-level overrides. Currently, setting `per_channel=True` assumes all Convs, MatMuls, Gemms, InstanceNormalization, and LayerNormalization ops should be quantized per-channel using some assumed default axis. #### Creating QDQ per-channel Conv model example ```python from onnxruntime.quantization import CalibrationDataReader, QuantType, quantize from onnxruntime.quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model class DataReader(CalibrationDataReader): # TODO: See ONNX Runtime QNN docs for example of a data reader # https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#generating-a-quantized-model-x64 pass if __name__ == "__main__": input_model_path = "model.onnx" my_data_reader = DataReader(model_to_quantize) # Pre-process the original float32 model. preproc_model_path = "model.preproc.onnx" model_changed = qnn_preprocess_model(input_model_path, preproc_model_path) model_to_quantize = preproc_model_path if model_changed else input_model_path # RELEVANT TO THIS PR: # Make sure Conv's weight input is quantized to int8/symmetric/per-channel with axis == 0. # The presence of the 'axis' key indicates that this is a per-channel quantized weight. init_overrides = {'weight': [{'axis': 0, 'quant_type': QuantType.QInt8, 'symmetric': True}]} qnn_config = get_qnn_qdq_config(model_to_quantize, my_data_reader, init_overrides=init_overrides, activation_type=QuantType.QUInt16, # uint16 activations weight_type=QuantType.QUInt8) # uint8 weights by default quantize(model_to_quantize, "model.qdq.onnx", qnn_config) ``` float32 model: <img width="683" alt="image" src="https://github.com/microsoft/onnxruntime/assets/19691973/ca650e49-1ad0-47d8-8c46-17fbc224ca39"> QDQ model (per-channel Conv weight): <img width="748" alt="image" src="https://github.com/microsoft/onnxruntime/assets/19691973/6bd469f2-968b-4d11-9526-09b3e71f98e7"> ### Motivation and Context Support more models, especially models with int4 quantized weights.	2024-04-16 08:45:35 -07:00
George Wu	08d208b969	[QNN EP] refactor QNN deps/copy logic. start copying deps to target python loc… (#20317 ) copy QNN deps when building python bindings as well. tweak the wildcard to only copy QNN related files. latest sdk from Qualcomm (>= 2.21) also include SNPE dll's which we don't want to include.	2024-04-15 22:33:12 -07:00
Yi Zhang	caf692e626	[Fix] Random connection exceptions in MacOS_C_API_Packaging_CPU stage (#20322 ) ### Description Add download_deps to reduce downloading from 3rd party websites. ### Motivation and Context Fix frequent random exception like ``` CMake Error at abseil_cpp-subbuild/abseil_cpp-populate-prefix/src/abseil_cpp-populate-stamp/download-abseil_cpp-populate.cmake:162 (message): Each download failed! error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20240116.0.zip' failed status_code: 35 status_string: "SSL connect error" log: --- LOG BEGIN --- Trying 20.29.134.23:443... Connected to github.com (20.29.134.23) port 443 ALPN: curl offers h2,http/1.1 (304) (OUT), TLS handshake, Client hello (1): [315 bytes data] CAfile: /etc/ssl/cert.pem CApath: none Recv failure: Operation timed out LibreSSL/3.3.6: error:02FFF03C:system library:func(4095):Operation timed out Closing connection ``` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443278&view=logs&j=006a7a04-d43b-5fe1-df02-ecafb79c4d6e&t=110edd38-9f3b-50cf-b328-8ed0f915e5c1 --------- Co-authored-by: Yi Zhang <your@email.com>	2024-04-16 13:28:18 +08:00
Wanming Lin	fe1c3a45c1	[WebNN EP] Support NPU deviceType (#20278 )	2024-04-15 18:43:46 -07:00
Edward Chen	287ecea2f1	Fix binary size check build publish step. (#20298 ) Add `--user` option to pip install command. Error: ``` ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py' Consider using the `--user` option or check the permissions. ``` See #19877.	2024-04-15 10:15:42 -07:00
kunal-vaishnavi	6e4516cef9	Fix parity checker in LLaMA scripts (#20301 ) ### Description This PR fixes the parity checker in the LLaMA scripts by adding the following. - Enable buffer sharing manually with `use_buffer_share` instead of `use_gqa` - Get max sequence length from model's config ### Motivation and Context This PR fixes an issue with running the parity checker on other large-language models where `GroupQueryAttention` can be used without buffer sharing enabled.	2024-04-14 17:59:14 -07:00
Xiang Zhang	bf72f996e3	Enrich logic to fuse rotaryembedding with bias and support partial RE fusion into GQA (#20300 ) ### Description This PR mainly focuses on adding two functionalities: 1. Fuse RotaryEmbedding op taking output from previous layers with bias enabled. > Matmul->RotaryEmbedding -----> Matmul->Add->RotatyEmbedding 2. Fuse GQA op for partial RotaryEmbedding applied in phi-2. > # Partial rotary embedding query_rot, query_pass = ( query_states[..., : self.rotary_emb.dim], query_states[..., self.rotary_emb.dim :], ) key_rot, key_pass = ( key_states[..., : self.rotary_emb.dim], key_states[..., self.rotary_emb.dim :], ) # [batch_size, seq_length, num_heads, head_dim // config.partial_rotary_factor] query_rot, key_rot = apply_rotary_pos_emb(query_rot, key_rot, cos, sin, position_ids) # [batch_size, seq_length, num_heads, head_dim] query_states = torch.cat((query_rot, query_pass), dim=-1) key_states = torch.cat((key_rot, key_pass), dim=-1) # Optimized graph ![image](https://github.com/microsoft/onnxruntime/assets/17421593/76fd8576-7e60-41af-9a4f-48d205fc6b56)	2024-04-14 17:43:00 -07:00
George Wu	7ec51f0a13	pin controlnet_aux version to 0.0.7 to fix Big Models Stable Diffusion pipeline failure (#20302 ) conflict in cv2 causes " LayerId = cv2.dnn.DictValue AttributeError: module 'cv2.dnn' has no attribute 'DictValue' " controlnet_aux 0.0.8 pulls in a conflicting version of opencv-python pin to 0.0.7 failing pipeline passes with this change: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1348876&view=results	2024-04-15 00:09:15 +08:00
Patrice Vignola	01acc25d9d	[DML EP] Fix the output shapes of nodes with multiple outputs in the graph builder (#20289 ) The graph builder currently doesn't assign the correct shapes for subgraphs that have more than 1 output, and where each output comes from a different node. `nodeOutputShapes` should be a map of shapes (1:1 relationship), and not a map of lists of shapes (1:N relationship) since an output referenced by `arg->Name()` can only have 1 output. Take for example the following example of a subgraph where a node has 2 outputs, then each output feeds into an elementwise op. Both nodes will have a `targetIndex` of 0, and we were using this target index to query their shape, resulting in both outputs querying the same shape. In reality, what we need to do is use the `GraphOutputIndex` ofthe subgraph to query the correct output shape of the subgraph.	2024-04-12 16:34:49 -07:00
Satya Kumar Jandhyala	b33216be4c	[JS/WebGPU] Improve MatMulNBits perf (#19974 ) ### Description <!-- Describe your changes. --> Improve performance using shared memory ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-12 11:03:05 -07:00

1 2 3 4 5 ...

10933 commits