onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-26 22:35:43 +00:00

Author	SHA1	Message	Date
dependabot[bot]	e1499a007a	Bump Sixlabors.ImageSharp from 2.1.7 to 2.1.8 in /csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample (#20315 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.7 to 2.1.8. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.8</h2> <h2>What's Changed</h2> <ul> <li>V2 - Limit Read Palette Indices by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2719">SixLabors/ImageSharp#2719</a></li> <li>V2 - Clear Pixel Buffers on Decode. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2717">SixLabors/ImageSharp#2717</a></li> <li>V2 - Limit all memory allocations in the MemoryAllocator layer by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2715">SixLabors/ImageSharp#2715</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`f21d64188e`"><code>f21d641</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2715">#2715</a> from SixLabors/backport/v2-memlimit</li> <li><a href="`8f0b4d3e68`"><code>8f0b4d3</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2717">#2717</a> from SixLabors/backport/v2-clear-buffers</li> <li><a href="`cf9496d284`"><code>cf9496d</code></a> test allocation limits</li> <li><a href="`3d298db2cd`"><code>3d298db</code></a> Adapt BmpDecoder_ThrowsException_Issue2696 for V2</li> <li><a href="`a78ce27a2b`"><code>a78ce27</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2719">#2719</a> from SixLabors/backport/v2-check-palette-indices</li> <li><a href="`e6209147b1`"><code>e620914</code></a> Clamp read palette indices.</li> <li><a href="`c122185ea0`"><code>c122185</code></a> Clear pixel buffers on decode.</li> <li><a href="`5c6ec5d6fb`"><code>5c6ec5d</code></a> Limit all allocations</li> <li>See full diff in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.7...v2.1.8">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.7&new-version=2.1.8)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 14:20:16 -07:00
Yi-Hong Lyu	6b6a62fb40	Add vectorized AVX512F kernel for ReduceMaximumF32Kernel (#20268 ) ### Description <!-- Describe your changes. --> This commit introduces a new vectorized AVX512F kernel, MlasReduceMaximumF32KernelAvx512F, which efficiently computes the maximum value of the supplied buffer. Additionally, microbenchmarks have been added for MlasComputeSoftmax (inplace), MlasReduceMaximumF32KernelAvx, MlasComputeSumExpF32KernelAvx512F, and MlasComputeSoftmaxOutputF32KernelAvx. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> The goal of this commit is to enhance the performance of ReduceMaximumF32Kernel on CPUs with AVX512F instruction support. \| AVX \| \| \| AVX512 \| \| \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- name \| iterations \| real_time \| cpu_time \| iterations \| real_time \| cpu_time \| time_unit REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:3/real_time \| 271277304 \| 2.58095 \| 2.58091 \| 263338132 \| 2.65661 \| 2.65661 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:3/real_time \| 271220477 \| 2.58095 \| 2.58095 \| 263509929 \| 2.65652 \| 2.65649 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:3/real_time \| 271240587 \| 2.58064 \| 2.58064 \| 263479542 \| 2.65671 \| 2.65665 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:3/real_time \| 271227745 \| 2.58083 \| 2.58079 \| 263402506 \| 2.65657 \| 2.65657 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:3/real_time \| 271255069 \| 2.58073 \| 2.58071 \| 263463858 \| 2.65682 \| 2.65682 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:3/real_time \| 271257174 \| 2.58058 \| 2.58052 \| 263460120 \| 2.65682 \| 2.65682 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:4/real_time \| 174395051 \| 4.01401 \| 4.01401 \| 197330481 \| 3.5465 \| 3.54636 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:4/real_time \| 174645502 \| 3.99691 \| 3.99691 \| 197474831 \| 3.54298 \| 3.54278 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:4/real_time \| 174523308 \| 4.01391 \| 4.01386 \| 197389981 \| 3.54518 \| 3.54506 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:4/real_time \| 174779200 \| 3.99874 \| 3.99874 \| 197519075 \| 3.54227 \| 3.54209 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:4/real_time \| 174642874 \| 4.00645 \| 4.00641 \| 197642101 \| 3.54195 \| 3.54188 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:4/real_time \| 174546754 \| 4.0061 \| 4.00608 \| 197621033 \| 3.54296 \| 3.54281 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:5/real_time \| 162752651 \| 4.30119 \| 4.30114 \| 215552503 \| 3.24767 \| 3.24752 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:5/real_time \| 162717463 \| 4.30123 \| 4.30116 \| 215541082 \| 3.24711 \| 3.24695 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:5/real_time \| 162718819 \| 4.3016 \| 4.30153 \| 215589239 \| 3.24725 \| 3.24708 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:5/real_time \| 162719596 \| 4.30151 \| 4.30145 \| 215563846 \| 3.24956 \| 3.24949 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:5/real_time \| 162753333 \| 4.30125 \| 4.30125 \| 215537315 \| 3.24924 \| 3.24908 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:5/real_time \| 162752258 \| 4.3014 \| 4.30141 \| 215526482 \| 3.24744 \| 3.24735 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:7/real_time \| 143579660 \| 4.87526 \| 4.87516 \| 100000000 \| 5.25767 \| 5.25752 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:7/real_time \| 143585097 \| 4.87476 \| 4.87467 \| 100000000 \| 5.41583 \| 5.41567 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:7/real_time \| 143571011 \| 4.87506 \| 4.87503 \| 182359467 \| 3.83773 \| 3.83764 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:7/real_time \| 143587142 \| 4.87487 \| 4.8748 \| 182397261 \| 3.83807 \| 3.8379 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:7/real_time \| 143578465 \| 4.87525 \| 4.87521 \| 182428602 \| 3.83777 \| 3.83768 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:7/real_time \| 143588555 \| 4.87491 \| 4.87488 \| 125280452 \| 5.59791 \| 5.59766 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:9/real_time \| 284851058 \| 2.43476 \| 2.43476 \| 156879863 \| 4.42895 \| 4.42884 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:9/real_time \| 270700898 \| 2.59031 \| 2.59024 \| 157953114 \| 4.42995 \| 4.42968 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:9/real_time \| 282871172 \| 2.45385 \| 2.45385 \| 157801156 \| 4.42817 \| 4.42804 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:9/real_time \| 285307738 \| 2.47009 \| 2.47005 \| 158058507 \| 4.4279 \| 4.42786 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:9/real_time \| 285709536 \| 2.45481 \| 2.45476 \| 158070961 \| 4.42809 \| 4.42799 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:9/real_time \| 285449733 \| 2.47495 \| 2.47491 \| 158069718 \| 4.45026 \| 4.45017 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:11/real_time \| 189213618 \| 3.79684 \| 3.79676 \| 139459497 \| 5.01882 \| 5.01871 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:11/real_time \| 185600468 \| 3.76394 \| 3.76376 \| 139444892 \| 5.01922 \| 5.01905 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:11/real_time \| 184968668 \| 3.80636 \| 3.80636 \| 139470834 \| 5.01948 \| 5.01936 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:11/real_time \| 183867226 \| 3.80432 \| 3.80427 \| 139481986 \| 5.01975 \| 5.01944 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:11/real_time \| 184301650 \| 3.81634 \| 3.81634 \| 139452846 \| 5.01983 \| 5.01972 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:11/real_time \| 186215795 \| 3.82659 \| 3.82654 \| 139497736 \| 5.02119 \| 5.02113 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:13/real_time \| 135622415 \| 5.16256 \| 5.16252 \| 124661337 \| 5.61227 \| 5.61194 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:13/real_time \| 135618907 \| 5.15967 \| 5.1596 \| 124805224 \| 5.6088 \| 5.60854 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:13/real_time \| 135612192 \| 5.15506 \| 5.15501 \| 124803221 \| 5.60901 \| 5.60869 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:13/real_time \| 135906082 \| 5.15818 \| 5.15818 \| 124776601 \| 5.60898 \| 5.60886 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:13/real_time \| 135369523 \| 5.15709 \| 5.15682 \| 124790370 \| 5.60927 \| 5.60902 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:13/real_time \| 135596827 \| 5.1603 \| 5.1603 \| 124792145 \| 5.61637 \| 5.61614 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:15/real_time \| 110947137 \| 5.96511 \| 5.96495 \| 112861522 \| 6.20035 \| 6.20014 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:15/real_time \| 118004792 \| 6.22645 \| 6.22628 \| 112909900 \| 6.20073 \| 6.20073 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:15/real_time \| 112630319 \| 6.25564 \| 6.25552 \| 112874563 \| 6.19932 \| 6.19924 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:15/real_time \| 117403034 \| 6.17263 \| 6.17258 \| 112927318 \| 6.19866 \| 6.19842 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:15/real_time \| 108921863 \| 6.48624 \| 6.48612 \| 112927746 \| 6.20057 \| 6.20026 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:15/real_time \| 110358148 \| 6.66805 \| 6.66789 \| 112907312 \| 6.19938 \| 6.19908 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:16/real_time \| 203419574 \| 3.4415 \| 3.44137 \| 237134525 \| 2.95649 \| 2.95638 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:16/real_time \| 203414035 \| 3.4411 \| 3.44099 \| 237129564 \| 2.95178 \| 2.95171 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:16/real_time \| 203404068 \| 3.44157 \| 3.44151 \| 236981704 \| 2.9518 \| 2.95167 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:16/real_time \| 203391471 \| 3.44146 \| 3.44137 \| 237108807 \| 2.95203 \| 2.95196 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:16/real_time \| 203393801 \| 3.44131 \| 3.44127 \| 237126460 \| 2.95278 \| 2.95272 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:16/real_time \| 203407476 \| 3.44181 \| 3.44162 \| 237154444 \| 2.95293 \| 2.9528 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:500/real_time \| 37551439 \| 18.6407 \| 18.6407 \| 39222534 \| 17.858 \| 17.8571 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:500/real_time \| 37544097 \| 18.6404 \| 18.6401 \| 39174151 \| 17.8539 \| 17.8536 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:500/real_time \| 37549837 \| 18.6391 \| 18.6391 \| 39233956 \| 17.8507 \| 17.8505 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:500/real_time \| 45996345 \| 15.2157 \| 15.2153 \| 39285929 \| 17.848 \| 17.8474 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:500/real_time \| 46012429 \| 15.2184 \| 15.2179 \| 65664865 \| 10.7366 \| 10.7364 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:500/real_time \| 45912375 \| 15.2349 \| 15.2346 \| 65205908 \| 10.8498 \| 10.8492 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:4/D:2000/real_time \| 9493955 \| 73.7232 \| 73.7203 \| 10188090 \| 68.7931 \| 68.7908 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:8/D:2000/real_time \| 9495562 \| 73.7173 \| 73.7173 \| 10180895 \| 68.7533 \| 68.7511 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:16/D:2000/real_time \| 9487371 \| 73.7852 \| 73.7831 \| 10164473 \| 68.7279 \| 68.725 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:32/D:2000/real_time \| 10816047 \| 64.7322 \| 64.7287 \| 10168481 \| 68.8109 \| 68.8096 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:64/D:2000/real_time \| 10808802 \| 64.7232 \| 64.721 \| 19478320 \| 36.1471 \| 36.1461 \| ns REDUCEMAXIMUMF32KERNEL[]/ByteAligned:128/D:2000/real_time \| 10818192 \| 64.7304 \| 64.728 \| 19419672 \| 35.9635 \| 35.9635 \| ns	2024-04-16 13:52:43 -07:00
Yifan Li	54f91ea65a	[TensorRT EP] support user_compute_stream in python API (#20168 ) ### Description <!-- Describe your changes. --> * Implement `user_compute_stream` python api for TensorRT EP * Using this option will implicitly set `has_user_compute_stream` as `true` * Extend existing TRTEP unit test to verify `user_compute_stream` option * This has been verified in local pytorch env, with `torch.cuda.Stream()` passing into `user_compute_stream`: ```python ... # Before inference if torch.cuda.is_available(): s = torch.cuda.Stream() option = {"user_compute_stream": str(s.cuda_stream)} sess.set_providers(["TensorrtExecutionProvider"], [option]) options = sess.get_provider_options() assert "TensorrtExecutionProvider" in options assert options["TensorrtExecutionProvider"].get("user_compute_stream", "") == str(s.cuda_stream) assert options["TensorrtExecutionProvider"].get("has_user_compute_stream", "") == "1" ... ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Align with existing `user_compute_stream` python implementations for [CUDA EP](https://github.com/microsoft/onnxruntime/pull/19229)/[ROCm EP](https://github.com/microsoft/onnxruntime/pull/19619)	2024-04-16 12:49:29 -07:00
dependabot[bot]	e02aef1ded	Bump transformers from 4.36.0 to 4.38.0 in /onnxruntime/python/tools/transformers/models/stable_diffusion (#20271 ) Bumps [transformers](https://github.com/huggingface/transformers) from 4.36.0 to 4.38.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.38: Gemma, Depth Anything, Stable LM; Static Cache, HF Quantizer, AQLM</h2> <h2>New model additions</h2> <h3>💎 Gemma 💎</h3> <p>Gemma is a new opensource Language Model series from Google AI that comes with a 2B and 7B variant. The release comes with the pre-trained and instruction fine-tuned versions and you can use them via <code>AutoModelForCausalLM</code>, <code>GemmaForCausalLM</code> or <code>pipeline</code> interface!</p> <p>Read more about it in the Gemma release blogpost: <a href="https://hf.co/blog/gemma">https://hf.co/blog/gemma</a></p> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b") model = AutoModelForCausalLM.from_pretrained("google/gemma-2b", device_map="auto", torch_dtype=torch.float16)</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <p>You can use the model with Flash Attention, SDPA, Static cache and quantization API for further optimizations !</p> <ul> <li>Flash Attention 2</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", torch_dtype=torch.float16, attn_implementation="flash_attention_2" )</p> <p>input_text = "Write me a poem about Machine Learning." input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")</p> <p>outputs = model.generate(input_ids) </code></pre></p> <ul> <li>bitsandbytes-4bit</li> </ul> <pre lang="python"><code>from transformers import AutoTokenizer, AutoModelForCausalLM <p>tokenizer = AutoTokenizer.from_pretrained("google/gemma-2b")</p> <p>model = AutoModelForCausalLM.from_pretrained( "google/gemma-2b", device_map="auto", load_in_4bit=True ) </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`08ab54ada5`"><code>08ab54a</code></a> [ <code>gemma</code>] Adds support for Gemma 💎 (<a href="https://redirect.github.com/huggingface/transformers/issues/29167">#29167</a>)</li> <li><a href="`2de9314197`"><code>2de9314</code></a> [<code>Maskformer</code>] safely get backbone config (<a href="https://redirect.github.com/huggingface/transformers/issues/29166">#29166</a>)</li> <li><a href="`476957b5b4`"><code>476957b</code></a> 🚨 Llama: update rope scaling to match static cache changes (<a href="https://redirect.github.com/huggingface/transformers/issues/29143">#29143</a>)</li> <li><a href="`7a4bec6e8f`"><code>7a4bec6</code></a> Release: 4.38.0</li> <li><a href="`ee3af60be0`"><code>ee3af60</code></a> Add support for fine-tuning CLIP-like models using contrastive-image-text exa...</li> <li><a href="`0996a10077`"><code>0996a10</code></a> Revert low cpu mem tie weights (<a href="https://redirect.github.com/huggingface/transformers/issues/29135">#29135</a>)</li> <li><a href="`15cfe38942`"><code>15cfe38</code></a> [<code>Core tokenization</code>] <code>add_dummy_prefix_space</code> option to help with latest is...</li> <li><a href="`efdd436663`"><code>efdd436</code></a> FIX [<code>PEFT</code> / <code>Trainer</code> ] Handle better peft + quantized compiled models (<a href="https://redirect.github.com/huggingface/transformers/issues/29">#29</a>...</li> <li><a href="`5e95dcabe1`"><code>5e95dca</code></a> [<code>cuda kernels</code>] only compile them when initializing (<a href="https://redirect.github.com/huggingface/transformers/issues/29133">#29133</a>)</li> <li><a href="`a7755d2409`"><code>a7755d2</code></a> Generate: unset GenerationConfig parameters do not raise warning (<a href="https://redirect.github.com/huggingface/transformers/issues/29119">#29119</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.36.0...v4.38.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.36.0&new-version=4.38.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-16 11:48:54 -07:00
Adrian Lizarraga	f644ff9fc0	[QNN EP] Support per-channel quantized weights (#20154 ) ### Description - Adds general support for per-channel quantized weights to QNN EP (HTP backend). - Add QNN EP unit tests for per-channel Conv - Update quantization tool to allow selecting which ops are quantized per-channel (and which axis) via tensor-level overrides. Currently, setting `per_channel=True` assumes all Convs, MatMuls, Gemms, InstanceNormalization, and LayerNormalization ops should be quantized per-channel using some assumed default axis. #### Creating QDQ per-channel Conv model example ```python from onnxruntime.quantization import CalibrationDataReader, QuantType, quantize from onnxruntime.quantization.execution_providers.qnn import get_qnn_qdq_config, qnn_preprocess_model class DataReader(CalibrationDataReader): # TODO: See ONNX Runtime QNN docs for example of a data reader # https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#generating-a-quantized-model-x64 pass if __name__ == "__main__": input_model_path = "model.onnx" my_data_reader = DataReader(model_to_quantize) # Pre-process the original float32 model. preproc_model_path = "model.preproc.onnx" model_changed = qnn_preprocess_model(input_model_path, preproc_model_path) model_to_quantize = preproc_model_path if model_changed else input_model_path # RELEVANT TO THIS PR: # Make sure Conv's weight input is quantized to int8/symmetric/per-channel with axis == 0. # The presence of the 'axis' key indicates that this is a per-channel quantized weight. init_overrides = {'weight': [{'axis': 0, 'quant_type': QuantType.QInt8, 'symmetric': True}]} qnn_config = get_qnn_qdq_config(model_to_quantize, my_data_reader, init_overrides=init_overrides, activation_type=QuantType.QUInt16, # uint16 activations weight_type=QuantType.QUInt8) # uint8 weights by default quantize(model_to_quantize, "model.qdq.onnx", qnn_config) ``` float32 model: <img width="683" alt="image" src="https://github.com/microsoft/onnxruntime/assets/19691973/ca650e49-1ad0-47d8-8c46-17fbc224ca39"> QDQ model (per-channel Conv weight): <img width="748" alt="image" src="https://github.com/microsoft/onnxruntime/assets/19691973/6bd469f2-968b-4d11-9526-09b3e71f98e7"> ### Motivation and Context Support more models, especially models with int4 quantized weights.	2024-04-16 08:45:35 -07:00
George Wu	08d208b969	[QNN EP] refactor QNN deps/copy logic. start copying deps to target python loc… (#20317 ) copy QNN deps when building python bindings as well. tweak the wildcard to only copy QNN related files. latest sdk from Qualcomm (>= 2.21) also include SNPE dll's which we don't want to include.	2024-04-15 22:33:12 -07:00
Yi Zhang	caf692e626	[Fix] Random connection exceptions in MacOS_C_API_Packaging_CPU stage (#20322 ) ### Description Add download_deps to reduce downloading from 3rd party websites. ### Motivation and Context Fix frequent random exception like ``` CMake Error at abseil_cpp-subbuild/abseil_cpp-populate-prefix/src/abseil_cpp-populate-stamp/download-abseil_cpp-populate.cmake:162 (message): Each download failed! error: downloading 'https://github.com/abseil/abseil-cpp/archive/refs/tags/20240116.0.zip' failed status_code: 35 status_string: "SSL connect error" log: --- LOG BEGIN --- Trying 20.29.134.23:443... Connected to github.com (20.29.134.23) port 443 ALPN: curl offers h2,http/1.1 (304) (OUT), TLS handshake, Client hello (1): [315 bytes data] CAfile: /etc/ssl/cert.pem CApath: none Recv failure: Operation timed out LibreSSL/3.3.6: error:02FFF03C:system library:func(4095):Operation timed out Closing connection ``` https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=443278&view=logs&j=006a7a04-d43b-5fe1-df02-ecafb79c4d6e&t=110edd38-9f3b-50cf-b328-8ed0f915e5c1 --------- Co-authored-by: Yi Zhang <your@email.com>	2024-04-16 13:28:18 +08:00
Wanming Lin	fe1c3a45c1	[WebNN EP] Support NPU deviceType (#20278 )	2024-04-15 18:43:46 -07:00
Edward Chen	287ecea2f1	Fix binary size check build publish step. (#20298 ) Add `--user` option to pip install command. Error: ``` ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/usr/local/bin/f2py' Consider using the `--user` option or check the permissions. ``` See #19877.	2024-04-15 10:15:42 -07:00
kunal-vaishnavi	6e4516cef9	Fix parity checker in LLaMA scripts (#20301 ) ### Description This PR fixes the parity checker in the LLaMA scripts by adding the following. - Enable buffer sharing manually with `use_buffer_share` instead of `use_gqa` - Get max sequence length from model's config ### Motivation and Context This PR fixes an issue with running the parity checker on other large-language models where `GroupQueryAttention` can be used without buffer sharing enabled.	2024-04-14 17:59:14 -07:00
Xiang Zhang	bf72f996e3	Enrich logic to fuse rotaryembedding with bias and support partial RE fusion into GQA (#20300 ) ### Description This PR mainly focuses on adding two functionalities: 1. Fuse RotaryEmbedding op taking output from previous layers with bias enabled. > Matmul->RotaryEmbedding -----> Matmul->Add->RotatyEmbedding 2. Fuse GQA op for partial RotaryEmbedding applied in phi-2. > # Partial rotary embedding query_rot, query_pass = ( query_states[..., : self.rotary_emb.dim], query_states[..., self.rotary_emb.dim :], ) key_rot, key_pass = ( key_states[..., : self.rotary_emb.dim], key_states[..., self.rotary_emb.dim :], ) # [batch_size, seq_length, num_heads, head_dim // config.partial_rotary_factor] query_rot, key_rot = apply_rotary_pos_emb(query_rot, key_rot, cos, sin, position_ids) # [batch_size, seq_length, num_heads, head_dim] query_states = torch.cat((query_rot, query_pass), dim=-1) key_states = torch.cat((key_rot, key_pass), dim=-1) # Optimized graph ![image](https://github.com/microsoft/onnxruntime/assets/17421593/76fd8576-7e60-41af-9a4f-48d205fc6b56)	2024-04-14 17:43:00 -07:00
George Wu	7ec51f0a13	pin controlnet_aux version to 0.0.7 to fix Big Models Stable Diffusion pipeline failure (#20302 ) conflict in cv2 causes " LayerId = cv2.dnn.DictValue AttributeError: module 'cv2.dnn' has no attribute 'DictValue' " controlnet_aux 0.0.8 pulls in a conflicting version of opencv-python pin to 0.0.7 failing pipeline passes with this change: https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1348876&view=results	2024-04-15 00:09:15 +08:00
Patrice Vignola	01acc25d9d	[DML EP] Fix the output shapes of nodes with multiple outputs in the graph builder (#20289 ) The graph builder currently doesn't assign the correct shapes for subgraphs that have more than 1 output, and where each output comes from a different node. `nodeOutputShapes` should be a map of shapes (1:1 relationship), and not a map of lists of shapes (1:N relationship) since an output referenced by `arg->Name()` can only have 1 output. Take for example the following example of a subgraph where a node has 2 outputs, then each output feeds into an elementwise op. Both nodes will have a `targetIndex` of 0, and we were using this target index to query their shape, resulting in both outputs querying the same shape. In reality, what we need to do is use the `GraphOutputIndex` ofthe subgraph to query the correct output shape of the subgraph.	2024-04-12 16:34:49 -07:00
Satya Kumar Jandhyala	b33216be4c	[JS/WebGPU] Improve MatMulNBits perf (#19974 ) ### Description <!-- Describe your changes. --> Improve performance using shared memory ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-12 11:03:05 -07:00
Changming Sun	794d39a977	LLVM16 compat changes (#20294 ) The change is similar to #15672 and #11667, for making the code compatible with CUDA 12 and LLVM16 on Mariner2.	2024-04-12 10:16:12 -07:00
liqun Fu	cd7112f800	Integration with ONNX 1.16.0 (#19745 ) ### Description update with ONNX 1.16.0 branch according to https://github.com/microsoft/onnxruntime/blob/main/docs/How_To_Update_ONNX_Dev_Notes.md ONNX 1.16.0 release notes: https://github.com/onnx/onnx/releases/tag/v1.16.0 #### Updated ops for CPU EP: - DequantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block dequantization support - QuantizeLinear(21) - Added int16 and uint16 support + various optimizer tests - Missing int4 and uint4 support - Missing block quantization support - Cast(21) - Missing int4 and uint4 support - CastLike(21) - Missing int4 and uint4 support - ConstantOfShape(21) - Missing int4 and uint4 support - Identity(21) - Missing int4 and uint4 support - If(21) - Missing int4 and uint4 support - Loop(21) - Missing int4 and uint4 support - Reshape(21) - Missing int4 and uint4 support - Scan(21) - Missing int4 and uint4 support - Shape(21) - Missing int4 and uint4 support - Size(21) - Missing int4 and uint4 support - Flatten(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Pad(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Squeeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Transpose(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support - Unsqueeze(21) - Missing float8e4m3fnuz, float8e5m2, float8e5m2fnuz, int4, and uint4 support #### Unimplemented opset 21 features/ops - int4 and uint4 data type - QLinearMatMul(21) - GroupNormalization(21) - ai.onnx.ml.TreeEnsemble(5) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ### Disabled tests #### ORT Training orttraining/orttraining/test/python/orttraining_test_ort_apis_py_bindings.py - test_ort_custom_ops: Potential shape inference bug for custom ops #### Python quantization unit tests test/onnx/python/quantization (shape inference bug) - test_op_conv_transpose.py: test_quantize_conv_transpose_u8u8_fp16 - test_op_conv_transpose.py: test_quantize_conv_transpose_s8s8_fp16 - test_op_gemm.py: test_quantize_qop_gemm_s8s8 - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_same - test_op_gemm.py: test_quantize_qop_gemm_e4m3fn_p3 - test_op_matmul.py: test_quantize_matmul_u8u8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16 - test_op_matmul.py: test_quantize_matmul_s8s8_f16_entropy - test_op_matmul.py: test_quantize_matmul_s8s8_f16_percentile - test_op_matmul.py: test_quantize_matmul_s8s8_f16_distribution - test_op_relu.py: test_quantize_qop_relu_s8s8 #### ONNX tests - test_maxpool_2d_ceil_output_size_reduce_by_one: ONNX 1.16.0 fixed a maxpool output size bug and added this test. Enable this test when [ORT PR](https://github.com/microsoft/onnxruntime/pull/18377) is merged. Refer to original [ONNX PR](https://github.com/onnx/onnx/pull/5741). - test_ai_onnx_ml_tree_ensemble_set_membership_cpu: new unimplemented op ai.onnx.ml.TreeEnsemble - test_ai_onnx_ml_tree_ensemble_single_tree_cpu: same - test_ai_onnx_ml_tree_ensemble_set_membership_cuda: same - test_ai_onnx_ml_tree_ensemble_single_tree_cuda: same - test_cast_INT4_to_FLOAT_cpu: ORT Cast(21) impl doesn't support int4 yet - test_cast_INT4_to_INT8_cpu: same - test_cast_UINT4_to_FLOAT_cpu: same - test_cast_UINT4_to_UINT8_cpu: same - test_cast_INT4_to_FLOAT_cuda - test_cast_INT4_to_INT8_cuda - test_cast_UINT4_to_FLOAT_cuda - test_cast_UINT4_to_UINT8_cuda - test_constantofshape_float_ones_cuda: ConstantOfShape(21) not implemented for cuda - test_constantofshape_int_shape_zero_cuda: same - test_constantofshape_int_zeros_cuda: same - test_flatten_axis0_cuda: Flatten(21) not implemented for cuda - test_flatten_axis1_cuda: same - test_flatten_axis2_cuda: same - test_flatten_axis3_cuda: same - test_flatten_default_axis_cuda: same - test_flatten_negative_axis1_cuda: same - test_flatten_negative_axis2_cuda: same - test_flatten_negative_axis3_cuda: same - test_flatten_negative_axis4_cuda: same - test_qlinearmatmul_2D_int8_float16_cpu: QLinearMatMul(21) for onnx not implemented in ORT yet - test_qlinearmatmul_2D_int8_float32_cpu: same - test_qlinearmatmul_2D_uint8_float16_cpu: same - test_qlinearmatmul_2D_uint8_float32_cpu: same - test_qlinearmatmul_3D_int8_float16_cpu: same - test_qlinearmatmul_3D_int8_float32_cpu: same - test_qlinearmatmul_3D_uint8_float16_cpu: same - test_qlinearmatmul_3D_uint8_float32_cpu: same - test_qlinearmatmul_2D_int8_float16_cuda: same - test_qlinearmatmul_2D_int8_float32_cuda: same - test_qlinearmatmul_2D_uint8_float16_cuda: same - test_qlinearmatmul_2D_uint8_float32_cuda: same - test_qlinearmatmul_3D_int8_float16_cuda: same - test_qlinearmatmul_3D_int8_float32_cuda: same - test_qlinearmatmul_3D_uint8_float16_cuda: same - test_qlinearmatmul_3D_uint8_float32_cuda: same - test_size_cuda: Size(21) not implemented for cuda - test_size_example_cuda: same - test_dequantizelinear_blocked: Missing implementation for block dequant for DequantizeLinear(21) - test_quantizelinear_blocked_asymmetric: Missing implementation for block quant for QuantizeLinear(21) - test_quantizelinear_blocked_symmetric: Missing implementation for block quant for QuantizeLinear(21) --------- Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: Ganesan Ramalingam <grama@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: adrianlizarraga <adlizarraga@microsoft.com>	2024-04-12 09:46:49 -07:00
Patrice Vignola	a0d5067341	[DML EP] Move operators to feature level 6.4 (#20290 ) Those operators won't be in the next version of DML, but they will come in the version right after.	2024-04-12 00:02:27 -07:00
Adrian Lizarraga	327fb1fde3	[QNN EP] Use QNN's ResizeBilinear operator for specific configs of ONNX Resize (#20292 ) ### Description Uses QNN's ResizeBilinear operator for ONNX Resize with: - input rank: 4 - mode: linear - coordinate transformation mode: half_pixel, align_corners, or asymmetric #### Mapping matrix of ONNX Resize w/ "linear" mode on HTP backend. Table entries correspond to the QNN operator used for the given configuration (Resize = QNN Resize op, RBL = QNN ResizeBilinear op, X = Unsupported). \| coordinate_transformation_mode \| input_rank < 3 \| input_rank = 3 \| input_rank = 4 \| input_rank = 5 \| input_rank > 5 \| \| ------------- \| ------------- \|------------- \|------------- \|------------- \|------------- \| \| half_pixel \| X \| Resize \| RBL \| Resize \| X \| \| pytorch_half_pixel \| X \| Resize \| Resize \| Resize \| X \| \| align_corners \| X \| Resize \| RBL \| Resize \| X \| \| asymmetric \| X \| Resize \| RBL \| Resize \| X \| ### Motivation and Context QNN's ResizeBilinear operator seems to perform better (lower latency) than QNN's Resize operator for certain configurations.	2024-04-11 22:38:55 -07:00
dependabot[bot]	9ca1afa25c	Bump protobufjs from 7.2.4 to 7.2.5 in /js/web (#20270 ) Bumps [protobufjs](https://github.com/protobufjs/protobuf.js) from 7.2.4 to 7.2.5. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/releases">protobufjs's releases</a>.</em></p> <blockquote> <h2>protobufjs: v7.2.5</h2> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/protobufjs/protobuf.js/blob/master/CHANGELOG.md">protobufjs's changelog</a>.</em></p> <blockquote> <h2><a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">7.2.5</a> (2023-08-21)</h2> <h3>Bug Fixes</h3> <ul> <li>crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>) (<a href="`eaf9f0a5a4`">eaf9f0a</a>)</li> <li>deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>) (<a href="`e93286ef70`">e93286e</a>)</li> <li>possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>) (<a href="`f2a8620179`">f2a8620</a>)</li> </ul> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`4436cc748c`"><code>4436cc7</code></a> chore: release master (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1925">#1925</a>)</li> <li><a href="`e93286ef70`"><code>e93286e</code></a> fix: deprecation warning for new Buffer (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1905">#1905</a>)</li> <li><a href="`eaf9f0a5a4`"><code>eaf9f0a</code></a> fix: crash in comment parsing (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1890">#1890</a>)</li> <li><a href="`f2a8620179`"><code>f2a8620</code></a> fix: possible infinite loop when parsing option (<a href="https://redirect.github.com/protobufjs/protobuf.js/issues/1923">#1923</a>)</li> <li>See full diff in <a href="https://github.com/protobufjs/protobuf.js/compare/protobufjs-v7.2.4...protobufjs-v7.2.5">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=protobufjs&package-manager=npm_and_yarn&previous-version=7.2.4&new-version=7.2.5)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-11 22:07:08 -07:00
Wanming Lin	667d2eb8e6	[WebNN EP] Support Gelu op (#20240 )	2024-04-11 19:55:44 -07:00
Patrice Vignola	12042a9387	[DML] Add FastGelu (#20066 ) Although DML doesn't have a "fast" gelu approximation operator, its standard GELU operator is still faster than having to combine all the separate elementwise operators from different ops.	2024-04-11 14:40:28 -07:00
Yulong Wang	50bd4571ac	[js/web] support SimplifiedLayerNorm and SkipSimplifiedLayerNorm (#20277 ) ### Description Support operator `SimplifiedLayerNorm` and `SkipSimplifiedLayerNorm` for WebGPU backend.	2024-04-11 14:08:50 -07:00
Maximilian Müller	2d0e1df80a	TRT detailed log and strong typed networks (#19695 ) ### Description @chilo-ms to me it seems sensible to forward the detailed log argument to the TRT logger itself. Also when no precision downcast is wanted this will ensure to actually stick to ONNX precision when used with TRT 9+.	2024-04-11 13:40:13 -07:00
Jeff Bloomfield	f7e2faf961	Re-enable MatMul QDQ fusions with the DML EP (#20248 ) This re-enables MatMul QDQ fusions with the DML EP now that bugs in related DML kernels previously encountered in the pipeline are expected to be addressed.	2024-04-11 10:43:20 -07:00
Wanming Lin	ee603ee326	[WebNN EP] Fixed WebNN constant operand is detached issue (#20229 ) Wasm allows growing the memory size, this will cause all array buffers reallocation. WebNN EP passes a wasm view to a WebNN constant directly which would lead to the WebNN constant be treated as detached buffers in JS side. Simply create a copy for WebNN constant to fix it.	2024-04-10 20:30:03 -07:00
Yufeng Li	e6ca360695	fix build break in kernel_explorer (#20235 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-10 15:01:44 -07:00
Chi Lo	47072509c7	[TensorRT EP] Fix updating provider options from session options (#20246 ) The issue comes from if user specifies a path for "ep.context_file_path" in session options, due to `context_cache_path` is a local variable and it will be destroyed when returning from `UpdateOrtTensorRTProviderOptionsV2FromSessionOptionsConfigs()`. Later in `onnxruntime::TensorrtProviderFactoryCreator::Create(&new_tensorrt_options)`, it will access the corrupted memory location because of the location is saved via context_cache_path.c_str(). Inline the `UpdateOrtTensorRTProviderOptionsV2FromSessionOptionsConfigs()` can fix this issue.	2024-04-10 12:48:37 -07:00
MasayoshiTsutsui	6a9d8a9030	[js/webgpu] implement DepthToSpace operator in webgpu (#19948 ) ### Description This PR supports [DepthToSpace](https://onnx.ai/onnx/operators/onnx__DepthToSpace.html#depthtospace) operator in webgpu backend. ### Test We followed the steps described on [this page](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce) to build, tested with the following commands, and confirmed that it passed the Model and Op tests that already existed. (Probably, these test cases were prepared in the past for WebGL backend) ``` ~/onnxruntime/js/web> % npm test -- suite0 -b=webgpu --wasm-number-threads=1 --debug ``` ##### NOTE I want to tell you that the main branch version failed 5 tests for the resize_upsample_sizes_nearest operator. Since I didn't touch this issue, those test cases still fail in my branch as well. Should I post an issue for this? ### Motivation and Context Though the DepthToSpace operator plays a crucial role in super-resolution domains, it was not supported in webgpu backend.	2024-04-10 12:13:46 -07:00
Dmitri Smirnov	89a96bdc34	Reduce heap contention in Tokenizer (#20196 ) ### Description <!-- Describe your changes. --> Re-use vector buffers to prevent frequent reallocations. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Reduce process heap contention. ![image](https://github.com/microsoft/onnxruntime/assets/11303988/f0b78062-3d86-45b7-87fd-e0696b170cf8)	2024-04-10 12:12:17 -07:00
Yifan Li	9577fe454d	[EP Perf] Customize onnx-tensorrt commit id when init CI tasks (#20175 ) ### Description <!-- Describe your changes. --> Customize commit id of onnx-tensorrt in EP Perf CI variables when testing OSS parsers in different versions ### To Verify ![image](https://github.com/microsoft/onnxruntime/assets/109183385/9dc650d8-377d-4223-8951-f0849b1fe984) After assigning `onnxTensorrtCommitId` in EP Perf CI Variables, CI would prompt during the step of [Build latest ORT Image with TensorRT OSS parser](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396450): ``` Updated deps.txt with new commit id a43ce67187bab219520fd80f21af8bbd4354bc8c and hash 572535aefef477050f86744dfab1fef840198035 ``` And CI would [overwrite the line of onnx_tensorrt in deps.txt](https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=438217&view=logs&j=b6bfa4e2-8141-507f-8ca1-59b3f929fa71&t=fc64e110-ab59-54e4-1c37-853e84a52a7e&l=396451) which was assigned as: ``` onnx_tensorrt;`a43ce67187`.zip;572535aefef477050f86744dfab1fef840198035 ``` ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> To save time of modifying deps.txt and manually calculating zip hash	2024-04-10 09:46:05 -07:00
guyang3532	471e969e2f	Check padding density by input of embedding module (#19821 ) ### Description The PaddingElimination optimization is enabled when the density of embedding padding less than 90%. We need to check the density of the embedding padding to decide whether enable the optimization. Before this pr, we just check the inputs of graph and correlate one with the embedding node by iterate graph from the embedding node back to one graph input. This is hard to be general because there may be complicated pattern between graph input and embedding node. This pr check padding density by the direct input of embedding module rather than the input of graph at the first graph execution when exporting onnx graph. And if the density < 90%, insert a flag PythonOp after the embedding node as: ``` Embedding \| PythonOp (func_name:_FlagPaddingElimination) (insert if density < 90%) \| Following graph ``` When the PaddingElimination is invoked, it check if there is the flag PythonOp(func_name:_FlagPaddingElimination) after the Embedding node and if it is, remove it and do the padding elimination optimization.	2024-04-10 18:45:51 +08:00
Yi Zhang	0acde1157a	Set parallel count to avoid OOM in training GPU packaging pipeline (#20255 ) ### Description make the compilation work on Azure CPU Agent by reduce the parallel count ### Motivation and Context The OOM issue mentioned in #20244 was caused the by low memory/parallel_count.	2024-04-10 14:05:53 +08:00
pengwa	280b2634c5	Prompt layer-wise recompute when applicable (#20126 ) ### Prompt layer-wise when applicable Give explicit prompts in export failures to users to enable layer-wise memory optimization if we found the checkpoint function is used. - Using checkpoint function is a strong indicator that the model is too large to fit in GPU memory. - If we don't override the checkpoint function here, mostly ONNX export will be failed. 1. For old version PyTorch, when handling gradient checkpoint feature, we just throw an exception. 2. For new version PyTorch, an export failure happens. - But both failures did not give users explicitly "HOW" to mitigate. This PR did that. `` ![image](https://github.com/microsoft/onnxruntime/assets/10530022/c0476748-5818-4cc8-b2d6-88c7580fe4da) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-10 11:50:28 +08:00
Yi Zhang	14d7872ce9	Reuse T4 for Cuda12.2 training packaging pipeline. (#20244 ) ### Description It always has been out of memory in training CUDA 12.2 packaging pipeline https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary since the PR #19910 I tried other CPU agents for example, D64as_v5(256G memory) and D32as_v4(128G memory and 256 G SSD temp storage), which are still out of memory like the below image ![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083) But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp storage, and it takes much more time. ### Motivation and Context Restore CUDA 12.2 training packaging pipeline first. More time is needed to investigate the root cause ### Other Clues. These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4 And it runs out of memory on CPU machine. @ajindal1 cuda12.2 on T4 ``` 2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o 2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o 2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o ``` But they could be finished in about one minute with Cuda 11.8 on CPU ``` cuda11.8 on CPU 2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o cuda11.8 on GPU 024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o 2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o ```	2024-04-10 09:21:40 +08:00
Dmitri Smirnov	7d8dea9f10	Reduce Heap contention in StringNormalizer (#20182 ) ### Description <!-- Describe your changes. --> Re-use pre-computed and pre-allocated buffers for UNICODE conversions. Make sure we do not introduce unnecessary intermediate `std::string` instances. Create a Utf8Generic converter for use with non-Windows platforms. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This reduces heap contention in P1 customer. ![image](https://github.com/microsoft/onnxruntime/assets/11303988/fd39fb01-7361-47d2-8f83-69dbc3bbc65c)	2024-04-09 16:10:31 -07:00
pengwa	81005e2c92	Optimize constant sharing perf (#20143 ) ### Optimize constant sharing perf by avoiding [renaming for the first name we detect a constant pattern. Currently every time we start run ConstantSharing, for each initializer, we find its pattern does not exist, then we create a new NodeArg with a unique name. Then later if other initializer share the same pattern, they will be replaced by the NodeArg. The problem is: once there is no real constant sharing cases, we still modify the graph for each initializer. This is not needed. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-09 12:04:36 +08:00
danyue	07b5377f7c	Add INT16 and UINT16 compatibility for relu_quantizelinear (#20187 ) ### Description <!-- Describe your changes. --> There is a problem in relu_quantizelinear transformer that causes wrong results. The purpose of this PR is to solve this problem. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This does not take into account the situation where Q's zeropoint is tensor(int16), tensor(uint16), so when this happens, an error will occur. How to verify： ```python import onnx import onnxruntime as ort import numpy as np model_name = 'relu_quantize_testcase.onnx' model = onnx.load(model_name) ort_input0 = np.random.rand((1, 64, 64, 128),np.float32) # infer with GraphOptimizationLevel=0 so = ort.SessionOptions() so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL ort_session = ort.InferenceSession( model_name, providers=["CPUExecutionProvider"], sess_options=so ) outputs = [x.name for x in ort_session.get_outputs()] ort_outs_mod = ort_session.run(outputs, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} ) del ort_session # infer with GraphOptimizationLevel=default model_orig = onnx.load(model_name) ort_session_orig = ort.InferenceSession(model_orig.SerializeToString()) outputs_orig = [x.name for x in ort_session_orig.get_outputs()] ort_outs_orig = ort_session_orig.run(outputs_orig, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} ) # diff print(np.linalg.norm(ort_outs_mod[0].astype(np.float32) - ort_outs_orig[0].astype(np.float32))) del ort_session_orig ``` [relu_quantize_testcase.zip](https://github.com/microsoft/onnxruntime/files/14848160/relu_quantize_testcase.zip) --------- Co-authored-by: genmingz <genming.zhong@amd.com>	2024-04-08 19:41:43 -07:00
pengwa	41acd8c543	Support more ops for recompute (#20234 ) ### Support more ops for recompute To cover Mistral model, and support padding elimination ops. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-09 09:24:48 +08:00
Adam Louly	22a61a3cf5	Fix Mixtral Parity test to keep it consistent with Transformers. (#20210 ) ### Description I recently opened a PR in hf transformers repo to fix an issue on the indexing part. https://github.com/huggingface/transformers/issues/29857 onnx exporter was failing because of the tolist() conversion so we had to remove it. I found out that the code was also a part of our codebase so this PR is to keep the code consistent.	2024-04-08 13:04:12 -07:00
wejoncy	908a76d675	fix "4bit quantization scales and zeropoint tensor shape" (#19986 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-08 10:15:28 -07:00
Jiajie Hu	23d3afd4fe	[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209 ) ### Description https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding ### Motivation and Context As per customer request, this helps Phi-2 and Gemma.	2024-04-08 09:11:26 -07:00
cloudhan	e19c778934	Improve KE for commandline and programmatically tuning dispatch (#18778 )	2024-04-08 11:08:59 +08:00
Ye Wang	cc3faba616	Support seq_len > 64K in rotary embedding cuda kernel (#20204 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-04-05 19:52:55 -07:00
Francesco	6af02ae06a	Remove non-existing function call (#19416 ) This function call is confusing, since it is a function call without definition of the function. It was correctly repalced from compute_data to compute_range, but function call was reintroudced in a later PR. ### Description Problem as described in [this issue](https://github.com/microsoft/onnxruntime/issues/18893 ) In the examples, different calls of compute_range() from calibrate.py can be found, also in the calibrate.py itself. The problem is that it was [replaced here] (https://github.com/microsoft/onnxruntime/pull/16550/files#diff-75e84436a983e17527f8b5bc585087e7ad75b3b515c2101c2a82dcaecca490de ) from `compute_range()` to `cpmute_data() -> TensorsData` and then falsely [added as call here](https://github.com/microsoft/onnxruntime/pull/17029/files#diff-75e84436a983e17527f8b5bc585087e7ad75b3b515c2101c2a82dcaecca490de ). ### Motivation and Context I suggest in this PR to remove this confusing call `self.calibrate_range()` in calibrate.py. Once it is removed and packaged, somehow the examples from the onnx-runtime-examples repository must be adapted, since they are already not working. Examples of `compute_range()` in the examples are linked in [this issue](https://github.com/microsoft/onnxruntime/issues/18893 ).	2024-04-05 19:48:48 -07:00
Adrian Lizarraga	05d97e8d18	Update QNN python packages to use QNN SDK version 2.19.2 (#20213 ) ### Description Update QNN python packages to use QNN SDK version 2.19.2. ### Motivation and Context Our CI builds already use QNN SDK version 2.19.2. We should make sure the ort-nightly-qnn python packages are also built with the same QNN SDK version.	2024-04-05 17:15:25 -07:00
Yi Zhang	23a5d0a305	Extend time out in Windows GPU packaging jobs (#20207 ) ### Description Extend Windows GPU Packaging job building time out to 6 hours, and test stage to 3 hours. ### Motivation and Context There're still a few timeout issues after refactoring. The probability is about 20% in https://dev.azure.com/aiinfra/Lotus/_build?definitionId=84. I found the building could be finished in 4 hours if it becomes slow, https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=434340&view=logs&j=0c6ee496-b38e-55a9-3699-12934156e90f, although in most cases, it only take about 30 minutes. Not like before, the building couldn't be completed. So, In this PR, I extend the timeout to 6 hours. And one interesting thing, if one windows GPU job becomes slow, all other windows GPU jobs in the same run become slow too. So I doubt it has something with the ADO or virtualization. That is, it's not completely random. https://dev.azure.com/aiinfra/Lotus/_build?definitionId=841	2024-04-06 08:03:42 +08:00
Andrew Grigorev	a6611409cc	Fix HalideIR title in third party notices reference (#20190 )	2024-04-05 11:12:43 -07:00
dependabot[bot]	2a323eb670	Bump Sixlabors.ImageSharp from 2.1.1 to 2.1.7 in /csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample (#19805 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.1 to 2.1.7. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.7</h2> <h2>What's Changed</h2> <ul> <li>[release/2.1] Disallow allocation attempts of unrepresentable sizes by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2553">SixLabors/ImageSharp#2553</a></li> <li>[release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2554">SixLabors/ImageSharp#2554</a></li> <li>[release/2.1] PBM decoder robustness improvements and BufferedReadStream observability by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2555">SixLabors/ImageSharp#2555</a></li> <li>Backport 2681 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2688">SixLabors/ImageSharp#2688</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7">https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7</a></p> <h2>v2.1.6</h2> <h2>What's Changed</h2> <ul> <li>Backport - Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack. by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2524">SixLabors/ImageSharp#2524</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6">https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6</a></p> <h2>v2.1.5</h2> <h2>What's Changed</h2> <ul> <li>Backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2501">#2501</a> by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2509">SixLabors/ImageSharp#2509</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5">https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5</a></p> <h2>v2.1.4</h2> <h2>What's Changed</h2> <ul> <li>Backport WebP fix to 2.1 by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2420">SixLabors/ImageSharp#2420</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4">https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4</a></p> <h2>v2.1.3</h2> <h2>What's Changed</h2> <ul> <li>V2 Backport: 2133, 2154 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2157">SixLabors/ImageSharp#2157</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3">https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3</a></p> <h2>v2.1.2</h2> <h2>What's Changed</h2> <ul> <li>Backport - Issue 2123 by <a href="https://github.com/JimBobSquarePants"><code>@JimBobSquarePants</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2126">SixLabors/ImageSharp#2126</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2">https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`fa7d712702`"><code>fa7d712</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2688">#2688</a> from SixLabors/js/backport-2681</li> <li><a href="`36b3533cc3`"><code>36b3533</code></a> Use correct property to disable upstream warnings.</li> <li><a href="`94bb7615a1`"><code>94bb761</code></a> Update ImageSharp.csproj</li> <li><a href="`3ea2574726`"><code>3ea2574</code></a> Update PngDecoderCore.cs</li> <li><a href="`e74a55fbfd`"><code>e74a55f</code></a> [release/2.1] PBM decoder robustness improvements and BufferedReadStream obse...</li> <li><a href="`749b1c04d7`"><code>749b1c0</code></a> [release/2.1] Tiff decoding robustness improvements (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>) (<a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2554">#2554</a>)</li> <li><a href="`3064b78927`"><code>3064b78</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2553">#2553</a> from SixLabors/backport/2.1.x/2545</li> <li><a href="`f36ec12695`"><code>f36ec12</code></a> Disallow allocation attempts of unrepresentable sizes </li> <li><a href="`688e242a84`"><code>688e242</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2524">#2524</a> from SixLabors/js/backport-fix-jpeg-dos</li> <li><a href="`0f17a8be9c`"><code>0f17a8b</code></a> Handle EOF in Jpeg bit reader when data is bad to prevent DOS attack.</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.7">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.1&new-version=2.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-04-05 11:11:52 -07:00
Hector Li	1ccb164c12	Improve the script to add Q, DQ nodes around EPContext node (#20107 ) Improve the script to add Q, DQ nodes around EPContext node so that the wrapper model use float data as inputs and outputs. User don't need to quantize or dequantize the data in their application	2024-04-05 10:12:01 -07:00
Guenther Schmuelling	c529e05e38	fix ConvTranspose 1D (#20194 )	2024-04-05 10:05:32 -07:00

... 21 22 23 24 25 ...

11997 commits