onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Prathik Rao	ffceed9d44	ORT 1.19.2 Release: Cherry Pick Round 1 (#21861 ) Approved cherry picks for ORT 1.19.2 release. --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Ye Wang <52801275+wangyems@users.noreply.github.com> Co-authored-by: Your Name <you@example.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: aciddelgado <139922440+aciddelgado@users.noreply.github.com> Co-authored-by: mindest <30493312+mindest@users.noreply.github.com> Co-authored-by: Changming Sun <chasun@microsoft.com>	2024-08-30 15:02:31 -07:00
Prathik Rao	d6514636ea	ORT 1.19.1 Release: Cherry Pick Round 1 (#21796 ) Approved cherry picks for ORT 1.19.1 release. --------- Co-authored-by: Yi Zhang <zhanyi@microsoft.com>	2024-08-20 21:21:44 -07:00
Prathik Rao	26250ae74d	ORT 1.19.0 Release: Cherry-Pick Round 2 (#21726 ) ### Description <!-- Describe your changes. --> PRs marked for cherry-pick & bug fixes. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19.0 Release Preparation --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com> Co-authored-by: George Wu <jywu@microsoft.com> Co-authored-by: liqun Fu <liqfu@microsoft.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com>	2024-08-14 13:45:35 -07:00
Prathik Rao	ccf6a28c3c	ORT 1.19.0 Release: Cherry-Pick Round 1 (#21619 ) ### Description <!-- Describe your changes. --> PRs marked for cherry-pick. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19.0 Release Preparation --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com> Signed-off-by: liqunfu <liqun.fu@microsoft.com> Signed-off-by: Liqun Fu <liqun_fu@hotmail.com> Co-authored-by: liqun Fu <liqfu@microsoft.com> Co-authored-by: Jing Fang <126209182+fajin-corp@users.noreply.github.com> Co-authored-by: Tianlei Wu <tlwu@microsoft.com> Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com> Co-authored-by: Changming Sun <chasun@microsoft.com> Co-authored-by: Sumit Agarwal <sumitagarwal330@gmail.com> Co-authored-by: vraspar <vrajang@outlook.com> Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com> Co-authored-by: Yi Zhang <zhanyi@microsoft.com> Co-authored-by: jingyanwangms <47403504+jingyanwangms@users.noreply.github.com> Co-authored-by: Yi Zhang <your@email.com> Co-authored-by: Chi Lo <54722500+chilo-ms@users.noreply.github.com> Co-authored-by: saurabh <saurabh1.kale@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com>	2024-08-12 16:54:25 -07:00
Prathik Rao	ee2fe87e2d	ORT 1.19.0 Release: Cherry-Pick Round 0 (#21609 ) ### Description <!-- Describe your changes. --> Critical changes required for an external developer (GeekBench) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19.0 Release Preparation --------- Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>	2024-08-03 22:04:57 -07:00
Yi-Hong Lyu	530a2d7b41	Enable FP16 Clip and Handle Bias in FP16 Depthwise Conv (#21493 ) - Improved accuracy for face-detection, image-classification, and object-detection in the GeekBench ML benchmark on ARM64. - Fixed issue https://github.com/microsoft/onnxruntime/issues/18992	2024-07-30 03:49:14 -07:00
Changming Sun	82036b0497	Remove references to the outdated CUDA EP factory method (#21549 ) The function "OrtSessionOptionsAppendExecutionProvider_CUDA" is deprecated.	2024-07-29 21:59:16 -07:00
vraspar	07d3be5b0e	CoreML: Add ML Program Split Op (#21456 ) ### Description Add support for Split Op ### Motivation and Context Address operator gaps in high priority model. --------- Co-authored-by: Scott McKay <skottmckay@gmail.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-30 14:04:47 +10:00
Yifan Li	5d78b9a17b	[TensorRT EP] Update TRT OSS Parser to 10.2 (#21552 ) ### Description <!-- Describe your changes. --> Update TRT OSS Parser to [latest 10.2-GA branch](`f161f95883`) ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 17:27:38 -07:00
mcollinswisc	8417c325ec	Keep QDQ nodes w/ nonpositive scale around MaxPool (#21182 ) ### Description This change adds a check for whether the scale in the QuantizeLinear (or DequantizeLinear) is a positive scalar, and a new selector to disallow removing the QDQ around MaxPool if it is not. ### Motivation and Context Currently, the DropQDQNodesRules optimization removes QuantizeLinear and DequantizeLinear nodes from DequantizeLinear ∘ MaxPool ∘ QuantizeLinear. However, if the x_scale/y_scale values are non-positive, the (de-)quantization changes the ordering of the elements in the input value, so this optimization is changing the results. https://github.com/microsoft/onnxruntime/issues/21176 --------- Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2024-07-30 09:06:51 +10:00
Sophie Schoenmeyer	d98581495f	Update labeling bot (#21548 ) Current labeling bot over-applies many of the labels (e.g., ep:CUDA and platform:windows) and is missing some of the APIs + EPs Working on migrating this workflow to GitHub policies but would like to use this fix in the meantime to avoid causing any issues w/ ORT 1.19 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 16:06:03 -07:00
Adam Reeve	7543dd040b	Propagate NaNs in the CPU min and max operators (#21492 ) ### Description Propagates NaN values in the min and max operators so that min or max with a NaN in either input always produces NaN. ### Motivation and Context Fixes #21455	2024-07-30 08:50:13 +10:00
Preetha Veeramalai	c39f1c4fd8	ORT- OVEP 1.19 PR-follow up (#21546 ) ### Description Follow up PR for bug fixes on 1.19 ### Motivation and Context - Handles 1.19 docker file fixes. - Sets the default file naming of epctx onnx model with _ctx.onnx as suffix. - Create epctx model directories if it doesn't exist. --------- Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>	2024-07-29 14:12:36 -07:00
Yulong Wang	b03c9496aa	[js/web] allow load WebAssembly binary from buffer (#21534 ) ### Description This PR adds a new option `ort.env.wasm.wasmBinary`, which allows user to set to a buffer containing preload .wasm file content. This PR should resolve the problem from latest discussion in #20876.	2024-07-29 13:39:38 -07:00
Xu Xing	0d7cf301a1	[js/webgpu] Add activation Tanh (#21540 ) Bug:https://github.com/microsoft/onnxruntime/issues/21467 ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 11:05:34 -07:00
Jian Chen	79537d0523	Remove tools/ci_build/github/android/run_nnapi_code_coverage.sh (#21371 ) ### Description Remove tools/ci_build/github/android/run_nnapi_code_coverage.sh ### Motivation and Context This file is no longer needed	2024-07-29 10:00:52 -07:00
Jian Chen	bc3713206d	Update QNN pipeline pool (#21482 ) ### Description Update QNN pipeline pool ### Motivation and Context Let all our pipelines are using the latest NDK version	2024-07-29 10:00:21 -07:00
Yi Zhang	05cef469e8	Move on-device training packages publish step (#21539 ) ### Description Since the onedevice training cpu packaging has been a separated pipeline, it's nuget package publishing step must be moved as well. ### Motivation and Context Fixes the exception in Nuget Publishing Packaging Pipeline caused by #21485	2024-07-29 09:59:46 -07:00
mingyueliuh	d8888136e3	Add support tensor element type for register custom op shape infer function (#21387 ) ### Description Functionality extension for the SetOutputShape method in custom op shape inference. ### Motivation and Context - SetOutputShape Interface enhancement Actually, the shape infer function need set the tensor type and shape ，Add a parameter type to allow users to specify the tensor type, and set ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT as default value to ensure compatibility. Co-authored-by: mingyue <mingyue@amd.com>	2024-07-29 09:45:52 -07:00
Wanming Lin	94eb70d983	[WebNN EP] Add labels for all WebNN operators (#21516 ) In order to provide more diagnosable error messages for developers. Spec change: https://github.com/webmachinelearning/webnn/pull/742	2024-07-29 08:50:14 -07:00
Xu Xing	5bc12bf209	[js/webgpu] Add activation for conv3d naive (#21466 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-29 08:47:41 -07:00
Yulong Wang	dbff0cd098	[js/node] enable float16 support for Node.js binding (#20581 ) ### Description enable float16 support for Node.js binding. data of float16 tensor uses `Uint16Array`.	2024-07-28 13:03:17 -07:00
liqun Fu	a4d3a1ce0c	pick changes from https://github.com/onnx/onnx/pull/6195 to fix heap-buffer-overflow in onnx::convPoolShapeInference (#21507 ) ### Description onnx 1.16.2 is not available before ort 1.19.0 code freeze. Thus pick the needed change as patch	2024-07-27 15:58:36 -07:00
Jian Chen	7e23212de9	Delete tools/ci_build/github/azure-pipelines/win-gpu-ci-pipeline.yml (#21529 ) ### Description Delete tools/ci_build/github/azure-pipelines/win-gpu-ci-pipeline.yml ### Motivation and Context This CI pipeline has been divided into 4 different pipeline.	2024-07-27 15:58:12 -07:00
Ranjit Ranjan	82b2955268	[AIX]test failure fix using gtest-1.15.0 for AIX (#21497 ) ### Description Local CI setup for AIX reported tests failure after the gtest 1.15.0 upgrade. ### Motivation and Context Below tests failure is observed after gtest upgrade. The following tests FAILED: 1 - onnxruntime_test_all (ILLEGAL) 7 - onnxruntime_logging_apis_test (Subprocess aborted) To fix this, I am enabling pthread support under gtest. This was disabled with previous version of gtest for some reason. Now by enabling this, above tests are getting passed with gtest 1.15.0.	2024-07-27 11:17:22 -07:00
jingyanwangms	48fb8a7e56	Security fuzz address sanitizer fix Bug #2 and #3 (#21528 ) ### Description Security fuzz test with address sanitizer found several bugs	2024-07-27 11:10:52 -07:00
dependabot[bot]	1ce160883f	Bump Sixlabors.ImageSharp from 2.1.8 to 2.1.9 in /csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample (#21444 ) Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp) from 2.1.8 to 2.1.9. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's releases</a>.</em></p> <blockquote> <h2>v2.1.9</h2> <h2>What's Changed</h2> <ul> <li>[2.1] Fix overflow in MemoryAllocator.Create(options) by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2732">SixLabors/ImageSharp#2732</a></li> <li>Backport GIF LZW fix to 2.1 by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2756">SixLabors/ImageSharp#2756</a></li> <li>Backport 2759 to 2.1.x by <a href="https://github.com/antonfirsov"><code>@antonfirsov</code></a> in <a href="https://redirect.github.com/SixLabors/ImageSharp/pull/2770">SixLabors/ImageSharp#2770</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.8...v2.1.9">https://github.com/SixLabors/ImageSharp/compare/v2.1.8...v2.1.9</a></p> </blockquote> </details> <details> <summary>Commits</summary> <ul> <li><a href="`9816ca4501`"><code>9816ca4</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2770">#2770</a> from SixLabors/af/backport-2759-2.1.x</li> <li><a href="`b33d666ab7`"><code>b33d666</code></a> handle DecodingMode</li> <li><a href="`6b2030b549`"><code>6b2030b</code></a> Merge branch 'release/2.1.x' into af/backport-2759-2.1.x</li> <li><a href="`8ffad3f480`"><code>8ffad3f</code></a> Issue2012BadMinCode should decode now</li> <li><a href="`1f5bf23b9e`"><code>1f5bf23</code></a> skip Issue2758_DecodeWorks</li> <li><a href="`3bf8c572a0`"><code>3bf8c57</code></a> manual port of 3.1 gif decoder</li> <li><a href="`28c20ded87`"><code>28c20de</code></a> Clamp JPEG quality estimation results.</li> <li><a href="`4b910e7f84`"><code>4b910e7</code></a> Decode LZW row by row</li> <li><a href="`a1f2879771`"><code>a1f2879</code></a> Merge pull request <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2756">#2756</a> from SixLabors/af/git-av-2.1</li> <li><a href="`898df7f8ca`"><code>898df7f</code></a> backport <a href="https://redirect.github.com/SixLabors/ImageSharp/issues/2749">#2749</a> to 2.1</li> <li>Additional commits viewable in <a href="https://github.com/SixLabors/ImageSharp/compare/v2.1.8...v2.1.9">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.8&new-version=2.1.9)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2024-07-26 22:31:16 -07:00
maggie1059	10b4a3b90b	Fix conda failure for onnxruntime-directml (#21526 ) The change in #21005 works for directly building wheels with `build.py`, but ort-nightly-directml wheels, as well as the 1.18.1 release of the onnxruntime-directml python wheel, still do not work with conda since they're built from the `py-win-gpu.yml` pipeline, which uses `install_third_party_deps.ps1` to set compile flags.	2024-07-26 22:26:38 -07:00
Yueqing Zhang	d01fc75ef1	[VitisAI] support vaip create ep context nodes & bug fix (#21506 ) ### Description <!-- Describe your changes. --> 1. We decided to move the context node creation back to our own repo because it is more flexible to modify. 2. We found a bug related the context node. It would change the inference order. So, we fixed in this PR as well. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This is crucial for Microsoft Release next month. --------- Co-authored-by: Yueqing Zhang <yueqingz@amd.com>	2024-07-26 22:15:57 -07:00
zz002	690d745cbf	[VitisAI] 1. KernelDef supports StartVersion and EndVersion (#21519 ) ### Description <!-- Describe your changes. --> [VitisAI] 1. KernelDef supports StartVersion and EndVersion 2. CapabilityOps checks domain ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>	2024-07-26 20:28:55 -07:00
Scott McKay	5af423c7c0	Set version and other info in the C# dll (#21517 ) ### Description <!-- Describe your changes. --> Set version and other info in the Microsoft.ML.OnnxRuntime C# dll by setting GenerateAssemblyInfo to true and passing in ORT version in the CI. Minor re-org of the order of properties so related things are grouped a little better. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> #21475	2024-07-27 13:22:57 +10:00
Tianlei Wu	64819f6f8c	Update benchmark_mha.py to compare with PyTorch SDPA (#21449 ) ### Description * Update benchmark_mha.py to compare with PyTorch SDPA api. * Write results to csv file. * Use sdpa_kernel cuda provider option instead of environment variables for better control. * Add arguments (`--use_gpu`, `--causal` etc) to allow testing different senarios. * Update benchmark_mha.sh to add cpu benchmarks For Q,K,V format, torch uses BNSH format, while ort uses BSNH format, so the result is not apple-to-apple. However, if the latency difference is large, that could be a warning. #### Example GPU results Example results on A100-SXM4-80GB with settings (use_gpu=TRUE, enable_cuda_graph=FALSE, causal=FALSE, past_sequence_length=0, intra_op_num_threads=0) in Azure Linux. ORT: build from source with CUDA 12.5; PyTorch 2.3.1 for cuda 12.1. format \| batch_size \| sequence_length \| num_heads \| head_size \| latency (s) \| tflops \| kernel -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- Q,KV \| 4 \| 2048 \| 32 \| 128 \| 0.0015 \| 179.5 \| ort:flash Q,KV \| 4 \| 2048 \| 32 \| 128 \| 0.0015 \| 179.0 \| ort:default Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0016 \| 170.0 \| ort:default Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0016 \| 169.5 \| ort:flash QKV \| 4 \| 2048 \| 32 \| 128 \| 0.0016 \| 168.5 \| ort:default QKV \| 4 \| 2048 \| 32 \| 128 \| 0.0016 \| 167.4 \| ort:flash Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0017 \| 159.4 \| torch:default Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0018 \| 155.0 \| torch:flash Q,KV \| 4 \| 2048 \| 32 \| 128 \| 0.0030 \| 92.7 \| ort:efficient Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0030 \| 90.9 \| ort:efficient QKV \| 4 \| 2048 \| 32 \| 128 \| 0.0031 \| 89.9 \| ort:efficient Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0031 \| 89.0 \| torch:efficient Q,K,V \| 4 \| 2048 \| 32 \| 128 \| 0.0054 \| 51.3 \| torch:math Q,KV \| 4 \| 4096 \| 32 \| 128 \| 0.0058 \| 191.0 \| ort:default Q,KV \| 4 \| 4096 \| 32 \| 128 \| 0.0058 \| 190.6 \| ort:flash Q,K,V \| 4 \| 4096 \| 32 \| 128 \| 0.0059 \| 187.8 \| ort:default Q,K,V \| 4 \| 4096 \| 32 \| 128 \| 0.0059 \| 186.7 \| ort:flash QKV \| 4 \| 4096 \| 32 \| 128 \| 0.0059 \| 185.9 \| ort:flash QKV \| 4 \| 4096 \| 32 \| 128 \| 0.0059 \| 185.8 \| ort:default Q,K,V \| 4 \| 4096 \| 32 \| 128 \| 0.0067 \| 163.4 \| torch:default Q,K,V \| 4 \| 4096 \| 32 \| 128 \| 0.0070 \| 157.2 \| torch:flash Q,KV \| 4 \| 4096 \| 32 \| 128 \| 0.0113 \| 97.6 \| ort:efficient Q,K,V \| 4 \| 4096 \| 32 \| 128 \| 0.0114 \| 96.4 \| ort:efficient QKV \| 4 \| 4096 \| 32 \| 128 \| 0.0114 \| 96.2 \| ort:efficient Q,K,V \| 4 \| 4096 \| 32 \| 128 \| 0.0127 \| 86.3 \| torch:efficient Q,KV \| 8 \| 2048 \| 32 \| 128 \| 0.0031 \| 177.8 \| ort:flash Q,KV \| 8 \| 2048 \| 32 \| 128 \| 0.0031 \| 177.7 \| ort:default Q,K,V \| 8 \| 2048 \| 32 \| 128 \| 0.0032 \| 170.8 \| ort:default Q,K,V \| 8 \| 2048 \| 32 \| 128 \| 0.0032 \| 170.3 \| ort:flash QKV \| 8 \| 2048 \| 32 \| 128 \| 0.0032 \| 169.2 \| ort:default QKV \| 8 \| 2048 \| 32 \| 128 \| 0.0033 \| 169.0 \| ort:flash Q,K,V \| 8 \| 2048 \| 32 \| 128 \| 0.0034 \| 161.9 \| torch:default Q,K,V \| 8 \| 2048 \| 32 \| 128 \| 0.0036 \| 152.9 \| torch:flash Q,KV \| 8 \| 2048 \| 32 \| 128 \| 0.0059 \| 93.5 \| ort:efficient Q,K,V \| 8 \| 2048 \| 32 \| 128 \| 0.0060 \| 91.3 \| ort:efficient QKV \| 8 \| 2048 \| 32 \| 128 \| 0.0060 \| 91.0 \| ort:efficient Q,K,V \| 8 \| 2048 \| 32 \| 128 \| 0.0064 \| 86.0 \| torch:efficient Q,KV \| 8 \| 4096 \| 32 \| 128 \| 0.0115 \| 190.8 \| ort:flash Q,KV \| 8 \| 4096 \| 32 \| 128 \| 0.0115 \| 190.7 \| ort:default Q,K,V \| 8 \| 4096 \| 32 \| 128 \| 0.0118 \| 187.1 \| ort:default Q,K,V \| 8 \| 4096 \| 32 \| 128 \| 0.0118 \| 187.0 \| ort:flash QKV \| 8 \| 4096 \| 32 \| 128 \| 0.0118 \| 185.6 \| ort:default QKV \| 8 \| 4096 \| 32 \| 128 \| 0.0118 \| 185.6 \| ort:flash Q,K,V \| 8 \| 4096 \| 32 \| 128 \| 0.0139 \| 158.7 \| torch:default Q,K,V \| 8 \| 4096 \| 32 \| 128 \| 0.0139 \| 158.3 \| torch:flash Q,KV \| 8 \| 4096 \| 32 \| 128 \| 0.0225 \| 97.7 \| ort:efficient Q,K,V \| 8 \| 4096 \| 32 \| 128 \| 0.0227 \| 96.8 \| ort:efficient QKV \| 8 \| 4096 \| 32 \| 128 \| 0.0228 \| 96.3 \| ort:efficient Q,K,V \| 8 \| 4096 \| 32 \| 128 \| 0.0260 \| 84.5 \| torch:efficient #### Example CPU results Dell XPS 8960 with i9-13900 CPU (use_gpu=FALSE, causal=FALSE, past_sequence_length=0) in Windows. ORT: build from source with CUDA 12.5; PyTorch 2.3.1 for cuda 12.1. format \| causal \| batch_size \| seq_len \| num_heads \| head_size \| threads \| latency (s) \| kernel -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- \| -- Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 8 \| 0.0005 \| ort:flash Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 0 \| 0.0009 \| ort:flash Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 0 \| 0.0009 \| ort:math Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 4 \| 0.0009 \| ort:flash Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 2 \| 0.0014 \| ort:flash Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 1 \| 0.0025 \| ort:flash Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 2 \| 0.0045 \| torch:default Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 24 \| 0.0046 \| torch:default Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 8 \| 0.0046 \| torch:default Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 4 \| 0.0046 \| torch:default Q,K,V \| FALSE \| 1 \| 128 \| 32 \| 128 \| 1 \| 0.0047 \| torch:default Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 0 \| 0.0019 \| ort:flash Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 8 \| 0.0019 \| ort:flash Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 0 \| 0.0022 \| ort:math Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 4 \| 0.0030 \| ort:flash Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 2 \| 0.0047 \| ort:flash Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 1 \| 0.0086 \| ort:flash Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 2 \| 0.0161 \| torch:default Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 4 \| 0.0162 \| torch:default Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 8 \| 0.0162 \| torch:default Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 24 \| 0.0165 \| torch:default Q,K,V \| FALSE \| 1 \| 256 \| 32 \| 128 \| 1 \| 0.0166 \| torch:default Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 8 \| 0.0077 \| ort:flash Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 0 \| 0.0091 \| ort:flash Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 0 \| 0.0099 \| ort:math Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 4 \| 0.0103 \| ort:flash Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 2 \| 0.0177 \| ort:flash Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 1 \| 0.0328 \| ort:flash Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 2 \| 0.0624 \| torch:default Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 4 \| 0.0624 \| torch:default Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 8 \| 0.0625 \| torch:default Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 24 \| 0.0626 \| torch:default Q,K,V \| FALSE \| 1 \| 512 \| 32 \| 128 \| 1 \| 0.0640 \| torch:default Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 8 \| 0.0286 \| ort:flash Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 0 \| 0.0317 \| ort:flash Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 4 \| 0.0367 \| ort:flash Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 0 \| 0.0391 \| ort:math Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 2 \| 0.0656 \| ort:flash Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 1 \| 0.1235 \| ort:flash Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 24 \| 0.2482 \| torch:default Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 2 \| 0.2483 \| torch:default Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 4 \| 0.2483 \| torch:default Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 8 \| 0.2486 \| torch:default Q,K,V \| FALSE \| 1 \| 1024 \| 32 \| 128 \| 1 \| 0.2538 \| torch:default Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 0 \| 0.1038 \| ort:flash Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 8 \| 0.1050 \| ort:flash Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 0 \| 0.1368 \| ort:math Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 4 \| 0.1535 \| ort:flash Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 2 \| 0.2461 \| ort:flash Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 1 \| 0.4724 \| ort:flash Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 8 \| 0.9835 \| torch:default Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 4 \| 0.9841 \| torch:default Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 24 \| 0.9841 \| torch:default Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 2 \| 0.9873 \| torch:default Q,K,V \| FALSE \| 1 \| 2048 \| 32 \| 128 \| 1 \| 0.9985 \| torch:default ### Motivation and Context To compare with PyTorch SDPA on CPU and CUDA latency.	2024-07-26 18:45:14 -07:00
Hector Li	fb61e14153	Add QNN EP option context_node_name_prefix to set EPContext node name prefix (#21236 ) ### Description Add QNN EP option context_node_name_prefix to set EPContext node name prefix ### Motivation and Context For the case to workaround QNN context PD memory limit, user need split the model into pieces and generate the QNN context model separately. It could happen that the generated EPContext node in separate graph has same node name. This will cause issue if glue those EPContext nodes together into a single model. To avoid this user can set this context_node_name_prefix for each split pieces to make the node name unique.	2024-07-26 16:56:44 -07:00
Jian Chen	7db7c4e5c8	Separating all GPU stages into different Pipelines (#21521 ) ### Description Separating all GPU stages into different Pipelines	2024-07-26 14:54:45 -07:00
Justin Chu	bbbaef3fa6	Update text formatting in generate_cgmanifest.py (#21489 ) The only place where I manually fixed I forgot a format string	2024-07-26 08:46:54 -07:00
Prathik Rao	278f0f5cd2	disables qnn in ort training cpu pipeline (#21510 ) ### Description <!-- Describe your changes. --> `enable_windows_arm64_qnn` and `enable_windows_x64_qnn` are true by default but unnecessary for training. This change explicitly sets these parameters to false for training pipeline. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> ORT 1.19 Release Preparation	2024-07-26 17:23:35 +08:00
Wanming Lin	b6b29309a5	[WebNN EP] Update argMax/argMin to adapt to latest spec (#21452 ) WebNN spec recently changes the definition of argMax/argMin: - Remove selectLastIndex option, let backends decide to select the last index or not. - Move axes option to axis input	2024-07-25 17:07:01 -07:00
aamajumder	166809425e	[DML EP] Register ReduceMin-20 (#20477 ) ### Description This PR registers the ReduceMin-20 operator to the DML EP. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-25 17:06:30 -07:00
Scott McKay	e5302b23c4	Fix SkipLayerNormFusion incorrectly setting modified every time it runs (#21502 ) ### Description <!-- Describe your changes. --> Current behavior forces all L2 optimizers to loop until they hit the max number of iterations. Only update modified if the graph was modified. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix unnecessary loops of L2 optimizers during model loading.	2024-07-26 10:00:28 +10:00
Justin Chu	c464ab3aca	Allow cpplint to always be green (#21491 ) Allow cpplint to always be green since it is optional. Also changed the workflow name to reflect that.	2024-07-25 15:57:30 -07:00
Scott McKay	b0e1f7f798	CoreML: Aggregated changes to add all required ops for priority model (#21472 ) ### Description <!-- Describe your changes. --> Add these changes to one PR to simplify checkin - Add Concat (#21423) - Add DepthToSpace (#21426) - Add LeakyRelu (#21453) - Add test scripts (#21427) - Add ability to set coreml flags from python (#21434) Other changes - updated partitioning utils to support dropping constant initializers from a ComputeCapability's inputs. - noticed that the list of inputs to the coreml model was unexpectedly long due to this - we copy constant initializers to a CoreML model so don't need the originals, and if they remain as inputs ORT can't free them as they appear to be in use. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-26 08:29:33 +10:00
Scott McKay	3cdf4b917b	Fix Android CI Pipeline code coverage failure (#21504 ) ### Description <!-- Describe your changes. --> Current failure is due to a version mismatch. Use llvm-cov from the Android NDK instead of the system gcov so that the version is correct. Also comment out publishing to the Azure dashboard to simplify the setup. The CI prints out the stats for review by developers. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Fix CI pipeline	2024-07-26 07:36:23 +10:00
Hector Li	c23517859e	Qnn batchnorm support input with rank 2 (#21469 ) ### Description Qnn BatchNorm support input with rank 2 Update Quantization script to quantize BatchNorm bias using int32 --------- Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>	2024-07-25 11:44:10 -07:00
Changming Sun	4167b68abf	Split ondevice training cpu packaging pipeline to a separated pipeline (#21485 ) ### Description Right now our "Zip-Nuget-Java-Nodejs Packaging Pipeline" is too big. This OnDevice training part is independent of the others, so it can be split out. Then our NPM Packaging pipeline will not depends on this training stuff. ### Motivation and Context Similar to #21235 Also, this PR fixed a problem that: "NuGet_Test_Linux_Training_CPU" job downloads artifacts from "onnxruntime-linux-x64" for getting customop shared libs, but the job forget to declare it depends on the "Linux_C_API_Packaging_CPU_x64" which produces the artifact. Such problems can be hard to find when a pipeline goes big.	2024-07-25 10:58:34 -07:00
Yifan Li	ebcb7075eb	Set CUDA12 as default in GPU packages (#21438 ) ### Description * Swap cuda version 11.8/12.2 in GPU CIs * Set CUDA12 as default version in yamls of publishing nuget/python/java GPU packages * Suppress warnings as errors of flash_api.cc during ort win-build	2024-07-25 10:17:16 -07:00
Sophie Schoenmeyer	f3a6e58ae3	Update 05-performance.yml issue template to auto apply label (#21486 ) Updating Performance issue template so "performance" label is automatically applied ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-25 09:52:37 -07:00
Yueqing Zhang	6787cf18a5	[VitisAI] use binary mode for context ep (#21474 ) ### Description <!-- Describe your changes. --> We found text format could caused error. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Because the OS could change the string so we decided to save it as binary file.	2024-07-25 07:18:55 -07:00
Preetha Veeramalai	ca47f0fdd3	OVEP - PR 1.19 (#21443 ) ### Description Add OVEP features for 1.19 The PR has, - Added support for EpCtx with ORT Session options for optimized performance. - Added bug fixes - Support for OV 2024.3 --------- Co-authored-by: ubuntu <ubuntu@ubuntu-mtlp-118727.iind.intel.com> Co-authored-by: vthaniel <vishnudas.thaniel.s@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com> Co-authored-by: Maheshkar <ankit.maheshkar@intel.com>	2024-07-24 23:45:31 -07:00
Justin Chu	ae3ec2e9ac	Ignore ruff rule `N813` (#21477 ) Allow importing camelcase names in lowercase	2024-07-24 17:48:22 -07:00
pengwa	08001d18ac	Fix security issue #22016 #22017 #22018 (#21333 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2024-07-25 08:25:22 +08:00

1 2 3 4 5 ...

11428 commits