onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-11 17:48:34 +00:00

Author	SHA1	Message	Date
dependabot[bot]	ce70a30b94	Bump transformers from 4.35.2 to 4.36.0 in /onnxruntime/python/tools/transformers/models/stable_diffusion (#18896 ) Bumps [transformers](https://github.com/huggingface/transformers) from 4.35.2 to 4.36.0. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/huggingface/transformers/releases">transformers's releases</a>.</em></p> <blockquote> <h2>v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa wide-spread support</h2> <h2>New model additions</h2> <h3>Mixtral</h3> <p>Mixtral is the new open-source model from Mistral AI announced by the blogpost <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral of Experts</a>. The model has been proven to have comparable capabilities to Chat-GPT according to the benchmark results shared on the release blogpost.</p> <!-- raw HTML omitted --> <p>The architecture is a sparse Mixture of Experts with Top-2 routing strategy, similar as <code>NllbMoe</code> architecture in transformers. You can use it through <code>AutoModelForCausalLM</code> interface:</p> <pre lang="py"><code>>>> import torch >>> from transformers import AutoModelForCausalLM, AutoTokenizer <p>>>> model = AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B", torch_dtype=torch.float16, device_map="auto") >>> tokenizer = AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B")</p> <p>>>> prompt = "My favourite condiment is"</p> <p>>>> model_inputs = tokenizer([prompt], return_tensors="pt").to(device) >>> model.to(device)</p> <p>>>> generated_ids = model.generate(**model_inputs, max_new_tokens=100, do_sample=True) >>> tokenizer.batch_decode(generated_ids)[0] </code></pre></p> <p>The model is compatible with existing optimisation tools such Flash Attention 2, <code>bitsandbytes</code> and PEFT library. The checkpoints are release under <a href="https://huggingface.co/mistralai"><code>mistralai</code></a> organisation on the Hugging Face Hub.</p> <h3>Llava / BakLlava</h3> <p>Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna on GPT-generated multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. In other words, it is an multi-modal version of LLMs fine-tuned for chat / instructions.</p> <!-- raw HTML omitted --> <p>The Llava model was proposed in <a href="https://arxiv.org/pdf/2310.03744">Improved Baselines with Visual Instruction Tuning</a> by Haotian Liu, Chunyuan Li, Yuheng Li and Yong Jae Lee.</p> <ul> <li>[<code>Llava</code>] Add Llava to transformers by <a href="https://github.com/younesbelkada"><code>@younesbelkada</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/27662">#27662</a></li> <li>[LLaVa] Some improvements by <a href="https://github.com/NielsRogge"><code>@NielsRogge</code></a> in <a href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a></li> </ul> <p>The integration also includes <a href="https://github.com/SkunkworksAI/BakLLaVA"><code>BakLlava</code></a> which is a Llava model trained with Mistral backbone.</p> <p>The mode is compatible with <code>"image-to-text"</code> pipeline:</p> <pre lang="py"><code>from transformers import pipeline from PIL import Image import requests <p>model_id = "llava-hf/llava-1.5-7b-hf" </tr></table> </code></pre></p> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`14666775a2`"><code>1466677</code></a> Release: v4.36.0</li> <li><a href="`accccdd008`"><code>accccdd</code></a> [<code>Add Mixtral</code>] Adds support for the Mixtral MoE (<a href="https://redirect.github.com/huggingface/transformers/issues/27942">#27942</a>)</li> <li><a href="`0676d992a5`"><code>0676d99</code></a> [<code>from_pretrained</code>] Make from_pretrained fast again (<a href="https://redirect.github.com/huggingface/transformers/issues/27709">#27709</a>)</li> <li><a href="`9f18cc6df0`"><code>9f18cc6</code></a> Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (<a href="https://redirect.github.com/huggingface/transformers/issues/27940">#27940</a>)</li> <li><a href="`7ea21f1f03`"><code>7ea21f1</code></a> [LLaVa] Some improvements (<a href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a>)</li> <li><a href="`5e620a92cf`"><code>5e620a9</code></a> Fix <code>SeamlessM4Tv2ModelIntegrationTest</code> (<a href="https://redirect.github.com/huggingface/transformers/issues/27911">#27911</a>)</li> <li><a href="`e96c1de191`"><code>e96c1de</code></a> Skip <code>UnivNetModelTest::test_multi_gpu_data_parallel_forward</code> (<a href="https://redirect.github.com/huggingface/transformers/issues/27912">#27912</a>)</li> <li><a href="`8d8970efdd`"><code>8d8970e</code></a> [BEiT] Fix test (<a href="https://redirect.github.com/huggingface/transformers/issues/27934">#27934</a>)</li> <li><a href="`235be08569`"><code>235be08</code></a> [DETA] fix backbone freeze/unfreeze function (<a href="https://redirect.github.com/huggingface/transformers/issues/27843">#27843</a>)</li> <li><a href="`df5c5c62ae`"><code>df5c5c6</code></a> Fix typo (<a href="https://redirect.github.com/huggingface/transformers/issues/27918">#27918</a>)</li> <li>Additional commits viewable in <a href="https://github.com/huggingface/transformers/compare/v4.35.2...v4.36.0">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=transformers&package-manager=pip&previous-version=4.35.2&new-version=4.36.0)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) You can disable automated security fix PRs for this repo from the [Security Alerts page](https://github.com/microsoft/onnxruntime/network/alerts). </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-12-20 22:09:02 -08:00
dependabot[bot]	379c7c43eb	Bump actions/setup-java from 3 to 4 (#18686 ) Bumps [actions/setup-java](https://github.com/actions/setup-java) from 3 to 4. <details> <summary>Release notes</summary> <p><em>Sourced from <a href="https://github.com/actions/setup-java/releases">actions/setup-java's releases</a>.</em></p> <blockquote> <h2>v4.0.0</h2> <h2>What's Changed</h2> <p>In the scope of this release, the version of the Node.js runtime was updated to 20. The majority of dependencies were updated to the latest versions. From now on, the code for the setup-java will run on Node.js 20 instead of Node.js 16.</p> <h2>Breaking changes</h2> <ul> <li>Update Node.js runtime to version 20 by <a href="https://github.com/aparnajyothi-y"><code>@aparnajyothi-y</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/558">actions/setup-java#558</a></li> </ul> <h2>Non-breaking changes</h2> <ul> <li>Adding support for microsoft openjdk 21.0.0 by <a href="https://github.com/ralfstuckert"><code>@ralfstuckert</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/546">actions/setup-java#546</a></li> <li>Update <code>@actions/cache</code> dependency and documentation by <a href="https://github.com/IvanZosimov"><code>@IvanZosimov</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/549">actions/setup-java#549</a></li> <li>Implementation of the cache-dependency-path option to control caching dependency by <a href="https://github.com/itchyny"><code>@itchyny</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/499">actions/setup-java#499</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/ralfstuckert"><code>@ralfstuckert</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/546">actions/setup-java#546</a></li> <li><a href="https://github.com/itchyny"><code>@itchyny</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/499">actions/setup-java#499</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-java/compare/v3...v4.0.0">https://github.com/actions/setup-java/compare/v3...v4.0.0</a></p> <h2>v3.13.0</h2> <h2>What's changed</h2> <p>In the scope of this release, support for Dragonwell JDK was added by <a href="https://github.com/Accelerator1996"><code>@Accelerator1996</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/532">actions/setup-java#532</a></p> <pre lang="yaml"><code>steps: - name: Checkout uses: actions/checkout@v3 - name: Setup-java uses: actions/setup-java@v3 with: distribution: 'dragonwell' java-version: '17' </code></pre> <p>Several inaccuracies were also fixed:</p> <ul> <li>Fix XML namespaces wrongly using https by <a href="https://github.com/gnodet"><code>@gnodet</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/503">actions/setup-java#503</a></li> <li>Fix typo and remove unintentional(?) word by <a href="https://github.com/CyberFlameGO"><code>@CyberFlameGO</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/518">actions/setup-java#518</a></li> <li>Fix usage link within the README.md file by <a href="https://github.com/dassiorleando"><code>@dassiorleando</code></a> in <a href="https://redirect.github.com/actions/setup-java/pull/525">actions/setup-java#525</a></li> </ul> <h2>New Contributors</h2> <ul> <li><a href="https://github.com/CyberFlameGO"><code>@CyberFlameGO</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/518">actions/setup-java#518</a></li> <li><a href="https://github.com/dassiorleando"><code>@dassiorleando</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/525">actions/setup-java#525</a></li> <li><a href="https://github.com/gnodet"><code>@gnodet</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/503">actions/setup-java#503</a></li> <li><a href="https://github.com/Accelerator1996"><code>@Accelerator1996</code></a> made their first contribution in <a href="https://redirect.github.com/actions/setup-java/pull/532">actions/setup-java#532</a></li> </ul> <p><strong>Full Changelog</strong>: <a href="https://github.com/actions/setup-java/compare/v3...v3.13.0">https://github.com/actions/setup-java/compare/v3...v3.13.0</a></p> <h2>v3.12.0</h2> <!-- raw HTML omitted --> </blockquote> <p>... (truncated)</p> </details> <details> <summary>Commits</summary> <ul> <li><a href="`387ac29b30`"><code>387ac29</code></a> Upgrade Node to v20 (<a href="https://redirect.github.com/actions/setup-java/issues/558">#558</a>)</li> <li><a href="`9eda6b51cc`"><code>9eda6b5</code></a> feat: implement cache-dependency-path option to control caching dependency (#...</li> <li><a href="`78078da0cd`"><code>78078da</code></a> Update <code>@actions/cache</code> dependency and documentation (<a href="https://redirect.github.com/actions/setup-java/issues/549">#549</a>)</li> <li><a href="`5caaba646e`"><code>5caaba6</code></a> add support for microsoft openjdk 21.0.0 (<a href="https://redirect.github.com/actions/setup-java/issues/546">#546</a>)</li> <li>See full diff in <a href="https://github.com/actions/setup-java/compare/v3...v4">compare view</a></li> </ul> </details> <br /> [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=actions/setup-java&package-manager=github_actions&previous-version=3&new-version=4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores) Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting `@dependabot rebase`. [//]: # (dependabot-automerge-start) [//]: # (dependabot-automerge-end) --- <details> <summary>Dependabot commands and options</summary> <br /> You can trigger Dependabot actions by commenting on this PR: - `@dependabot rebase` will rebase this PR - `@dependabot recreate` will recreate this PR, overwriting any edits that have been made to it - `@dependabot merge` will merge this PR after your CI passes on it - `@dependabot squash and merge` will squash and merge this PR after your CI passes on it - `@dependabot cancel merge` will cancel a previously requested merge and block automerging - `@dependabot reopen` will reopen this PR if it is closed - `@dependabot close` will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually - `@dependabot show <dependency name> ignore conditions` will show all of the ignore conditions of the specified dependency - `@dependabot ignore this major version` will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this minor version` will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself) - `@dependabot ignore this dependency` will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself) </details> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2023-12-20 22:08:33 -08:00
Kevin Chen	1c6cb5dfeb	Remove usage of TRT deprecated APIs (#18879 ) ### Description <!-- Describe your changes. --> - Wrap usage of kENABLE_TACTIC_HEURISTIC around version checking macros - Use delete instead of deprecated destroy() functions on TRT objects. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> - Removes usages of deprecated TRT APIs. Signed-off-by: Kevin Chen <kevinch@nvidia.com>	2023-12-20 15:08:13 -08:00
Tianlei Wu	2d6e2e243d	update sdxl demo (#18889 ) ### Description (1) Support importing model from Olive. (2) Add backend engine Torch (Eager and Compile modes) to the demo. (3) Use fp16 in most places. (4) Remove some old pipeline scripts that are not useful anymore. They are replaced by the demo. (5) Remove old benchmark results that are out of date. (6) Add PIL image conversion to end to end latency (for fair comparison with diffusers since the default output type is pil) (7) Remove some options are seldom used like force-rebuild-engine, hf-token, refit etc. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-20 14:46:22 -08:00
Yulong Wang	9a61388f0a	[js/web] revise backend registration (#18715 ) ### Description This PR revises the backend registration. The following describes the expected behavior after this change: (bolded are changed behavior) - (ort.min.js - built without webgpu support) - loading: do not register 'webgpu' backend - creating session without EP list: use default EP list ['webnn', 'cpu', 'wasm'] - creating session with ['webgpu'] as EP list: should fail with backend not available - (ort.webgpu.min.js - built with webgpu support) - loading: always register 'webgpu' backend ( previous behavior: only register 'webgpu' backend when `navigator.gpu` is available) - creating session without EP list: use default EP list ['webgpu', 'webnn', 'cpu', 'wasm'] - when WebGPU is available (win): use WebGPU backend - when WebGPU is unavailable (android): should fail backend init, and try to use next backend in the list, 'webnn' (previous behavior: does not fail backend init, but fail in JSEP init, which was too late to switch to next backend) - creating session with ['webgpu'] as EP list - when WebGPU is available (win): use WebGPU backend - when WebGPU is unavailable (android): **should fail backend init, and because no more EP listed, fail. related PRs: #18190 #18144	2023-12-20 14:45:55 -08:00
Yifan Li	c0142c9108	[EP Perf] Fix model zoo url (#18808 ) ### Description <!-- Describe your changes. --> Onnx model zoo had major update recently, and legacy models were relocated under /archive/ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-20 10:54:45 -08:00
Hector Li	8931854528	Move some QNN EP provider options to session options (#18877 ) Move QNN EP provider options to session options ### Description Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first. This is the first step for PR: PR https://github.com/microsoft/onnxruntime/pull/18865	2023-12-20 00:13:38 -08:00
Ye Wang	02eb17655d	Fix a bug in 4bits quantizer script (#18878 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-19 22:53:33 -08:00
Scott McKay	666fcbde4d	Add LeakyRelu to list of NNAPI operators (#18880 ) ### Description <!-- Describe your changes. --> Add LeakyRelu to the list as support was added a while ago. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-20 14:44:31 +10:00
Changming Sun	535a2403dd	Update Nuget publishing jobs (#18851 ) ### Description 1. Add a CodeSign validation task before the binaries are published, to make sure all DLL files are signed. 2. Auto-trigger the CUDA 12 pipeline's publishing job.	2023-12-19 16:54:46 -08:00
Yulong Wang	ffa6602686	[js/node] support manually dispose session (#18655 ) ### Description support manually dispose session in onnxruntime-node feature request: #16796	2023-12-19 16:20:00 -08:00
satyajandhyala	98510fb8fb	[JS/WebGPU] fix an error in Clip (#18799 ) ### Description <!-- Describe your changes. --> Check whether the min/max inputs are provided and use default values if not provided. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-19 13:51:01 -08:00
liqun Fu	32fcf73740	Implement dft(20) (#17821 ) ### Description dft is updated in opset20. implement it in ort ### Motivation and Context this is for ort 1.17.0 release Fixes #17723 --------- Signed-off-by: Liqun Fu <liqfu@microsoft.com>	2023-12-19 10:42:54 -08:00
luoyu-intel	5f00bc9931	Integrate high-performance x64 gemm library to MLAS (#17669 ) ### Description Improve MLAS to support high-performance x64 INT4 kernels ### Motivation and Context 1. improve LLM inference performance on Intel CPUs. 2. support more 4bit quantization types: nf4, fp4 3. support dynamic block size: block size aligned with kernel's tiling size(e.g. 4 for VNNI kernel), per channel on N dimension 4. support most Intel ISAs: avx2, avx_vnni, avx512f, avx512_vnni, amx_bf16, amx_int8, avx512_fp16 5. support MatMulNBits' data format ### Tasks - [x] support block_size: 32, 128, -1(per channel) - [x] get weight pack size without memory allocation - [x] use ort's thread pool for parallelism - [x] support ISAs: avx2, avx512f, avx_vnni, avx512_vnni, amx_int8 ### Benchmark Ubuntu 20.22 + Intel(R) Xeon(R) Platinum 8480+ 56 cores Benchmark \| Time \| CPU \| Iterations -- \| -- \| -- \| -- Q4GEMM_Jblas/Q4G32SymInt8/M:1/N:4096/K:4096/Threads:56/real_time \| 47613 \| 47401 \| 12970 Q4GEMM_Jblas/Q4G32SymInt8/M:1024/N:4096/K:4096/Threads:56/real_time \| 6347792 \| 6317562 \| 109 Q4GEMM_Jblas/Q4G32SymInt8/M:2048/N:4096/K:4096/Threads:56/real_time \| 11814014 \| 11757847 \| 59 Q4GEMM_Jblas/Q4G128SymInt8/M:1/N:4096/K:4096/Threads:56/real_time \| 50222 \| 50031 \| 13759 Q4GEMM_Jblas/Q4G128SymInt8/M:1024/N:4096/K:4096/Threads:56/real_time \| 2038222 \| 2028743 \| 341 Q4GEMM_Jblas/Q4G128SymInt8/M:2048/N:4096/K:4096/Threads:56/real_time \| 3792832 \| 3774485 \| 191 Q4GEMM_Jblas/Q4GPerNSymInt8/M:1/N:4096/K:4096/Threads:56/real_time \| 58717 \| 58501 \| 11467 Q4GEMM_Jblas/Q4GPerNSymInt8/M:1024/N:4096/K:4096/Threads:56/real_time \| 1360846 \| 1354598 \| 543 Q4GEMM_Jblas/Q4GPerNSymInt8/M:2048/N:4096/K:4096/Threads:56/real_time \| 2564232 \| 2551365 \| 266 Q4GEMM_Jblas/Q4G32SymFp32/M:1/N:4096/K:4096/Threads:56/real_time \| 57929 \| 57694 \| 12047 Q4GEMM_Jblas/Q4G32SymFp32/M:1024/N:4096/K:4096/Threads:56/real_time \| 5495330 \| 5465810 \| 126 Q4GEMM_Jblas/Q4G32SymFp32/M:2048/N:4096/K:4096/Threads:56/real_time \| 10676240 \| 10617817 \| 66 Q4GEMM_Jblas/Q4G128SymFp32/M:1/N:4096/K:4096/Threads:56/real_time \| 68305 \| 68047 \| 10026 Q4GEMM_Jblas/Q4G128SymFp32/M:1024/N:4096/K:4096/Threads:56/real_time \| 5504862 \| 5476215 \| 126 Q4GEMM_Jblas/Q4G128SymFp32/M:2048/N:4096/K:4096/Threads:56/real_time \| 11758623 \| 11697337 \| 66 Q4GEMM_Jblas/Q4GPerNSymFp32/M:1/N:4096/K:4096/Threads:56/real_time \| 67713 \| 67451 \| 10298 Q4GEMM_Jblas/Q4GPerNSymFp32/M:1024/N:4096/K:4096/Threads:56/real_time \| 5508325 \| 5480237 \| 126 Q4GEMM_Jblas/Q4GPerNSymFp32/M:2048/N:4096/K:4096/Threads:56/real_time \| 10738528 \| 10681656 \| 64 Q4GEMM_Jblas/Q4G32AsymFp32/M:1/N:4096/K:4096/Threads:56/real_time \| 60708 \| 60486 \| 11321 Q4GEMM_Jblas/Q4G32AsymFp32/M:1024/N:4096/K:4096/Threads:56/real_time \| 5523784 \| 5495736 \| 126 Q4GEMM_Jblas/Q4G32AsymFp32/M:2048/N:4096/K:4096/Threads:56/real_time \| 10829633 \| 10772161 \| 67 Reference: Benchmark \| Time \| CPU \| Iterations -- \| -- \| -- \| -- Q4GEMM/Q4Sym/M:1/N:4096/K:4096/Threads:56/real_time \| 53088 \| 52911 \| 13364 Q4GEMM/Q4Sym/M:1024/N:4096/K:4096/Threads:56/real_time \| 6268981 \| 6230335 \| 110 Q4GEMM/Q4Sym/M:2048/N:4096/K:4096/Threads:56/real_time \| 11701237 \| 11632339 \| 59 Win11+12900K 8 cores: Benchmark \| Time \| CPU \| Iterations -- \| -- \| -- \| -- Q4GEMM_Jblas/Q4G32SymInt8/M:1/N:4096/K:4096/Threads:8/real_time \| 215976 \| 211295 \| 2884 Q4GEMM_Jblas/Q4G32SymInt8/M:1024/N:4096/K:4096/Threads:8/real_time \| 60960590 \| 60937500 \| 10 Q4GEMM_Jblas/Q4G32SymInt8/M:2048/N:4096/K:4096/Threads:8/real_time \| 1.18E+08 \| 1.19E+08 \| 5 Q4GEMM_Jblas/Q4G32SymInt8/M:1/N:11008/K:4096/Threads:8/real_time \| 470377 \| 453059 \| 1414 Q4GEMM_Jblas/Q4G32SymInt8/M:1024/N:11008/K:4096/Threads:8/real_time \| 1.54E+08 \| 1.53E+08 \| 5 Q4GEMM_Jblas/Q4G32SymInt8/M:2048/N:11008/K:4096/Threads:8/real_time \| 3.18E+08 \| 3.13E+08 \| 2 Q4GEMM_Jblas/Q4G32SymInt8/M:1/N:4096/K:11008/Threads:8/real_time \| 569072 \| 559398 \| 1229 Q4GEMM_Jblas/Q4G32SymInt8/M:1024/N:4096/K:11008/Threads:8/real_time \| 1.54E+08 \| 1.52E+08 \| 4 Q4GEMM_Jblas/Q4G32SymInt8/M:2048/N:4096/K:11008/Threads:8/real_time \| 3.22E+08 \| 3.28E+08 \| 2 Q4GEMM_Jblas/Q4G32SymInt8/M:1/N:11008/K:11008/Threads:8/real_time \| 1486055 \| 1473325 \| 403 Q4GEMM_Jblas/Q4G32SymInt8/M:1024/N:11008/K:11008/Threads:8/real_time \| 4.14E+08 \| 4.14E+08 \| 2 Q4GEMM_Jblas/Q4G32SymInt8/M:2048/N:11008/K:11008/Threads:8/real_time \| 8.88E+08 \| 8.59E+08 \| 1 --------- Signed-off-by: Mengni Wang <mengni.wang@intel.com> Co-authored-by: Mengni Wang <mengni.wang@intel.com>	2023-12-19 09:36:31 -08:00
Ashwini Khade	4dff154f51	Fix nightly pipeline failure (#18867 ) ### Description Fixes a failure in the ortmodule nightly pipeline. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-19 09:18:00 -08:00
Jian Chen	6d7519ede8	Adding new pipeline for python cuda testing (#18718 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-18 18:13:03 -08:00
Frank	63b47ceaf8	[REACT NATIVE] Bugfix -> casing Podfile (#18861 ) ### Description The casing of Podfile is incorrect in the plugin. This causes issues when building iOS on case-sensitive systems such as Linux. ### Motivation and Context because cannot build ios on case sensitive systems	2023-12-19 10:20:46 +10:00
dependabot[bot]	3ff4a4c393	Bump actions/stale from 8.0.0 to 9.0.0 (#18774 )	2023-12-18 14:59:03 -08:00
sophies927	ea6186efa8	Update stale.yml to correct close-issue-message (#18849 ) ### Description <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-18 09:57:33 -08:00
Yifan Li	9426bd50cb	[TensorRT EP] Update deprecated TRT api (#18834 ) ### Description <!-- Describe your changes. --> Update deprecated TRT api: 1. [setMaxWorkspaceSize](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_builder_config.html#a8209999988ab480c60c8a905dfd2654d)(max_workspace_size_)-------->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE, max_workspace_size_) 2. [kENABLE_TACTIC_HEURISTIC](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#abdc74c40fe7a0c3d05d2caeccfbc29c1a1215692ad24465e4d9e37a8a7fce1a38)-------->supersede by trt builder optimization level 2 Perf & warning log comparison <html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882" xmlns="http://www.w3.org/TR/REC-html40"> <head> <meta name=ProgId content=OneNote.File> <meta name=Generator content="Microsoft OneNote 15"> </head> <body lang=en-US style='font-family:"Microsoft YaHei";font-size:12.0pt'> <!--StartFragment--> <div style='direction:ltr'> TRT EP options \| User will see corresponding warning logs: \| Average inference time cost (FRCNN on A100) -- \| -- \| -- trt_build_heuristics_enable\\|true \| [TensorRT EP] trt_build_heuristics_enable is deprecated on TRT 8.6 onwards. Please set builder optimization level as 2 to enable builder heuristics. \| ~300ms trt_build_heuristics_enable\\|true trt_builder_optimization_level\\|2 \| [TensorRT EP] Builder heuristics are enabled automatically by builder optimization level 2. trt_build_heuristics_enable is deprecated on TRT 8.6 onwards. \| ~275ms trt_builder_optimization_level\\|2 \| \| ~275ms </div> <!--EndFragment--> </body> </html> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Prepare for upcoming TRT 10	2023-12-18 09:16:09 -08:00
Changming Sun	ad476d5a1f	Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly (#18847 ) ### Description Change Nuget packaging pipeline's build TRT job to download CUDA SDK on-the-fly, so that we do not need to put a CUDA SDK in the build machine's image.	2023-12-15 17:44:02 -08:00
Dmitri Smirnov	50cbcf9587	Build function bodies according to the imported global opset. (#18833 ) ### Description Build function bodies according to the imported global opset. Same is for querying ONNX functions. ### Motivation and Context This addresses issues: https://github.com/microsoft/onnxruntime/issues/18781 https://github.com/microsoft/onnxruntime/issues/16438	2023-12-15 15:56:20 -08:00
RandySheriffH	2952cf82a5	Access map by iterator to silence sanity check. (#18835 ) Use iterator to refer to the set. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-12-15 14:57:55 -08:00
Jiajia Qin	8f7b89bd5b	[js/webgpu] Optimize NCHW layout for InstanceNormalization (#18123 ) ### Description The changes in this PR includes: 1) Fix f16 errors in InstanceNormalization with NCHW format. 2) Use vec to further optimize the original algorithm. 3) (Removed) Don't do layout conversion for InstanceNormalization for JSEP since InstanceNormalization itself is suitable for NCHW layout and has better performance in our current implementation. Tested on sd-vae-decoder-f16.onnx, it becomes 285 ms from 314 ms. The aggregate gpu profiling data can be found as below (Note the data is based change 3).): Before: <html> <body> <!--StartFragment--><span><span class="ui-provider ef bbg bbh bbi bbj bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn" dir="ltr"> Kernel \| Time (Ms) \| Percentage (%) -- \| -- \| -- Conv \| 201.55 \| 69.56 InstanceNormalization \| 42.49 \| 14.67 Transpose \| 28.95 \| 9.99 Mul \| 5.69 \| 1.96 Add \| 3.82 \| 1.32 MatMul \| 3.27 \| 1.13 Sigmoid \| 2.24 \| 0.77 Resize \| 1.16 \| 0.40 Softmax \| 0.34 \| 0.12 Cast \| 0.24 \| 0.08 Sum \| 289.75 <br class="Apple-interchange-newline"><!--EndFragment--> </body> </html> After: <html> <body> <!--StartFragment--><span><span class="ui-provider ef bbg bbh bbi bbj bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn" dir="ltr"> Kernel \| Time (Ms) \| Percentage (%) -- \| -- \| -- Conv \| 205.44 \| 79.43 InstanceNormalization \| 18.24 \| 7.05 Transpose \| 17.64 \| 6.82 Mul \| 5.69 \| 2.20 Add \| 3.81 \| 1.47 MatMul \| 3.56 \| 1.38 Sigmoid \| 2.24 \| 0.86 Resize \| 1.19 \| 0.46 Softmax \| 0.59 \| 0.23 Cast \| 0.24 \| 0.09 Sum \| 258.65 \| </span></span><!--EndFragment--> </body> </html> From above table, we can see that two ops time are greatly reduced. One is InstanceNormalization and the other is Transpose. The reason that the transpose time is reduced is because each InstanceNormalization is surrounded with two reshape ops in sd-vae-decoder-f16.onnx. Due to JSEP is prefer NHWC and InstanceNormalization is layout sensitive op, so two extra transpose ops are inserted dynamically when executing this model. After this change, those inserted transpose ops are not needed anymore. So the overall transpose time is reduced.	2023-12-15 11:26:15 -08:00
Jiajia Qin	4bbed4c71a	[js/webgpu] Fix f16 errors in unary (#18839 ) ### Description This PR fixes below errors: ``` no matching overload for operator > (vec4<f16>, vec4<f32>)	2023-12-15 11:25:12 -08:00
Changming Sun	f52668cc68	Disable mlas unit test in ARM64EC build (#18747 ) ### Description Disable mlas unit test in ARM64EC build because the program has some link errors. We will fix the errors later. This PR only impacts Windows ARM64EC build. It has no impact on the existing build pipelines.	2023-12-15 09:17:47 -08:00
wirthual	89168b830d	Fix CI error: The workflow is not valid. .github/workflows/rust-ci.yml (Line: 27, Col: 7): Unexpected value 'ORT_RUST_STRATEGY=download' (#18836 ) Use colon for Env variable instead of =	2023-12-15 09:14:02 -08:00
Yang Gu	81ad1e6ac3	[js/webgpu] Fix typo of outputShapes in profiling message (#18837 )	2023-12-15 08:57:48 -08:00
Peishen Yan	d111eed726	[WebNN EP] Change axis to axes for argMax/argMin (#18838 ) In the latest spec, the axes option of WebNN's argMax and argMin requires the use of a sequence long type. Replace axis option (long type) with axes (sequence long type) for argMax and argMin.	2023-12-15 08:57:07 -08:00
Changming Sun	d795fc636c	FIX: Our cmake script didn't check googletest's hash (#18826 )	2023-12-15 08:48:15 -08:00
Changming Sun	fc9ecb59db	Add Windows ARM build jobs to post merge pipeline (#18832 ) ### Description Add Windows ARM build jobs to post merge pipeline to valid our code is still compatible with these build settings.	2023-12-15 08:47:52 -08:00
pengwa	5eda79bdd3	Improve perf for stage3 training (#18099 ) ### Improve perf for stage3 training - first wave Port existing PythonOp/PythonOpGrad python runner to C++, also introduce an unsafe run mode (to skip inplace, save for backward, materrialized grad detection on the fly). This reduce the overhead from XX~XXX us to X ~ lower end of XX us . In LLAMA2 7B training with 8x32GV100, we have observed 6.7% gains over PyTorch. (1.59 v.s. 1.49it/s) Peak memory also dropped from 31GB to 28GB. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-15 13:32:19 +08:00
Changming Sun	cbad4fe49b	Update absl and googletest (#18827 ) ### Description Update absl and googletest to their latest version to include some cmake changes: 1. A googletest's cmake change that will allow using external absl and re2. 2. Nullability enhancements that will allow our clang-based static analysis detecting many kinds of null pointer errors. ### Motivation and Context To fix a C4744 link warning in our Windows pipelines. ``` LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\usage.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<int>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj] ```	2023-12-14 16:15:07 -08:00
Yueqing Zhang	b42d4b8ea6	[VitisAI] 1. api compatbile 2. dynamic load onnx (#18470 ) ### Description <!-- Describe your changes. --> 1. Add a backward-compatible API for compiling model. 2. Run-time load vitisai-ep.dll ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> --------- Co-authored-by: Yueqing Zhang <yueqingz@amd.com> Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>	2023-12-14 14:43:41 -08:00
zesongw	6d5ee4d69b	[WebNN EP] Use explicit padding (#18688 ) WebNN will remove autoPad option, we need to use explicit padding values. Compute padding values of autopad(same-upper, same-lower) for Op Pool, Conv and ConvTranspose.	2023-12-14 14:33:44 -08:00
Wanming Lin	1db1c75048	[WebNN EP] WebNN only supports 4-D input and weight for Conv/ConvTranspose (#18703 )	2023-12-14 14:33:19 -08:00
Changming Sun	b129f425fc	Fix test model URL issue (#18823 ) ### Description ONNX model zoo changed their dir structure. So some our pipelines are failing. In prevent such things happening again, we'd better to read the test data for a cache from local disk instead of downloading it remotely every time.	2023-12-14 13:06:08 -08:00
Chi Lo	afe5cdc938	[TensorRT EP] Switch to enqueueV3 with support DDS output (copy version) (#18714 ) It's branched off from https://github.com/microsoft/onnxruntime/pull/17751 but removes KernelContext_SetOutput() API. It copies output allocation buffer to kernel context. --------- Co-authored-by: George Wu <jywu@microsoft.com>	2023-12-14 11:10:58 -08:00
Changming Sun	7386e21121	Replace some ORT_ENFORCE with ORT_THROW_IF_ERROR (#18812 ) ### Description Replace some ORT_ENFORCE with ORT_THROW_IF_ERROR to get better error messages.	2023-12-14 10:14:22 -08:00
Changming Sun	95193cb440	Set NDK version in Linux CPU Minimal Build E2E CI Pipeline (#18810 ) ### Description To upgrade the clang version in preparation for PR #17031 .	2023-12-14 08:08:41 -08:00
Yi Zhang	7dade5d05b	Readd basetargets in Microsoft.ML.OnnxRuntime.csproj (#18789 ) ### Description <!-- Describe your changes. --> ### Motivation and Context Now, the nightly Microsoft.ML.Onnxruntime.Managed Nuget Packag couldn't be added in dotnet console program in VS2022 with target framework .NET 6.0. I just restore it to previous setting to make it work.	2023-12-14 14:44:11 +08:00
Changming Sun	7047d13c68	Update windowsai-steps.yml: enable "/profile" linker flag (#18022 ) ### Description Update windowsai-steps.yml: enable "/profiling" linker flag for an internal requirement.	2023-12-13 19:47:04 -08:00
Suryaprakash Shanmugam	0723dcb8b5	OpenVINO Execution Provider with 2023.2 support (#18596 ) - Add support for OpenVINO 2023.2 - num_of_threads provider option is mapped to the CPU device property inference_num_threads of the CPU plugin, so users can control the #threads used for inference by the CPU - Logging in Debug mode now includes the runtime properties set for devices - Fix issue in using external weights through OpenVINO --------- Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-12-13 15:56:43 -08:00
Rachel Guo	f3fa045681	Enable MacOS build in ORT Objc Pod (#18786 ) ### Description <!-- Describe your changes. --> Add macos build for objc pod. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Follow up pr for #18550 --------- Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2023-12-13 13:50:42 -08:00
Ashwini Khade	487abcd25e	Update gradient ops tests (#18783 ) ### Description <!-- Describe your changes. --> TrainingSession has been deprecated for a while now, but the gradient ops tests are still using training session. This PR updates these tests to use inference session instead of training session. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> This will enable us to remove all the training session related deprecated code from the repo.	2023-12-13 11:26:52 -08:00
Changming Sun	17eaf9b053	Fix a build warning in SparseTensor code for 32-bit build configs (#18766 ) ### Description The warning is: ``` C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.1812949Z with 2023-12-08T20:58:48.2144272Z [ 2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.2801935Z ] 2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.2894476Z with 2023-12-08T20:58:48.2911521Z [ 2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr, 2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) 2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3197946Z with 2023-12-08T20:58:48.3198565Z [ 2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor 2023-12-08T20:58:48.3905678Z ] 2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.3913414Z with 2023-12-08T20:58:48.3913660Z [ 2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.3914499Z ] 2023-12-08T20:58:48.3914743Z qlinear_concat.cc 2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5534583Z with 2023-12-08T20:58:48.5541266Z [ 2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5544914Z ] 2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5553712Z with 2023-12-08T20:58:48.5555569Z [ 2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5558707Z ] 2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5566354Z with 2023-12-08T20:58:48.5568185Z [ 2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5571339Z ] 2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5578562Z with 2023-12-08T20:58:48.5580399Z [ 2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5583465Z ] 2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data 2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj] 2023-12-08T20:58:48.5591396Z with 2023-12-08T20:58:48.5593220Z [ 2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>> 2023-12-08T20:58:48.5595955Z ] ``` And the warning in #18195 ### Motivation and Context AB#22894 --------- Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>	2023-12-13 11:11:13 -08:00
Changming Sun	44054e7508	Move NuGet nightly package publishing job to a separated pipeline (#18801 ) ### Description Move NuGet nightly package publishing job to a separated pipeline. Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs Packaging Pipeline'. This PR moves it to a separate pipeline so that we can manually trigger this step for any branch(e.g. release branches).	2023-12-13 11:10:50 -08:00
Jiajia Qin	b30e721dc8	[js/webgpu] Provide a naive vectorized matmul algorithm (#18758 ) ### Description This PR provided a vectorized matmul algorithm. In most situations, we still go to the workgroup memory optimized matmul. But for some situations, like N and K are very small, using workgroup optimized matmul can't fully utilize the underlying hardware due to the 32x32 tile size. So for very small N/K, we switch to the naive vectorized matmul algorithm to improve the hardware execution unit usage. With this PR, matmul with input0: [1, 36864, 3], input1: [1, 3, 3], input2: [3] becomes less than 1 ms from 4.34 ms on Intel Gen9 GPUs.	2023-12-13 09:03:23 -08:00
Ted Themistokleous	1ad6eb1359	Add DynamicQuantizeLinear as supported OP (#18798 ) Supported added in MIGraphX. should be in operator list ### Description Simple change to add support to EP for DynamicQuantizeLinear ### Motivation and Context Changes added in MIGraphX. Should also be available in the EP to run models that are int8 quantized. Currently we fail and fallback ops to ROCm->CPU EPs Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>	2023-12-13 16:25:56 +08:00
pengwa	dbe886abb3	Disable test_bert_result_with_layerwise_recompute (#18800 ) ### Disable test_bert_result_with_layerwise_recompute <!-- Describe your changes. --> ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-12-13 12:16:39 +08:00

1 2 3 4 5 ...

10297 commits