Bumps [transformers](https://github.com/huggingface/transformers) from
4.35.2 to 4.36.0.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/huggingface/transformers/releases">transformers's
releases</a>.</em></p>
<blockquote>
<h2>v4.36: Mixtral, Llava/BakLlava, SeamlessM4T v2, AMD ROCm, F.sdpa
wide-spread support</h2>
<h2>New model additions</h2>
<h3>Mixtral</h3>
<p>Mixtral is the new open-source model from Mistral AI announced by the
blogpost <a href="https://mistral.ai/news/mixtral-of-experts/">Mixtral
of Experts</a>. The model has been proven to have comparable
capabilities to Chat-GPT according to the benchmark results shared on
the release blogpost.</p>
<!-- raw HTML omitted -->
<p>The architecture is a sparse Mixture of Experts with Top-2 routing
strategy, similar as <code>NllbMoe</code> architecture in transformers.
You can use it through <code>AutoModelForCausalLM</code> interface:</p>
<pre lang="py"><code>>>> import torch
>>> from transformers import AutoModelForCausalLM,
AutoTokenizer
<p>>>> model =
AutoModelForCausalLM.from_pretrained("mistralai/Mixtral-8x7B",
torch_dtype=torch.float16, device_map="auto")
>>> tokenizer =
AutoTokenizer.from_pretrained("mistralai/Mistral-8x7B")</p>
<p>>>> prompt = "My favourite condiment is"</p>
<p>>>> model_inputs = tokenizer([prompt],
return_tensors="pt").to(device)
>>> model.to(device)</p>
<p>>>> generated_ids = model.generate(**model_inputs,
max_new_tokens=100, do_sample=True)
>>> tokenizer.batch_decode(generated_ids)[0]
</code></pre></p>
<p>The model is compatible with existing optimisation tools such Flash
Attention 2, <code>bitsandbytes</code> and PEFT library. The checkpoints
are release under <a
href="https://huggingface.co/mistralai"><code>mistralai</code></a>
organisation on the Hugging Face Hub.</p>
<h3>Llava / BakLlava</h3>
<p>Llava is an open-source chatbot trained by fine-tuning LlamA/Vicuna
on GPT-generated multimodal instruction-following data. It is an
auto-regressive language model, based on the transformer architecture.
In other words, it is an multi-modal version of LLMs fine-tuned for chat
/ instructions.</p>
<!-- raw HTML omitted -->
<p>The Llava model was proposed in <a
href="https://arxiv.org/pdf/2310.03744">Improved Baselines with Visual
Instruction Tuning</a> by Haotian Liu, Chunyuan Li, Yuheng Li and Yong
Jae Lee.</p>
<ul>
<li>[<code>Llava</code>] Add Llava to transformers by <a
href="https://github.com/younesbelkada"><code>@younesbelkada</code></a>
in <a
href="https://redirect.github.com/huggingface/transformers/issues/27662">#27662</a></li>
<li>[LLaVa] Some improvements by <a
href="https://github.com/NielsRogge"><code>@NielsRogge</code></a> in <a
href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a></li>
</ul>
<p>The integration also includes <a
href="https://github.com/SkunkworksAI/BakLLaVA"><code>BakLlava</code></a>
which is a Llava model trained with Mistral backbone.</p>
<p>The mode is compatible with <code>"image-to-text"</code>
pipeline:</p>
<pre lang="py"><code>from transformers import pipeline
from PIL import Image
import requests
<p>model_id = "llava-hf/llava-1.5-7b-hf"
</tr></table>
</code></pre></p>
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="14666775a2"><code>1466677</code></a>
Release: v4.36.0</li>
<li><a
href="accccdd008"><code>accccdd</code></a>
[<code>Add Mixtral</code>] Adds support for the Mixtral MoE (<a
href="https://redirect.github.com/huggingface/transformers/issues/27942">#27942</a>)</li>
<li><a
href="0676d992a5"><code>0676d99</code></a>
[<code>from_pretrained</code>] Make from_pretrained fast again (<a
href="https://redirect.github.com/huggingface/transformers/issues/27709">#27709</a>)</li>
<li><a
href="9f18cc6df0"><code>9f18cc6</code></a>
Fix SDPA dispatch & make SDPA CI compatible with torch<2.1.1 (<a
href="https://redirect.github.com/huggingface/transformers/issues/27940">#27940</a>)</li>
<li><a
href="7ea21f1f03"><code>7ea21f1</code></a>
[LLaVa] Some improvements (<a
href="https://redirect.github.com/huggingface/transformers/issues/27895">#27895</a>)</li>
<li><a
href="5e620a92cf"><code>5e620a9</code></a>
Fix <code>SeamlessM4Tv2ModelIntegrationTest</code> (<a
href="https://redirect.github.com/huggingface/transformers/issues/27911">#27911</a>)</li>
<li><a
href="e96c1de191"><code>e96c1de</code></a>
Skip <code>UnivNetModelTest::test_multi_gpu_data_parallel_forward</code>
(<a
href="https://redirect.github.com/huggingface/transformers/issues/27912">#27912</a>)</li>
<li><a
href="8d8970efdd"><code>8d8970e</code></a>
[BEiT] Fix test (<a
href="https://redirect.github.com/huggingface/transformers/issues/27934">#27934</a>)</li>
<li><a
href="235be08569"><code>235be08</code></a>
[DETA] fix backbone freeze/unfreeze function (<a
href="https://redirect.github.com/huggingface/transformers/issues/27843">#27843</a>)</li>
<li><a
href="df5c5c62ae"><code>df5c5c6</code></a>
Fix typo (<a
href="https://redirect.github.com/huggingface/transformers/issues/27918">#27918</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/huggingface/transformers/compare/v4.35.2...v4.36.0">compare
view</a></li>
</ul>
</details>
<br />
[](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)
Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.
[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)
---
<details>
<summary>Dependabot commands and options</summary>
<br />
You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).
</details>
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
### Description
<!-- Describe your changes. -->
- Wrap usage of kENABLE_TACTIC_HEURISTIC around version checking macros
- Use delete instead of deprecated destroy() functions on TRT objects.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Removes usages of deprecated TRT APIs.
Signed-off-by: Kevin Chen <kevinch@nvidia.com>
### Description
(1) Support importing model from Olive.
(2) Add backend engine Torch (Eager and Compile modes) to the demo.
(3) Use fp16 in most places.
(4) Remove some old pipeline scripts that are not useful anymore. They
are replaced by the demo.
(5) Remove old benchmark results that are out of date.
(6) Add PIL image conversion to end to end latency (for fair comparison
with diffusers since the default output type is pil)
(7) Remove some options are seldom used like force-rebuild-engine,
hf-token, refit etc.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
This PR revises the backend registration.
The following describes the expected behavior after this change:
(**bolded are changed behavior**)
- (ort.min.js - built without webgpu support)
- loading: do not register 'webgpu' backend
- creating session without EP list: use default EP list ['webnn', 'cpu',
'wasm']
- creating session with ['webgpu'] as EP list: should fail with backend
not available
- (ort.webgpu.min.js - built with webgpu support)
- loading: **always register 'webgpu' backend**
( previous behavior: only register 'webgpu' backend when `navigator.gpu`
is available)
- creating session without EP list: use default EP list ['webgpu',
'webnn', 'cpu', 'wasm']
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init,**
and try to use next backend in the list, 'webnn'
(previous behavior: does not fail backend init, but fail in JSEP init,
which was too late to switch to next backend)
- creating session with ['webgpu'] as EP list
- when WebGPU is available (win): use WebGPU backend
- when WebGPU is unavailable (android): **should fail backend init, and
because no more EP listed, fail.
related PRs: #18190#18144
### Description
<!-- Describe your changes. -->
Onnx model zoo had major update recently, and legacy models were
relocated under /archive/
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Move QNN EP provider options to session options
### Description
Need to use session option to support multi-partition for context cache feature. To smooth the transaction, move the provider options to session options first.
This is the first step for PR:
PR https://github.com/microsoft/onnxruntime/pull/18865
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Add LeakyRelu to the list as support was added a while ago.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
1. Add a CodeSign validation task before the binaries are published, to
make sure all DLL files are signed.
2. Auto-trigger the CUDA 12 pipeline's publishing job.
### Description
<!-- Describe your changes. -->
Check whether the min/max inputs are provided and use default values if not provided.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
dft is updated in opset20. implement it in ort
### Motivation and Context
this is for ort 1.17.0 release
Fixes#17723
---------
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
### Description
Fixes a failure in the ortmodule nightly pipeline.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
The casing of Podfile is incorrect in the plugin. This causes issues
when building iOS on case-sensitive systems such as Linux.
### Motivation and Context
because cannot build ios on case sensitive systems
### Description
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
<!-- Describe your changes. -->
Update deprecated TRT api:
1.
[setMaxWorkspaceSize](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_builder_config.html#a8209999988ab480c60c8a905dfd2654d)(max_workspace_size_)-------->setMemoryPoolLimit(nvinfer1::MemoryPoolType::kWORKSPACE,
max_workspace_size_)
2.
[kENABLE_TACTIC_HEURISTIC](https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/namespacenvinfer1.html#abdc74c40fe7a0c3d05d2caeccfbc29c1a1215692ad24465e4d9e37a8a7fce1a38)-------->supersede
by trt builder optimization level 2
Perf & warning log comparison
<html xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:dt="uuid:C2F41010-65B3-11d1-A29F-00AA00C14882"
xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta name=ProgId content=OneNote.File>
<meta name=Generator content="Microsoft OneNote 15">
</head>
<body lang=en-US style='font-family:"Microsoft YaHei";font-size:12.0pt'>
<!--StartFragment-->
<div style='direction:ltr'>
TRT EP options | User will see corresponding warning logs: | Average
inference time cost (FRCNN on A100)
-- | -- | --
trt_build_heuristics_enable\|true | [TensorRT EP]
trt_build_heuristics_enable is deprecated on TRT 8.6 onwards. Please set
builder optimization level as 2 to enable builder heuristics. | ~300ms
trt_build_heuristics_enable\|true trt_builder_optimization_level\|2 |
[TensorRT EP] Builder heuristics are enabled automatically by builder
optimization level 2. trt_build_heuristics_enable is deprecated on TRT
8.6 onwards. | ~275ms
trt_builder_optimization_level\|2 | | ~275ms
</div>
<!--EndFragment-->
</body>
</html>
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Prepare for upcoming TRT 10
### Description
Change Nuget packaging pipeline's build TRT job to download CUDA SDK
on-the-fly, so that we do not need to put a CUDA SDK in the build
machine's image.
### Description
The changes in this PR includes:
1) Fix f16 errors in InstanceNormalization with NCHW format.
2) Use vec to further optimize the original algorithm.
3) (Removed) Don't do layout conversion for InstanceNormalization for
JSEP since InstanceNormalization itself is suitable for NCHW layout and
has better performance in our current implementation.
Tested on sd-vae-decoder-f16.onnx, it becomes 285 ms from 314 ms. The
aggregate gpu profiling data can be found as below (Note the data is
based change 3).):
Before:
<html>
<body>
<!--StartFragment--><span><span class="ui-provider ef bbg bbh bbi bbj
bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb
bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn" dir="ltr">
Kernel | Time (Ms) | Percentage (%)
-- | -- | --
Conv | 201.55 | 69.56
InstanceNormalization | 42.49 | 14.67
Transpose | 28.95 | 9.99
Mul | 5.69 | 1.96
Add | 3.82 | 1.32
MatMul | 3.27 | 1.13
Sigmoid | 2.24 | 0.77
Resize | 1.16 | 0.40
Softmax | 0.34 | 0.12
Cast | 0.24 | 0.08
Sum | 289.75
<br class="Apple-interchange-newline"><!--EndFragment-->
</body>
</html>
After:
<html>
<body>
<!--StartFragment--><span><span class="ui-provider ef bbg bbh bbi bbj
bbk bbl bbm bbn bbo bbp bbq bbr bbs bbt bbu bbv bbw bbx bby bbz bca bcb
bcc bcd bce bcf bcg bch bci bcj bck bcl bcm bcn" dir="ltr">
Kernel | Time (Ms) | Percentage (%)
-- | -- | --
Conv | 205.44 | 79.43
InstanceNormalization | 18.24 | 7.05
Transpose | 17.64 | 6.82
Mul | 5.69 | 2.20
Add | 3.81 | 1.47
MatMul | 3.56 | 1.38
Sigmoid | 2.24 | 0.86
Resize | 1.19 | 0.46
Softmax | 0.59 | 0.23
Cast | 0.24 | 0.09
Sum | 258.65 |
</span></span><!--EndFragment-->
</body>
</html>
From above table, we can see that two ops time are greatly reduced. One
is InstanceNormalization and the other is Transpose. The reason that the
transpose time is reduced is because each InstanceNormalization is
surrounded with two reshape ops in sd-vae-decoder-f16.onnx. Due to JSEP
is prefer NHWC and InstanceNormalization is layout sensitive op, so two
extra transpose ops are inserted dynamically when executing this model.
After this change, those inserted transpose ops are not needed anymore.
So the overall transpose time is reduced.
### Description
Disable mlas unit test in ARM64EC build because the program has some
link errors. We will fix the errors later.
This PR only impacts Windows ARM64EC build. It has no impact on the
existing build pipelines.
In the latest spec, the axes option of WebNN's argMax and argMin
requires the use of a sequence long type. Replace axis option (long
type) with axes (sequence long type) for argMax and argMin.
### Improve perf for stage3 training - first wave
Port existing PythonOp/PythonOpGrad python runner to C++, also introduce
an unsafe run mode (to skip inplace, save for backward, materrialized
grad detection on the fly).
This reduce the overhead from XX~XXX us to X ~ lower end of XX us . In
LLAMA2 7B training with 8x32GV100, we have observed 6.7% gains over
PyTorch. (1.59 v.s. 1.49it/s)
Peak memory also dropped from 31GB to 28GB.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
### Description
Update absl and googletest to their latest version to include some cmake
changes:
1. A googletest's cmake change that will allow using external absl and
re2.
2. Nullability enhancements that will allow our clang-based static
analysis detecting many kinds of null pointer errors.
### Motivation and Context
To fix a C4744 link warning in our Windows pipelines.
```
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\parse.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\usage.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<bool>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<class std::basic_string<char,struct std::char_traits<char>,class std::allocator<char> > >::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
LINK : warning C4744: 'static char const absl::lts_20230802::base_internal::FastTypeTag<int>::dummy_var' has different type in 'd:\a\_work\_temp\abseil_cpp\abseil-cpp-20230802.0\absl\flags\internal\flag.cc' and 'd:\a\_work\1\b\relwithdebinfo\_deps\googletest-src\googletest\src\gtest-all.cc': 'signed char' and 'unsigned char' [D:\a\_work\1\b\RelWithDebInfo\onnxruntime_mlas_test.vcxproj]
```
### Description
<!-- Describe your changes. -->
1. Add a backward-compatible API for compiling model.
2. Run-time load vitisai-ep.dll
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
---------
Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>
WebNN will remove autoPad option, we need to use explicit padding
values.
Compute padding values of autopad(same-upper, same-lower) for Op Pool,
Conv and ConvTranspose.
### Description
ONNX model zoo changed their dir structure. So some our pipelines are
failing. In prevent such things happening again, we'd better to read the
test data for a cache from local disk instead of downloading it remotely
every time.
It's branched off from
https://github.com/microsoft/onnxruntime/pull/17751 but removes
KernelContext_SetOutput() API. It copies output allocation buffer to
kernel context.
---------
Co-authored-by: George Wu <jywu@microsoft.com>
### Description
<!-- Describe your changes. -->
### Motivation and Context
Now, the nightly Microsoft.ML.Onnxruntime.Managed Nuget Packag couldn't
be added in dotnet console program in VS2022 with target framework .NET
6.0.
I just restore it to previous setting to make it work.
- Add support for OpenVINO 2023.2
- num_of_threads provider option is mapped to the CPU device property
inference_num_threads of the CPU plugin, so users can control the
#threads used for inference by the CPU
- Logging in Debug mode now includes the runtime properties set for
devices
- Fix issue in using external weights through OpenVINO
---------
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
### Description
<!-- Describe your changes. -->
Add macos build for objc pod.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Follow up pr for #18550
---------
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
### Description
<!-- Describe your changes. -->
TrainingSession has been deprecated for a while now, but the gradient
ops tests are still using training session. This PR updates these tests
to use inference session instead of training session.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This will enable us to remove all the training session related
deprecated code from the repo.
### Description
The warning is:
```
C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.1812949Z with
2023-12-08T20:58:48.2144272Z [
2023-12-08T20:58:48.2145285Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.2801935Z ]
2023-12-08T20:58:48.2804047Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(82,8): message : while compiling class template member function 'void onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()(const onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const onnxruntime::SparseTensor &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2806197Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(302,27): message : see the first reference to 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>::operator ()' in 'onnxruntime::utils::mltype_dispatcher_internal::CallableDispatchableHelper::Invoke' (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2871783Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(438,100): message : see reference to class template instantiation 'onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr<uint64_t>' being compiled (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc) [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2893010Z C:\a\_work\1\s\include\onnxruntime\core/framework/data_types_internal.h(414,5): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::InvokeWithLeadingTemplateArgs<Fn,onnxruntime::TypeList<>,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.2894476Z with
2023-12-08T20:58:48.2911521Z [
2023-12-08T20:58:48.2912457Z Fn=onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,
2023-12-08T20:58:48.3067840Z T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3068863Z ] (compiling source file C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc)
2023-12-08T20:58:48.3195854Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,11): message : see reference to function template instantiation 'void onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke<onnxruntime::contrib::`anonymous-namespace'::SparseToDenseCsr,onnxruntime::contrib::`anonymous-namespace'::ComputeCtx&,const T&,const onnxruntime::Tensor&,onnxruntime::Tensor&>(onnxruntime::contrib::`anonymous-namespace'::ComputeCtx &,const T &,const onnxruntime::Tensor &,onnxruntime::Tensor &) const' being compiled [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3197946Z with
2023-12-08T20:58:48.3198565Z [
2023-12-08T20:58:48.3199093Z T=onnxruntime::SparseTensor
2023-12-08T20:58:48.3905678Z ]
2023-12-08T20:58:48.3907275Z C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(198,36): message : see the first reference to 'onnxruntime::utils::MLTypeCallDispatcher<float,double,int32_t,uint32_t,int64_t,uint64_t>::Invoke' in 'onnxruntime::contrib::SparseToDenseMatMul::Compute' [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3910999Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3912734Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,43): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.3913414Z with
2023-12-08T20:58:48.3913660Z [
2023-12-08T20:58:48.3914001Z Derived=Eigen::Map<const Eigen::SparseMatrix<uint64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.3914499Z ]
2023-12-08T20:58:48.3914743Z qlinear_concat.cc
2023-12-08T20:58:48.3917082Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.3918624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,74): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5534583Z with
2023-12-08T20:58:48.5541266Z [
2023-12-08T20:58:48.5542401Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5544914Z ]
2023-12-08T20:58:48.5548670Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5552099Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(92,63): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5553712Z with
2023-12-08T20:58:48.5555569Z [
2023-12-08T20:58:48.5556779Z Derived=Eigen::Map<const Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5558707Z ]
2023-12-08T20:58:48.5561428Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5565624Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,90): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5566354Z with
2023-12-08T20:58:48.5568185Z [
2023-12-08T20:58:48.5569305Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5571339Z ]
2023-12-08T20:58:48.5574864Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5577866Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(93,77): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5578562Z with
2023-12-08T20:58:48.5580399Z [
2023-12-08T20:58:48.5581503Z Derived=Eigen::Map<Eigen::Matrix<uint64_t,-1,-1,1,-1,-1>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5583465Z ]
2023-12-08T20:58:48.5587661Z ##[warning]onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): Warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data
2023-12-08T20:58:48.5590705Z 182>C:\a\_work\1\s\onnxruntime\contrib_ops\cpu\math\sparse_dense_matmul.cc(88,54): warning C4244: 'argument': conversion from 'const __int64' to 'Eigen::EigenBase<Derived>::Index', possible loss of data [C:\a\_work\1\b\RelWithDebInfo\onnxruntime_providers.vcxproj]
2023-12-08T20:58:48.5591396Z with
2023-12-08T20:58:48.5593220Z [
2023-12-08T20:58:48.5593693Z Derived=Eigen::Map<const Eigen::SparseMatrix<int64_t,1,int64_t>,0,Eigen::Stride<0,0>>
2023-12-08T20:58:48.5595955Z ]
```
And the warning in #18195
### Motivation and Context
AB#22894
---------
Co-authored-by: Dmitri Smirnov <yuslepukhin@users.noreply.github.com>
### Description
Move NuGet nightly package publishing job to a separated pipeline.
Before this change, it runs at the end of 'Zip-Nuget-Java-Nodejs
Packaging Pipeline'. This PR moves it to a separate pipeline so that we
can manually trigger this step for any branch(e.g. release branches).
### Description
This PR provided a vectorized matmul algorithm. In most situations, we
still go to the workgroup memory optimized matmul. But for some
situations, like N and K are very small, using workgroup optimized
matmul can't fully utilize the underlying hardware due to the 32x32 tile
size. So for very small N/K, we switch to the naive vectorized matmul
algorithm to improve the hardware execution unit usage.
With this PR, matmul with input0: [1, 36864, 3], input1: [1, 3, 3],
input2: [3] becomes less than 1 ms from 4.34 ms on Intel Gen9 GPUs.
Supported added in MIGraphX. should be in operator list
### Description
Simple change to add support to EP for DynamicQuantizeLinear
### Motivation and Context
Changes added in MIGraphX. Should also be available in the EP to run
models that are int8 quantized. Currently we fail and fallback ops to
ROCm->CPU EPs
Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
### Disable test_bert_result_with_layerwise_recompute
<!-- Describe your changes. -->
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->