Commit graph

10864 commits

Author SHA1 Message Date
Yi Zhang
14d7872ce9
Reuse T4 for Cuda12.2 training packaging pipeline. (#20244)
### Description
It always has been out of memory in training CUDA 12.2 packaging
pipeline
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1308&_a=summary
since the PR #19910
I tried other CPU agents for example, D64as_v5(256G memory) and
D32as_v4(128G memory and 256 G SSD temp storage), which are still out of
memory like the below image

![image](https://github.com/microsoft/onnxruntime/assets/16190118/5acde9ef-674f-4b6d-a1b3-b54647645083)


But it works on T4, though T4 only has 4 vCPUs, 28G memory and 180G temp
storage, and it takes much more time.

### Motivation and Context
Restore CUDA 12.2 training packaging pipeline first.
More time is needed to investigate the root cause


### Other Clues.
These 2 compilation steps take nearly 6 minutes with Cuda 12.2 on T4
And it runs out of memory on CPU machine. @ajindal1 
cuda12.2 on T4
```
2024-03-14T05:39:08.7726865Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-14T05:45:01.3223393Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o

2024-03-14T05:46:07.9218003Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim96_fp16_sm80.cu.o
2024-03-14T05:52:59.2387051Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/group_query_attention_impl.cu.o

```

But they could be finished in about one minute with Cuda 11.8 on CPU
```
cuda11.8 on CPU
2024-04-09T11:34:35.0849836Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-04-09T11:35:53.6648154Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o

cuda11.8 on GPU
024-03-13T12:16:33.4102477Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim32_fp16_sm80.cu.o
2024-03-13T12:19:58.8268272Z [ 90%] Building CUDA object CMakeFiles/onnxruntime_providers_cuda.dir/onnxruntime_src/onnxruntime/contrib_ops/cuda/bert/flash_attention/flash_fwd_split_hdim64_bf16_sm80.cu.o
```
2024-04-10 09:21:40 +08:00
Dmitri Smirnov
7d8dea9f10
Reduce Heap contention in StringNormalizer (#20182)
### Description
<!-- Describe your changes. -->
Re-use pre-computed and pre-allocated buffers for UNICODE conversions.
Make sure we do not introduce unnecessary intermediate `std::string`
instances.
Create a Utf8Generic converter for use with non-Windows platforms.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This reduces heap contention in P1 customer.


![image](https://github.com/microsoft/onnxruntime/assets/11303988/fd39fb01-7361-47d2-8f83-69dbc3bbc65c)
2024-04-09 16:10:31 -07:00
pengwa
81005e2c92
Optimize constant sharing perf (#20143)
### Optimize constant sharing perf

by avoiding [renaming for the first name we detect a constant pattern. 

Currently every time we start run ConstantSharing, for each initializer,
we find its pattern does not exist, then we create a new NodeArg with a
unique name. Then later if other initializer share the same pattern,
they will be replaced by the NodeArg.

The problem is: once there is no real constant sharing cases, we still
modify the graph for each initializer. This is not needed.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-09 12:04:36 +08:00
danyue
07b5377f7c
Add INT16 and UINT16 compatibility for relu_quantizelinear (#20187)
### Description
<!-- Describe your changes. -->
There is a problem in relu_quantizelinear transformer that causes wrong
results. The purpose of this PR is to solve this problem.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This does not take into account the situation where Q's zeropoint is
tensor(int16), tensor(uint16), so when this happens, an error will
occur.
How to verify:
```python
import onnx
import onnxruntime as ort
import numpy as np
model_name = 'relu_quantize_testcase.onnx'
model = onnx.load(model_name)
ort_input0 = np.random.rand((1, 64, 64, 128),np.float32)
# infer with GraphOptimizationLevel=0
so = ort.SessionOptions()
so.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
ort_session = ort.InferenceSession(
    model_name,
    providers=["CPUExecutionProvider"],
    sess_options=so
)

outputs = [x.name for x in ort_session.get_outputs()]

ort_outs_mod = ort_session.run(outputs, { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0} )
del ort_session

# infer with GraphOptimizationLevel=default
model_orig = onnx.load(model_name)
ort_session_orig = ort.InferenceSession(model_orig.SerializeToString())

outputs_orig = [x.name for x in ort_session_orig.get_outputs()]

ort_outs_orig = ort_session_orig.run(outputs_orig,  { 'generator/conv2d_input/conv2d/Conv2D:0': ort_input0}  )

# diff
print(np.linalg.norm(ort_outs_mod[0].astype(np.float32) - ort_outs_orig[0].astype(np.float32)))

del ort_session_orig
```

[relu_quantize_testcase.zip](https://github.com/microsoft/onnxruntime/files/14848160/relu_quantize_testcase.zip)

---------

Co-authored-by: genmingz <genming.zhong@amd.com>
2024-04-08 19:41:43 -07:00
pengwa
41acd8c543
Support more ops for recompute (#20234)
### Support more ops for recompute

To cover Mistral model, and support padding elimination ops.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-09 09:24:48 +08:00
Adam Louly
22a61a3cf5
Fix Mixtral Parity test to keep it consistent with Transformers. (#20210)
### Description
I recently opened a PR in hf transformers repo to fix an issue on the
indexing part.

https://github.com/huggingface/transformers/issues/29857

onnx exporter was failing because of the tolist() conversion so we had
to remove it.

I found out that the code was also a part of our codebase so this PR is
to keep the code consistent.
2024-04-08 13:04:12 -07:00
wejoncy
908a76d675
fix "4bit quantization scales and zeropoint tensor shape" (#19986)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-08 10:15:28 -07:00
Jiajie Hu
23d3afd4fe
[js/webgpu] Implement com.microsoft.RotaryEmbedding (#20209)
### Description

https://github.com/microsoft/onnxruntime/blob/main/docs/ContribOperators.md#commicrosoftrotaryembedding

### Motivation and Context
As per customer request, this helps Phi-2 and Gemma.
2024-04-08 09:11:26 -07:00
cloudhan
e19c778934
Improve KE for commandline and programmatically tuning dispatch (#18778) 2024-04-08 11:08:59 +08:00
Ye Wang
cc3faba616
Support seq_len > 64K in rotary embedding cuda kernel (#20204)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-05 19:52:55 -07:00
Francesco
6af02ae06a
Remove non-existing function call (#19416)
This function call is confusing, since it is a function call without
definition of the function. It was correctly repalced from compute_data
to compute_range, but function call was reintroudced in a later PR.

### Description
Problem as described in [this
issue](https://github.com/microsoft/onnxruntime/issues/18893 )

In the examples, different calls of compute_range() from calibrate.py
can be found, also in the calibrate.py itself.

The problem is that it was [replaced here]
(https://github.com/microsoft/onnxruntime/pull/16550/files#diff-75e84436a983e17527f8b5bc585087e7ad75b3b515c2101c2a82dcaecca490de
) from `compute_range()` to `cpmute_data() -> TensorsData` and then
falsely [added as call
here](https://github.com/microsoft/onnxruntime/pull/17029/files#diff-75e84436a983e17527f8b5bc585087e7ad75b3b515c2101c2a82dcaecca490de
).

### Motivation and Context
I suggest in this PR to remove this confusing call
`self.calibrate_range()` in calibrate.py. Once it is removed and
packaged, somehow the examples from the onnx-runtime-examples repository
must be adapted, since they are already not working. Examples of
`compute_range()` in the examples are linked in [this
issue](https://github.com/microsoft/onnxruntime/issues/18893 ).
2024-04-05 19:48:48 -07:00
Adrian Lizarraga
05d97e8d18
Update QNN python packages to use QNN SDK version 2.19.2 (#20213)
### Description
Update QNN python packages to use QNN SDK version 2.19.2.



### Motivation and Context
Our CI builds already use QNN SDK version 2.19.2. We should make sure
the ort-nightly-qnn python packages are also built with the same QNN SDK
version.
2024-04-05 17:15:25 -07:00
Yi Zhang
23a5d0a305
Extend time out in Windows GPU packaging jobs (#20207)
### Description
Extend Windows GPU Packaging job building time out to 6 hours, and test
stage to 3 hours.



### Motivation and Context
There're still a few timeout issues after refactoring. The probability
is about 20% in
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=84.
I found the building could be finished in 4 hours if it becomes slow,
https://dev.azure.com/aiinfra/Lotus/_build/results?buildId=434340&view=logs&j=0c6ee496-b38e-55a9-3699-12934156e90f,
although in most cases, it only take about 30 minutes.
Not like before, the building couldn't be completed.
So, In this PR, I extend the timeout to 6 hours.

And one interesting thing, if one windows GPU job becomes slow, all
other windows GPU jobs in the same run become slow too.
So I doubt it has something with the ADO or virtualization. That is,
it's not completely random.
https://dev.azure.com/aiinfra/Lotus/_build?definitionId=841
2024-04-06 08:03:42 +08:00
Andrew Grigorev
a6611409cc
Fix HalideIR title in third party notices reference (#20190) 2024-04-05 11:12:43 -07:00
dependabot[bot]
2a323eb670
Bump Sixlabors.ImageSharp from 2.1.1 to 2.1.7 in /csharp/sample/Microsoft.ML.OnnxRuntime.ResNet50v2Sample (#19805)
Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp)
from 2.1.1 to 2.1.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's
releases</a>.</em></p>
<blockquote>
<h2>v2.1.7</h2>
<h2>What's Changed</h2>
<ul>
<li>[release/2.1] Disallow allocation attempts of unrepresentable sizes
by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2553">SixLabors/ImageSharp#2553</a></li>
<li>[release/2.1] Tiff decoding robustness improvements (<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>)
by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2554">SixLabors/ImageSharp#2554</a></li>
<li>[release/2.1] PBM decoder robustness improvements and
BufferedReadStream observability by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2555">SixLabors/ImageSharp#2555</a></li>
<li>Backport 2681 by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2688">SixLabors/ImageSharp#2688</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7">https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7</a></p>
<h2>v2.1.6</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport - Handle EOF in Jpeg bit reader when data is bad to prevent
DOS attack. by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2524">SixLabors/ImageSharp#2524</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6">https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6</a></p>
<h2>v2.1.5</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2501">#2501</a>
by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2509">SixLabors/ImageSharp#2509</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5">https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5</a></p>
<h2>v2.1.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport WebP fix to 2.1 by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2420">SixLabors/ImageSharp#2420</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4">https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4</a></p>
<h2>v2.1.3</h2>
<h2>What's Changed</h2>
<ul>
<li>V2 Backport: 2133, 2154 by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2157">SixLabors/ImageSharp#2157</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3">https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3</a></p>
<h2>v2.1.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport - Issue 2123 by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2126">SixLabors/ImageSharp#2126</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2">https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="fa7d712702"><code>fa7d712</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2688">#2688</a>
from SixLabors/js/backport-2681</li>
<li><a
href="36b3533cc3"><code>36b3533</code></a>
Use correct property to disable upstream warnings.</li>
<li><a
href="94bb7615a1"><code>94bb761</code></a>
Update ImageSharp.csproj</li>
<li><a
href="3ea2574726"><code>3ea2574</code></a>
Update PngDecoderCore.cs</li>
<li><a
href="e74a55fbfd"><code>e74a55f</code></a>
[release/2.1] PBM decoder robustness improvements and BufferedReadStream
obse...</li>
<li><a
href="749b1c04d7"><code>749b1c0</code></a>
[release/2.1] Tiff decoding robustness improvements (<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>)
(<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2554">#2554</a>)</li>
<li><a
href="3064b78927"><code>3064b78</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2553">#2553</a>
from SixLabors/backport/2.1.x/2545</li>
<li><a
href="f36ec12695"><code>f36ec12</code></a>
Disallow allocation attempts of unrepresentable sizes </li>
<li><a
href="688e242a84"><code>688e242</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2524">#2524</a>
from SixLabors/js/backport-fix-jpeg-dos</li>
<li><a
href="0f17a8be9c"><code>0f17a8b</code></a>
Handle EOF in Jpeg bit reader when data is bad to prevent DOS
attack.</li>
<li>Additional commits viewable in <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.1&new-version=2.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-05 11:11:52 -07:00
Hector Li
1ccb164c12
Improve the script to add Q, DQ nodes around EPContext node (#20107)
Improve the script to add Q, DQ nodes around EPContext node so that the wrapper model use float data as inputs and outputs. User don't need to quantize or dequantize the data in their application
2024-04-05 10:12:01 -07:00
Guenther Schmuelling
c529e05e38
fix ConvTranspose 1D (#20194) 2024-04-05 10:05:32 -07:00
dependabot[bot]
4f2d454211
Bump Sixlabors.ImageSharp from 2.1.1 to 2.1.7 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#19806)
Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp)
from 2.1.1 to 2.1.7.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/SixLabors/ImageSharp/releases">Sixlabors.ImageSharp's
releases</a>.</em></p>
<blockquote>
<h2>v2.1.7</h2>
<h2>What's Changed</h2>
<ul>
<li>[release/2.1] Disallow allocation attempts of unrepresentable sizes
by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2553">SixLabors/ImageSharp#2553</a></li>
<li>[release/2.1] Tiff decoding robustness improvements (<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>)
by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2554">SixLabors/ImageSharp#2554</a></li>
<li>[release/2.1] PBM decoder robustness improvements and
BufferedReadStream observability by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2555">SixLabors/ImageSharp#2555</a></li>
<li>Backport 2681 by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2688">SixLabors/ImageSharp#2688</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7">https://github.com/SixLabors/ImageSharp/compare/v2.1.6...v2.1.7</a></p>
<h2>v2.1.6</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport - Handle EOF in Jpeg bit reader when data is bad to prevent
DOS attack. by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2524">SixLabors/ImageSharp#2524</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6">https://github.com/SixLabors/ImageSharp/compare/v2.1.5...v2.1.6</a></p>
<h2>v2.1.5</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2501">#2501</a>
by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2509">SixLabors/ImageSharp#2509</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5">https://github.com/SixLabors/ImageSharp/compare/v2.1.4...v2.1.5</a></p>
<h2>v2.1.4</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport WebP fix to 2.1 by <a
href="https://github.com/antonfirsov"><code>@​antonfirsov</code></a> in
<a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2420">SixLabors/ImageSharp#2420</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4">https://github.com/SixLabors/ImageSharp/compare/v2.1.3...v2.1.4</a></p>
<h2>v2.1.3</h2>
<h2>What's Changed</h2>
<ul>
<li>V2 Backport: 2133, 2154 by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2157">SixLabors/ImageSharp#2157</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3">https://github.com/SixLabors/ImageSharp/compare/v2.1.2...v2.1.3</a></p>
<h2>v2.1.2</h2>
<h2>What's Changed</h2>
<ul>
<li>Backport - Issue 2123 by <a
href="https://github.com/JimBobSquarePants"><code>@​JimBobSquarePants</code></a>
in <a
href="https://redirect.github.com/SixLabors/ImageSharp/pull/2126">SixLabors/ImageSharp#2126</a></li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2">https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.2</a></p>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="fa7d712702"><code>fa7d712</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2688">#2688</a>
from SixLabors/js/backport-2681</li>
<li><a
href="36b3533cc3"><code>36b3533</code></a>
Use correct property to disable upstream warnings.</li>
<li><a
href="94bb7615a1"><code>94bb761</code></a>
Update ImageSharp.csproj</li>
<li><a
href="3ea2574726"><code>3ea2574</code></a>
Update PngDecoderCore.cs</li>
<li><a
href="e74a55fbfd"><code>e74a55f</code></a>
[release/2.1] PBM decoder robustness improvements and BufferedReadStream
obse...</li>
<li><a
href="749b1c04d7"><code>749b1c0</code></a>
[release/2.1] Tiff decoding robustness improvements (<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2550">#2550</a>)
(<a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2554">#2554</a>)</li>
<li><a
href="3064b78927"><code>3064b78</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2553">#2553</a>
from SixLabors/backport/2.1.x/2545</li>
<li><a
href="f36ec12695"><code>f36ec12</code></a>
Disallow allocation attempts of unrepresentable sizes </li>
<li><a
href="688e242a84"><code>688e242</code></a>
Merge pull request <a
href="https://redirect.github.com/SixLabors/ImageSharp/issues/2524">#2524</a>
from SixLabors/js/backport-fix-jpeg-dos</li>
<li><a
href="0f17a8be9c"><code>0f17a8b</code></a>
Handle EOF in Jpeg bit reader when data is bad to prevent DOS
attack.</li>
<li>Additional commits viewable in <a
href="https://github.com/SixLabors/ImageSharp/compare/v2.1.1...v2.1.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=Sixlabors.ImageSharp&package-manager=nuget&previous-version=2.1.1&new-version=2.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2024-04-05 08:32:18 -07:00
Edward Chen
2b3071119a
Add onnxruntime/test/run_benchmark.py helper script. (#19234)
### Description
Add onnxruntime/test/run_benchmark.py helper script to repeat benchmark
runs until a target coefficient of variance is reached. It works with
[Google Benchmark](https://github.com/google/benchmark) programs like
`onnxruntime_mlas_benchmark`.

### Motivation and Context
Sometimes there is variability in benchmark run results. This automates
the repeated running needed to get results that are stable enough.
2024-04-05 07:02:01 -07:00
Hans
6abfb6b928
[js/rn] Support load external data (#20090)
Support load external data by passing local model path
2024-04-05 05:55:03 -07:00
Scott McKay
f61cca1b8f
NNAPI: Improve MatMul diagnostic output (#19721)
### Description
<!-- Describe your changes. -->
Re-order so that we don't get two messages for the one node.

Currently the batched matmul 'not supported' message will appear for 2D
input which is valid, which can be confusing to understand.

Change the order so we only check if batched matmul can be used when the
input ranks are > 3, as that is one of the requirements.

c311d1faf5/onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder_helpers.cc (L257-L264)
2024-04-04 21:58:39 -07:00
Thomas Boby
254bdbb19d
OneDNN/dnnl: Fix filepath after dnnl move (#20086)
### Description
This adjusts the path used in the nuget script for dnnl to the new
location of the file.

There isn't a CI pipeline for this as far as I can tell, and I can't
easily confirm this change works on master, so please check.

### Motivation and Context
It is currently not possible to build onednn nuget packages. It's
possible that the correct action would be to move the file not fix this
path, but I'm not familiar enough with the repository layout.

---------

Co-authored-by: Tianlei Wu <tlwu@microsoft.com>
2024-04-04 21:24:49 -07:00
Yi Zhang
4ea54b82f9
[Fix] Upload training CUDA daily wheel (#20183)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-03 13:18:26 +08:00
Andrew Fantino
7303a90f49
Fix build errors from date/date.h C++20 compatibility (#20139)
### Description
For C++ standards >= 20, use `std::chrono::operator<<` in place of
`date::operator<<` to fix ambiguous operator compile error.

### Motivation and Context
The external dependency HowardHinnant/date has a conflict with
std::chrono for >=C++20.
Solves #20137
2024-04-02 22:10:25 -07:00
Yi Zhang
dae77e6014
Support building Windows CUDA with Ninja (#20176)
### How to run it locally
1. conda install ninja
2. "C:\Program Files\Microsoft Visual
Studio\2022\Enterprise\VC\Auxiliary\Build\vcvarsall.bat" x64
3. python.exe {ort_repo}\tools\ci_build\build.py --config RelWithDebInfo
--build_dir {ort_repo}\build_cuda --skip_submodule_sync --build_csharp
--update --parallel --cmake_generator "Ninja" --build_shared_lib
--enable_onnx_tests --enable_pybind --build_java --build_nodejs
--use_cuda "--cuda_home=C:\Program Files\NVIDIA GPU Computing
Toolkit\CUDA\v11.8" --enable_cuda_profiling --cmake_extra_defines
CMAKE_CUDA_ARCHITECTURES=60
4. cd build_cuda\RelWithDebInfo
5.  cmake --build . j16

### Motivation and Context
In packaging pipelines, we often come across a random issue that the
building with CUDA on Windows takes too much time.
Although it has been reduced much by moving the building to the CPU
machine.
We're planning to build with Ninja instead of msbuild in Packaging
pipelines, thus, nvcc can run parallelly.
It's the first step to support it locally.
2024-04-03 11:19:31 +08:00
Yulong Wang
fa1917b81b
[js/webgpu] add validation to workgroup size (#20110)
### Description
add validation to workgroup size in `shaderHelper.mainStart()`.
2024-04-02 19:29:20 -07:00
Shubham Bhokare
be831e1ba3
Export of Openai Whisper with batched prompts (#19854)
Adds an example to demonstrate the export of openai whipser
implemenation with batch_size > 1 and addition of prompts for each audio
snippet.

Also handles the scenario for when prompts are not of the same size. For
example if our prompt ids are [p1_id_1, p1_id_2] and [p2_id_1], the
final decoder_input_ids will look as such after padding:
`[prev_token, p1_id_1, p1_id_2, start_token, lang_token,
transcribe_token]
[prev_token, p2_id_1, PAD_TOKEN, start_token, lang_token,
transcribe_token]`

---------

Co-authored-by: kunal-vaishnavi <115581922+kunal-vaishnavi@users.noreply.github.com>
2024-04-02 17:01:48 -07:00
Rachel Guo
19793de1b3
#19921 [Dup] LLC Core count calculations updated (#20171)
### Description
<!-- Describe your changes. -->

See #19921 Just to address one comment:
https://github.com/microsoft/onnxruntime/pull/19921#discussion_r1543398640

since this is an external branch. need to open another pull request for
this.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Sai Kishan Pampana <sai.kishan.pampana@intel.com>
Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
Co-authored-by: Jian Chen <cjian@microsoft.com>
2024-04-02 16:53:47 -07:00
Dmitri Smirnov
12e2538065
Add new SessionOptions config entry to disable specific transformers and rules (#20135)
### Description
<!-- Describe your changes. -->

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Certain transformers slow down session loading time while providing no
runtime perf benefits.
Allow clients to exclude them.
2024-04-02 16:33:05 -07:00
Chi Lo
e916929371
[TensorRT EP] Address compiler warnings on Windows (#20134)
Previous [PR
](https://github.com/microsoft/onnxruntime/pull/19663)changes msvc
compiler warning level from set_msvc_c_cpp_compiler_warning_level(3) to
set_msvc_c_cpp_compiler_warning_level(4) when using CUDA EP (it also
applies to TRT EP).
Some warnings still need to be addressed in TRT EP code.
2024-04-02 10:39:46 -07:00
Xu Xing
a2998e5d42
[js/webgpu] Use global id in attention and instance-norm (#20008)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-02 01:42:39 -07:00
Adam Pocock
262b6bd3b7
[java][DML EP] Modifying dml_provider_factory.h so it can compile as a C header file (#20157)
### Description
The dml_provider_factory header file can't be used in C programs as it
defines C++ inline operators. This PR rearranges that header file so
that it looks like valid C when used from C, and also makes a couple of
small modifications to the Java code so it correctly binds to the DML EP
at build time.

I'm having some difficulty testing it as I think it's pulling in the old
version of DirectML on my computer and I can't figure out what the
library loading path is in Java to make it look at the recent version I
downloaded. So the test I added fails with:

```
InferenceTest > testDirectML() FAILED
    ai.onnxruntime.OrtException: Error code - ORT_RUNTIME_EXCEPTION - message: Exception during initialization: <path-to-ort>\onnxruntime\core\providers\dml\DmlExecutionProvider\src\AbiCustomRegistry.cpp(518)\onnxruntime.dll!00007FFF74819333: (caller: 00007FFF74793509) Exception(3) tid(4f58) 80070057 The parameter is incorrect.
        at app//ai.onnxruntime.OrtSession.createSession(Native Method)
        at app//ai.onnxruntime.OrtSession.<init>(OrtSession.java:74)
        at app//ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:236)
        at app//ai.onnxruntime.OrtEnvironment.createSession(OrtEnvironment.java:221)
        at app//ai.onnxruntime.InferenceTest.openSessionSqueezeNet(InferenceTest.java:1961)
        at app//ai.onnxruntime.InferenceTest.runProvider(InferenceTest.java:665)
        at app//ai.onnxruntime.InferenceTest.testDirectML(InferenceTest.java:657)
```

But it does correctly compile, and this error seems very similar to
other issues with the DML provider when it doesn't like a model due to
the loaded library being old. The test is using the squeezenet file
that's been in the repo since 2019. If someone can help me figure out
how to get the right version of DML in the library path I can test it
more on my end. I tried adding the folder with the new version into the
system path, but I'm not very familiar with Windows' library loading
behaviour.

### Motivation and Context
Fixes #19656 to allow use of the DirectML EP from ORT Java.

cc @martinb35
2024-04-01 21:58:50 -07:00
Xiaoyu
3979f53aa4
Update api backward compatibility (#20136)
### Description
Update api backward compatibility

### Motivation and Context
Update api backward compatibility
2024-04-01 21:37:56 -07:00
wangshuai09
3e2b659fce
[CANN] Add dump_om_model flag (#20075)
### Description
New flag of `dump_om_model` for **CANN EP**, which defaults to "True".

### Motivation and Context
When building an onnx model with CANN EP, the intermediate **OM(offline
model for Ascend NPU)** is automatically saved. There are some users
don't want to dump OM when resources are limited.
This PR will resovle this situation with `dump_om_model=False`
2024-04-01 21:35:29 -07:00
Dhruv Matani
742d413586
Fix bug related to export failure for DynamicQuantizeLSTM [issue 15465] (#20160)
### Description

See issue 15465: https://github.com/microsoft/onnxruntime/issues/15465

This PR just applies the workaround suggested in the thread that I and
numerous others on the thread have validated to work for them and allows
them to successfully export a PyTorch model with LSTM layers that are
dynamically quantized by ONNX.



### Motivation and Context

It is not possible to successfully export a dynamically quantized LSTM
model that I have trained for use in the onnx runtime without this
change.

Currently, this workaround lives as a local change in my python package
directory, and makes it basically impossible for anyone else at the
place I work at to successfully export the quantized model that I am
exporting.


See issue 15465: https://github.com/microsoft/onnxruntime/issues/15465

Co-authored-by: Dhruv Matani <dhruv.matani@grammarly.com>
2024-04-01 21:33:00 -07:00
Yufeng Li
91654988fd
optimize threading of mha (#20088)
### Description
<!-- Describe your changes. -->
The cost computation of ComputeVxAttentionScore is wrong. It should be
sequence_length * v_head_size * total_sequence_length instead of
sequence_length * v_head_size * sequence_length.

The PR also fine-tuned the cost computation.

on my local box with i9 cpu, the performance is same as unfused version,
but it is much faster on an azure vm with 16 threads.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

https://github.com/microsoft/onnxruntime/issues/19924
2024-04-01 21:32:36 -07:00
Atanas Dimitrov
9d06e1bfa4
Label encoder fusion (#19761)
### Description
Created a new `LabelEncoderFusion` pass. This is useful in model that
result from automatic conversion tools related to data-science:
sometimes the produced model contains consecutive `LabelEncoder`-s.
To merge 2 `LabelEncoder`-s the optimizer propagates the outputs of the
first encoder through the second one.


### Motivation and Context
This enhances the capabilities of the `onnxruntime::optimizer` by fusing
consecutive `LabelEncoder` nodes.


### Fusion examples
```
Applying fusion
node1: (a,C) (b,B) (c,A) -> Default: _Unused
node2: (A,1) (B,2) (C,3) -> Default: -1
fused: (a,3) (b,2) (c,1) -> Default: -1
Applying fusion
node1: (a,C) (b,B) (c,A) -> Default: D
node2: (A,a) (B,b) (C,c) (D,d) -> Default: default
fused: (a,c) (b,b) (c,a) -> Default: d
Applying fusion
node1: (a,0) (b,1) (c,2) -> Default: -1
node2: (2,a) (1,b) (0,c) -> Default: default
fused: (a,c) (b,b) (c,a) -> Default: default
Applying fusion
node1: (a,3) (b,2) (c,1) -> Default: -1
node2: (1,a) (2,b) (3,c) -> Default: d
fused: (a,c) (b,b) (c,a) -> Default: d
```

---------

Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2024-04-01 09:41:10 -07:00
Yi Zhang
523ef04240
enable lto in Python-CUDA-Packaging Pipline (#20164)
### Description
Except [Python-CUDA-Packaging
pipeline](https://dev.azure.com/aiinfra/Lotus/_build?definitionId=1299&_a=summary),
all windows cuda packaging jobs have been running well now.
After comparison, enable_lto isn't added in the pipeline, which might be
one root cause of the random hang.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-04-01 15:42:28 +08:00
Sumit Agarwal
e1e292f94c
[DML EP] DML Graph Serialization Bug (#19748)
### Description
This pull request addresses several issues:

- The DML Graph's nodes were not sorted in a topologically ordered
sequence, leading to crashes during deserialization when a child node
preceded its parent node. This PR resolves this issue by implementing a
topological sorting algorithm before serialization.

- During the `RemoveUnconnectedNodes` process:
- we update `intermeidateEdge.FromNodeIndex`. Additionally, we must
update `intermediateEdge.Name` when it includes
`intermediateEdge.FromNodeIndex`, as serialization/deserialization
heavily relies on edge names.

- we also eliminate unused edges. Consequently, we must erase inputs
(now unused) from corresponding maps
`serializedGraphInputIndexToSubgraphInputIndex` and
`serializedGraphLargeConstantNameToSubgraphInputIndex`.


### Motivation and Context
Why is this change required? What problem does it solve?
There are few ONNX Zoo public models which were crashing during
deserialization.
<!-- - - If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Jeff Bloomfield <38966965+jeffbloo@users.noreply.github.com>
2024-03-31 14:41:42 -07:00
kunal-vaishnavi
a0ebd5fee5
Add flash attention v2 and INT4 CUDA for LLaMA E2E benchmarking (#20149)
### Description
This PR adds flash attention v2 and support for INT4 CUDA benchmarking
in PyTorch.

### Motivation and Context
The [flash attention v2](https://github.com/Dao-AILab/flash-attention)
algorithm helps improve model performance in PyTorch. Support for INT4
CUDA in PyTorch is done through the
[`bitsandbytes`](https://github.com/TimDettmers/bitsandbytes) package.
2024-03-29 23:09:37 -07:00
mo-ja
00244ea143
fix quantization errors of ConvTranspose with per_channel=True (#19996)
### Description
<!-- Describe your changes. -->
 - update axis value for per_channel quantization of QDQConv
   - we should use `axis=1` for ConvTranspose operator.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- this PR fixes https://github.com/microsoft/onnxruntime/issues/19694,
which I have opened
2024-03-29 21:36:15 -07:00
Ye Wang
f3a864217f
Fix MoE tensor parallelism tests (#20147)
### Description
<!-- Describe your changes. -->
Previously the expert weights are in row-major. But with the updated
cutlass extension introduced by
https://github.com/microsoft/onnxruntime/pull/20108, weights are stored
in col-major that aligns with Pytorch implementation. This change fixes
the way the tensors are sliced across shards.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-29 16:10:09 -07:00
Jeff Bloomfield
2f31560430
Enable generic feature level devices in DML EP (#20114)
### Description
Enable NPUs supporting DXCORE_ADAPTER_ATTRIBUTE_D3D12_GENERIC_ML and
D3D_FEATURE_LEVEL_1_0_GENERIC with DML EP. This also begins ingesting DX
headers through the DirectX-Headers repo.

Note that this includes an update to cgamanifest.json for onnx-tensorrt
which is triggered during re-generation due to a prior changes to
deps.txt.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-29 14:37:30 -07:00
cao lei
604b284261
add API function GetAliasMap and ReleaseAliasMap in OrtCustomOp (#20145)
### Description
<!-- Describe your changes. -->
Add API function GetAliasMap and ReleaseAliasMap in OrtCustomOp 


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Add API function GetAliasMap and ReleaseAliasMap in OrtCustomOp
2024-03-29 13:49:56 -07:00
inisis
8396845806
fix shape inference bug (#19848)
### Description
for nodes like add, their input should be merged dynamically

### Motivation and Context
when doing shape inference, for nodes like Add, currently when doing _onnx_infer_single_node, their inputs are generated from last node's output, but they should be merged.
2024-03-29 13:06:27 -07:00
Adrian Lizarraga
b1a5eb255e
[Quant] Fix accuracy_level config option for MatMul 4bits quantizer (#20146)
### Description
Fixes code that extracts the accuracy level when creating a MatMulNBits
node in the `DefaultWeightOnlyQuantizer` class.


### Motivation and Context
Error from line 443: `AttributeError: 'DefaultWeightOnlyQuantizer'
object has no attribute 'accuracy_level'`. The solution is to access
`self.config.accuracy_level` instead of `self.accuracy_level`.

Relevant commit: https://github.com/microsoft/onnxruntime/pull/19106
2024-03-29 11:54:55 -07:00
Ye Wang
17919717b5
add QMoE (#20108)
### Description
<!-- Describe your changes. -->
1. Introduce latest cutlass extension from TRTLLM that gives us cutlass
upgrade(to 3.4) opportunity from MoE side.
2. Fix Windows build issue
3. Add Int4 MoE op and ut



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-03-29 10:24:19 -07:00
pengwa
2092bebc78
Fix transformer layer detection for recompute (#20106)
### Fix transformer layer detection for recompute

Originally logic miss detecting the layer boudary node in Mistral model.
This PR simplifies the searching, by using more strong pattern's match,
to make sure it is flexible enough to cover different transformer
variants.

Also add a UT.

Add a warning when user enable layerwise recompute but no layer boudary
nodes are found.
2024-03-29 17:44:38 +08:00
cao lei
2a184ac1a1
use OrtCustomOp's new API GetMayInplace in CreateKernelCreateInfo (#20037)
### Description
<!-- Describe your changes. -->
use OrtCustomOp's new API GetMayInplace in CreateKernelCreateInfo to
hook the inplace map of custom ops


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR is to use OrtCustomOp's new API GetMayInplace in
CreateKernelCreateInfo to hook the inplace map of custom ops
2024-03-28 20:45:37 -07:00
Adam Pocock
2f82400b13
[java] Java 21 build support (#19876)
### Description
Bump spotless and the Gradle wrapper to 6.25.0 and 8.6 respectively to
allow compiling ORT on Java 21. The build still targets Java 8.

I'm not sure if there will be CI changes necessary to use this PR,
specifically for the Gradle version as I don't know if that is cached
somewhere earlier in the CI build process.

The new Gradle version adds a warning that using `--source` and
`--target` to select the Java language version is obsolete which is
annoying, we can fix it if we decide to only allow building on newer
versions of Java, while still supporting running on Java 8.

### Motivation and Context
Java 21 is the latest LTS release of Java and ORT should be able to
build on it.
2024-03-28 15:51:22 -07:00