Commit graph

12304 commits

Author SHA1 Message Date
edgchen1
c7358dad6f Merge remote-tracking branch 'origin/main' into edgchen1/remove_duplicate_header 2025-02-06 18:00:40 -08:00
microsoft-github-policy-service[bot]
65008cbb73
Auto-generated baselines by 1ES Pipeline Templates (#23603) 2025-02-06 17:06:29 -08:00
Tianlei Wu
09e5724f3b
[CUDA] Fix beam search of num_beams > 32 (#23599)
### Description
* Pass topk_scores to beam scorer in slow topk path.
* Add an env variable `ORT_BEAM_SEARCH_USE_FAST_TOPK` to enable/disable fast topk.
* Add a test case for slow topk path.

### Motivation and Context

This bug was introduced in
https://github.com/microsoft/onnxruntime/pull/16272

Beam search uses fast cuda kernel when number of beams <= 32. When beam
size is larger than that threshold, we use another code path (slower
cuda kernel) to get topk. In such `slow topk path`, topk_scores shall be
passed to beam scorer but it is not.

This bug will cause incorrect result when num_beams > 32. It was not
found previously since such large beam size is rarely used.
2025-02-06 16:50:31 -08:00
Sushanth Rajasankar
82840f635d
Implement Flash Attention 2 for webgpu EP (#23576)
### Description
This change implements FlashAttention 2 for the webgpu EP for the MHA
operator.

Numbers from Alderlake device show a 2.2x speed up for prefill, which
considering that Attention is 50% of prefill phase (other 50% being
MatMul) implies 4x speed up for Attention with this implementation. This
is inline with the expected perf gain of 2-4x with FlashAttention over
regular attention.

```
Baseline
PS C:\onnxruntime> C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web\ -l 1000
Batch size: 1, prompt tokens: 1001, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       9.54997e+06   <<<<<
        avg (tokens/s): 104.817
        p50 (us):       9.49218e+06
        stddev (us):    251442
        n:              5 * 1001 token(s)
------
With FlashAttention 2
PS C:\onnxruntime> C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web\ -l 1000
Batch size: 1, prompt tokens: 1001, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       4.27937e+06     <<<<<
        avg (tokens/s): 233.913
        p50 (us):       4.27687e+06
        stddev (us):    5344.1
        n:              5 * 1001 token(s)
```

### Motivation and Context

On integrated GPUs memory bandwidth is premium, Flash attention makes
softmax computation (and therefore output attention vector computation)
a running operation instead of maintaining full QKt attention scores in
memory. As a result, we see significant improvements in prefill speed -
200% speed up measured here.

This change uses techniques from co-operative matrix multiply to use
registers from a subgroup for fast in register matrix multiply. Without
the co-operative matrix multiply technique ALD showed about 6.0s prefill
time.

Tested on ALD/TGL intel integrated and Nvidia 4070.

### Future Work
- Fine tuning and profiling optimizations.
- Current implement is for prefill only, a generation phase optimized
FA2 implementation is possible, however attention is a tiny part of the
generation phase.
2025-02-06 16:32:05 -08:00
Ankit Maheshkar
a6ea57b8f3
OpenVINO EP Weights Sharing Feature (#23553)
### Description
These changes are done to ensure that weight sharing happens between two model using session context option ep_weight_sharing.

Key changes introduced in this feature are:

Creating a shared context between two models Extracting external constant initializers and re labelling them back as
inputs to the model to allow weight loading in the direct blob. Creating EP Context Nodes when Subgraph partitioning is happening.

### Motivation and Context
This change was required to ensure that LLM with prefill and kvcache models can use the same share
The change was also required to ensure EP Context nodes can be formed even when model is being subgraph partitioned.

---------

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: saurabh <saurabh1.kale@intel.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
2025-02-06 14:57:38 -08:00
Tianlei Wu
2c2ff4aef9
[CUDA] Fix BeamSearchTest.DummyT5WithSequenceInputIds test failure in Windows (#23596)
### Description
BeamSearchTest.DummyT5WithSequenceInputIds failed in Windows due to
early stopping triggered. The cause is state.early_stopping_ is
interpreted as true in cuda kernel at some point, however printf still
show its value is false. The root cause is unknown.

Update the code to use early_stopping as template parameter seems walk
around the issue.

Other changes: 
* Add some debug code (will not be built into binary unless
DEBUG_GENERATION is fined) to assist debugging beam search scorer in
CUDA.
* Enable DummyT5WithSequenceInputIds test in CI. This test was not run
in Windows CUDA CI pipeline previously.

### Motivation and Context

Fix a unit test BeamSearchTest.DummyT5WithSequenceInputIds failure in
Windows.
2025-02-06 13:15:09 -08:00
Joshua Lochner
d981b153d3
[webgpu/js] Optimize resize webgpu op & fix precision issues (#23591)
### Description
<!-- Describe your changes. -->

This PR is a follow-up to
https://github.com/microsoft/onnxruntime/pull/23488 and partially
improves upon https://github.com/microsoft/onnxruntime/issues/23403. It
does the following:
- Prevents unnecessary cache shader recompilation for 'nearest' resize
operation.
- Fixes precision (offset-by-one) errors with asymmetric coordinate
transform. When running the Kokoro TTS model, values for the
`/decoder/decoder/generator/f0_upsamp/Resize_output_0` results in
differences at the end bounds due to precision issues when dividing
21600 by 72 (should be 300, but seemingly results in 299.999, which
causes issues when flooring)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

I did a deep dive over the weekend to try fix Kokoro TTS on WebGPU and
found that the above node had a large difference. Thinking this was a
major issue, I spent some time fixing it. Turns out, it only happens for
a small number of values, leading to high maximum error, but most values
are correct (as seen here).

BEFORE:
```
[/decoder/decoder/generator/f0_upsamp/Resize_output_0] atol: 78.6640682220459 | rtol: 24.13991587587724 | avgDiff: 0.009967932171121087 | medianDiff: 0.000030517578125
```

AFTER:
```
[/decoder/decoder/generator/f0_upsamp/Resize_output_0] atol: 0.0011138916015625 | rtol: 0.0020059924232260704 | avgDiff: 0.00008570214675873825 | medianDiff: 0.000030517578125
```

So, although it has a very small impact on the final output (waveform),
this bug could appear with other models in a more severe way.

BEFORE:
```
[waveform] atol: 0.04784199967980385 | rtol: 1366.0462001093495 | avgDiff: 0.0009544936942737713 | medianDiff: 0.00015346752479672432
```

AFTER:
```
[waveform] atol: 0.04775865003466606 | rtol: 1354.7002460360852 | avgDiff: 0.000954830244055033 | medianDiff: 0.00015274062752723694
```
2025-02-06 10:26:25 -08:00
Changming Sun
328a13c06d
Enable VCPKG in more pipelines (#23590)
### Description
Enable VCPKG in more pipelines
2025-02-06 10:10:31 -08:00
Yifan Li
6728d6085d
[TensorRT EP] support TensorRT 10.8-GA (#23592)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-06 10:05:57 -08:00
Jambay Kinley
d1fb58b0f2
Quantization tool: Allow user to override calibrator's session EP (#23559)
### Description
The quantization calibrators have `execution_providers` attributes but
there is no way for a user to provide their own providers when using the
`quantize` or `quantize_static` functions. This PR adds a
`calibration_providers` parameter to allow users to specify the
execution providers to use during calibration. It is helpful when
quantizing large models which are slow to calibrate on the CPU.
- Chose `calibration_providers` as the name since there is the
docstrings refer to another `execution_provider`
169917b1e7/onnxruntime/python/tools/quantization/quantize.py (L204)

169917b1e7/onnxruntime/python/tools/quantization/quantize.py (L415)
which are not present anywhere in the code.
- Can change the name to something else if needed like
calibrator_providers, and/or make it into a string instead of a
providers list.
2025-02-05 22:38:21 -08:00
Hector Li
649ced4a60
Enable user loading model with external data from memory buffer (#23557)
Add session option to enable user loading model with external data from memory buffer. User want to set the folder path for the external data files.

### Description
For some cases user load the model from memory buffer, but they can't load the external files into memory. They need to have a way to set the folder path for the external data files so that Ort can figure out the external data location.
2025-02-05 22:31:13 -08:00
Satya Kumar Jandhyala
544bdd6073
Fix ConvTranspose for certain attribute combinations (#23488)
### Description
Convert output_padding attribute from 1D to 2D convtranspose



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/23403
2025-02-05 12:22:47 -08:00
Changming Sun
8f6ddf3bd5
Delete extra cgmanifest entries and files (#23583)
Remove the auto-generated cgmanifest.json. Because now we can get the
same information from vcpkg.
Also, remote some outdated entries in the main cgmanifest.json file.
2025-02-05 11:21:21 -08:00
Changming Sun
5f6a3158f8
Enable VCPKG in CI build (#23426)
### Description
1. Enable VCPKG flag in Windows CPU CI build pipelines. 
2. Increased the min supported cmake version from 3.26 to 3.28. Because
of it, drop the support for the old way of finding python by
"find_package(PythonLibs)". Therefore, in build.py we no longer set
"PYTHON_EXECUTABLE" cmake var when doing cmake configure.
3. Added "xnnpack-ep" as a feature for ORT's vcpkg config.
4. Added asset cache support for ORT's vcpkg build
5. Added VCPKG triplet files for Android build.
6. Set VCPKG triplet to "universal2-osx" if CMAKE_OSX_ARCHITECTURES was
found in cmake extra defines.
7. Removed a small piece of code in build.py, which was for support CUDA
version < 11.8.
8. Fixed an issue that CMAKE_OSX_ARCHITECTURES sometimes got specified
twice when build.py invoked cmake.
9. Added more model tests to Android build. After this change, we will
test all ONNX versions instead of just the latest one.
10. Fixed issues that are related to build.py's "--build_nuget"
parameter. Also, enable the flag in most Windows CPU CI build jobs.
11. Removed a restriction in build.py that disallowed cross-compiling
Windows ARM64 nuget package on Windows x86.
 
### Motivation and Context
Adopt vcpkg.
2025-02-05 10:58:53 -08:00
dependabot[bot]
e1e3f623f6
Bump lintrunner from 0.12.5 to 0.12.7 (#23326)
Bumps [lintrunner](https://github.com/suo/lintrunner) from 0.12.5 to
0.12.7.
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/suo/lintrunner/blob/main/CHANGELOG.md">lintrunner's
changelog</a>.</em></p>
<blockquote>
<h2>[0.12.7] - 2024-12-05</h2>
<h3>Bug Fixes</h3>
<ul>
<li>Build x86_64 wheels for Windows (<a
href="a4d6b74693">a4d6b74</a>)</li>
<li>Fix <a href="https://doc.rust-lang.org/clippy/">Clippy</a>
violatoins (<a
href="05ff6431bb">05ff643</a>)</li>
<li>Fetch all commit history to fix MacOS builds (<a
href="3770be65ee">3770be6</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="1b70da01a6"><code>1b70da0</code></a>
chore(release): prep for 0.12.7</li>
<li><a
href="3770be65ee"><code>3770be6</code></a>
[CI] Fetch full commit history (<a
href="https://redirect.github.com/suo/lintrunner/issues/81">#81</a>)</li>
<li><a
href="b2482aff48"><code>b2482af</code></a>
[CI] Use <code>actions/checkout@v4</code> (<a
href="https://redirect.github.com/suo/lintrunner/issues/80">#80</a>)</li>
<li><a
href="05ff6431bb"><code>05ff643</code></a>
Fix clippy violations (<a
href="https://redirect.github.com/suo/lintrunner/issues/79">#79</a>)</li>
<li><a
href="1be20c6b8f"><code>1be20c6</code></a>
chore(release): prep for 0.12.6</li>
<li><a
href="a4d6b74693"><code>a4d6b74</code></a>
fix(build): build x86_64 wheels for Windows (<a
href="https://redirect.github.com/suo/lintrunner/issues/73">#73</a>)</li>
<li>See full diff in <a
href="https://github.com/suo/lintrunner/compare/v0.12.5...v0.12.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=lintrunner&package-manager=pip&previous-version=0.12.5&new-version=0.12.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-04 19:50:56 -08:00
Jon Campbell
cd8775f518
Fix Node JS Samples (#23581)
### Description
The Node JS Samples included in the repository have outdated package
references that are broken, which are fixed in this PR.

### Motivation and Context
The samples included in this repository should just work, but sadly do
not. The reason is that they are using very outdated references for the
npm modules. This fix updates the dependencies to the current
onnxruntime-node, which fixes the samples. Also adds a small update to
the .gitignore to exclude the node_modules directories in the samples
directory, which keeps the local repo changelist cleaner.
2025-02-04 19:50:29 -08:00
Prathik Rao
6b4f9c481d
[WebGPU EP] Batch Norm Implementation (#23525)
Increases operator coverage for webgpu ep.

---------

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-02-04 17:38:45 -08:00
Gavin Kinsey
1fce51b3b2
Fix all instances of 4244 and 4267 warnings in OV EP code (#23567)
### Description
Remove MSVC warnings 4244, 4267 from the list of disabled warnings in
cmake.
Fix the code that generates the warnings so that it no longer does.

### Motivation and Context
This makes onnxruntime_providers_openvino.dll pass BinSkim analysis.
Without this change BinSkim complains about the disabled warnings.
2025-02-04 17:13:27 -08:00
Hector Li
c29ca1cb41
Update QNN default version to 2.31 (#23573)
Update QNN default version to 2.31
2025-02-04 16:24:54 -08:00
Caroline Zhu
2fc75a45a2
[mobile] Add Android BrowserStack test project back (#23551)
## Description
Follow-up for #23383 and #23474

* Adds android BrowserStack test back in
* Modifies MAUI csproj file to build into an APK


### Motivation and Context
There were 2 issues with the previous PRs:
1. The updated MAUI .csproj file configuration failed when building to
iOS and MacCatalyst. This caused problems in the packaging pipeline
because we build all C# projects in the .soln file in the packaging
pipeline. Removed the Mac & iOS build targets for now

3. The previous MAUI .csproj file configuration did not build into an
APK. It was missing the `<OutputType>` XAML tag and the Android package
type XAML tag.
2025-02-04 14:39:50 -08:00
Tianlei Wu
9e18b6a0f3
[CUDA] Update nvcc flags (#23572)
### Description
(1) Remove `if (CMAKE_CUDA_COMPILER_VERSION VERSION_GREATER_EQUAL 11)`
since build requires cuda >= 11.4.
(2) Add sm_86 and sm_89 since we generate SASS code for specified cuda
architectures only. This change could support popular consumer GPUs
(like RTX 30X0 and RTX 40X0).
(3) Add sm_120 to support Blackwell GPUs (like RTX 50X0 etc).
(4) Add `-Xfatbin=-compress-all` to reduce wheel size. When
CMAKE_CUDA_ARCHITECTURES is not specified, the linux wheel size built by
CUDA 12.8 is reduced 8% (from 324MB to 299MB).

### Motivation and Context

To support popular consumer GPUs (RTX 30x0, 40x0, 50x0) in the default
setting. Reduce binary size.

Note that the default sm settings does not impact official released
binary. ORT official released binary are built with augmentation like
CMAKE_CUDA_ARCHITECTURES=75;80;90, which has both SASS (real) and PTX
(virtual) by default. See
https://cmake.org/cmake/help/latest/prop_tgt/CUDA_ARCHITECTURES.html for
more info.
2025-02-04 11:47:02 -08:00
Adrian Lizarraga
b47e1e64d7
[QNN EP] Make offloading graph input/output quantization (to CPU) the default (#23368)
### Description
Makes the QNN provider option `offload_graph_io_quantization` enabled by
default. It was previously disabled by default.



### Motivation and Context
Enabling this option significantly decreases inference latency for many
models.
2025-02-04 11:42:46 -08:00
Tianlei Wu
75a9b40da2
[ROCm] Update CI to use rocm 6.3.2 (#23577)
### Description
* Update rocm to 6.3.2;
* Remove dependency on cupy (which does not support rocm 6.3 yet).

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-04 11:01:12 -08:00
dependabot[bot]
26ff2b66ef
Bump ruff from 0.9.3 to 0.9.4 (#23563)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.9.3 to 0.9.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.9.4</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>airflow</code>] Extend airflow context parameter check for
<code>BaseOperator.execute</code> (<code>AIR302</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15713">#15713</a>)</li>
<li>[<code>airflow</code>] Update <code>AIR302</code> to check for
deprecated context keys (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15144">#15144</a>)</li>
<li>[<code>flake8-bandit</code>] Permit suspicious imports within stub
files (<code>S4</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15822">#15822</a>)</li>
<li>[<code>pylint</code>] Do not trigger <code>PLR6201</code> on empty
collections (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15732">#15732</a>)</li>
<li>[<code>refurb</code>] Do not emit diagnostic when loop variables are
used outside loop body (<code>FURB122</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15757">#15757</a>)</li>
<li>[<code>ruff</code>] Add support for more <code>re</code> patterns
(<code>RUF055</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15764">#15764</a>)</li>
<li>[<code>ruff</code>] Check for shadowed <code>map</code> before
suggesting fix (<code>RUF058</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15790">#15790</a>)</li>
<li>[<code>ruff</code>] Do not emit diagnostic when all arguments to
<code>zip()</code> are variadic (<code>RUF058</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15744">#15744</a>)</li>
<li>[<code>ruff</code>] Parenthesize fix when argument spans multiple
lines for <code>unnecessary-round</code> (<code>RUF057</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15703">#15703</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>Preserve quote style in generated code (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15726">#15726</a>,
<a
href="https://redirect.github.com/astral-sh/ruff/pull/15778">#15778</a>,
<a
href="https://redirect.github.com/astral-sh/ruff/pull/15794">#15794</a>)</li>
<li>[<code>flake8-bugbear</code>] Exempt <code>NewType</code> calls
where the original type is immutable (<code>B008</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15765">#15765</a>)</li>
<li>[<code>pylint</code>] Honor banned top-level imports by
<code>TID253</code> in <code>PLC0415</code>. (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15628">#15628</a>)</li>
<li>[<code>pyupgrade</code>] Ignore <code>is_typeddict</code> and
<code>TypedDict</code> for <code>deprecated-import</code>
(<code>UP035</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15800">#15800</a>)</li>
</ul>
<h3>CLI</h3>
<ul>
<li>Fix formatter warning message for <code>flake8-quotes</code> option
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/15788">#15788</a>)</li>
<li>Implement tab autocomplete for <code>ruff config</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15603">#15603</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-comprehensions</code>] Do not emit
<code>unnecessary-map</code> diagnostic when lambda has different arity
(<code>C417</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15802">#15802</a>)</li>
<li>[<code>flake8-comprehensions</code>] Parenthesize
<code>sorted</code> when needed for
<code>unnecessary-call-around-sorted</code> (<code>C413</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15825">#15825</a>)</li>
<li>[<code>pyupgrade</code>] Handle end-of-line comments for
<code>quoted-annotation</code> (<code>UP037</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15824">#15824</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Add missing config docstrings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15803">#15803</a>)</li>
<li>Add references to <code>trio.run_process</code> and
<code>anyio.run_process</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15761">#15761</a>)</li>
<li>Use <code>uv init --lib</code> in tutorial (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15718">#15718</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AlexWaygood"><code>@​AlexWaygood</code></a></li>
<li><a
href="https://github.com/Garrett-R"><code>@​Garrett-R</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a
href="https://github.com/JelleZijlstra"><code>@​JelleZijlstra</code></a></li>
<li><a href="https://github.com/Lee-W"><code>@​Lee-W</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a
href="https://github.com/charliermarsh"><code>@​charliermarsh</code></a></li>
<li><a
href="https://github.com/dcreager"><code>@​dcreager</code></a></li>
<li><a
href="https://github.com/dhruvmanila"><code>@​dhruvmanila</code></a></li>
<li><a href="https://github.com/dylwil3"><code>@​dylwil3</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.9.4</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>airflow</code>] Extend airflow context parameter check for
<code>BaseOperator.execute</code> (<code>AIR302</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15713">#15713</a>)</li>
<li>[<code>airflow</code>] Update <code>AIR302</code> to check for
deprecated context keys (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15144">#15144</a>)</li>
<li>[<code>flake8-bandit</code>] Permit suspicious imports within stub
files (<code>S4</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15822">#15822</a>)</li>
<li>[<code>pylint</code>] Do not trigger <code>PLR6201</code> on empty
collections (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15732">#15732</a>)</li>
<li>[<code>refurb</code>] Do not emit diagnostic when loop variables are
used outside loop body (<code>FURB122</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15757">#15757</a>)</li>
<li>[<code>ruff</code>] Add support for more <code>re</code> patterns
(<code>RUF055</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15764">#15764</a>)</li>
<li>[<code>ruff</code>] Check for shadowed <code>map</code> before
suggesting fix (<code>RUF058</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15790">#15790</a>)</li>
<li>[<code>ruff</code>] Do not emit diagnostic when all arguments to
<code>zip()</code> are variadic (<code>RUF058</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15744">#15744</a>)</li>
<li>[<code>ruff</code>] Parenthesize fix when argument spans multiple
lines for <code>unnecessary-round</code> (<code>RUF057</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15703">#15703</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>Preserve quote style in generated code (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15726">#15726</a>,
<a
href="https://redirect.github.com/astral-sh/ruff/pull/15778">#15778</a>,
<a
href="https://redirect.github.com/astral-sh/ruff/pull/15794">#15794</a>)</li>
<li>[<code>flake8-bugbear</code>] Exempt <code>NewType</code> calls
where the original type is immutable (<code>B008</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15765">#15765</a>)</li>
<li>[<code>pylint</code>] Honor banned top-level imports by
<code>TID253</code> in <code>PLC0415</code>. (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15628">#15628</a>)</li>
<li>[<code>pyupgrade</code>] Ignore <code>is_typeddict</code> and
<code>TypedDict</code> for <code>deprecated-import</code>
(<code>UP035</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15800">#15800</a>)</li>
</ul>
<h3>CLI</h3>
<ul>
<li>Fix formatter warning message for <code>flake8-quotes</code> option
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/15788">#15788</a>)</li>
<li>Implement tab autocomplete for <code>ruff config</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15603">#15603</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>[<code>flake8-comprehensions</code>] Do not emit
<code>unnecessary-map</code> diagnostic when lambda has different arity
(<code>C417</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15802">#15802</a>)</li>
<li>[<code>flake8-comprehensions</code>] Parenthesize
<code>sorted</code> when needed for
<code>unnecessary-call-around-sorted</code> (<code>C413</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15825">#15825</a>)</li>
<li>[<code>pyupgrade</code>] Handle end-of-line comments for
<code>quoted-annotation</code> (<code>UP037</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15824">#15824</a>)</li>
</ul>
<h3>Documentation</h3>
<ul>
<li>Add missing config docstrings (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15803">#15803</a>)</li>
<li>Add references to <code>trio.run_process</code> and
<code>anyio.run_process</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15761">#15761</a>)</li>
<li>Use <code>uv init --lib</code> in tutorial (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15718">#15718</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="854ab03078"><code>854ab03</code></a>
Bump version to 0.9.4 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15831">#15831</a>)</li>
<li><a
href="b0b8b06241"><code>b0b8b06</code></a>
Remove semicolon after TypeScript interface definition (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15827">#15827</a>)</li>
<li><a
href="451f251a31"><code>451f251</code></a>
[red-knot] Clarify behavior when redeclaring base class attributes (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15826">#15826</a>)</li>
<li><a
href="13cf3e65f1"><code>13cf3e6</code></a>
[<code>flake8-comprehensions</code>] Parenthesize <code>sorted</code>
when needed for `unnecessary-...</li>
<li><a
href="56f956a238"><code>56f956a</code></a>
[<code>pyupgrade</code>] Handle end-of-line comments for
<code>quoted-annotation</code> (<code>UP037</code>) (...</li>
<li><a
href="7a10a40b0d"><code>7a10a40</code></a>
[<code>flake8-bandit</code>] Permit suspicious imports within stub files
(<code>S4</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15822">#15822</a>)</li>
<li><a
href="3125332ec1"><code>3125332</code></a>
[red-knot] Format mdtest snippets with the latest version of black (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15819">#15819</a>)</li>
<li><a
href="15d886a502"><code>15d886a</code></a>
[red-knot] Consider all definitions after terminal statements
unreachable (<a
href="https://redirect.github.com/astral-sh/ruff/issues/1">#1</a>...</li>
<li><a
href="e1c9d10863"><code>e1c9d10</code></a>
[<code>flake8-comprehensions</code>] Do not emit
<code>unnecessary-map</code> diagnostic when lambd...</li>
<li><a
href="23c98849fc"><code>23c9884</code></a>
Preserve quotes in generated f-strings (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15794">#15794</a>)</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.9.3...0.9.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.9.3&new-version=0.9.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-02-04 10:55:27 -08:00
Jian Chen
b2560a75cf
Update react-native to 0.72 (#23509)
…-andriod-e2e-test-job.yml

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-04 09:53:20 -08:00
Yulong Wang
faee9125fb
[js] update JavaScript API to support QNN EP options (#23486)
### Description

As a pre-requisite of #23468
2025-02-03 17:38:50 -08:00
Yifan Li
816e8cb2fb
[EP Perf] Update env to ubuntu 22.04 (#23570)
### Description
<!-- Describe your changes. -->
* Update env to cuda 12.6/ubuntu 22.04 (ubuntu 20.04 uses outdated py38
by default)
* Clean old trt8.6 test config


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-02-03 17:35:33 -08:00
Dmitri Smirnov
cddc271b0b
Use Eigen in Round implementation (#23571)
### Description
Attempt to make it more consistent.

### Motivation and Context
Customer reports big difference in perf of Round between Windows and
Linux.
2025-02-03 17:06:12 -08:00
Jambay Kinley
e8b0bdb127
Shape inference: ReduceMean dispatcher, quant_pre_process: skip_symbolic_shape bugfix (#23558)
### Description
- Add symbolic shape inference dispatcher for `ReduceMean`.
- Reducemean is used in RMSNorm so shape inference fails for llama, phi,
etc torch exported models.
- Reuse the dispatcher for ReduceSum since ReduceMean 18+ and ReduceSum
13+ have the same specs other than the type of reduction done.
- Fix an issue with `quant_pre_process` tool where the external data
file is missing if `skip_symbolic_shape=True` and
`skip_optimization=False`.
- Add `"session.optimized_model_external_initializers_file_name"` to
session options so that the external data gets saved in the same temp
directory as the optimized model.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-31 19:37:07 -08:00
Xinpeng Dou
267b49353b
delete the supported domain version upper bounds (#23237)
### Description
<!-- Describe your changes. -->

This PR changes the range of ONNX versions supported by CANN graph
inference to no upper limit (the previous version supports between 8 and
15), because the CANN version is further upgraded to support some
developers' requirements for higher ONNX versions.
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-31 18:21:41 -08:00
George Wu
bb7f9616e6
remove log spam from cpuinfo (#23548)
cpuinfo outputs error when cpu is not recognized. 
this has been a longstanding issue e.g. 
https://github.com/microsoft/onnxruntime/issues/21947
https://github.com/microsoft/onnxruntime/issues/21393

this issue has been exacerbated by
https://github.com/microsoft/onnxruntime/pull/22856
this change
4fa0f1e0ed/onnxruntime/core/mlas/lib/qnbitgemm_kernel_neon.cpp (L189)
causes the messages to appear during static initialization.

this means for python, when you import onnxruntime you immediately see
the errors.

```
>>> import onnxruntime
Error in cpuinfo: Unknown chip model name 'snapdragon (tm) 8cx gen 3 @ 3.40 GHz'.
Please add new Windows on Arm SoC/chip support to arm/windows/init.c!
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
unknown Qualcomm CPU part 0x1 ignored
```

Fix is to patch pytorch_cpuinfo and to comment out std::cerr lines in
cpuid_uarch.cc
the errors are not actionable by the user, so they should not be
emitted.

tested that after these changes, these errors no longer show up.
2025-01-31 18:16:24 -08:00
PARK DongHa
169917b1e7
Use latest vcpkg commit in configuration, sync manifest with deps.txt (#23554)
### Description

`python3` dependency is removed in `onnx` port of
https://github.com/microsoft/vcpkg upstream.

* https://github.com/microsoft/vcpkg/pull/43236
*
https://github.com/microsoft/onnxruntime/pull/23285#issuecomment-2579073056
(Previous work)

Removed `nsync`, and use ONNX 1.70.0+ in vcpkg.json(manifest)

### Motivation and Context

* Help #23158
* #23456
2025-01-31 12:34:07 -08:00
Corentin Maravat
a9d4d08ed1
Add of ReduceMax Gradient (#23501) 2025-01-31 10:37:41 -08:00
Yulong Wang
6bbf1bd948
[js/web] upgrade version of flatbuffers (#23545)
### Description

Upgrade version of flatbuffers to latest. This change fixes #23361.
2025-01-31 10:28:53 -08:00
Sushanth Rajasankar
271c509d59
DP4AMatMul perf refinements (#23539)
In this change

1. Vectorization of k is updated to 4.
2. Tile_A, Tile_B are stored transposed in shared memory. This makes it
so that memory locality is improved for our access pattern.
3. Lane output is switched to being individual vectors and its loop
unrolled, this solves the problem where laneoutput was not on registers
before.

Perf improvements are not very consistent with this change. On Tigerlake
GPU with 32.0.101.6460 (latest intel drivers)
```
Baseline

model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web\ -l 1000
Batch size: 1, prompt tokens: 1001, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       7.36557e+06                         <<<<
        avg (tokens/s): 135.903
        p50 (us):       7.35498e+06
        stddev (us):    27599
        n:              5 * 1001 token(s)

With Change

model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web\ -l 1000
Batch size: 1, prompt tokens: 1001, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       6.52302e+06                           <<<<
        avg (tokens/s): 153.457
        p50 (us):       6.52224e+06
        stddev (us):    10407.3
        n:              5 * 1001 token(s)
```

However, using the Intel GPA comparing before and after profile, one can
clearly see straight runs of ALU work without being interspersed by
writebacks to local memory that contained lane_output before.


![image](https://github.com/user-attachments/assets/e01d3474-8406-4a61-b352-2ecbf0855a7f)
2025-01-31 10:20:01 -08:00
kunal-vaishnavi
cb69c59863
Add fusions for SigLIP and Conformer-Encoder (#23528)
### Description
This PR adds fusions for [Google's SigLIP
model](https://huggingface.co/google/siglip-base-patch16-224/) and
Microsoft's internal conformer-encoder model.

Here is an example of how to run the ORT transformer optimizer for the
SigLIP model.
```
$ git clone https://github.com/microsoft/onnxruntime
$ cd onnxruntime/onnxruntime/python/tools/transformers
$ python3 optimizer.py --input /path/to/model.onnx --output /path/to/model_opt.onnx --model_type clip --num_heads 16 --hidden_size 1152 --use_external_data_format --opt_level 0 --disable_shape_inference
```

Here is an example of how to run the ORT transformer optimizer for the
conformer-encoder model.
```
$ git clone https://github.com/microsoft/onnxruntime
$ cd onnxruntime/onnxruntime/python/tools/transformers
$ python3 optimizer.py --input /path/to/model.onnx --output /path/to/model_opt.onnx --model_type conformer --num_heads 16 --hidden_size 1024 --use_external_data_format --opt_level 0 --disable_shape_inference --convert_attribute
```

### Motivation and Context
This PR helps optimize multi-modal models that use SigLIP for the vision
encoder and conformer-encoder for the speech encoder.

This PR uses changes from the following PRs:
- https://github.com/pytorch/pytorch/pull/144801
- https://github.com/microsoft/onnxscript/pull/2018
- https://github.com/microsoft/onnxscript/pull/2019
- https://github.com/microsoft/onnxscript/pull/2020
- https://github.com/microsoft/onnxscript/pull/2021
- https://github.com/microsoft/onnxscript/pull/2022
- https://github.com/microsoft/onnxscript/pull/2024
- https://github.com/microsoft/onnxscript/pull/2025
- https://github.com/microsoft/onnxscript/pull/2029
- https://github.com/microsoft/onnxscript/pull/2033

### Introduction of ONNX Script

This PR introduces [ONNX
Script](https://github.com/microsoft/onnxscript) into the ORT
transformer optimizer as an optional step via the
`fold_transpose_initializers()` method of the `DynamoOnnxHelper` class.
2025-01-31 09:17:49 -08:00
Changming Sun
61fae9bb91
Remove "--enable_pybind" from webgpu pipeline (#23550)
There is a crash in the WebGPU CI pipeline. It crashed at process
shutdown when unloading onnxruntime_pybind11_state.pyd.
Here is the callstack:

```
 	dxil.dll!DxcSwapThreadMalloc()	Unknown
 	dxil.dll!DxcThreadMalloc::DxcThreadMalloc(struct IMalloc *)	Unknown
 	dxil.dll!DxcValidator::Release(void)	Unknown
 	[Inline Frame] webgpu_dawn.dll!Microsoft::WRL::ComPtr<IDxcValidator>::InternalRelease() Line 235	C++
 	[Inline Frame] webgpu_dawn.dll!Microsoft::WRL::ComPtr<IDxcValidator>::{dtor}() Line 290	C++
 	webgpu_dawn.dll!dawn::native::d3d12::Backend::`scalar deleting destructor'(unsigned int)	C++
 	webgpu_dawn.dll!`eh vector destructor iterator'(void * ptr, unsigned __int64 size, unsigned __int64 count, void(*)(void *) destructor)	C++
 	webgpu_dawn.dll!dawn::native::InstanceBase::~InstanceBase() Line 197	C++
 	webgpu_dawn.dll!dawn::native::InstanceBase::`scalar deleting destructor'(unsigned int)	C++
 	webgpu_dawn.dll!dawn::native::InstanceBase::DeleteThis() Line 218	C++
 	ucrtbase.dll!<lambda>(void)()	Unknown
 	ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda>(void) &,<lambda_3883c3dff614d5e0c5f61bb1ac94921c>>()	Unknown
 	ucrtbase.dll!_execute_onexit_table()	Unknown
 	onnxruntime_pybind11_state.pyd!dllmain_crt_process_detach(const bool is_terminating) Line 182	C++
>	onnxruntime_pybind11_state.pyd!dllmain_dispatch(HINSTANCE__ * const instance, const unsigned long reason, void * const reserved) Line 293	C++
 	ntdll.dll!LdrpCallInitRoutine()	Unknown
 	ntdll.dll!LdrShutdownProcess()	Unknown
 	ntdll.dll!RtlExitUserProcess()	Unknown
 	kernel32.dll!ExitProcessImplementation()	Unknown
 	ucrtbase.dll!exit_or_terminate_process()	Unknown
 	ucrtbase.dll!common_exit()	Unknown
 	python312.dll!00007ff9cab3ec8d()	Unknown
 	python312.dll!00007ff9cab3efbf()	Unknown
 	python312.dll!00007ff9cab3edee()	Unknown
 	python312.dll!00007ff9cab57f4c()	Unknown
 	python312.dll!00007ff9cab57579()	Unknown
 	python312.dll!00007ff9cab573be()	Unknown
 	python312.dll!00007ff9cab5729b()	Unknown
 	python312.dll!00007ff9cabacfcb()	Unknown
 	python312.dll!00007ff9cabacd7d()	Unknown
 	python312.dll!00007ff9cab99e2d()	Unknown
 	python.exe!00007ff78a641230()	Unknown
 	kernel32.dll!BaseThreadInitThunk()	Unknown
 	ntdll.dll!RtlUserThreadStart()	Unknown
```
It might be because the destruct order of some global variables was
wrong. I saw DX DLLs were getting destroyed earlier than the WebGPU
instance in our code in onnxruntime_pybind11_state.pyd.
2025-01-31 08:43:58 -08:00
Tianlei Wu
0bb4ea6797
Update BiasGelu fusion and related ops (#23518)
### Description
(1) Update BiasGelu fusion to support onnx Gelu-20

Since onnx Gelu-20 supports float/double/bf16/fp16, here we update
related ops to support these data types in CUDA and ROCm execution
providers:
(2) Add double support for Gelu/FastGelu op in CUDA/ROCm execution
provider
(3) Add BFloat16 support for Gelu ops in CUDA execution provider

(4) Add unit tests
(5) Update operator documents

### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/23491
2025-01-30 22:53:59 -08:00
Caroline Zhu
4dde74a393
Add more details to BrowserStack script failure (#23520)
### Description
Add details about how to access the BrowserStack logs

### Motivation and Context
- browserstack link on its own is confusing to people who don't have
context.

Let me know if you have suggestions to make the text more clear or
informative
2025-01-30 22:25:12 -08:00
Changming Sun
ead9d5cf43
Set ANDROID_USE_LEGACY_TOOLCHAIN_FILE to false (#23544)
NDK has two toolchain cmake files as you can see in 

https://android.googlesource.com/platform/ndk/+/refs/heads/main/build/cmake

By default NDK use the legacy one for providing the best compatibility.
We don't need to. This PR changes to use the new one.

The new toolchain cmake file uses standard cmake flags like
CMAKE_ANDROID_RTTI to control C++ features.
2025-01-30 16:10:09 -08:00
Takeshi Watanabe
7e2408880e
Enable dlpack by default (#23110)
### Description
<!-- Describe your changes. -->
This PR will enable python dlpack interface by default.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

dlpack python interface is useful in inference mode not only training
mode.
Since some inference result preprocess may be written in torch and
making unnecessary device transfer should be reduced in those cases.
closes https://github.com/microsoft/onnxruntime/issues/15963 closes
https://github.com/microsoft/onnxruntime/issues/22061

TODOs:
- [x] Add tests like
5407c69028/orttraining/orttraining/test/python/orttraining_test_ortvalue.py
that's unrelated to training feature

---------

Co-authored-by: Xavier Dupré <xadupre@users.noreply.github.com>
Co-authored-by: Justin Chu <justinchuby@users.noreply.github.com>
2025-01-30 23:23:56 +01:00
Edward Chen
dc2f7a9a0c
Add overload of TryParseStringWithClassicLocale() that uses std::from_chars() (#23541)
Add overload of `TryParseStringWithClassicLocale()` that uses `std::from_chars()` for certain types.

Reduce binary size. It recently increased after PR #23526.
2025-01-30 13:55:54 -08:00
Hector Li
5407c69028
Fix the issue that the new generated EP context model not able to find external data (#23537)
Fix the issue that the new generated EP context model not able to find external data

### Description
The new generated EP context model was not able to find the external data file because it lost track of the source model path which used to locate the external initializers.

Relate to issue: https://github.com/microsoft/onnxruntime/issues/23358
2025-01-29 22:01:13 -08:00
Yulong Wang
fbae88f5ad
[js/web] use the recommended workaround for Vite (#23531)
### Description

After some investigation and debug, I decided to follow the recommended
workaround as suggested in https://github.com/vitejs/vite/issues/8427.

### Motivation and Context

There is a known issue with Vite 5.x when using WebAssembly package.
Detail information is in https://github.com/vitejs/vite/issues/8427.

There are previous attempts to fix this problem (#23487). I tried
various ways to make it working out of the box for Vite users but none
of them worked: Some "fixes" did fix the usage of Vite but broke other
use case/bundler and some introduced other issues. Eventually I figured
out that there is no good way to fix this inside ONNX Runtime.

Considering the root cause is inside Vite and it may be fixed in Vite
v6. I think now the best way is to follow the recommended workaround.
2025-01-29 17:38:22 -08:00
Edward Chen
d5338da1f5
Fix tensor external data info length parsing issue. (#23526)
Fix tensor external data info length parsing issue.

The old implementation was parsing a `size_t` value with `strtol` (via `OrtStrToPtrDiff`) on ARM64 MSVC.

bf023ab3d5/onnxruntime/core/platform/path_lib.h (L74)

If we have `sizeof(size_t) == 8` and `sizeof(long) == 4` (as is the case for x64 and ARM64 MSVC), `strtol` will return a maximum value of `2^31-1` even for a larger, valid `size_t` value. `strtol` will also set `errno` to `ERANGE`, but we weren't checking that.

Updated to use `ParseStringWithClassicLocale` which will parse directly to the target type.

Added some tests.
2025-01-29 13:35:25 -08:00
Ted Themistokleous
e3e41739a7
[ROCm EP] Fix transpose helper for gfx gridsize constraints (#23527)
Remove inline default transposeHelper and ensure we use the proper check
via CanUse_hipBlasTransposeHelper_MLFloat16

Related to change in ROCm Onnxruntime repo:
https://github.com/ROCm/onnxruntime/pull/82

### Description

Required to correctly limit grid size of transpose helper kernel

### Motivation and Context
Compile was defaulting to the inline constructor that was removed
instead of using the overloaded case with proper checks.
Removed the inline default "true" case as this is incorrect for newer
AMD cards/targets

Co-authored-by: Ted Themistokleous <tedthemistokleous@amd.com>
2025-01-29 10:41:16 -08:00
Hector Li
80bc1d25f0
Enable Ep context with external data for CPU nodes (#23498)
### Description
When user dump the EP context model, if the nodes not partitioned to the EP, and they have external initializers, then the dumped model still point to the old external data file. It does not make sense that new generated model still point to old external data file.
Example, model has node A, B, C, D all has external initializer in ext.bin. So ext.bin contains data for A, B, C, D.
After dumping the EP context model, node A is on CPU, node B, C, D are on EP and dumped as EPContext node. If A's data is still in ext.bin, then new generated model has to depend on old ext.bin which contains all external data for the old model which is a big overhead.

Fix:
For new generated model, user should have option to specify the new external data file, so that the new generated model either pack all initializers into the Onnx model or has all initializers in the external data file.
Add option ep.context_model_external_initializers_file_name to specify the new external data file and size threshold. All initializers will be inside the external data fie if the options is specified. Otherwise all initializers will be inside the EP context Onnx model.

### Motivation and Context
Fix the issue https://github.com/microsoft/onnxruntime/issues/23358
2025-01-28 20:22:22 -08:00
Yulong Wang
bf023ab3d5
[js/web] allow import .mjs/.wasm file (#23487)
### Description

Allow importing the `.mjs` and `.wasm` files.

when using Vite, this enables web app to consume ORT-web for simplify
the setup:
   ```js
   import * as ort from 'onnxruntime-web';

   import wasmFileUrl from 'onnxruntime-web/.wasm?url';
   ort.env.wasm.wasmPaths = { wasm: wasmFileUrl };
2025-01-28 16:24:41 -08:00
Karim Vadsariya
655a23ff1d
[onnxruntime/build] Add new flag enable_generic_interface to build primary EPs by default (#23342)
### Description
- Add new build flag in build.py to build onnxruntime.dll supporting
interfaces for all primary EPs( QNN, TensoRT, OpenVino, VitisAI).
- Modify onnxruntime.dll/onnxruntime_shared.dll build settings to remove
dependency of IHV SDK Toolset to be installed on the system.
- Change CMake variables to be explicit when building EP vs ORT. e.g.
onnxruntime_USE_TENSORRT vs onnxruntime_USE_TENSORRT_INTERFACE, to
evolve the build system to build ORT independent of EPs.



### Motivation and Context
Changes in the build system required to evolve the repo to build the
components independently while removing unnecessary dependencies

---------

Co-authored-by: Lei Cao <jslhcl@gmail.com>
Co-authored-by: Karim Vadsariya <kvadsariya@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2025-01-28 15:24:09 -08:00
Jian Chen
a770a8dec8
Update RN to 0.71.19 (#23381)
### Description
<!-- Describe your changes. -->

Upgrading RN to 0.71.19, including Android and iOS changes.. This PR
also include the E2E test changes.

Used React-Native upgrade
[helper](https://react-native-community.github.io/upgrade-helper/?from=0.70.15&to=0.71.19&package=onnxruntime-android&name=onnxruntime)
as the reference.



### Motivation and Context
Need newer RN version to fix S360 work items.
2025-01-28 09:53:13 -08:00