Commit graph

12250 commits

Author SHA1 Message Date
Liqun Fu
b4aad0134c disable 2bit test
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
2025-02-03 14:05:51 -08:00
Liqun Fu
0130061001 fix neon build
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
2025-02-03 12:50:04 -08:00
Liqun Fu
3e1a951448 add apis to neon and other avxs
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
2025-02-03 12:24:40 -08:00
Liqun Fu
f6f22e30d5 some fixes
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
2025-01-30 22:41:16 -08:00
Liqun Fu
8c1cfe11d3 add and pass q4dq tests for q2bit - rename file and test name later
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
2025-01-30 16:36:41 -08:00
Liqun Fu
5484560d5f init code structure for matmul 2 bits
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
2025-01-29 19:11:36 -08:00
Jing Fang
13348c572a
[ARM CPU] hgemm optimized for gqa (#23107)
### Description
Add fp16 kernels for GQA matmul on ARM CPU.
The kernels are mlas hgemm for C = alpha * A x B' + beta * C


### Motivation and Context
Add fp16 support for GQA, speed up the operator and reduce memory usage.

__Token Generation__
| | HGEMM Runtime (ns) | SGEMM Runtime (ns) | Speed-up (%) |

|---------------------------------|--------------------|--------------------|--------------|
| M:1/N:4096/K:4096 | 251551 | 1775905 | 85.84 |
| M:1/N:11008/K:4096 | 892507 | 4649145 | 80.80 |
| M:1/N:4096/K:11008 | 866860 | 3240015 | 73.25 |
| M:1/N:11008/K:11008 | 2631615 |8783877 | 70.04 |

__Prompting__
| | HGEMM Runtime (ns) | SGEMM Runtime (ns) | Speed-up (%) |

|---------------------------------|--------------------|--------------------|--------------|
| M:1024/N:4096/K:4096 | 90508701 | 111283029 | 18.67 |
| M:2048/N:4096/K:4096 | 181307522 | 240211107 | 24.52 |
| M:1024/N:11008/K:4096 | 241120234 | 307707933 | 21.64 |
| M:2048/N:11008/K:4096 | 481091232 | 648921367 | 25.86 |
| M:1024/N:4096/K:11008 | 241736343 | 310129880 | 22.05 |
| M:2048/N:4096/K:11008 | 480456703 | 644814999 | 25.49 |
| M:1024/N:11008/K:11008 | 642121440 | 847925766 | 24.27 |
| M:2048/N:11008/K:11008 | 1276097154 | 1731314509 | 26.29
2025-01-24 15:25:24 -08:00
Grégoire
c89a798b73
Enable opti on Microsoft.ML.OnnxRuntime with RelWithDebInfo config (#23463)
Microsoft.ML.OnnxRuntime is not built with the Release configuration but
RelWithDebInfo which is not recognized by the MSBuild SDK. Consequently,
the optimizations are not enabled. A fix would be to simply force the
configuration to be Release when building the .NET code even if it was
set to RelWithDebInfo in the command line arguments but I could not find
an easy way to do that. Instead, I try to mimic the behavior of the
Release configuration by setting the optimize property.

I can see a 15% performance improvement using this simple model summing
up the 3 inputs:
```csharp
using System.Buffers;
using System.Collections.Frozen;
using System.Net;
using System.Net.Sockets;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Text;
using System.Text.RegularExpressions;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Running;
using Microsoft.ML.OnnxRuntime;

var config = DefaultConfig.Instance; //.WithOptions(ConfigOptions.DisableOptimizationsValidator);
BenchmarkSwitcher.FromAssembly(typeof(Program).Assembly).Run(args, config);

public class OnnxBench
{
    private const int Iterations = 100_000;
    private const int BatchSize = 50;
    
    private InferenceSession _session = default!;
    private string[] _inputNames = default!;
    private OrtValue[] _inputValues = default!;
    private RunOptions _runOptions = default!;

    [GlobalSetup]
    public void GlobalSetup()
    {
        using SessionOptions sessionOptions = new();
        sessionOptions.InterOpNumThreads = 1;
        sessionOptions.IntraOpNumThreads = 1;
        sessionOptions.GraphOptimizationLevel = GraphOptimizationLevel.ORT_ENABLE_ALL;
        sessionOptions.ExecutionMode = ExecutionMode.ORT_SEQUENTIAL;

        _session = new InferenceSession(
            Convert.FromBase64String("CAo6cAoOCgFBCgFCEgFEIgNBZGQKDgoBQwoBRBIBWCIDQWRkEgJscloRCgFBEgwKCggBEgYKAAoCCAFaEQoBQhIMCgoIARIGCgAKAggBWhEKAUMSDAoKCAESBgoACgIIAWIRCgFYEgwKCggBEgYKAAoCCAFCBAoAEBU="),
            sessionOptions);
        _inputNames = ["A", "B", "C"];
        _inputValues =
        [
            OrtValue.CreateTensorValueFromMemory(new float[BatchSize], [BatchSize, 1]),
            OrtValue.CreateTensorValueFromMemory(new float[BatchSize], [BatchSize, 1]),
            OrtValue.CreateTensorValueFromMemory(new float[BatchSize], [BatchSize, 1]),
        ];
        _runOptions = new RunOptions();
    }

    [Benchmark(OperationsPerInvoke = Iterations)]
    public float Run()
    {
        var inputValues0Span = _inputValues[0].GetTensorMutableDataAsSpan<float>();
        var inputValues1Span = _inputValues[1].GetTensorMutableDataAsSpan<float>();
        var inputValues2Span = _inputValues[2].GetTensorMutableDataAsSpan<float>();
        for (int i = 0; i < BatchSize; i += 1)
        {
            inputValues0Span[i] = Random.Shared.NextSingle();
            inputValues1Span[i] = Random.Shared.NextSingle();
            inputValues2Span[i] = Random.Shared.NextSingle();
        }
        
        float sum = 0f;
        for (int i = 0; i < Iterations; i += 1)
        {
            using var output = _session.Run(_runOptions, _inputNames, _inputValues, _session.OutputNames);
            ReadOnlySpan<float> outputData = output[0].GetTensorDataAsSpan<float>();
            for (int j = 0; j < outputData.Length; j += 1)
            {
                sum += outputData[j];
            }
        }
        
        return sum;
    }
}
```

| Method | Mean     | Error     | StdDev    |
|------- |---------:|----------:|----------:|
| Before | 5.003 us | 0.0318 us | 0.0297 us |
| After   | 4.325 us | 0.0568 us | 0.0503 us |
2025-01-24 09:52:05 -08:00
Caroline Zhu
d00ae325ce
Revert "[Mobile] Add BrowserStack Android MAUI Test (#23383)" (#23474)
This reverts commit 9f9fcf74ff.

### Motivation and Context
- NuGet packaging pipelines failing with this error:
```Files\dotnet\packs\Microsoft.NET.Runtime.MonoTargets.Sdk\8.0.12\Sdk\RuntimeComponentManifest.targets(3,5):
error : Empty ResolveFrameworkReference.RuntimePackPath while trying to
read runtime components manifest. ResolvedFrameworkReference available:
{ Microsoft.NETCore.App, RuntimePackPath: }```
2025-01-23 21:48:27 -08:00
Ti-Tai Wang
8b1d3b3d57
Align AvgPool ceil_mode on last value to torch (#16752)
Fix #16203

Previous to this PR, if `ceil_mode` is on, the calculation of a value
would divide the kernel size, even if remaining pixels is less than the
kernel size, which causes the difference in this operator between ORT
and torch.

However, this fix only applies to the change in #15597, which only
supports AvgPool since 19. The older opset version is remain the same,
as it's using mlas files.

Also, the PR fixes the shape mismatch caused by sliding window starting
from padding. More detail: https://github.com/onnx/onnx/pull/6650 (And
this PR is also validated with the tests added in
https://github.com/onnx/onnx/pull/6650)
2025-01-23 17:35:11 -08:00
Adrian Lizarraga
06fc73b7d4
[TRT EP Perf Tool] Add annotations import to python script to support annotations on Python 3.8 (#23466)
### Description
Adds `from __future__ import annotations` to python script to support
annotations on Python 3.8.



### Motivation and Context
Pipeline that runs this script is using Ubuntu 20.04's default python
version (3.8), which does not support annotations unless one imports
from __future__.
2025-01-23 08:54:55 -08:00
Adrian Lizarraga
46dc0b5f21
[QNN EP] Add LoggingManager::HasDefaultLogger() to provider bridge API (#23467)
### Description
Fixes QNN EP builds due to missing function in provider bridge API:
`logging::LoggingManager::HasDefaultLogger()`



### Motivation and Context
A [recent PR](https://github.com/microsoft/onnxruntime/pull/23120) made
QNN EP a shared library. A [different
PR](https://github.com/microsoft/onnxruntime/pull/23435) added use of a
new function to QNN EP that was not part of the provider bridge API. The
CI did not catch it because main was not merged into the first PR before
merging.
2025-01-22 21:26:24 -08:00
Adrian Lizarraga
3b4c7df4e9
[QNN EP] Make QNN EP a shared library (#23120)
### Description
- Makes QNN EP a shared library **by default** when building with
`--use_qnn` or `--use_qnn shared_lib`. Generates the following build
artifacts:
- **Windows**: `onnxruntime_providers_qnn.dll` and
`onnxruntime_providers_shared.dll`
- **Linux**: `libonnxruntime_providers_qnn.so` and
`libonnxruntime_providers_shared.so`
  - **Android**: Not supported. Must build QNN EP as a static library.
- Allows QNN EP to still be built as a static library with `--use_qnn
static_lib`. This is primarily for the Android QNN AAR package.
- Unit tests run for both the static and shared QNN EP builds.

### Detailed changes
- Updates Java bindings to support both shared and static QNN EP builds.
- Provider bridge API:
- Adds logging sink ETW to the provider bridge. Allows EPs to register
ETW callbacks for ORT logging.
- Adds a variety of methods for onnxruntime objects that are needed by
QNN EP.
- QNN EP:
- Adds `ort_api.h` and `ort_api.cc` that encapsulates the API provided
by ORT in a manner that allows the EP to be built as either a shared or
static library.
- Adds custom function to transpose weights for Conv and Gemm (instead
of adding util to provider bridge API).
- Adds custom function to quantize data for LeakyRelu (instead of adding
util to provider bridge API).
  - Adds custom ETW tracing for QNN profiling events:
    - shared library: defines its own TraceLogging provider handle
- static library: uses ORT's TraceLogging provider handle and existing
telemetry provider.
- ORT-QNN Packages:
- **Python**: Pipelines build QNN EP as a shared library by default.
User can build a local python wheel with QNN EP as a static library by
passing `--use_qnn static_lib`.
- **NuGet**: Pipelines build QNN EP as a shared library by default.
`build.py` currently enforces QNN EP to be built as a shared library.
Can add support for building a QNN NuGet package with static later if
deemed necessary.
- **Android**: Pipelines build QNN EP as a **static library**.
`build.py` enforces QNN EP to be built as a static library. Packaging
multiple shared libraries into an Android AAR package is not currently
supported due to the added need to also distribute a shared libcpp.so
library.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-22 12:11:00 -08:00
Changming Sun
77adf4b040
Add custom vcpkg ports (#23456)
### Description
Add custom vcpkg ports for the following packages:
1. cpuinfo
2. onnx 
3. pthreadpool  
4. xnnpack

Because:

- The cpuinfo/pthreadpool/xnnpack packages in the official vcpkg repo
are too old.
   - XNNPack's version is updated from 2022-12-22 to 2025-01-17
   - CPUINFO's version is updated from 2022-07-19 to 2024-12-09
- Pthreadpool's version is updated from 2020-04-10 to 2024-12-17, and
the source code location is changed from
https://github.com/Maratyszcza/pthreadpool to
https://github.com/google/pthreadpool
- The onnx package in the official repo requires building python from
source, which then requires a lot of additional dependencies to be
installed. This PR removes them.
- Added a disable_gcc_warning.patch file for xnnpack for addressing the
issue reported in https://github.com/google/XNNPACK/issues/7650. I will
remove this patch when the issue is fully addressed.
- Added " -DONNX_DISABLE_STATIC_REGISTRATION=ON" to ONNX's config
options.
-
2025-01-22 11:49:16 -08:00
Changming Sun
3dcc90119b
Update the compile flags for vcpkg packages (#23455)
### Description

This PR updates the triplets files that manage the compile flags for
vcpkg packages.
All the changes are autogenerated except for the gen.py file in this PR.

Main changes:
1. Enable debug info for all Linux build config(Release and Debug)
2. Set CMAKE_CXX_STANDARD in each triplet. The value is set to 20 for
macOS targets and 17 for the others.
3. Only set _FORTIFY_SOURCE in release build. This is to address a build
issue on some platforms with the following glibc change:
"Warn if user requests __FORTIFY_SOURCE but it is disabled"

https://sourceware.org/git/?p=glibc.git;a=commit;f=include/features.h;h=05c2c9618f583ea4acd69b3fe5ae2a2922dd2ddc


### Motivation and Context
Address a Linux build error.
2025-01-22 11:48:38 -08:00
Caroline Zhu
9f9fcf74ff
[Mobile] Add BrowserStack Android MAUI Test (#23383)
### Description
Add test project that will perform an automated UI test that runs the
unit tests on Android.

### Motivation
- Enables end-to-end on-device MAUI unit testing which we want to add to
the packaging pipelines

### Context
Microsoft.ML.OnnxRuntime.Tests.MAUI uses DeviceRunners.VisualRunners to
allow running the unit tests (found in
Microsoft.ML.OnnxRuntime.Tests.Common) across multiple devices.
DeviceRunners.VisualRunners provides a simple UI with a button that will
run the unit tests and a panel with the unit test results.

In order to automate the process of running the unit tests across mobile
devices, Appium is used for UI testing orchestration (it provides a way
to interact with the UI), and BrowserStack automatically runs these
Appium tests across different mobile devices.

This project does not include the capability to start an Appium server
locally or attach to a local emulator or device.

## Build & run instructions
### Requirements
* A BrowserStack account with access to App Automate
* You can set BrowserStack credentials as environment variables as shown
[here](https://www.browserstack.com/docs/app-automate/appium/getting-started/c-sharp/nunit/integrate-your-tests#CLI)
* ONNXRuntime NuGet package
1. You can either download the [stable NuGet
package](https://www.nuget.org/packages/Microsoft.ML.OnnxRuntime) then
follow the instructions from [NativeLibraryInclude.props
file](../Microsoft.ML.OnnxRuntime.Tests.Common/NativeLibraryInclude.props)
to use the downloaded .nupkg file
2. Or follow the [build
instructions](https://onnxruntime.ai/docs/build/android.html) to build
the Android package locally
* The dotnet workloads for maui and maui-android, which will not always
automatically install correctly
    1. `dotnet workload install maui`
    2. `dotnet workload install maui-android`
* [Appium](https://appium.io/docs/en/latest/quickstart/) and the
[UiAutomator2
driver](https://appium.io/docs/en/latest/quickstart/uiauto2-driver/)

### Run instructions
1. Build the Microsoft.ML.OnnxRuntime.Tests.MAUI project into a signed
APK.
1. Run the following: `dotnet publish -c Release -f net8.0-android` in
the Microsoft.ML.OnnxRuntime.Tests.MAUI directory.
2. Search for the APK files generated. They should be located in
`bin\Release\net8.0-android\publish`.
3. If they're in a different location, edit the `browserstack.yml` file
to target the path to the signed APK.
2. Ensure you've set the BrowserStack credentials as environment
variables.
3. Run the following in the
Microsoft.ML.OnnxRuntime.Tests.Android.BrowserStack directory: `dotnet
test`
4. Navigate to the [BrowserStack App Automate
dashboard](https://app-automate.browserstack.com/dashboard/v2/builds) to
see your test running!
2025-01-22 10:57:09 -08:00
Jiajia Qin
25f427466e
[js/webgpu] Optimize ConvTranspose (Continue) (#23429)
BUG #23273

This PR does below optimizations:
1. When output channels is one, 1) calculate the offset before the
inchannel loop to reduce indices to offsets calculation, 2) split the
`inputChannelsPerGroup` into `inputChannelsPerGroupInt` and
`inputChannelsRemainder` parts so that we can always access 4 data for
`inputChannelsPerGroupInt`.
2. Use precise initial value to reduce useless loop iterations. Thanks
@jiangzhaoming 's suggestion's on this.

With this PR, ConvTranspose becomes 3.7s from 8.4s on Intel Meteor Lake.
On NV RTX 2000 Ada, it becomes 1.6s from 2.7s.
2025-01-22 08:59:17 -08:00
Changming Sun
ff8465eda4
Use onnx_protobuf.h to suppress some GCC warnings (#23453)
### Description
Use onnx_protobuf.h to suppress some GCC warnings.

All the changes are autogenerated by a shell command.
```bash
find . -type f -exec sed -i 's/#include\s\+<onnx\/onnx_pb.h>/#include "core\/graph\/onnx_protobuf.h"/g' {} \;
```

### Motivation and Context
This PR is needed for making vcpkg work(without disabling all warnings)
This PR is split from another bigger PR per request from a reviewer.
2025-01-21 20:25:12 -08:00
Changming Sun
c9614fbf90
Suppress some strict-aliasing related warnings in WebGPU EP (#23454)
### Description
Suppress some strict-aliasing related warnings in WebGPU EP

For example:
```
/home/chasun/src/onnxruntime/onnxruntime/core/providers/webgpu/math/unary_elementwise_ops.cc:208:30: error: dereferencing type-punned pointer will break strict-aliasing rules [-Werror=strict-aliasing]
  208 |       float encoded_value = *reinterpret_cast<const float*>(attr);
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

This PR does not really fix the problems. It just suppresses the
warnings to make build pass. Some issues related to strict aliasing may
be fixed by using std::bit_cast, which requires c++20 however.


### Motivation and Context
Build the code on Azure Linux 3 fails. To reproduce the issue, you may
get an AzureLinux3 machine and run:
```
 python3 tools/ci_build/build.py --update --build  --build_wheel --use_xnnpack --build_nodejs   --use_webgpu --build_dir b --skip_submodule_sync --parallel --use_binskim_compliant_compile_flags --build_shared_lib --config Release
```
2025-01-21 17:37:08 -08:00
dependabot[bot]
87582ba9b7
Bump ruff from 0.9.1 to 0.9.2 (#23427)
Bumps [ruff](https://github.com/astral-sh/ruff) from 0.9.1 to 0.9.2.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/releases">ruff's
releases</a>.</em></p>
<blockquote>
<h2>0.9.2</h2>
<h2>Release Notes</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>airflow</code>] Fix typo &quot;security_managr&quot; to
&quot;security_manager&quot; (<code>AIR303</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15463">#15463</a>)</li>
<li>[<code>airflow</code>] extend and fix AIR302 rules (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15525">#15525</a>)</li>
<li>[<code>fastapi</code>] Handle parameters with <code>Depends</code>
correctly (<code>FAST003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15364">#15364</a>)</li>
<li>[<code>flake8-pytest-style</code>] Implement pytest.warns
diagnostics (<code>PT029</code>, <code>PT030</code>, <code>PT031</code>)
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/15444">#15444</a>)</li>
<li>[<code>flake8-pytest-style</code>] Test function parameters with
default arguments (<code>PT028</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15449">#15449</a>)</li>
<li>[<code>flake8-type-checking</code>] Avoid false positives for
<code>|</code> in <code>TC008</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15201">#15201</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>flake8-todos</code>] Allow VSCode GitHub PR extension style
links in <code>missing-todo-link</code> (<code>TD003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15519">#15519</a>)</li>
<li>[<code>pyflakes</code>] Show syntax error message for
<code>F722</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15523">#15523</a>)</li>
</ul>
<h3>Formatter</h3>
<ul>
<li>Fix curly bracket spacing around f-string expressions containing
curly braces (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15471">#15471</a>)</li>
<li>Fix joining of f-strings with different quotes when using quote
style <code>Preserve</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15524">#15524</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Avoid indexing the same workspace multiple times (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15495">#15495</a>)</li>
<li>Display context for <code>ruff.configuration</code> errors (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15452">#15452</a>)</li>
</ul>
<h3>Configuration</h3>
<ul>
<li>Remove <code>flatten</code> to improve deserialization error
messages (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15414">#15414</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Parse triple-quoted string annotations as if parenthesized (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15387">#15387</a>)</li>
<li>[<code>fastapi</code>] Update <code>Annotated</code> fixes
(<code>FAST002</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15462">#15462</a>)</li>
<li>[<code>flake8-bandit</code>] Check for <code>builtins</code> instead
of <code>builtin</code> (<code>S102</code>, <code>PTH123</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15443">#15443</a>)</li>
<li>[<code>flake8-pathlib</code>] Fix <code>--select</code> for
<code>os-path-dirname</code> (<code>PTH120</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15446">#15446</a>)</li>
<li>[<code>ruff</code>] Fix false positive on global keyword
(<code>RUF052</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15235">#15235</a>)</li>
</ul>
<h2>Contributors</h2>
<ul>
<li><a
href="https://github.com/AlexWaygood"><code>@​AlexWaygood</code></a></li>
<li><a
href="https://github.com/BurntSushi"><code>@​BurntSushi</code></a></li>
<li><a
href="https://github.com/Daverball"><code>@​Daverball</code></a></li>
<li><a
href="https://github.com/Garrett-R"><code>@​Garrett-R</code></a></li>
<li><a
href="https://github.com/Glyphack"><code>@​Glyphack</code></a></li>
<li><a
href="https://github.com/InSyncWithFoo"><code>@​InSyncWithFoo</code></a></li>
<li><a href="https://github.com/Lee-W"><code>@​Lee-W</code></a></li>
<li><a
href="https://github.com/MichaReiser"><code>@​MichaReiser</code></a></li>
<li><a
href="https://github.com/cake-monotone"><code>@​cake-monotone</code></a></li>
</ul>
<!-- raw HTML omitted -->
</blockquote>
<p>... (truncated)</p>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/astral-sh/ruff/blob/main/CHANGELOG.md">ruff's
changelog</a>.</em></p>
<blockquote>
<h2>0.9.2</h2>
<h3>Preview features</h3>
<ul>
<li>[<code>airflow</code>] Fix typo &quot;security_managr&quot; to
&quot;security_manager&quot; (<code>AIR303</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15463">#15463</a>)</li>
<li>[<code>airflow</code>] extend and fix AIR302 rules (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15525">#15525</a>)</li>
<li>[<code>fastapi</code>] Handle parameters with <code>Depends</code>
correctly (<code>FAST003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15364">#15364</a>)</li>
<li>[<code>flake8-pytest-style</code>] Implement pytest.warns
diagnostics (<code>PT029</code>, <code>PT030</code>, <code>PT031</code>)
(<a
href="https://redirect.github.com/astral-sh/ruff/pull/15444">#15444</a>)</li>
<li>[<code>flake8-pytest-style</code>] Test function parameters with
default arguments (<code>PT028</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15449">#15449</a>)</li>
<li>[<code>flake8-type-checking</code>] Avoid false positives for
<code>|</code> in <code>TC008</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15201">#15201</a>)</li>
</ul>
<h3>Rule changes</h3>
<ul>
<li>[<code>flake8-todos</code>] Allow VSCode GitHub PR extension style
links in <code>missing-todo-link</code> (<code>TD003</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15519">#15519</a>)</li>
<li>[<code>pyflakes</code>] Show syntax error message for
<code>F722</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15523">#15523</a>)</li>
</ul>
<h3>Formatter</h3>
<ul>
<li>Fix curly bracket spacing around f-string expressions containing
curly braces (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15471">#15471</a>)</li>
<li>Fix joining of f-strings with different quotes when using quote
style <code>Preserve</code> (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15524">#15524</a>)</li>
</ul>
<h3>Server</h3>
<ul>
<li>Avoid indexing the same workspace multiple times (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15495">#15495</a>)</li>
<li>Display context for <code>ruff.configuration</code> errors (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15452">#15452</a>)</li>
</ul>
<h3>Configuration</h3>
<ul>
<li>Remove <code>flatten</code> to improve deserialization error
messages (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15414">#15414</a>)</li>
</ul>
<h3>Bug fixes</h3>
<ul>
<li>Parse triple-quoted string annotations as if parenthesized (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15387">#15387</a>)</li>
<li>[<code>fastapi</code>] Update <code>Annotated</code> fixes
(<code>FAST002</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15462">#15462</a>)</li>
<li>[<code>flake8-bandit</code>] Check for <code>builtins</code> instead
of <code>builtin</code> (<code>S102</code>, <code>PTH123</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15443">#15443</a>)</li>
<li>[<code>flake8-pathlib</code>] Fix <code>--select</code> for
<code>os-path-dirname</code> (<code>PTH120</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15446">#15446</a>)</li>
<li>[<code>ruff</code>] Fix false positive on global keyword
(<code>RUF052</code>) (<a
href="https://redirect.github.com/astral-sh/ruff/pull/15235">#15235</a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="0a39348381"><code>0a39348</code></a>
Include build binaries</li>
<li><a
href="027f8009e5"><code>027f800</code></a>
Comment out non-npm-publish jobs</li>
<li><a
href="425870df76"><code>425870d</code></a>
Upload npm publish logs when failed</li>
<li><a
href="c20255abe4"><code>c20255a</code></a>
Bump version to 0.9.2 (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15529">#15529</a>)</li>
<li><a
href="420365811f"><code>4203658</code></a>
Fix joining of f-strings with different quotes when using quote style
`Preser...</li>
<li><a
href="fc9dd63d64"><code>fc9dd63</code></a>
[airflow] extend and fix AIR302 rules (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15525">#15525</a>)</li>
<li><a
href="79e52c7fdf"><code>79e52c7</code></a>
[<code>pyflakes</code>] Show syntax error message for <code>F722</code>
(<a
href="https://redirect.github.com/astral-sh/ruff/issues/15523">#15523</a>)</li>
<li><a
href="cf4ab7cba1"><code>cf4ab7c</code></a>
Parse triple quoted string annotations as if parenthesized (<a
href="https://redirect.github.com/astral-sh/ruff/issues/15387">#15387</a>)</li>
<li><a
href="d2656e88a3"><code>d2656e8</code></a>
[<code>flake8-todos</code>] Allow VSCode GitHub PR extension style links
in `missing-tod...</li>
<li><a
href="c53ee608a1"><code>c53ee60</code></a>
Typeshed-sync workflow: add appropriate labels, link directly to failing
run ...</li>
<li>Additional commits viewable in <a
href="https://github.com/astral-sh/ruff/compare/0.9.1...0.9.2">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=ruff&package-manager=pip&previous-version=0.9.1&new-version=0.9.2)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-21 17:21:21 -08:00
Wanming Lin
18a54284c8
[WebNN] Remove workarounds for TFLite backend (#23406)
The WebNN CPU device type may now target different backends, such as
CoreML. Legacy special workarounds for the TFLite backend should be
removed and allowed to fail as is, as these are implementation issues.

Additionally, the WebNN EP should adhere to the WebNN API conformance.
We assume all the WebNN ops should be supported, so remove the WebNN op
support status for different device types in webnn-operators.md as well.
2025-01-21 17:20:19 -08:00
dependabot[bot]
f4dc965522
Bump vite from 6.0.7 to 6.0.11 in /js/web/test/e2e/exports/testcases/vite-default (#23446)
Bumps [vite](https://github.com/vitejs/vite/tree/HEAD/packages/vite)
from 6.0.7 to 6.0.11.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/releases">vite's
releases</a>.</em></p>
<blockquote>
<h2>v6.0.11</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.11/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.0.10</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.10/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.0.9</h2>
<p>This version contains a breaking change due to security fixes. See <a
href="https://github.com/vitejs/vite/security/advisories/GHSA-vg6x-rcgg-rjx6">https://github.com/vitejs/vite/security/advisories/GHSA-vg6x-rcgg-rjx6</a>
for more details.</p>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.9/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
<h2>v6.0.8</h2>
<p>Please refer to <a
href="https://github.com/vitejs/vite/blob/v6.0.8/packages/vite/CHANGELOG.md">CHANGELOG.md</a>
for details.</p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/vitejs/vite/blob/main/packages/vite/CHANGELOG.md">vite's
changelog</a>.</em></p>
<blockquote>
<h2><!-- raw HTML omitted -->6.0.11 (2025-01-21)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: <code>preview.allowedHosts</code> with specific values was not
respected (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19246">#19246</a>)
(<a
href="aeb3ec84a2">aeb3ec8</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19246">#19246</a></li>
<li>fix: allow CORS from loopback addresses by default (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19249">#19249</a>)
(<a
href="3d03899737">3d03899</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19249">#19249</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.0.10 (2025-01-20)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: try parse <code>server.origin</code> URL (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19241">#19241</a>)
(<a
href="2495022420">2495022</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19241">#19241</a></li>
</ul>
<h2><!-- raw HTML omitted -->6.0.9 (2025-01-20)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix!: check host header to prevent DNS rebinding attacks and
introduce <code>server.allowedHosts</code> (<a
href="bd896fb5f3">bd896fb</a>)</li>
<li>fix!: default <code>server.cors: false</code> to disallow fetching
from untrusted origins (<a
href="b09572acc9">b09572a</a>)</li>
<li>fix: verify token for HMR WebSocket connection (<a
href="029dcd6d77">029dcd6</a>)</li>
</ul>
<h2><!-- raw HTML omitted -->6.0.8 (2025-01-20)<!-- raw HTML omitted
--></h2>
<ul>
<li>fix: avoid SSR HMR for HTML files (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19193">#19193</a>)
(<a
href="3bd55bcb7e">3bd55bc</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19193">#19193</a></li>
<li>fix: build time display 7m 60s (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19108">#19108</a>)
(<a
href="cf0d2c8e23">cf0d2c8</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19108">#19108</a></li>
<li>fix: don't resolve URL starting with double slash (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19059">#19059</a>)
(<a
href="35942cde11">35942cd</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19059">#19059</a></li>
<li>fix: ensure <code>server.close()</code> only called once (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19204">#19204</a>)
(<a
href="db81c2dada">db81c2d</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19204">#19204</a></li>
<li>fix: resolve.conditions in ResolvedConfig was
<code>defaultServerConditions</code> (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19174">#19174</a>)
(<a
href="ad75c56dce">ad75c56</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19174">#19174</a></li>
<li>fix: tree shake stringified JSON imports (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19189">#19189</a>)
(<a
href="f2aed62d0b">f2aed62</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19189">#19189</a></li>
<li>fix: use shared sigterm callback (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19203">#19203</a>)
(<a
href="47039f4643">47039f4</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19203">#19203</a></li>
<li>fix(deps): update all non-major dependencies (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19098">#19098</a>)
(<a
href="8639538e64">8639538</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19098">#19098</a></li>
<li>fix(optimizer): use correct default install state path for yarn PnP
(<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19119">#19119</a>)
(<a
href="e690d8bb1e">e690d8b</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19119">#19119</a></li>
<li>fix(types): improve <code>ESBuildOptions.include / exclude</code>
type to allow <code>readonly (string | RegExp)[]</code> (<a
href="ea53e70952">ea53e70</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19146">#19146</a></li>
<li>chore(deps): update dependency pathe to v2 (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19139">#19139</a>)
(<a
href="71506f0a8d">71506f0</a>),
closes <a
href="https://redirect.github.com/vitejs/vite/issues/19139">#19139</a></li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="a0ed4057c9"><code>a0ed405</code></a>
release: v6.0.11</li>
<li><a
href="3d03899737"><code>3d03899</code></a>
fix: allow CORS from loopback addresses by default (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19249">#19249</a>)</li>
<li><a
href="aeb3ec84a2"><code>aeb3ec8</code></a>
fix: <code>preview.allowedHosts</code> with specific values was not
respected (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19246">#19246</a>)</li>
<li><a
href="9654348258"><code>9654348</code></a>
release: v6.0.10</li>
<li><a
href="2495022420"><code>2495022</code></a>
fix: try parse <code>server.origin</code> URL (<a
href="https://github.com/vitejs/vite/tree/HEAD/packages/vite/issues/19241">#19241</a>)</li>
<li><a
href="a55f8ba3e4"><code>a55f8ba</code></a>
release: v6.0.9</li>
<li><a
href="bd896fb5f3"><code>bd896fb</code></a>
fix!: check host header to prevent DNS rebinding attacks and introduce
`serve...</li>
<li><a
href="029dcd6d77"><code>029dcd6</code></a>
fix: verify token for HMR WebSocket connection</li>
<li><a
href="b09572acc9"><code>b09572a</code></a>
fix!: default <code>server.cors: false</code> to disallow fetching from
untrusted origins</li>
<li><a
href="c0f72a695c"><code>c0f72a6</code></a>
release: v6.0.8</li>
<li>Additional commits viewable in <a
href="https://github.com/vitejs/vite/commits/v6.0.11/packages/vite">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=vite&package-manager=npm_and_yarn&previous-version=6.0.7&new-version=6.0.11)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)
You can disable automated security fix PRs for this repo from the
[Security Alerts
page](https://github.com/microsoft/onnxruntime/network/alerts).

</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-21 17:18:39 -08:00
Changming Sun
368e243194
Make ORT and Dawn use the same protobuf/abseil source code (#23447)
### Description
Make ORT and Dawn use the same protobuf/abseil source code
2025-01-21 17:17:47 -08:00
Jian Chen
628c0e00c4
Change MacOS-13 to ubuntu on for android-java-api-aar-test.yml. (#23444)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-21 17:07:20 -08:00
sushraja-msft
58c29d34b2
WIP: Dp4MatMulNBits accuracy level 4 matmul for WebGPU EP (#23365)
### Description

This change implements accuracy level 4 - quantize A to int8 matmul for
the WebGPU EP. The matmul kernel here uses DP4A for matrix
multiplication, in order to keep the DP4A fed co-operative matrix
multiplication is implemented which preloads the row/col into local
variables before the multiplication operation.

Credits to @qjia7 for help with the quantizer shader.

Performance metrics on intel ADL/TGL GPU.

```
PS C:\onnxruntime> C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web -l 500
Batch size: 1, prompt tokens: 501, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       2.76762e+06
        **avg (tokens/s): 181.022**   <<< Prefill speed
        p50 (us):       2.74843e+06
        stddev (us):    41756.4
        n:              5 * 501 token(s)
Token generation:
        avg (us):       81500.7
        avg (tokens/s): 12.2698
        p50 (us):       81104.1
        stddev (us):    2961.31
        n:              635 * 1 token(s)
Token sampling:
        avg (us):       13.1836
        avg (tokens/s): 75851.9
        p50 (us):       12
        stddev (us):    6.47085
        n:              640 * 1 token(s)
E2E generation (entire generation loop):
        avg (ms):       13120
        p50 (ms):       13081.6
        stddev (ms):    114.689
        n:              5
Peak working set size (bytes): 5467533312
WebGPU device lost (2): Device was destroyed.

```
This kernel is 2.10x faster than its F16 counterpart for a 500 token
prefill. Previous prefill record is 86tks/s.

In order to support devices with subgroup size 8/32, a no subgroup
version of the same shader is included. Performance is slower than the
subgroup version on ADL.

```
PS C:\onnxruntime> C:\model_benchmark\model_benchmark.exe -i C:\Phi-3.5-mini-instruct-onnx-web\Phi-3.5-mini-instruct-onnx-web -l 500 
Batch size: 1, prompt tokens: 501, tokens to generate: 128
Prompt processing (time to first token):
        avg (us):       4.11989e+06
        avg (tokens/s): 121.605
        p50 (us):       4.11847e+06
        stddev (us):    2147.48
        n:              5 * 501 token(s)
Token generation:
        avg (us):       81174.9
        avg (tokens/s): 12.3191
        p50 (us):       81301.1
        stddev (us):    2177.2
        n:              635 * 1 token(s)
Token sampling:
        avg (us):       14.7998
        avg (tokens/s): 67568.3
        p50 (us):       12.3
        stddev (us):    11.5481
        n:              640 * 1 token(s)
E2E generation (entire generation loop):
        avg (ms):       14431.1
        p50 (ms):       14433.8
        stddev (ms):    5.02473
        n:              5
Peak working set size (bytes): 5466480640
WebGPU device lost (2): Device was destroyed.
```
2025-01-21 15:46:51 -08:00
Adrian Lizarraga
c7f764c941
[QNN EP]: Clean up QNN logging resources if an error occurs during initialization (#23435)
### Description
Re-implementation of https://github.com/microsoft/onnxruntime/pull/23320
(which was reverted).

- Cleans up QNN logging resources if an error occurs during
initialization.
- Updates `QnnLogging()`, which is a logging callback called by QNN
libs, to handle situations in which ORT logging is unavailable, thus
avoiding a segmentation fault.
- Updates `QnnBackendManager::CreateHtpPowerCfgId()` and
`QnnBackendManager::SetHtpPowerConfig()` to check that backend setup is
complete. These functions get called in QNN EP's `OnRunStart()` even if
QNN backend setup failed and the model is assigned to a different EP.
This prevents a segmentation fault. Our Android tests ran into this
issue because the QNN backend setup failed, the model was then assigned
to CPU EP, and the QNN EP's `OnRunStart()` was still called with an
invalid backend.


### Motivation and Context
If QNN initialization fails at any point, we have to properly clean up
the logging resources so that QNN does not call our `QnnLogging()`
callback after the EP has been destroyed.
2025-01-21 12:48:08 -08:00
dependabot[bot]
997ab24c08
Bump clang-format from 19.1.6 to 19.1.7 (#23428)
Bumps [clang-format](https://github.com/ssciwr/clang-format-wheel) from
19.1.6 to 19.1.7.
<details>
<summary>Commits</summary>
<ul>
<li><a
href="f865928dd2"><code>f865928</code></a>
Bump to v19.1.7</li>
<li>See full diff in <a
href="https://github.com/ssciwr/clang-format-wheel/compare/v19.1.6...v19.1.7">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=clang-format&package-manager=pip&previous-version=19.1.6&new-version=19.1.7)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-01-21 12:41:16 -08:00
junchao-zhao
5b5aa11b83
Fix eigen external deps (#23439)
### Description
<!-- Describe your changes. -->

I think we should not use the eigen in the system directly, but should
first use the eigen specified in deps.txt. in ubuntu22.04, ORT fails to
compile when I install libeigen3-dev (which ROS2 humble depends on). The
error message is below:

```
[ 62%] Built target onnxruntime_lora
[ 62%] Building CXX object CMakeFiles/onnxruntime_session.dir/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/session/IOBinding.cc.o
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc: In member function ‘onnxruntime::common::Status onnxruntime::Min_6<T>::Compute(onnxruntime::OpKernelContext*) const [with T = float]’:
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: error: no matching function for call to ‘Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >::min<Eigen::PropagateNaN>(Eigen::ArrayWrapper<Eigen::Map<const Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >)’
  750 |     min = min.array().template min<Eigen::PropagateNaN>(EigenMap<float>(data_n).array());
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/eigen3/Eigen/Core:19,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:10,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:4:
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: candidate: ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >]’
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note:   template argument deduction/substitution failed:
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: error: type/value mismatch at argument 1 in template parameter list for ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::ArrayWrapper<Eigen::Map<Eigen::Matrix<float, -1, 1>, 0, Eigen::Stride<0, 0> > >]’
  750 |     min = min.array().template min<Eigen::PropagateNaN>(EigenMap<float>(data_n).array());
      |           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:750:56: note:   expected a type, got ‘Eigen::PropagateNaN’
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc: In lambda function:
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: error: no matching function for call to ‘Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >::min<Eigen::PropagateNaN>(Eigen::half)’
  802 |           output_vec_map = input_1_vec_map.template min<Eigen::PropagateNaN>(
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  803 |               static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>()));
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
In file included from /usr/include/eigen3/Eigen/Core:19,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/util/math_cpuonly.h:68,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.h:10,
                 from /home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:4:
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note: candidate: ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >]’
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/usr/include/eigen3/Eigen/src/Core/../plugins/ArrayCwiseBinaryOps.h:33:28: note:   template argument deduction/substitution failed:
   33 | EIGEN_MAKE_CWISE_BINARY_OP(min,min)
      |                            ^~~
/usr/include/eigen3/Eigen/src/Core/util/Macros.h:1339:4: note: in definition of macro ‘EIGEN_MAKE_CWISE_BINARY_OP’
 1339 |   (METHOD)(const EIGEN_CURRENT_STORAGE_BASE_CLASS<OtherDerived> &other) const \
      |    ^~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: error: type/value mismatch at argument 1 in template parameter list for ‘template<class OtherDerived> const Eigen::CwiseBinaryOp<Eigen::internal::scalar_min_op<typename Eigen::internal::traits<T>::Scalar, typename Eigen::internal::traits<OtherDerived>::Scalar>, const Derived, const OtherDerived> Eigen::ArrayBase<Derived>::min(const Eigen::ArrayBase<OtherDerived>&) const [with OtherDerived = OtherDerived; Derived = Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >]’
  802 |           output_vec_map = input_1_vec_map.template min<Eigen::PropagateNaN>(
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  803 |               static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>()));
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:802:77: note:   expected a type, got ‘Eigen::PropagateNaN’
/home/junchao/work/plugin/ai/onnxruntime/onnxruntime/core/providers/cpu/math/element_wise_ops.cc:805:77: error: no matching function for call to ‘Eigen::Map<const Eigen::Array<Eigen::half, -1, 1, 0, -1, 1>, 0, Eigen::Stride<0, 0> >::max<Eigen::PropagateNaN>(Eigen::half)’
  805 |           output_vec_map = input_1_vec_map.template max<Eigen::PropagateNaN>(
      |                            ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
  806 |               static_cast<Eigen::half>(per_iter_bh.ScalarInput0<MLFloat16>()));
      |               ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fix #23407

@lixing-star
2025-01-21 12:40:42 -08:00
Jian Chen
899ea21ffe
Moving RN_CI Android Testing to Linux (#23422)
### Description
Moving Android E2E test steps from Mac-OS13 to unbunt22.04



### Motivation and Context
Deduced the dependency on MacOS, which is deprecating the x64 version.
2025-01-21 11:55:29 -08:00
Adrian Lizarraga
3e4c5e6487
[QNN EP] workaround for QNN validation bug for Tanh with uint16 quantized output (#23432)
### Description
- Skip QNN validation for Tanh with uint16 quantized output (workaround
for QNN validation bug).
- Re-enables unit test for Tanh with uint16 quantized output.

The [QNN
documentation](https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/HtpOpDefSupplement.html#tanh)
states that the output scale and offset for `ufixed_point_16` should be
(1/32768) and -32768, respectively. However, the QNN validator
incorrectly rejects these values. So, we skip validation for this
configuration of Tanh. Building an actual QNN graph with the correct
scale/offset still works.


### Motivation and Context
This QNN validation bug appeared in QNN SDK 2.28.0 and is still present
in QNN SDK 2.30.0. A previous PR disabled the corresponding unit test:
https://github.com/microsoft/onnxruntime/pull/22724/files#diff-57f590c6c548b073ba8cd8af6cf198799906f7059ea46b31cd33972ea9b01983R232
2025-01-20 21:35:47 -08:00
Jian Chen
83cb1e4a3c
Seperate RN andriod and IOS into 2 separated Stages. (#23400)
### Description
Seperate RN andriod and IOS into 2 separated Stages.



### Motivation and Context
Speed up the PR process.
2025-01-20 18:08:01 -08:00
Alexis Tsogias
e20b529a32
Implement some missing element wise Add/Sub/Mul/Div/Neg operations for CPU and CUDA EPs (#23090)
* [CPU EP] Implement Add/Sub/Mul/Div element wise operations for
(u)int8, (u)int16, uint32 and uint64.
* [CPU EP] Implement Neg unary operation for int16
* [CUDA EP] Implement Add/Sub/Mul/Div element wise operations for
(u)int8 and (u)int16

### Motivation and Context
This solves https://github.com/microsoft/onnxruntime/issues/23051
2025-01-20 16:46:45 -08:00
Jian Chen
7c90a9b2a1
Upgrade Java version from react-native/android to Java 17 (#23066)
### Description
Upgrade Java version from react-native/android to Java 17. This PR does
not update the e2e Java 17 version
2025-01-18 08:51:06 -08:00
Hector Li
f35924a891
Update Qnn SDK default version to 2.30 (#23411)
### Description
Update Qnn SDK default version to 2.30
2025-01-17 22:36:35 -08:00
Tianlei Wu
97812e5818
Fix type cast build error (#23423)
### Description

- Fix a type cast in
https://github.com/microsoft/onnxruntime/pull/23363.
- Include some headers which are suggested by code scanning in that PR.

### Motivation and Context

PostMerge has build error:
```
onnxruntime\core\framework\print_tensor_statistics_utils.h(92,55): error C2220: the following warning is treated as an error [D:\a\_work\1\b\Debug\onnxruntime_framework.vcxproj]
```
2025-01-17 21:20:43 -08:00
Peishen Yan
a05203b91c
[WebNN EP] Fix AddInitializersToSkip issues (#23354)
### Description
<!-- Describe your changes. -->
When the onnx model reuses initializers in more than one ops, if one of
the ops wants to add this initializer to the skipped list, but the other
ops still need this initializer, it will cause the process to crash.

Therefore, like other EPs, we count `initializer_usage_`, the number of
occurrences of each initializer in all ops and modify the
`AddInitializersToSkip` to minus the corresponding initializers'
statistic one when adding the specific operators.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2025-01-17 17:18:38 -08:00
Adrian Lizarraga
a9bf0bedd8
[QNN EP] Fix regression for MatMul with two quantized/dynamic uint16 inputs (#23419)
### Description
- Fixes regression for MatMul with two quantized/dynamic uint16 inputs.
We need to convert input[1] to uint8 to pass QNN validation.
- Separates translation of `ONNX MatMul -> QNN MatMul` and `ONNX MatMul
-> QNN FullyConnected` to separate functions to make the code more
readable.


### Motivation and Context
The following PR updated the handling of MatMul. The logic to handle
MatMul with two non-const uint16 inputs was not ported from
[simple_op_builder.cc](c64fa18834/onnxruntime/core/providers/qnn/builder/opbuilder/simple_op_builder.cc (L107))
to the new
[matmul_op_builder.cc](c64fa18834/onnxruntime/core/providers/qnn/builder/opbuilder/matmul_op_builder.cc (L57)).

https://github.com/microsoft/onnxruntime/pull/22639
2025-01-17 15:45:49 -08:00
Changming Sun
d461ca9dcd
Update onnxruntime binary size checks ci pipeline's docker image (#23405)
1. Update onnxruntime binary size checks ci pipeline's docker image. Use
a different docker image that is not manylinux based. The new one is
smaller.
2. Add flatbuffers tools/ci_build/requirements/pybind/requirements.txt
3. Delete
tools/ci_build/github/azure-pipelines/py-package-build-pipeline.yml. The
pipeline was for generating packages for Olive, but it went unused. And
the content is highly duplicated with our official python packaging
pipeline.
4. A lot of YAML files reference pypa/manylinux git repo but do not use
it. This PR removes the references.
2025-01-17 15:29:17 -08:00
Edward Chen
db8e10b0b9
Revert "[QNN EP] Clean up correctly from a partial setup (#23320)" (#23420)
### Description
<!-- Describe your changes. -->

This reverts commit 5d215ff810.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

The reverted change causes a packaging pipeline to fail due to a crash
in one of the E2E Android tests.

Reverting this first to fix the pipeline. We should come up with an
alternative way to properly do the necessary clean up.
2025-01-17 14:35:25 -08:00
Justin Chu
ad312d9677
Enable comprehension simplification in ruff rules (#23414)
Enable comprehension simplification rules (C4) for ruff and apply
autofix.
2025-01-17 08:43:06 -08:00
Yulong Wang
a350040aa7
bugfix: string_view of invalid memory (#23417)
### Description

the `std::unordered_map` uses a `std::string_view` as key, while the
string view may refer to invalid memory. Function `IdentityBuilder`
returns a `std::string` which goes out of scope quickly.

```c++
  unordered_map<string_view, std::vector<NodeIndex>> identical_children_map;
  for (auto i = node->OutputEdgesBegin(); i != node->OutputEdgesEnd(); ++i) {
    if (i->GetNode().OpType() == op) {
      identical_children_map[IdentityBuilder(graph, i->GetNode())].push_back(i->GetNode().Index());
    }
  }
```

This code will cause a waring as error in EMSDK v4.0.1:

```
C:/code/o2/onnxruntime/core/optimizer/identical_children_consolidation.cc:51:30: error: object whose reference is captured by 'identical_children_map' will be destroyed at the end of the full-expression [-Werror,-Wdangling-capture]
   51 |       identical_children_map[IdentityBuilder(graph, i->GetNode())].push_back(i->GetNode().Index());
      |                              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 error generated.
```
2025-01-17 07:56:26 -08:00
Yulong Wang
b8599b786e
fix crash when first input of BatchNormalization is 1-D (#23387)
### Description

fix crash when first input of BatchNormalization is 1-D
2025-01-16 21:11:43 -08:00
Justin Chu
09c4cc7b36
Target py310 and modernize codebase with ruff (#23401)
Change `target-version = "py310"` and modernize the code base with ruff.
2025-01-16 19:10:14 -08:00
Adrian Lizarraga
2ab51c7013
[QNN EP] Fix segfault when unregistering HTP shared memory handles (#23402)
### Description
- Fixes segfault when the function that cleans up HTP memory handles
uses an invalid Logger.
- Fixes unit test that compares output from QNN EP with exact float
values. QNN HTP runs float32 models with float16 precision, so need to
use a tolerance in the comparison.



### Motivation and Context
Fixes issues with using QNN HTP memory sharing on Windows ARM64. This is
also needed to test HTP shared memory with
https://github.com/microsoft/onnxruntime/pull/23120.
2025-01-16 16:20:33 -08:00
Peishen Yan
80f686e055
[WebNN EP] Optimize model partitioning (#23332)
### Description
<!-- Describe your changes. -->

The old `GetCapability` function of WebNN EP is just a very simple
search for groups of nodes that can be handled. This doesn't work well
in the following example graph, where A and D could be handled by the
EP, but B is between them in the topological order, as you get two
single node capabilities. However, it may also be advantageous if C and
E could be handled by the EP, since they would be combined with D even
though they are not connected.
```
    A  B  C
    | /   |
    D     E
    |     |
```
Therefore, we improve partitioning results by reusing
`utils::CreateSupportedPartitions`, which walks the edges for each node
that the EP can handle as they are iterated in topological order. This
would guarantee that all connected nodes that can be handled are grouped
together. Correspondingly, we modify the `webnn::GetSupportedNodes`
function to return the supported nodes instead of the group of supported
partitions.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2025-01-16 13:26:22 -08:00
Tianlei Wu
5735e1bce0
Dump nodes with potential overflow in half conversion (#23363)
Add a tool to generate node_block_list used in [float16 conversion tool](04030f64be/onnxruntime/python/tools/transformers/float16.py (L175)).

Previously, we have a feature to dump statistics data (like min, max) of
each node input/output. However, it is time consuming to generate a list
of nodes that need to be kept in float32 when model is large.

This could help speed up the process by outputting a list of nodes that
have potential overflow in float-to-half conversion.

Usage is to build onnxruntime from source with ` --cmake_extra_defines
onnxruntime_DEBUG_NODE_INPUTS_OUTPUTS=1`, then set some environment
variables before running float32 optimized onnx model like:
```
export ORT_DEBUG_NODE_IO_DUMP_HALF_CONVERSION_OVERFLOW=1
export ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD=50000

python benchmark.py -e optimum --height 1024 --width 1024 --steps 3 -b 1 -v Flux.1D -p flux1_dev_onnx/fp32_opt --skip_warmup
```

The threshold `ORT_DEBUG_NODE_IO_HALF_OVERFLOW_THRESHOLD` shall be <=
65504. The default value is 50000 if the environment variable is not
set. It is better to leave some margin if number of samples are not
large enough in the test.

As a demo, we add an option --skip_warmup to benchmark.py for Flux, so
that we can reduce the time on dumping warm-up runs.

Example snippet of stdout (each inference session has such a summary
when session ended):
```
Total counter in node dumping: 141
Found 2 nodes cannot be converted to half precision due to potential input/output overflow.
Operator frequencies for these nodes:
Softmax : 1
MatMul : 1
# -------
# Example python script for float16 conversion
# For details, search `node_block_list` in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/transformers/float16.py
# -------
from onnxruntime.transformers.onnx_model import OnnxModel
m = OnnxModel(onnx.load('flux1_dev_onnx/fp32_opt/vae_decoder/model.onnx'))
node_block_list = [
  '/decoder/mid_block/attentions.0/Softmax',
  '/decoder/mid_block/attentions.0/MatMul',
]
m.convert_float_to_float16(keep_io_types=False, node_block_list=node_block_list)
m.save_model_to_file('fp16/optimized.onnx', use_external_data_format=False)
```
Then you can use the python script to convert corresponding model to
float16.

### Motivation and Context

It is a tool used to generate node_block_list used in float16 conversion
of stable diffusion 3.x and flux models in
https://github.com/microsoft/onnxruntime/pull/22986.

In stable diffusion or Flux pipeline, there are multiple models and
there could be multiple session runs for each model. Without a proper
tool, it is time consuming to get node_block_list for each model.
2025-01-16 12:54:46 -08:00
Ti-Tai Wang
a08211febb
Register opset 22 (#23344)
### Description
Follw up #21897 
To be compatible with onnx 17.0, Registering opset 22 is required in
terms of the [updated operators
(bfloat16)](https://github.com/onnx/onnx/releases/tag/v1.17.0)



### Motivation and Context
Fix #23162 
Fix #23161
Fix #23164 (Xnnpack)

### Remaining issue
#23163 (QNN) See [the
file](https://github.com/microsoft/onnxruntime/pull/23344/files#diff-04f5d6db0a6873f7299ed06ff1ec45a49e69f0865cb32f4397cd56db0cd0a784)

### Result of `find_optimizer_opset_version_updates_required.py (cpu
only)`
```
[WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_add_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.IsInf. Latest:20 Optimizer support ends at 10. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/isinf_reducesum_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.HardSigmoid. Latest:22 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_add_act_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/layer_norm_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.AveragePool. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.BatchNormalization. Latest:15 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.Upsample. Latest:10 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.Resize. Latest:19 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.GlobalMaxPool. Latest:22 Optimizer support ends at 1. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.GlobalAveragePool. Latest:22 Optimizer support ends at 1. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/nchwc_transformer.cc
[WARNING] - Newer opset found for kOnnxDomain.Shape. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pre_shape_node_elimination.cc
[WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_bn_fusion.cc
[ERROR] - Call/Declaration is split over multiple lines. Please check manually.File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/label_encoder_fusion.cc Line:49
[ERROR] - Failed to find version information for "ai.onnx.ml".LabelEncoder. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/label_encoder_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.HardSigmoid. Latest:22 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_activation_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Dropout. Latest:22 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/dropout_elimination.cc
[WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/gemm_transpose_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/gemm_transpose_fusion.cc
[ERROR] - Symbolic name of 'ignorable_nodes[index].first' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/matmul_bn_fusion.cc
[ERROR] - Symbolic name of 'dest.first' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/matmul_bn_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.AveragePool. Latest:22 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Pad. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/pad_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Dropout. Latest:22 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/bias_dropout_fusion.cc
[ERROR] - Failed to find version information for kMSDomain.BitmaskDropout. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/bias_dropout_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Clip. Latest:13 Optimizer support ends at 6. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/relu_clip_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/fast_gelu_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Cast. Latest:21 Optimizer support ends at 19. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/fast_gelu_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Reshape. Latest:21 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/reshape_fusion.cc
[ERROR] - Failed to find version information for kMSDomain.ConcatTraining. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/reshape_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Where. Latest:16 Optimizer support ends at 9. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/not_where_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Where. Latest:16 Optimizer support ends at 9. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/not_where_fusion.cc
[WARNING] - Newer opset found for kOnnxDomain.Conv. Latest:22 Optimizer support ends at 11. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/conv_mul_fusion.cc
[ERROR] - Symbolic name of 'QOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc
[ERROR] - Symbolic name of 'QOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc
[ERROR] - Symbolic name of 'DQOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc
[ERROR] - Symbolic name of 'DQOpName' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_util.cc
[ERROR] - Call/Declaration is split over multiple lines. Please check manually.File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/avx2_weight_s8_to_u8.cc Line:170
[WARNING] - Newer opset found for kOnnxDomain.MaxPool. Latest:22 Optimizer support ends at 12. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/qdq_transformer/qdq_propagation.cc
[ERROR] - Symbolic name of 'current_node.OpType(' found for op. Please check manually. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/compute_optimizer/upstream_transformer_base.cc
[WARNING] - Newer opset found for kOnnxDomain.Reshape. Latest:21 Optimizer support ends at 14. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/compute_optimizer/upstream_reshape.cc
[WARNING] - Newer opset found for kOnnxDomain.Transpose. Latest:21 Optimizer support ends at 13. File:/home/titaiwang/onnxruntime/onnxruntime/core/optimizer/attention_fusion_helper.h
```
2025-01-16 11:26:34 -08:00
Justin Chu
c7c8757a1c
Use ruff as the formatter to replace black-isort (#23397)
Use ruff as the code formatter in place of black and isort since it is
much faster, and as projects like PyTorch and ONNX have adopted ruff
format as well.

This PR include only auto-fixed changes in formatting.
2025-01-16 11:14:15 -08:00
Yulong Wang
080c67e900
[WebGPU] allow build WebGPU EP for WebAssembly (#23364)
### Description

This PR allows WebGPU EP to be built with Emscripten for WebAssembly,
Including:


- cmake build files update to support correct setup for Emscripten.
- code changes to fix build breaks for wasm
- change in Web CI pipeline to add a build-only target for wasm with
`--use_webgpu`.
2025-01-16 10:52:17 -08:00
Changming Sun
c64fa18834
Change docker buildx's driver to default (#23388)
### Description
Docker's buildx has four different drivers:
1. default
2. docker-container
3. kubernetes
4. remote 
 
Now we are using "docker-container". This PR change it to the default
driver, because the container driver needs to fetch an image from docker
hub which is no longer free and has a rate limit.
2025-01-16 09:38:34 -08:00