Commit graph

7104 commits

Author SHA1 Message Date
Xu Xing
c19617a24a
[js/webgpu] Add GatherND (#22847)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:57:32 -08:00
Yulong Wang
a615bd6688
Bump version of Dawn to 12a3b24c4 (#23002)
### Description

Upgrade version of Dawn.

Removed dawn.patch, because all patches are included in upstream.

Updated code that affected by API changes (`const char*` ->
`WGPUStringView`)


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-12-04 09:47:16 -08:00
Chi Lo
9b9f881475
[TensorRT EP] Use TRT/CUDA/ORT version from runtime instead of build time to generate hash value (#22921)
Use TensorRT and CUDA version fetched at **runtime** to get the hash
value which determines the cache name.

The old way to get the version is at compile/build time that might have
some issues in some cases,
ex:
TRT EP uses the TRT version which we or users built against at compile
time.
However, users can change different TRT version at run time, that can
cause issue because TRT EP always checks the "fixed" TRT version, not
the TRT version it uses now. This can cause TRT EP to use incompatible
TRT engine cache.

see the github issue here:

https://github.com/microsoft/onnxruntime/issues/22382#issuecomment-2404140754
2024-12-03 21:58:43 -08:00
Prathik Rao
5c644d3747
[WebGPU EP] Flatten implementation (#22964)
Implements flatten operator for native webgpu.
2024-12-03 14:40:57 -08:00
Kee
8c52fa3924
[VSINPU]Split/Pad and some element-wise OPs support (#22916)
### Description
-Add split/pad/neg/not/ceil/round/min/max op support
-Fix conv2d op default pads value issue
-Add VSINPU EP to support python bindings


### Motivation and Context
-New OPs support for VSINPU EP

---------

Signed-off-by: Kee <xuke537@hotmail.com>
2024-12-02 13:57:30 -08:00
Satya Kumar Jandhyala
e8bf46a70e
[WebGPU EP] Support GroupQueryAttention (#22658)
### Description
<!-- Describe your changes. -->
Support GroupQueryAttention operator for native webgpu ep.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is required for inferencing some LLMs.
2024-12-02 12:40:03 -08:00
Chi Lo
e234023d11
[TensorRT EP] Fix wrong input order when generating IndexedSubGraph (#22857)
The input order of generated indexedSubGraph needs to be consistent with
the input order of original graph.

This PR will also fix the github issue
https://github.com/microsoft/onnxruntime/issues/22729
2024-12-02 01:45:29 -08:00
Chi Lo
49a80df77f
Keep the model metadata on the generated EP context model (use bridge api) (#22860)
In addition to the
[PR](https://github.com/microsoft/onnxruntime/pull/22825) which directly
uses internal graph api, this PR updates the bridge api for the case of
TRT EP and OpenVINO EP.
2024-12-01 21:57:45 -08:00
Vincent Wang
1128882bfd
Quantize Bias for Conv/Gemm on Quantized Model (#22889)
Some quantized models don't have Conv/Gemm node's bias quantized but
still leave them in float. This PR is to create a sub-graph to quantize
the bias for Conv/Gemm nodes with scale = scale_input_0 * scale_input_1
and zp = 0. We only do this for bias initializer so that ConstantFolding
will fold the sub-graph to a real quantized int32 bias initializer
during the graph optimization next round.
2024-11-28 10:10:24 +08:00
Vincent Wang
42ecb05080
[QNN] ReduceL2 Support (#22636)
Add ReduceL2 support to QNN EP. Some of the QNN AI Hub models contain
Reduce L2, such as openai_clip_CLIPTextEncoder and
openai_clip_CLIPIamgeEncoder, without this PR, the ReduceL2 will be
assigned to CPU and the graph will be split to 2 QNN graphs, which this
PR, all nodes will be in QNN EP.
2024-11-28 10:09:13 +08:00
Jing Fang
08abab0b14
[CPU] Fix mamtulnbits accuracy level (#22963)
### Description
Fix mamtulnbits accuracy level



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-27 17:40:04 -08:00
wejoncy
a24723df16
[CoreML ] ML Program more operators support [3/N] (#22710)
### Description
- Erf
- Round
- Max
- ReduceMax
- ReduceMean
- ReduceSum
- Unsqueeze
- Squeeze
- Softmax



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-28 09:21:02 +08:00
wejoncy
c284a686f2
[CoreML] Create EP by AppendExecutionProvider (#22675)
### Description
AppendExecutionProvider("CoreML", {{"MLComputeUnits","MLProgram"}})



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-27 09:26:31 +08:00
Chen Feiyue
487184fa42
[VSINPU] update crosscompiling patch (#22937)
### Description
<!-- Describe your changes. -->
Update this patch because the origin file has changed


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-26 14:35:16 -08:00
amancini-N
8826e39a81
#22890 Fix profiling on empty Optional (#22891)
### Description
Fix sequential_executor.cc to avoid segfault when profiling is used on
model with empty Optional



### Motivation and Context
Fixes #22890
2024-11-26 11:18:47 -08:00
shiyi
afbb53937c
[WebNN] Support negative steps for slice (#22871)
Slice with negative steps can be emulated by reverse+slice.
2024-11-25 23:06:23 -08:00
Bin Miao
558ae8621c
[WebNN EP] Fix an issue of CumSum operator (#22936)
This PR limits the axis of the CumSum operator to be a constant when
using WebNN EP.
@Honry  @fdwr PTAL.
2024-11-25 21:05:53 -08:00
Yi Zhang
85751e7276
Build DML in Windows GPU CI pipeline (#22869)
### Description
Add a new stage to build cuda and dml in Windows GPU CI pipeline (PR
checks) to prevent regressions introduced by new cuda tests.
Update all tests in cuda/testcases name prefix to CudaEp for skipping
them easily

### Motivation and Context
1. CudaNhwcEP is added by default when using cuda ep
2. if onnxruntime_ENABLE_CUDA_EP_INTERNAL_TES is enable, the tests in
tests/provider/cuda/testcases is added too.

### To do
add enable_pybind in the new stage.
Now, --enable_pybind will trigger some python test, like
onnxruntime_test_python.py.
It uses the API of get_avaible_providers() .
More discussions are needed to decide how to make it works
2024-11-25 10:50:52 +08:00
Xavier Dupré
a2ba3cb547
Implementation of TreeEnsemble ai.onnx.ml==5 (#22333)
### Description
Merges PR #21851, #21222.

Implements TreeEnsemble from ai.onnx.ml==5 (CPU).

---------

Co-authored-by: Bilyana Indzheva <bilyana2002@gmail.com>
Co-authored-by: Bilyana Indzheva <36890669+bili2002@users.noreply.github.com>
Co-authored-by: Christian Bourjau <cbourjau@users.noreply.github.com>
2024-11-22 19:48:23 +01:00
Scott McKay
b1ccbe2a8e
Minor update to onnxruntime_perf_test usage info for -I (#22810)
### Description
<!-- Describe your changes. -->
Update comment for `-I` to mention that symbolic dim values can be
provided with `-f`.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-22 16:38:25 +11:00
Aleksei Nikiforov
f6e1d44829
Add option to force generic algorithms on x86 (#22917)
Option is named onnxruntime_FORCE_GENERIC_ALGORITHMS

Follow up to https://github.com/microsoft/onnxruntime/pull/22125.

### Description
This change adds compile-time option to disable optimized algorithms and
use generic algorithms (exclude AVX* and SSE etc in GEMM) on x86. This
new option is intended only for testing these algorithms, not for
production use.

Following build command on linux x86_64 builds onnxruntime with new
option enabled:
`./build.sh --parallel --cmake_extra_defines
onnxruntime_FORCE_GENERIC_ALGORITHMS=1`

### Motivation and Context
This change allows testing generic algorithms. This may be needed for
platforms which don't have optimized implementations available, like in
https://github.com/microsoft/onnxruntime/pull/22125.
2024-11-21 13:45:46 -08:00
Tianlei Wu
8d99b1a8dc
reduce GQA test combinations (#22918)
### Description
* Reduce GQA test combinations to save about 35 minutes test time in CI
pipelines.
* Show latency of transformers tests
* Use seed in DMMHA test to avoid random failure.
* For test_flash_attn_rocm.py, test skipping condition from "has cuda
ep" to "not has rocm ep", so that it does not run in cpu build.
* For test_flash_attn_cuda.py, move flash attention and memory efficient
attention tests to different classes, so that we can skip a test suite
instead of checking in each test.

### Motivation and Context
It takes too long to run GQA tests in CI pipelines since there are too
many combinations.

###### Linux GPU CI Pipeline
Before: 5097 passed, 68 skipped, 8 warnings in 1954.64s (0:32:34)
After:  150 passed, 176 skipped, 8 warnings in 530.38s (0:08:50)
Time Saved: **1424** seconds (0:23:44)

###### Windows GPU CUDA CI Pipeline
Before: 1781 passed, 72 skipped, 6 warnings in 605.48s (0:10:05)
After: 116 passed, 118 skipped, 6 warnings in 275.48s (0:04:35) 
Time Saved: **330** seconds (0:05:30)

###### Linux CPU CI Pipeline
Before: 5093 passed, 72 skipped, 4 warnings in 467.04s (0:07:47)
- 212.96s transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 154.12s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 26.45s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

After: 116 passed, 210 skipped, 4 warnings in 93.41s (0:01:33)
- 0.97s  transformers/test_gqa_cpu.py::TestGQA::test_gqa_past
- 19.23s transformers/test_gqa_cpu.py::TestGQA::test_gqa_no_past
- 2.41s
transformers/test_gqa_cpu.py::TestGQA::test_gqa_interactive_one_batch

Time Saved: **374** seconds (0:06:14).
2024-11-21 12:26:46 -08:00
Tianlei Wu
55f0559e5d
Update attention fusion to support SDPA pattern (#22629)
### Description
Match new SDPA pattern for huggingface BERT model that exported from
latest transformers package.

Some changes of transformers tests in CI pipeline:
(1) Enable tests for bert, distilbert and roberta models in CI.
(2) Remove out-of-date tests for huggingface models that were marked as
slow and not enabled in CI pipeline.
(3) Upgrade transformers package version to the latest.

### Motivation and Context

Recent huggingface transformers use torch SDPA in bert modeling. The
graph pattern change causes attention fusion not working anymore. Update
the fusion script to match the new pattern.
2024-11-21 09:42:41 -08:00
kailums
1e605be166
bigmodel pipeline update cp38 to cp310 (#22793)
### Description
<!-- Describe your changes. -->
when updating from cp38 to cp310, there has some issues for bigmodel
pipeine. there are two jobs failed: stable_diffusion and whisper.

1. for stable_diffusion, we are now using
"nvcr.io/nvidia/pytorch:22.11-py3" from nvidia repo. it is for cuda11
and python3.8. and they are not providing python3.10 version for cuda
11. the latest version of this docker image is for cuda12 and
python3.10. To solve this problem, i use a docker image of ubuntu22.04,
and then install all need python package for this job.
2. for whisper. the original docker image is ubuntu20.04 which doesn't
have python3.10, and has to update to ubuntu22.04.
2024-11-21 07:25:01 -08:00
Aleksei Nikiforov
e430795332
Fix MlasSgemmKernel: properly process more than 2 rows (#22125)
This change fixes multiple tests like QDQTransformerTests.MatMul_U8S8S8,
for all architectures where architecture-specific
optimized function is not available yet, like s390x.

### Description
Matrix B is packed by 16 elements, thus new row starts 16 items later.
Also, for next C increment index only by 1 for each increment of C.


### Motivation and Context
This change fixes mlas sgemm fallback implementation for all
architectures which don't have architecture-specific implementations
available, like s390x.
2024-11-20 16:00:23 -08:00
Edward Chen
af0303f9b4
Simplify CPU allocator arena usage helper function, fix unit tests that check old ifdefs. (#22876) 2024-11-19 14:24:52 -08:00
Changming Sun
13346fdf18
Cleanup code (#22827)
### Description
1.  Delete TVM EP because it is out of maintain 
2.  Delete ortmodule related docker files and scripts.
2024-11-19 14:13:33 -08:00
Wanming Lin
5b787121e8
[WebNN] Check split's output name (#22884)
Chromium will rename split's output name from "output" to "outputs" in
`OpSupportLimits` to align with spec, the EP should check which name is
available to make it compatible.
2024-11-19 12:44:23 -08:00
Chi Lo
56e4fda8a8
[TensorRT EP] Revert "Add new provider option to exclude nodes from running on TRT" (#22878)
- Revert https://github.com/microsoft/onnxruntime/pull/22681
- But still implicitly exclude DDS ops for TRT 10. Will later provide
better PR to add trt_op_types_to_exclude provider option.
2024-11-19 09:08:54 -08:00
Adrian Lizarraga
497b06f0a9
[QNN EP] QNN SDK 2.28.2 (#22844)
### Description
- Updates pipelines to use QNN SDK 2.28.2.241116.
- Re-enable LayerNormalization unit tests that failed with accuracy
errors with the previous QNN SDK (2.28.0).
- Update QNN EP to no longer provide a dummy bias for LayerNorm if the
QNN SDK version is >= 2.28.0.


### Motivation and Context
Use the latest QNN SDK. This version improves inference latency for
certain customer models.
2024-11-18 20:10:36 -08:00
Tianlei Wu
c4f3742bb4
Replace INFINITY by std::numeric_limits<float>::infinity() (#22868)
Replace INFINITY by `std::numeric_limits<float>::infinity()` to avoid
build errors with Visual Studio 2022 v17.12 Preview 5

### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/22728
2024-11-18 09:16:41 -08:00
Yi-Hong Lyu
02a0be3599
Optimize Transpose around QLinearSoftmax (#22849)
### Description
<!-- Describe your changes. -->

- Improved Transpose around QLinearSoftmax in Level 3 NHWC Transformer.
- Removed redundant code HandleQLinearConcat, HandleQLinearBinaryOp.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

By merging and eliminating redundant transpose , the Image Segmentation
i8 model (MobileNetv2 + DeepLabv3) achieves a 2.34X speedup.
2024-11-18 06:58:21 -08:00
Yi Zhang
135d8b2beb
Fix CUDA/DML package exception caused by ENABLE_CUDA_NHWC_OPS (#22851)
### Description
Now,  ENABLE_CUDA_NHWC_OPS is enabled by default.
It adds a new chance to create cuda provider while both cuda/dml are
enabled


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-18 10:46:23 +08:00
liqun Fu
101ed10e5e
Refactor SkipLayerNorm and handle beta properly (#22862)
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-17 14:51:16 -08:00
Peishen Yan
5928009553
[WebNN EP] Support Einsum op (#19558)
Adds support for einsum via WebNN matmul, transpose, reshape, reducesum,
identity and element-wise binary ops.
2024-11-15 17:58:35 -08:00
Jing Fang
c73a3d1804
[ARM] MatMulNBits fp16 support - connect kernels (#22856)
### Description
A breakdown PR of https://github.com/microsoft/onnxruntime/pull/22651



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-15 14:59:11 -08:00
Po-Wei (Vincent)
bbe7c87738
Fix 1.20 cuda minimal build failure (#22751)
### Description
Fixes build failure for the cuda minimal build




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[This change](https://github.com/microsoft/onnxruntime/pull/19470) in
1.20 is causing build failures for the cuda minimal build.
Essentially, some cudnn logic was not guarded by the `USE_CUDA_MINIMAL`.
Also the build is looking for cudnn while in the cuda minimal build it
shouldn't depend on it, resulting in linking error.


cc @gedoensmax @chilo-ms
2024-11-15 10:50:55 -08:00
Preetha Veeramalai
ac9c135b95
Ovep develop 1.21 (#22824)
### Description
OVEP development changes for ORT 1.21 Release


### Motivation and Context
Has critical bug fixes
Support for concurrency execution of models is enabled
Support for OV 2024.5
Memory optimizations for NPU platform

---------

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
2024-11-14 20:10:07 -08:00
Adrian Lizarraga
0733733307
[Quant tool] Handle input models with pre-quantized weights (#22633)
### Description
Allows the QDQ quantizer to handle input models that already have some
pre-quantized weights. In this case, the qdq quantizer will properly
skip/handle the pre-quantized weights.

Also handles an operator (e.g., Conv) with a pre-quantized weight and a
float bias. The tool will read the pre-quantized weight's quantization
scale to compute the bias's scale (`bias_scale = input_scale *
weight_scale`).

Input model (pre-quantized Conv weight):

![image](https://github.com/user-attachments/assets/7d2626e4-49ad-47ae-bd0e-6339ac590435)

Output QDQ model (everything is quantized):

![image](https://github.com/user-attachments/assets/393804d3-f042-47bd-895f-3d667fb2ae94)


### Motivation and Context
Customers may use external tools to quantize some weights (e.g., int4
for Conv/MatMul). The qdq quantizer should still be able to quantize the
rest of the model (float weights and activations) in this case.
2024-11-14 13:48:46 -08:00
Yifan Li
562ddce270
Re-enable test symbolic shape infer (#22737)
### Description
<!-- Describe your changes. -->
It seems after CI updated to py310, numpy got updated to 2.0 and sympy
1.2 failed to cast float numpy array.
Pointing sympy to 1.13 when py>=3.9 and re-enable unit test

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Error: Linux CPU
CI
2024-11-14 11:28:00 -08:00
Jing Fang
c02b398980
[ARM] MatMulNBits Fp16 support - API change only (#22826)
### Description
A break-down PR of https://github.com/microsoft/onnxruntime/pull/22651
Op API change only.
- add template to functions and classes that support fp32 and fp16
- rename functions, classes and files that support fp32 and fp16 from
SQNBxxx to QNBxxx


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-14 10:38:59 -08:00
dtang317
12dfe2859c
Register groupnorm for opset 21 (#22830)
### Description
This PR registers GroupNormalization for opset 21



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-14 10:06:30 -08:00
Tianlei Wu
09c98433e7
[CUDA] stable diffusion benchmark allows IO binding for optimum (#22834)
### Description

Update stable diffusion benchmark:
(1) allow IO binding for optimum.
(2) do not use num_images_per_prompt across all engines for fair
comparison.

Example to run benchmark of optimum on stable diffusion 1.5:
```
git clone https://github.com/tianleiwu/optimum
cd optimum
git checkout tlwu/diffusers-io-binding
pip install -e .

pip install -U onnxruntime-gpu
git clone https://github.com/microsoft/onnxruntime
cd onnxruntime/onnxruntime/python/tools/transformers/models/stable_diffusion
git checkout tlwu/benchmark_sd_optimum_io_binding
pip install -r requirements/cuda12/requirements.txt

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5  --task text-to-image ./sd_onnx_fp32

python optimize_pipeline.py -i ./sd_onnx_fp32 -o ./sd_onnx_fp16 --float16
python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16
python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16 --use_io_binding
```

Example output in H100_80GB_HBM3: 572 ms with IO Binding; 588 ms without
IO Binding; IO binding gains 16ms, or 2.7%,

### Motivation and Context

Optimum is working on enabling I/O binding:
https://github.com/huggingface/optimum/pull/2056. This could help
testing the impact of I/O binding on the performance of the stable
diffusion.
2024-11-14 00:09:07 -08:00
Michael Tyler
dd99e34d66
Enable ConvReplaceWithQLinear when using ACL (#22823)
### Description
Enable the ConvReplaceWithQLinear graph optimization when using the ACL
execution provider.



### Motivation and Context
Fixes an issue where quantized Conv nodes followed by ReLU don't get
converted to QLinearConv, so ACL sees the weights as mutable and
therefore cannot run the Conv node.

Signed-off-by: Michael Tyler <michael.tyler@arm.com>
2024-11-13 21:44:50 -08:00
Bin Miao
a15381d7fc
[WebNN EP] Fix issues of GRU operator (#22123)
### Description
This PR fixes the spelling of the key value of the GRU operator in the
map in the `GetSupportedNodes` function (Gru -> GRU) and removes the
data type check for the fifth input (sequence_lens) of the GRU operator.

PTAL, thanks!
2024-11-13 13:34:34 -08:00
Hector Li
a9b62fa8da
Keep the model metadata on the generated EP context model (#22825)
### Description
Keep the model metadata on the generated EP context model
2024-11-13 11:52:21 -08:00
Chi Lo
fa4cbcd36b
[TensorRT EP] Add new provider option to exclude nodes from running on TRT (#22681)
Add new provider option `trt_op_types_to_exclude`:
- User can provide op type list to be excluded from running on TRT
- e.g. `trt_op_types_to_exclude="MaxPool"`

There is a known performance issue with the DDS ops (NonMaxSuppression,
NonZero and RoiAlign) from TRT versions 10.0 to 10.7. TRT EP excludes
DDS ops from running on TRT by default, user can override default value
with empty string to include all ops.
2024-11-13 11:34:43 -08:00
shiyi
3adcf4d714
[WebNN] Remove validation for coordinate_transformation_mode (#22811)
The performance cost of falling back to the CPU EP is high for several
resampling nodes and causes multiple partitions in SD Turbo and VAE
decoder. Since the asymmetric mode with nearest to floor and integer
scales is identical to half_pixel anyway, stick with the WebNN EP.
2024-11-13 11:12:00 -08:00
Xu Xing
ff57ac4f3d
[js/webgpu] Add scatterND (#22755)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-13 09:13:00 -08:00
liqun Fu
bc2b1b5e37
Fix issue #22796 - a typo: (__GNUC__ > 9) -> (__GNUC__ > 10) (#22807)
### Description
fix #22796 
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
2024-11-12 18:56:35 -08:00