Commit graph

12015 commits

Author SHA1 Message Date
Chi Lo
56e4fda8a8
[TensorRT EP] Revert "Add new provider option to exclude nodes from running on TRT" (#22878)
- Revert https://github.com/microsoft/onnxruntime/pull/22681
- But still implicitly exclude DDS ops for TRT 10. Will later provide
better PR to add trt_op_types_to_exclude provider option.
2024-11-19 09:08:54 -08:00
Changming Sun
a0d36a508c
Move C# doc Github Action to Windows (#22880)
### Description
Move C# doc Github Action to Windows machines, to avoid having
dependency on Mono which I think is getting deprecated.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-18 23:56:59 -08:00
Adrian Lizarraga
497b06f0a9
[QNN EP] QNN SDK 2.28.2 (#22844)
### Description
- Updates pipelines to use QNN SDK 2.28.2.241116.
- Re-enable LayerNormalization unit tests that failed with accuracy
errors with the previous QNN SDK (2.28.0).
- Update QNN EP to no longer provide a dummy bias for LayerNorm if the
QNN SDK version is >= 2.28.0.


### Motivation and Context
Use the latest QNN SDK. This version improves inference latency for
certain customer models.
2024-11-18 20:10:36 -08:00
Jiajia Qin
e597eaed4a
[js/webgpu] Optimize transpose as reshape when suitable (#22870)
BUG #22031
2024-11-18 12:52:48 -08:00
Tianlei Wu
c4f3742bb4
Replace INFINITY by std::numeric_limits<float>::infinity() (#22868)
Replace INFINITY by `std::numeric_limits<float>::infinity()` to avoid
build errors with Visual Studio 2022 v17.12 Preview 5

### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/22728
2024-11-18 09:16:41 -08:00
Yi-Hong Lyu
02a0be3599
Optimize Transpose around QLinearSoftmax (#22849)
### Description
<!-- Describe your changes. -->

- Improved Transpose around QLinearSoftmax in Level 3 NHWC Transformer.
- Removed redundant code HandleQLinearConcat, HandleQLinearBinaryOp.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

By merging and eliminating redundant transpose , the Image Segmentation
i8 model (MobileNetv2 + DeepLabv3) achieves a 2.34X speedup.
2024-11-18 06:58:21 -08:00
Yi Zhang
135d8b2beb
Fix CUDA/DML package exception caused by ENABLE_CUDA_NHWC_OPS (#22851)
### Description
Now,  ENABLE_CUDA_NHWC_OPS is enabled by default.
It adds a new chance to create cuda provider while both cuda/dml are
enabled


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-18 10:46:23 +08:00
liqun Fu
101ed10e5e
Refactor SkipLayerNorm and handle beta properly (#22862)
Signed-off-by: Liqun Fu <liqfu@microsoft.com>
Signed-off-by: Liqun Fu <liqun.fu@microsoft.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-11-17 14:51:16 -08:00
Peishen Yan
5928009553
[WebNN EP] Support Einsum op (#19558)
Adds support for einsum via WebNN matmul, transpose, reshape, reducesum,
identity and element-wise binary ops.
2024-11-15 17:58:35 -08:00
Jing Fang
c73a3d1804
[ARM] MatMulNBits fp16 support - connect kernels (#22856)
### Description
A breakdown PR of https://github.com/microsoft/onnxruntime/pull/22651



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-15 14:59:11 -08:00
Po-Wei (Vincent)
bbe7c87738
Fix 1.20 cuda minimal build failure (#22751)
### Description
Fixes build failure for the cuda minimal build




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
[This change](https://github.com/microsoft/onnxruntime/pull/19470) in
1.20 is causing build failures for the cuda minimal build.
Essentially, some cudnn logic was not guarded by the `USE_CUDA_MINIMAL`.
Also the build is looking for cudnn while in the cuda minimal build it
shouldn't depend on it, resulting in linking error.


cc @gedoensmax @chilo-ms
2024-11-15 10:50:55 -08:00
Preetha Veeramalai
ac9c135b95
Ovep develop 1.21 (#22824)
### Description
OVEP development changes for ORT 1.21 Release


### Motivation and Context
Has critical bug fixes
Support for concurrency execution of models is enabled
Support for OV 2024.5
Memory optimizations for NPU platform

---------

Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
Co-authored-by: Ankit Maheshkar <ankit.maheshkar@intel.com>
Co-authored-by: sfatimar <sahar.fatima@intel.com>
Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: TejalKhade28 <tejal.khade@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
2024-11-14 20:10:07 -08:00
Jian Chen
632a36a233
Update Gradle version 8.7 and java version 17 within onnxruntime/java (#22771)
### Description
This change is to update the Gradle version within java project to 8.7,
it also upgrades the JAVA to 17. Gradle version from react-native was
also updated to 7.5 to make it compatible with changes from the Java
directory. However, the target java version remains the same. Java
version from these will be upgraded in a separated PR.

This is spited from #22206

### Motivation and Context
This is the first step to upgrade the react native version.
2024-11-14 17:10:44 -08:00
Adrian Lizarraga
0733733307
[Quant tool] Handle input models with pre-quantized weights (#22633)
### Description
Allows the QDQ quantizer to handle input models that already have some
pre-quantized weights. In this case, the qdq quantizer will properly
skip/handle the pre-quantized weights.

Also handles an operator (e.g., Conv) with a pre-quantized weight and a
float bias. The tool will read the pre-quantized weight's quantization
scale to compute the bias's scale (`bias_scale = input_scale *
weight_scale`).

Input model (pre-quantized Conv weight):

![image](https://github.com/user-attachments/assets/7d2626e4-49ad-47ae-bd0e-6339ac590435)

Output QDQ model (everything is quantized):

![image](https://github.com/user-attachments/assets/393804d3-f042-47bd-895f-3d667fb2ae94)


### Motivation and Context
Customers may use external tools to quantize some weights (e.g., int4
for Conv/MatMul). The qdq quantizer should still be able to quantize the
rest of the model (float weights and activations) in this case.
2024-11-14 13:48:46 -08:00
Yifan Li
562ddce270
Re-enable test symbolic shape infer (#22737)
### Description
<!-- Describe your changes. -->
It seems after CI updated to py310, numpy got updated to 2.0 and sympy
1.2 failed to cast float numpy array.
Pointing sympy to 1.13 when py>=3.9 and re-enable unit test

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Error: Linux CPU
CI
2024-11-14 11:28:00 -08:00
Jing Fang
c02b398980
[ARM] MatMulNBits Fp16 support - API change only (#22826)
### Description
A break-down PR of https://github.com/microsoft/onnxruntime/pull/22651
Op API change only.
- add template to functions and classes that support fp32 and fp16
- rename functions, classes and files that support fp32 and fp16 from
SQNBxxx to QNBxxx


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-14 10:38:59 -08:00
Jian Chen
c645bd202c
Fix spellchecks from Optional Lint (#22802)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-14 10:27:33 -08:00
dtang317
12dfe2859c
Register groupnorm for opset 21 (#22830)
### Description
This PR registers GroupNormalization for opset 21



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-14 10:06:30 -08:00
Jian Chen
5659d055ee
Fix Linux CI pipeline where ep was not provided for py-packaging-linux-test-cpu.yml (#22828)
### Description
Current linux-ci-pipeline was broken due to missing parameters from
`py-packaging-linux-test-cpu.yml` template


### Motivation and Context
Fix Linux CI pipeline
2024-11-14 09:41:37 -08:00
Tianlei Wu
09c98433e7
[CUDA] stable diffusion benchmark allows IO binding for optimum (#22834)
### Description

Update stable diffusion benchmark:
(1) allow IO binding for optimum.
(2) do not use num_images_per_prompt across all engines for fair
comparison.

Example to run benchmark of optimum on stable diffusion 1.5:
```
git clone https://github.com/tianleiwu/optimum
cd optimum
git checkout tlwu/diffusers-io-binding
pip install -e .

pip install -U onnxruntime-gpu
git clone https://github.com/microsoft/onnxruntime
cd onnxruntime/onnxruntime/python/tools/transformers/models/stable_diffusion
git checkout tlwu/benchmark_sd_optimum_io_binding
pip install -r requirements/cuda12/requirements.txt

optimum-cli export onnx --model runwayml/stable-diffusion-v1-5  --task text-to-image ./sd_onnx_fp32

python optimize_pipeline.py -i ./sd_onnx_fp32 -o ./sd_onnx_fp16 --float16
python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16
python benchmark.py -e optimum -r cuda -v 1.5 -p ./sd_onnx_fp16 --use_io_binding
```

Example output in H100_80GB_HBM3: 572 ms with IO Binding; 588 ms without
IO Binding; IO binding gains 16ms, or 2.7%,

### Motivation and Context

Optimum is working on enabling I/O binding:
https://github.com/huggingface/optimum/pull/2056. This could help
testing the impact of I/O binding on the performance of the stable
diffusion.
2024-11-14 00:09:07 -08:00
Michael Tyler
dd99e34d66
Enable ConvReplaceWithQLinear when using ACL (#22823)
### Description
Enable the ConvReplaceWithQLinear graph optimization when using the ACL
execution provider.



### Motivation and Context
Fixes an issue where quantized Conv nodes followed by ReLU don't get
converted to QLinearConv, so ACL sees the weights as mutable and
therefore cannot run the Conv node.

Signed-off-by: Michael Tyler <michael.tyler@arm.com>
2024-11-13 21:44:50 -08:00
Wanming Lin
82681205e4
[WebNN] Fix MLTensorUsage is undefined issue (#22831)
`MLTensorUsage` has been removed from Chromium:
https://chromium-review.googlesource.com/c/chromium/src/+/6015318, but
we still need to make it compatible with old Chrome versions, so just
make it `undefined` for latest Chrome version.
2024-11-13 20:22:22 -08:00
Jian Chen
f423b737a9
Fix Linux python CUDA package pipeline (#22803)
### Description
Making ::p optional in the Linux python CUDA package pipeline



### Motivation and Context
Linux stage from Python-CUDA-Packaging-Pipeline has failed since merge
of #22773
2024-11-13 14:20:21 -08:00
microsoft-github-policy-service[bot]
6d7603f054
Auto-generated baselines by 1ES Pipeline Templates (#22817) 2024-11-13 13:50:52 -08:00
Bin Miao
a15381d7fc
[WebNN EP] Fix issues of GRU operator (#22123)
### Description
This PR fixes the spelling of the key value of the GRU operator in the
map in the `GetSupportedNodes` function (Gru -> GRU) and removes the
data type check for the fifth input (sequence_lens) of the GRU operator.

PTAL, thanks!
2024-11-13 13:34:34 -08:00
Hector Li
a9b62fa8da
Keep the model metadata on the generated EP context model (#22825)
### Description
Keep the model metadata on the generated EP context model
2024-11-13 11:52:21 -08:00
Chi Lo
fa4cbcd36b
[TensorRT EP] Add new provider option to exclude nodes from running on TRT (#22681)
Add new provider option `trt_op_types_to_exclude`:
- User can provide op type list to be excluded from running on TRT
- e.g. `trt_op_types_to_exclude="MaxPool"`

There is a known performance issue with the DDS ops (NonMaxSuppression,
NonZero and RoiAlign) from TRT versions 10.0 to 10.7. TRT EP excludes
DDS ops from running on TRT by default, user can override default value
with empty string to include all ops.
2024-11-13 11:34:43 -08:00
shiyi
3adcf4d714
[WebNN] Remove validation for coordinate_transformation_mode (#22811)
The performance cost of falling back to the CPU EP is high for several
resampling nodes and causes multiple partitions in SD Turbo and VAE
decoder. Since the asymmetric mode with nearest to floor and integer
scales is identical to half_pixel anyway, stick with the WebNN EP.
2024-11-13 11:12:00 -08:00
Xu Xing
ff57ac4f3d
[js/webgpu] Add scatterND (#22755)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-13 09:13:00 -08:00
liqun Fu
bc2b1b5e37
Fix issue #22796 - a typo: (__GNUC__ > 9) -> (__GNUC__ > 10) (#22807)
### Description
fix #22796 
Signed-off-by: liqunfu <liqun.fu@microsoft.com>
2024-11-12 18:56:35 -08:00
Xiang Zhang
69a36eb231
Revert Implement DML copy for Lora Adapters (#22814)
Revert https://github.com/microsoft/onnxruntime/pull/22396
2024-11-12 17:45:59 -05:00
Jing Fang
7fa69461fd
[ARM] MatMulNBits FP16 support - kernels only (#22806)
### Description
A break down PR of https://github.com/microsoft/onnxruntime/pull/22651
Add fp16 kernels.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-12 14:28:47 -08:00
Jiajia Qin
7e0dd9d433
[js/webgpu] Optimize Expand (#22752)
Use components = 4 if possible.

llama3.2-1B becomes 20 tokens/s from 18 tokens/s on my iGPUs.
2024-11-12 12:37:19 -08:00
Jiajia Qin
05c8dc9d1c
[js/webgpu] Optimize ConvTranspose (#22774)
BUG #22031 

The overall time of ConvTranspose in Demucs model becomes 517.41 ms from
1415.65 ms on my iGPUs.
2024-11-12 12:37:07 -08:00
Bin Miao
67f5be0da2
[WebNN EP] Support LRN operator (#22775)
WebNN doesn't provide dedicate op for LRN, use a couple of WebNN ops to
emulate it in WebNN EP:
pow -> transpose -> pad -> averagePool -> transpose -> mul -> add -> pow
-> div
@Honry @fdwr PTAL, thanks!
2024-11-12 11:53:52 -08:00
junchao-zhao
fd5b1a18ee
Fix LARCH64 compile error (#22759)
### Description

Currently loongarch has not implemented AIsSigned qgemm, so I added
bypass for it
2024-11-12 11:47:43 -08:00
Jian Chen
75a44582ba
Update all JDK version to 17 (#22786) 2024-11-12 11:42:18 -08:00
Ted Themistokleous
2b0f3435d2
[MIGraphX EP] Add support for Gelu, BiasGelu, FastGelu operators (#22808)
### Description
Adds support for different flavours of gelu already supported in
MIGraphX
2024-11-12 11:04:15 -08:00
dtang317
9836ef1c89
register Identity and QLinearMatmul for opset21 (#22804)
### Description
This PR registers the following opset 21 operators:

Idenity-21
OlieanrMatmul-21



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-12 09:36:19 -08:00
amarin16
f0ac5e0d3d
Update skip layer norm (#22719)
Update the `SkipLayerNorm` implementation to address issues.
2024-11-12 07:01:30 -08:00
Wanming Lin
cdc8db9984
[WebNN] Fixed WebNN Module undefined issue (#22795)
`Module.jsepRegisterMLConstant` will be shorten by Closure Compiler in
offical release, this would cause undefined error.

Fix it by using `Module['jsepRegisterMLConstant']`.
2024-11-11 21:31:24 -08:00
Adrian Lizarraga
0ad44d0f79
[Quant Tool] Flaky test due to Pad reflect bug (#22798)
### Description
Fixes a unit test that would fail intermittently due to an existing bug
with Pad (reflect mode). When the number of padded values is >= the
inner dimension size, the ORT Pad implementation accesses invalid
memory. This PR makes the number of padding values less than the inner
dimension size to avoid triggering the bug.


### Motivation and Context
See related issues:
https://github.com/microsoft/onnxruntime/issues/8265
https://github.com/microsoft/onnxruntime/issues/11828
https://github.com/microsoft/onnxruntime/issues/20801

Here's a valgrind trace obtained on a Linux machine (with
`sess_options.enable_cpu_mem_arena = False`)
```
==864228== Invalid read of size 4
==864228==    at 0x2716272A: void onnxruntime::PadInnermostAxis<unsigned int>(unsigned int*, unsigned int*, long, unsigned long) (pad.cc:370)
==864228==    by 0x2715D213: onnxruntime::common::Status onnxruntime::PadImpl<unsigned int>(onnxruntime::OpKernelContext*, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, absl::lts_20240722::InlinedVector<long, 10ul, std::allocator<long> > const&, onnxruntime::Mode const&, unsigned int) (pad.cc:551)
==864228==    by 0x2715B2BB: onnxruntime::Pad::Compute(onnxruntime::OpKernelContext*) const (pad.cc:725)
==864228==    by 0x276FF6A7: onnxruntime::ExecuteKernel(onnxruntime::StreamExecutionContext&, unsigned long, unsigned long, bool const&, onnxruntime::SessionScope&) (sequential_executor.cc:484)
==864228==    by 0x276F4A04: onnxruntime::LaunchKernelStep::Execute(onnxruntime::StreamExecutionContext&, unsigned long, onnxruntime::SessionScope&, bool const&, bool&) (execution_steps.cc:73)
...
```

The above is obtained with the basic Pad(reflect) example on the [ONNX
Pad operator spec
page](https://onnx.ai/onnx/operators/onnx__Pad.html#summary):

```python
data = [
    [1.0, 1.2],
    [2.3, 3.4],
    [4.5, 5.7],
]

pads = [0, 2, 0, 0]

mode = 'reflect'

# Expected output by ONNX spec
expected_output = [
    [1.0, 1.2, 1.0, 1.2],
    [2.3, 3.4, 2.3, 3.4],
    [4.5, 5.7, 4.5, 5.7],
]

# Bugged output from onnxruntime has invalid/uninitialized data for the first element in the inner dimension
# invalid data may be 0.0, inf, nan, etc.
ort_output = [
    [inf, 1.2, 1.0, 1.2],
    [inf, 3.4, 2.3, 3.4],
    [inf, 5.7, 4.5, 5.7],
]
```
2024-11-11 19:49:27 -08:00
shiyi
f7d1f0fc5e
Reland "[WebNN] Fallback the node when its output doesn't have shape info" (#22685)
The previous PR was reverted because it causes the whole model to
fallback when there is output shape info missing. This PR fixes the
issue by removing redundant fallbacks.
2024-11-11 16:30:10 -08:00
Adrian Lizarraga
b1e0930eab
Fix build for linux python wheel (#22801)
### Description
Fixes command for building Linux python packages by preventing an empty
`-p` command-line option from being passed to a subsequent build script:
1f3b675453/tools/ci_build/github/linux/run_python_dockerbuild.sh (L37)



### Motivation and Context
A recent [PR
](https://github.com/microsoft/onnxruntime/pull/22773)introduced a new
optional command-line option (`-p`) to pass custom python exe paths. We
need to check if the option is empty before forwarding the option to a
separate build script.
2024-11-11 15:20:07 -08:00
Jian Chen
885a7acd45
Fix warning - LegacyKeyValueFormat: "ENV key=value" should be used instead of legacy "ENV key value" format (#22800)
### Description
This PR Fix warning - `LegacyKeyValueFormat: "ENV key=value" should be
used instead of legacy "ENV key value" format` from all Dockerfile



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-11-11 13:05:34 -08:00
Xavier Dupré
1f3b675453
Fix MatMulBnFusion to exclude cases when tensors are not 2D tensors (#22762)
### Description
Fixes #22512, MatMul, Add can be fused into a single Gemm even if
tensors dimensions are > 2. The PR excludes that cases.



### Motivation and Context
ORT crashes on valid models due to that unexpected fusion.
2024-11-11 19:48:25 +01:00
Dmitri Smirnov
c5276ac448
Revert "enable serialize prepacked weights into data file (#22256)" (#22788)
This reverts commit c5b6be045f.

### Description
Revert

### Motivation and Context
This needs simpler and more robust approach
2024-11-11 09:59:05 -08:00
sheetalarkadam
e8f1d73b0b
Add Android QNN Browserstack test (#22434)
Add Android QNN Browserstack test



### Motivation and Context
Real device test in CI
2024-11-10 16:10:29 -08:00
Preetha Veeramalai
c9ed016b12
OVEP Dynamic WorkloadType support (#22779)
### Description
Support to set EPdynamic options in OVEP

### Motivation and Context
relate to https://github.com/microsoft/onnxruntime/pull/22282

---------

Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
2024-11-09 23:26:29 -08:00
shiyi
63cb53257b
[WebNN] Support steps >= 1 for slice operator (#22708)
Co-authored-by: Wanming Lin <wanming.lin@intel.com>
2024-11-09 18:20:52 -08:00