Commit graph

11597 commits

Author SHA1 Message Date
Scott McKay
44fc7b443c
Update C# test projects (#21631)
### Description
<!-- Describe your changes. -->
Update various test projects to .net8 from EOL frameworks.
Replace the Xamarin based Android and iOS test projects with a MAUI
based project that uses .net 8.
Add new CoreML flags to C# bindings

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove usage of EOL frameworks.
2024-09-05 08:21:23 +10:00
Scott McKay
8632e67dc3
Update C# E2E project's test package versions (#21975)
### Description
<!-- Describe your changes. -->
Update C# test package dependencies to match #21913

This csproj isn't included in the main sln and was overlooked. We need
the newer xunit version for Assert.Fail which is used in shared unit
test source that is included here as well.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CI failure
2024-09-05 07:53:53 +10:00
Jian Chen
09d786fc14
Rename ios_packaging.requirements.txt to ios_packaging/requirements.txt (#21936)
### Description
Rename ios_packaging.requirements.txt to ios_packaging/requirements.txt



### Motivation and Context
By doing this, the package within os_packaging/requirements.txt can be
scanned by CG task
2024-09-04 13:18:05 -07:00
Jiajia Qin
a80bfed5b4
[js/webgpu] Optimize transpose (#21964)
### Description
<!-- Describe your changes. -->
Fix bugs in previous implementation and add more situations to go the
optimized path.

Below situations will go to the optimized path.
1. 2d inputs or squeezed 2d inputs
2. channels last or channels first transpose. For example, channel last
transpose: [1, 256, 512, 512] -> [1, 512, 512, 256]
For this case, the transpose becomes [256, 512x512] -> [512x512, 256]

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
For SD Turbo demo, the total transpose time becomes 39.98ms from
122.09ms. And the correspnding percents becomes 3.89% from 11.05% in
this demo.

This PR will also help #21618, the total transpose time in that demo
becomes 17.32 ms from 70.25 ms on my iGPUs.
2024-09-04 12:04:04 -07:00
Hector Li
190588bb64
Enable QNN weight sharing (#21077)
### Description
Enable QNN weight sharing across graphs in single context
Create tool to generate QNN context cache model with weight sharing enabled.
2024-09-04 11:20:33 -07:00
Yueqing Zhang
9031112c8e
[VitisAI] add registered custom op for perf test (#21336)
### Description
<!-- Describe your changes. -->
Register for custom op when testing the performance


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is needed for providers to test their implementation
2024-09-04 11:13:35 -07:00
zz002
bf8a8e7e36
[VitisAI] Bug fixes in model_clone (#21950)
### Description
<!-- Describe your changes. -->

VitisAI bug fixes in model clone

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>
2024-09-04 10:29:17 -07:00
Edward Chen
cbf3c50d75
Improve stability of Android ReactNative E2E test (#21969)
- Remove redundant `OnnxruntimeModuleExampleE2ETest CheckOutputComponentExists` test
- Attempt to close any Application Not Responding (ANR) dialog prior to running Android test
- Add `--take-screenshots failing` option to detox test commands to save screenshots on failure
2024-09-04 08:41:07 -07:00
Chen Feiyue
d4290f6e7f
Update vsinpu ep cross-compiling patch (#21963)
- Block the bf16 && ummla gemm functions because we cannot support these features yet
2024-09-03 22:54:43 -07:00
Yueqing Zhang
dd2425932d
[VitisAI] Fix model path (#21911)
### Description
<!-- Describe your changes. -->
Change the .data path so it is on the same path as the model path.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This would fix the issue if a model has .data file, the executable can't
read the data if the model is in another directory.
2024-09-03 22:42:01 -07:00
Yulong Wang
decb3852a0
refactor: extract shared util function ComputeBroadcastOutputShape (#21940)
### Description

This is used in multiple places.
2024-09-03 18:21:36 -07:00
Tianlei Wu
628c0a8f0e
Remove unused find_cudnn_supported_cuda_versions (#21620)
### Description

The function find_cudnn_supported_cuda_versions is not used anymore.
Remove it.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-03 14:38:33 -07:00
sfatimar
8dba8e3e24
Memory Optimization for Compilation in OVEP (#21872)
Calling Split API Calls Read+Model in lieu of unified Compile Model call
for export compile flow to ensure memory optimization. Freeing up model
proto and serialized string and read model ov ir later to free up memory
for the ahead pipeline
Optimization during EpCtxt flow
All the Graph related operations require all the Node Attributes to be
set while dealing with model instances internally with them, in the
existing implementation these attributes make a copy when constructing a
Graph dynamically during runtime.
Propose to use these attributes in place without creating a copy to
avoid memory allocation / copy while calling these Graph related
functions.
Changes to ensure the bug fixes related to openvino version and epctxt
file path.
Moving Compiler version to C++20 for getting r-value mem optimizations
benefit

### Motivation and Context
This change is required because memory optimization during Compilation
flow is too high.

---------

Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: ankitm3k <ankit.maheshkar@intel.com>
Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
2024-09-03 13:52:31 -07:00
Yi Zhang
4962252c8f
Enable xnnpack ep works in current windows xnn ci (#21951)
### Description
The EP wasn't added in session option in onnxruntime_test_all.



### Motivation and Context
After this PR
onnxruntime_test_all --gtest_filter=\*xnnpack\*maxpool\* can step into
8c5336449d/onnxruntime/core/providers/xnnpack/nn/max_pool.cc (L209)

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-09-03 10:02:00 -07:00
Chester Liu
5c74539ab7
Fix copying ORT dylib into wheel on macOS (#21931)
Fix #21223 on macOS

---------

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2024-09-03 11:08:25 +08:00
Yulong Wang
257792225f
revert forceinline for MakeString (#21943)
### Description

revert forceinline for MakeString.

This change reverts https://github.com/microsoft/onnxruntime/pull/21893.
The forceinline was introduced for performance considerations, however
it turns out to have some notable binary size increase, which is a
concern for some binary size sensitive platforms like Android.

I made a few tests locally and found it is not related to whether or not
have used the template struct `if_char_array_make_ptr_t` trick. So I
have to revert this back.
2024-09-02 19:01:08 -07:00
Scott McKay
e788b3d30e
Fix C# warnings. (#21913)
### Description
<!-- Describe your changes. -->
Update some testing dependencies.
Fix various warnings. Mainly around documentation (existing) and unit
test usage (mainly resulting from xunit update).

Invalid angle brackets for generics in documentation were changed to use
curly braces based on
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/xmldoc/
> To refer to generic identifiers in code reference (cref) elements, you
can use either the escape characters (for example, cref="List&lt;T&gt;")
or braces (cref="List{T}"). As a special case, the compiler parses the
braces as angle brackets to make the documentation comment less
cumbersome to the author when referring to generic identifiers.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-03 10:08:29 +10:00
Yulong Wang
bad00a3657
Add dependency dawn into deps.txt (#21910)
### Description

Add dependency dawn into deps.txt. This is a preparation for introducing
WebGPU EP.
2024-09-02 04:24:28 -07:00
Kyle
b1ae43cbcb
Add Files Signature Validation after Signed by ESRP (#21949)
### Description
<!-- Describe your changes. -->
Files signature validation after signed by ESRP.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Add validation after the ESRP process.
- Make sure the targeting pattern/suffix files are signed successfully
by ESRP.
- If the signature is not Valid, then will fail the following stages.
2024-09-02 17:16:59 +08:00
Yulong Wang
8c5336449d
Stop VSCode appending file associations to settings.json (#21944)
### Description

If you open onnxruntime source code using VSCode with C/C++ extension,
it's keeping adding file associations for C/C++ headers into this
settings.json. This is annoying when staging/committing changes.

Add a configuration to disable this behavior.

see:
-
https://stackoverflow.com/questions/65220185/how-to-stop-vs-code-to-keep-adding-standard-c-libraries-to-the-file-associatio
-
https://github.com/microsoft/vscode-cpptools/issues/722#issuecomment-480329005
2024-08-31 19:04:12 -07:00
mingyueliuh
047f32c79d
[VitisAI] Remove shape infer from bridge ort (#21331)
### Description
Vitis AI EP's custom op are completely self contained within Vitis AI EP
implementation (rather than needing to add static functions in
provider_bridge).

---------

Co-authored-by: liumingyue <mingyue@xilinx.com>
2024-08-31 08:57:23 -07:00
aciddelgado
509cb54d6f
softcap gqa (#21683)
### Description
Implement softcap for gqa.

### Motivation and Context
Fixes certain models like Gemma-2 which need softcap to work so they
don't output nan's.
2024-08-30 19:11:04 -07:00
Jing Fang
5dee95fa10
[CUDA] Support CUDA EP blocked quantization in Q/DQ ops. (#21846)
### Description
1. Added CUDA EP support for blocked quantization in QuantizeLinear and
DequantizeLinear ops.
2. Currently CUDA EP blocked quantization only supports int4/uint4
quantized types and float32/float16 unquantized types.
3. Added CUDA EP support in QDQ selector/action transformer. CUDA EP is
only added to DQ + MatMul -> MatMulNBits rule. Other rules' EP support
are not changed.



### Motivation and Context
ONNX opset 21 introduced blocked quantization for Q/DQ opts. ORT
originally only supports CPU EP blocked quantization.
2024-08-30 18:28:00 -07:00
Yi Zhang
60b07623a2
Add a reminder in set-trigger-rules script (#21929)
### Description
After editing the set-trigger-rules.py, we must run the file.



### Motivation and Context
Obviously the script wasn't run because some files's name are incorrect.
2024-08-30 12:18:10 -07:00
Ranjit Ranjan
02e3a430af
[AIX] Python binding enablement and gcc support (#21934)
### Description
Enabling python binding and gcc support for AIX.



### Motivation and Context
Code changes in this PR contains:
1. python binding enablement
2. gcc building support


Below are list of files and the description.

1. cmake/CMakeLists.txt
[gcc building support] -no-unused-function compiler flag addition for
IBMClang
2. cmake/external/eigen.cmake
[gcc building support] AIX check for applying the AIX patch
3. cmake/onnxruntime_python.cmake
[python binding ] putting NOT AIX check for -Xlinker
4. cmake/onnxruntime_unittests.cmake
[gcc building support] Fix for gtest behavior. Check the comment .
[python binding ] using -Wl,-brtl for linking
onnxruntime_providers_shared in test_execution_provider
5. cmake/patches/eigen/eigen-aix.patch
[gcc building support] In AIX gcc, we are hitting
__builtin_cpu_supports("mma") which is not supported yet. So patching
code for this method . Patched code will check for P10 Processor at
run-time and based on that routine will be set.
6. onnxruntime/python/onnxruntime_validation.py
[python binding ] Adding AIX check in check_distro_info()
7. onnxruntime/test/providers/cpu/generator/random_test.cc
[gcc building support] updating previous check for AIX , along with
clang. So in case of gcc, else block will hit.
8. onnxruntime/test/python/onnxruntime_test_python.py
[python binding ] powerpc check on platform.processor()
9. setup.py
[python binding ] Adding AIX check for list of libs.
2024-08-30 12:17:26 -07:00
Changming Sun
1f879c3282
Disable absl symbolize in Windows Release build (#21923)
### Description
This change disables Abseil's symbolize functionality in Windows
non-debug builds.
### Motivation and Context
To solve #21826. Avoid having a dependency on dbghelp.dll.
2024-08-30 12:03:17 -07:00
mindest
bfa4da4f65
Add Linux ROCm CI Pipeline (#21798)
### Description

* Add new ROCm CI pipeline (`Linux ROCm CI Pipeline`) focusing on
inference.
* Resolve test errors; disable flaky tests.

based on test PR #21614.
2024-08-30 14:50:32 +08:00
dependabot[bot]
924259617d
Bump Sixlabors.ImageSharp from 2.1.8 to 2.1.9 in /csharp/sample/Microsoft.ML.OnnxRuntime.FasterRcnnSample (#21920)
Bumps [Sixlabors.ImageSharp](https://github.com/SixLabors/ImageSharp)
from 2.1.8 to 2.1.9.
2024-08-29 21:58:02 -07:00
dependabot[bot]
4ac1558498
Bump torch from 1.13.1+cpu to 2.2.0 in /tools/ci_build/github/linux/docker/scripts/training/ortmodule/stage1/torch_eager_cpu (#21919)
Bumps [torch](https://github.com/pytorch/pytorch) from 1.13.1+cpu to
2.2.0.
2024-08-29 21:57:24 -07:00
Wanming Lin
7550fec4aa
Revert "[WebNN EP] Remove NHWC preferred layout" (#21905)
Reverts microsoft/onnxruntime#21570
2024-08-29 18:01:56 -07:00
aciddelgado
0223e8647b
Fix num splits bug (#21899)
### Description
Found a bug with num splits where the heuristic isn't being performed
properly due to incorrect passing of sequence length to heuristic
function.



### Motivation and Context
We were experiencing significant performance issues with long sequence
length with flash attention due to this misconfiguration.
2024-08-29 15:00:53 -07:00
Jian Chen
fd88474077
Fix a CG issue that require upgrade transformer from 4.36 to 4.38 (#21900)
### Description
Fix a CG issue that require upgrade transformer from 4.36 to 4.38



### Motivation and Context
See CG
[link](https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11474680?typeId=26218094&pipelinesTrackingFilter=0)
Also the other [CG
item](https://aiinfra.visualstudio.com/Lotus/_componentGovernance/218239/alert/11474678?typeId=26218094&pipelinesTrackingFilter=0)
to request update 4.72 to 4.38
2024-08-29 14:53:15 -07:00
Sheil Kumar
867e0401a7
Catch statement causing build failures for flavors with EHsc disabled (#21902)
### Description
Catch in etw_sink.cc is causing build failures for flavors with EHsc
disabled.
Remove the catch and set the Failure state as a response the FAILED
check.


### Motivation and Context
Catch in etw_sink.cc is causing build failures for flavors with EHsc
disabled.

---------

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2024-08-29 12:15:39 -07:00
Yulong Wang
32af2ba68f
enhance string util functions (#21893)
### Description
- make `MakeString` force inline
- refactor ORT_FORCEINLINE macro - move to one place to avoid macro
redefinition error
- ~~add a `StringJoin` utility~~


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-29 10:37:50 -07:00
Xu Xing
01673389b8
[js/webgpu] Enable conv+clip fuse on mobilenetv2-12-f16 (#21234)
There are failures for some inputs.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-08-29 08:03:02 -07:00
Yi Zhang
be76e1e1b8
Add dependent stages in nuget packaging pipelines (#21886)
### Description
Since the stage need to download drop-extra, it should add the
dependencies



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-29 11:34:10 +08:00
Guenther Schmuelling
4fece0430f
remove duplicate function definition (#21903) 2024-08-28 16:18:56 -07:00
duanshengliu
7df8776322
Add overflow protection for quantization bias to reduce quantization precision loss (#21645)
### Description
<!-- Describe your changes. -->

When the scale of the bias is too small, the quantized bias may exceed
the range of `int32`, leading to significant loss of precision.
Therefore, before converting quantized bias to `int32`, it needs to be
clipped within the range of `int32` to reduce the loss of quantization
precision.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix the issue https://github.com/microsoft/onnxruntime/issues/21000
2024-08-28 14:29:17 -07:00
xhcao
3bfb5e4f62
[js/webgpu] support float16 for Clip (#21584)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-28 13:19:20 -07:00
Wanming Lin
59114227fd
[WebNN EP] Remove NHWC preferred layout (#21570)
Currently WebNN CPU backend has supported NCHW layout in Chromium, we
can now drop NHWC preferred layout for CPU backend in WebNN EP to
simplify the code.
2024-08-28 13:17:34 -07:00
Ye Wang
bf8855ba3c
Support Smooth Softmax in fmha (#21885)
### Description
<!-- Describe your changes. -->
refer to https://github.com/microsoft/onnxruntime/pull/21867


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Your Name <you@example.com>
2024-08-28 09:29:33 -07:00
AlbertGuan9527
ef073fd8f4
Add session and run option workload_type for applications to set efficient mode. (#21781)
### Description
This PR added session and run option workload_type, this option is the
knob for applications to enable/disable the processor performance
efficient mode.



### Motivation and Context
The efficient mode is co-engineered with processor vendors to allow
applications voluntarily being serviced at a more energy efficient
performance level. This functionality can be used by long running,
latency insensitive application to save the energy consumption.
2024-08-28 08:17:01 -07:00
Jian Chen
e95277484e
Adding $(Build.SourcesDirectory)s to the ignoreDirectories (#21878) 2024-08-27 19:56:48 -07:00
George Wu
23f3912334
support both qnn x64 and arm64ec stages in py packaging pipeline (#21880)
both arm64ec and x64 packages are needed.  
x64 is needed for offline context binary generation
and arm64ec is needed for interop with python packages that don't have
prebuilt arm64 packages and only have x64.
2024-08-27 15:07:30 -07:00
Yulong Wang
d2a1b7a353
Introduce custom external data loader (#21634)
### Description

This PR introduces support for custom external data loader. An EP can
register a custom external data loader to override the default behavior,
making it possible to upload initializers directly to GPU.



### Motivation and Context

- In ONNX Runtime Web, WebAssembly uses 32-bit as pointer type
(`sizeof(size_t)==4`), which means there is a 4GB hard limit on the
maximum memory. As the ONNX models get larger, this becomes a blocker
for supporting medium-sized language models.

- ORT runs out of memory because the current code always loads data into
CPU memory, including the .onnx file (protobuf) and external data
file(s). However, if using GPU EP, the big data does not need to be kept
on CPU because the only thing that ORT does is to load the data into
memory, upload to GPU and then release them.

- Some platforms has offered developers way to upload data directly to
GPU. For example, webgpu allows uploading from any ArrayBuffer (it can
be a side buffer, not count into the 4GB) to GPU directly. This helps to
keep the CPU memory usage significantly.

### Design

Class `ExternalDataLoader` and `ExternalDataLoaderManager` are
introduced. They are similar to `DataTransfer` and
`DataTransferManager`. `InferenceSession` owns the manager object, and
`SessionState` keeps a reference to it.

Added a new method `GetExternalDataLoader` in `IExecutionProvider`. An
EP can override the method to register an instance of custom external
data loader.

The key function in a `ExternalDataLoader` class is method `LoadTensor`:

```c++
  // the tensor is pre-created using the TensorProto info of the initializer and the MemoryInfo (from allocation plan).
  virtual common::Status LoadTensor(const Env& env,
                                    const std::filesystem::path& data_file_path,
                                    FileOffsetType data_offset,
                                    SafeInt<size_t> data_length,
                                    Tensor& tensor) const;
```

This function can be registered by EP, going through a few layers and
eventually get into `DeserializeTensorProto()` in the finalizing stage
of session initialization. In this step, initializer tensors are
created. Behavior is changed to first look up for a registered external
data loader that can handle the current memory info. If any instance is
available, use the loader; otherwise respect the old code path.
2024-08-27 12:18:52 -07:00
Caroline Zhu
b7f09d4c27
Increase timeout for orttraining-linux-gpu pipeline (#21844)
### Description
Increase timeout to 160 minutes

### Motivation and Context
- Recent runs of orttraining-linux-gpu pipeline have been timing out
2024-08-27 11:47:12 -07:00
Jian Chen
7f851f4e61
Removing docker_base_image parameter and variables (#21864)
### Description
Removing `docker_base_image` parameter and variables. From the Cuda
Packaging pipeline.



### Motivation and Context
Since the docker image is hard coded in the 

`onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda12/Dockerfile`
and 

`onnxruntime/tools/ci_build/github/linux/docker/inference/x86_64/default/cuda11/Dockerfile`
This parameter and variable is no longer needed.
2024-08-27 10:36:17 -07:00
Ye Wang
1d059b8702
Phi3 MoE cuda kernel (#21819)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Your Name <you@example.com>
2024-08-27 09:21:30 -07:00
Jiajia Qin
252222034f
[js/webgpu] Support Reshape/Shape 21+ on jsep (#21871)
### Description
<!-- Describe your changes. -->
#21618

With this PR, the cross device copying (`MemcpyToHost`) can totally be
removed for model `wav2vec2`. And the overall time becomes 48ms from
604ms.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-08-27 09:02:39 -07:00
mcollinswisc
5d54dc1462
Drop QDQ around more nodes (#21376)
### Description

Extends the Drop QDQ optimization to remove DequantizeLinear and
QuantizeLinear nodes from around operators:

- Flatten
- Expand
- Tile
- Slice
- GatherElements
- ReduceMin
- ReduceMax

### Motivation and Context

To reduce floating-point conversions in quantize inference. Mainly
motivated by the Flatten case, since that will show up in graphs
exported from PyTorch to ONNX. But to make the change complete,
extending to a larger set of ops for which this optimization is valid.

https://github.com/microsoft/onnxruntime/issues/21375

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-08-27 16:54:37 +10:00