Commit graph

11626 commits

Author SHA1 Message Date
jingyanwangms
4a5d66c15f
Default value 10.2->10.3 in linux-gpu-tensorrt-daily-perf-pipeline.yml (#21823)
### Description
Fix default value 10.2->10.3 in
linux-gpu-tensorrt-daily-perf-pipeline.yml

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-10 15:26:16 -07:00
George Wu
31ae11788a
[QNN EP] Update QNN SDK to 2.26 (#22037)
* update default QNN SDK version to 2.26
* enable layernorm implicit bias workaround for QNN 2.26
* update artifact names for py win arm64 and arm64ec to re-enable
ort-qnn-nightly arm64 python packages
2024-09-10 14:03:06 -07:00
Sophie Schoenmeyer
e7107f41de
Decrease API docs artifact retention days (#22003)
### Description
When API docs workflows fail, we typically don't catch the issue until
the most recently generated artifact expires. The current artifact
retention is 60 days, so by decreasing to 30 days, we can ensure that
we're resolving the workflow failures more quickly.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-10 10:44:08 -07:00
Erick Muñoz
7489bfee53
Enable AVX NE CONVERT for FP16 to FP32 cast (#21183)
### Description
Implementation of a new cast assembly kernel that uses AVX_NE_CONVERT
instructions to accelerate casting from FP16 to FP32. Added CPUID checks
to determine support of the ISA.

### Motivation and Context
Currently FP16 models executed on systems that lack complete FP16
operator support use single precision on every node to run the model,
this means the original FP16 weights have to be casted to FP32 in order
to run the model properly, this change aims to accelerate the casting by
using upconvert instructions and therefore improve performance.
2024-09-09 21:19:31 -07:00
Jake Mathern
d4d419f789
fix more dml warnings (#21980)
### Description
Fixes more warnings in DML execution provider that lead to security
issues in binskim


### Motivation and Context
OS components that include ORT must treat certain warnings as errors,
and cannot disable critical compiler warnings

https://github.com/microsoft/binskim/blob/main/src/BinSkim.Rules/PERules/BA2007.EnableCriticalCompilerWarnings.cs
2024-09-09 17:50:17 -07:00
Jian Chen
93c4c9cb6a
Using wostringstream only on Windows (#21938)
### Description
Using wostringstream only on Windows



### Motivation and Context
From line
[62](https://github.com/microsoft/onnxruntime/pull/21938/files#diff-47776d020ac08134de4059eab473550237f4999c598ab56afad3676d2f193edcR62),
currently, `stream_` can be either `wostringstream` or `ostringstream`
depending on the OS, however, for Unix like system, `stream_` should be
`ostringstream`, instead of.
2024-09-09 13:20:17 -07:00
Adrian Lizarraga
c7ae9b977a
[Quantization] Apply workaround for crash when using histogram-based calibrators (#21972)
### Description
- Applies a workaround that prevents the histogram-based calibrators
(percentile, entropy, distribution) from crashing. The workaround
involves copying inference outputs that come directly from model inputs.
A description of the bug is here:
https://github.com/microsoft/onnxruntime/issues/21922. **This PR does
not fix the root bug, but instead provides a workaround to _unblock_
users using histogram-based calibration.**
- Adds a unit test that runs all histogram-based calibrators to help
catch future regressions. We didn't have unit tests that ran these
calibration methods.

### Motivation and Context
Trying to quantize a model with the percentile, entropy, or distribution
calibration methods raises an exception:
```shell
  File "/.../site-packages/onnxruntime/quantization/quantize.py", line 691, in quantize
    quantize_static(
  File "/.../site-packages/onnxruntime/quantization/quantize.py", line 525, in quantize_static
    calibrator.collect_data(calibration_data_reader)
  File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 571, in collect_data
    self.collector.collect(clean_merged_dict)
  File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 746, in collect
    return self.collect_value(name_to_arr)
  File "/.../site-packages/onnxruntime/quantization/calibrate.py", line 836, in collect_value
    hist, hist_edges = np.histogram(data_arr, self.num_bins, range=(-threshold, threshold))
  File "<__array_function__ internals>", line 180, in histogram
  File ".../site-packages/numpy/lib/histograms.py", line 793, in histogram
    bin_edges, uniform_bins = _get_bin_edges(a, bins, range, weights)
  File "/.../site-packages/numpy/lib/histograms.py", line 426, in _get_bin_edges
    first_edge, last_edge = _get_outer_edges(a, range)
  File "/.../site-packages/numpy/lib/histograms.py", line 315, in _get_outer_edges
    raise ValueError(
ValueError: supplied range of [nan, nan] is not finite
```

The calibrators create an augmented model with all tensors (including
model inputs) set as model outputs. The data for outputs that are also
model inputs is corrupted as described in
https://github.com/microsoft/onnxruntime/issues/21922. The corrupted
data sometimes contains `NaN` values that cause numpy's histogram
utilities to raise an exception.
2024-09-09 12:05:41 -07:00
Peishen Yan
2cdc05f189
Move Gelu and LayerNorm fusion to L1 optimization (#21332)
According to https://github.com/microsoft/onnxruntime/issues/20915, we
move the Gelu and LayerNorm fusion to L1 with a condition on the ONNX
opset the model imports (LayerNorm requires opset 16+ and Gelu requires
opset 20+.) If the opset version doesn't meet the requirements, the
fusion is delayed to L2 optimization since the internal contrib op
doesn't have a requirement for any specific ONNX opset.

---------

Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>
Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-09-09 13:27:52 +10:00
Yi Zhang
de7a02beef
Add parameter for flexdonwload (#22009)
### Description
<!-- Describe your changes. -->



### Motivation and Context
Thus, we can run Nuget_Packaging_GPU stage directly
2024-09-08 14:17:55 +08:00
Wanming Lin
ad9afbb042
[WebNN EP] Remove workaround for CPU op supported list (#21962)
We assume all WebNN ops are supported across all backends.
2024-09-06 22:14:52 -07:00
Edward Chen
f3725b9f06
Use output variable from InstallAppleProvisioningProfile task to set provisioning profile UUID. (#22018)
This is more flexible than hardcoding the provisioning profile name or UUID. The name shouldn't usually change but it is not guaranteed to remain constant.
2024-09-06 18:00:34 -07:00
zz002
28b550f091
[VitisAI] Add processing for sessionOptions.AppendExecutionProvider("VitisAI", options) (#21839)
### Description
<!-- Describe your changes. -->

[VitisAI] Add processing for
sessionOptions.AppendExecutionProvider("VitisAI", options)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>
2024-09-06 14:06:33 -07:00
Arne H Juul
493159b481
near-zero negative values must convert to 0 not NAN (#18473)
for the Float8 types with unsigned zero, we must clear the sign bit when
rounding to zero;
otherwise we end up with 0x80 which is the encoding for NAN.

### Description
Handle all zero and near-zero values the same way, rounding to positive
zero.
Note that I removed one "if" level but did not re-indent the code in
this PR, to make it
easier to see what the actual changes are.

### Motivation and Context
For the two new 8-bit floating point types Float8E4M3FNUZ and
Float8E5M2FNUZ,
converting from a near-zero negative value would end up with the sign
bit set only;
this bit pattern is not negative zero but instead means NAN.
2024-09-06 11:41:48 -07:00
Arne H Juul
605a84ffc9
remove unused and confusing float16 constants (#21999)
### Description
Remove unused and confusing special constants in MLFloat16 and BFloat16
types.

### Motivation and Context
While looking at adding a specialization for std::numeric_limits for the
16-bit floating point types, I found that there are various special
constants in those types that are confusing or just wrong.

MLFLoat16::Epsilon is not an epsilon at all, but approximates "e". Looks
like a copy-paste bug.
BFloat16::Epsilon does not correspond to `numeric_limits::epsilon()`,
nor even to the C# Float.Epsilon.
Instead, it corresponds to `numeric_limits::min()` which was really
confusing to me.

The "MinValue" constants does correspond to the C# `Float.MinValue`
constant, but this is C++ so it would be better renamed to "LowestValue"
since it corresponds to `numeric_limits::lowest()`. As it was unused
except for some unit tests I have replaced it with the equivalent
`MaxValue.Negate()` here.

There's also an unused `kSignaling_NaNBits` constant which is just wrong
(has the same value as `kPositiveInfinityBits` instead of a NaN).
2024-09-05 22:00:48 -07:00
Edward Chen
970ebc2ccf
Fix typo in coreml_supported_mlprogram_ops.md (#22004)
### Description
<!-- Describe your changes. -->

Fix typo: ai:onnx -> ai.onnx

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Typo.
2024-09-06 12:50:56 +10:00
Edward Chen
0c398b3e52
Update Android NDK version to 27.0.12077973. (#21989)
Upgrade to newer version. r26 will be unsupported soon.
2024-09-05 17:57:24 -07:00
Adrian Lizarraga
b011f6fbf6
[TransposeOptimizer] Support Unsqueeze/Transpose of input consumed by per-axis DQ (#21821)
### Description
Follow-up to: https://github.com/microsoft/onnxruntime/pull/21793

- Support looking past a per-axis DQ to do in-place Unsqueeze/Transpose
of initializers
- Support looking past a per-axis DQ to cancel a Transpose or Squeeze.

### Test models
For all test models, the transpose optimizer pushes a Transpose through
a Mul's input[0]. The Mul's input[1] is optionally unsqueezed and then
transposed.

### I. Test in-place unsqueeze and transpose of per-axis quantized
weight
Original model has input[1] with shape (3,)
<details><summary>click to expand model image</summary>
<img
src="https://github.com/user-attachments/assets/37b6f60c-77d2-4bd3-8ca2-58dc7c88a304"
/>
</details>

Optimized model has input[1] with shape (1, 3, 1, 1). The initializer
was unsqueezed and transposed in-place.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/adb72757-a164-400c-bfef-2a05f0e35825"
/>
</details>

### II. Test canceling existing Squeeze before per-axis DQ
Original model has input[1] that is squeezed.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/f27e6742-b563-42a9-ad06-bb3178b0ceb8"
/>
</details>

Optimized model unsqueezed and transposed input[1]. The original squeeze
was removed due to the unsqueeze, leaving only the Transpose.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/e56261d4-eba6-4a9f-847b-dcd33548dd07"
/>
</details>

### III. Test canceling existing Transpose before per-axis DQ
Original model has input[1] that is transposed.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/f157e04a-572a-479d-8e3b-cf57954df5c0"
/>
</details>

Optimized model transposed input[1], thus canceling the existing
transpose.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/63d742ce-3762-4ab2-bdb0-1b507886da9d"
/>
</details>

### IV. Test QDQ fix-up of Transpose/Unsqueeze for per-axis quantization
Original model has input[1] that can be broadcasted.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/96c0092c-22ec-486d-882e-e2cb59ffe324"
/>
</details>

The main transpose optimization loop inserts float32 Unsqueeze and
Transpose after the DQ. The qdq fix-up pass inserts new per-axis Q/DQ
ops after the inserted nodes.
<details><summary>click expand model image</summary>
<img
src="https://github.com/user-attachments/assets/b6f89c11-974d-4b35-922f-11effdf06883"
/>
</details>


### Motivation and Context
Enables the TransposeOptimizer to support more models with per-axis QDQ
nodes. Per-axis quantization can improve model accuracy and is used by
EPs like QNN.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2024-09-05 17:26:17 -07:00
Wanming Lin
23f6604c39
[WebNN EP] Use identity for one input of Max/Min (#21974)
Now WebNN supports `identity` op, use it for `Max` and `Min` ops with
only one input.
2024-09-05 16:47:40 -07:00
Scott McKay
20c802afd4
Add better native nuget package readme (#21889)
### Description
<!-- Describe your changes. -->
Request from Nuget team to add a better readme to the nuget package so
it is displayed nicely on nuget.org.

Previously we were using the ORT repo readme.md but that a) doesn't
display correctly due to limited markdown support on nuget.org, and b)
has a lot of irrelevant info like build pipeline status.

- Created a generic readme.md that includes the ORT description from the
main readme, includes the ORT logo via an acceptable link, and lists the
native nuget packages so the file can be included in any of them as-is.
- Updated the nuget packaging script to add the `readme` tag and use
this file.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Request from MS Nuget team to MS package owners to add.
2024-09-06 08:28:14 +10:00
Tianlei Wu
c7d0ded079
[CUDA] Update Dockerfile.cuda with cuda 12.5.1 and cudnn 9 (#21987)
### Description
Previous image is based on cuda 12.1 and cudnn 8, which is out of date
since we have moved to cudnn 9 since 1.19 release.
(1) Upgrade base image to cuda 12.5.1 and cudnn 9.
(2) Update CMAKE_CUDA_ARCHITECTURES from 52;60;61;70;75;86 to
61;70;75;80;86;90 to support A100 and H100
(3) Make the build faster: exclude unit test; use ninja etc.
(4) upgrade some packages (like packaging etc) before building to avoid
build error.

### Motivation and Context
https://github.com/microsoft/onnxruntime/issues/21792
https://github.com/microsoft/onnxruntime/issues/21532
2024-09-05 15:25:40 -07:00
0xdr3dd
2dae8aaced
[Fuzzer] Add fuzzer support for linux (#21996)
### Description
Added some change in fuzzer project code to support linux also.

How to test on linux:
1. Make sure you have installed clang/llvm.
2. run below command to build asan instrumented project:
```
CFLAGS="-g -fsanitize=address -shared-libasan -fprofile-instr-generate -fcoverage-mapping" CXXFLAGS="-g -shared-libasan -fsanitize=address -fprofile-instr-generate -fcoverage-mapping" CC=clang CXX=clang++ ./build.sh --update --build --config Debug --compile_no_warning_as_error --build_shared_lib --skip_submodule_sync --skip_tests --use_full_protobuf  --parallel --fuzz_testing --build_dir build/
```

3. run fuzzer for some time, it will generate *.profraw file:
```
LLVM_PROFILE_FILE="%p.profraw" ./build/Debug/onnxruntime_security_fuzz /t /v onnxruntime/test/testdata/bart_tiny.onnx 1 m
```
4. Get the cov by running below cmd:
```
llvm-profdata merge -sparse *.profraw -o default.profdata
llvm-cov report ./build/Debug/onnxruntime_security_fuzz  -instr-profile=default.profdata
```

<img width="1566" alt="Screenshot 2024-09-05 at 4 25 08 PM"
src="https://github.com/user-attachments/assets/2aa0bb83-6634-4d33-b026-3535e97df431">



### Motivation and Context
1. Currently fuzzer only supports windows and MSVC, we can't generate
the code coverage using MSVC. With clang/llvm we can try and use clang
instrumentation and llvm tools like llvm-cov.
2. In future we can add coverage guided fuzzer (libfuzzer) in same
project. (Working on it)
2024-09-05 11:52:15 -07:00
Yueqing Zhang
f4d62eeb2e
[VitisAI] remove unused header (#21890)
### Description
<!-- Describe your changes. -->
Removed unused headers


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This would cause compile error on machine that didn't install nlohmann.

Co-authored-by: Yueqing Zhang <yueqingz@amd.com>
2024-09-05 08:37:15 -07:00
Javier Martinez
840f896c5f
Uncomment line in OVEP that was commented out in error (#21973)
### Description
One line change to re-enable a line incorrectly commented out in an
earlier commit



### Motivation and Context
Fix issue introduced with [PR
21872](https://github.com/microsoft/onnxruntime/pull/21872#discussion_r1736744441)
2024-09-05 08:34:55 -07:00
Scott McKay
8b661f7157
Fix DML packaging CIs (#21997)
### Description
<!-- Describe your changes. -->
The DML CIs build native and C# as well as sign DLLs in the same CI.
Some parts of that require .net 8 and some .net 6.
Update to use .net 8 in general, and revert to .net 6 for the signing.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix packaging pipeline.
2024-09-05 22:30:40 +08:00
Scott McKay
5e24c5d5f8
Fix C# doc generation workflow (#21988)
### Description
<!-- Describe your changes. -->
- Update docfx usage. 
  - The docfx cli is now a dotnet tool.
  - Split some commands up so it's easier to debug failures
- Update to .net8.
- Exclude mobile targets from build as the workloads aren't available
and it doesn't change the generated documentation.
- The mobile specific APIs (e.g. enable CoreML EP) still exist in this
case as we check in the implementation if it's valid to use them or not,
so the workloads are not required to generate complete API
documentation.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix doc gen.
2024-09-05 13:54:17 +10:00
Yulong Wang
2e83541eba
fix one build warning in MSVC (#21983)
### Description

Fix one MSVC warning member not initialized


```
Warning	C26495	Variable 'onnxruntime::ITuningContext::allocators_' is uninitialized. Always initialize a member variable (type.6).  C:\code\onnxruntime\onnxruntime\core\framework\tuning_context.h	22		
```
2024-09-04 17:51:14 -07:00
Jiajia Qin
3580e01348
[js/webgpu] Optimize grouped conv (#21892)
### Description
<!-- Describe your changes. -->
#21618

This PR optimizes grouped conv by 1) more sequential memory access in
gpu 2) reusing input's data to reduce global memory access times.

See `Conv|GroupedConv` op in
[Wav2Vec2](https://huggingface.co/facebook/wav2vec2-base-960h) becomes
92 ms from 1058 ms on iGPUs with 32 EU.

For the whole model on my iGPUs with 32 EU,
wav2vec2 model becomes 982ms from 1942 ms.
squeezebert-uncased model becomes 71.86ms from 431.77ms.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-04 17:16:35 -07:00
mindest
30f07758a2
Add packaging version constraint. (#21814)
### Description
Newer `setuptools` requires newer version of `packaging`, due to
function update.

### Motivation and Context
Fixes #21792
2024-09-04 16:57:04 -07:00
Prathik Rao
ed232dc1ef
Sets enable_windows_arm64ec_qnn to false in training CI (#21981)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-04 16:01:14 -07:00
Scott McKay
44fc7b443c
Update C# test projects (#21631)
### Description
<!-- Describe your changes. -->
Update various test projects to .net8 from EOL frameworks.
Replace the Xamarin based Android and iOS test projects with a MAUI
based project that uses .net 8.
Add new CoreML flags to C# bindings

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Remove usage of EOL frameworks.
2024-09-05 08:21:23 +10:00
Scott McKay
8632e67dc3
Update C# E2E project's test package versions (#21975)
### Description
<!-- Describe your changes. -->
Update C# test package dependencies to match #21913

This csproj isn't included in the main sln and was overlooked. We need
the newer xunit version for Assert.Fail which is used in shared unit
test source that is included here as well.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix CI failure
2024-09-05 07:53:53 +10:00
Jian Chen
09d786fc14
Rename ios_packaging.requirements.txt to ios_packaging/requirements.txt (#21936)
### Description
Rename ios_packaging.requirements.txt to ios_packaging/requirements.txt



### Motivation and Context
By doing this, the package within os_packaging/requirements.txt can be
scanned by CG task
2024-09-04 13:18:05 -07:00
Jiajia Qin
a80bfed5b4
[js/webgpu] Optimize transpose (#21964)
### Description
<!-- Describe your changes. -->
Fix bugs in previous implementation and add more situations to go the
optimized path.

Below situations will go to the optimized path.
1. 2d inputs or squeezed 2d inputs
2. channels last or channels first transpose. For example, channel last
transpose: [1, 256, 512, 512] -> [1, 512, 512, 256]
For this case, the transpose becomes [256, 512x512] -> [512x512, 256]

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
For SD Turbo demo, the total transpose time becomes 39.98ms from
122.09ms. And the correspnding percents becomes 3.89% from 11.05% in
this demo.

This PR will also help #21618, the total transpose time in that demo
becomes 17.32 ms from 70.25 ms on my iGPUs.
2024-09-04 12:04:04 -07:00
Hector Li
190588bb64
Enable QNN weight sharing (#21077)
### Description
Enable QNN weight sharing across graphs in single context
Create tool to generate QNN context cache model with weight sharing enabled.
2024-09-04 11:20:33 -07:00
Yueqing Zhang
9031112c8e
[VitisAI] add registered custom op for perf test (#21336)
### Description
<!-- Describe your changes. -->
Register for custom op when testing the performance


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is needed for providers to test their implementation
2024-09-04 11:13:35 -07:00
zz002
bf8a8e7e36
[VitisAI] Bug fixes in model_clone (#21950)
### Description
<!-- Describe your changes. -->

VitisAI bug fixes in model clone

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Zhenze Wang <zhenzew@xilinx.com>
2024-09-04 10:29:17 -07:00
Edward Chen
cbf3c50d75
Improve stability of Android ReactNative E2E test (#21969)
- Remove redundant `OnnxruntimeModuleExampleE2ETest CheckOutputComponentExists` test
- Attempt to close any Application Not Responding (ANR) dialog prior to running Android test
- Add `--take-screenshots failing` option to detox test commands to save screenshots on failure
2024-09-04 08:41:07 -07:00
Chen Feiyue
d4290f6e7f
Update vsinpu ep cross-compiling patch (#21963)
- Block the bf16 && ummla gemm functions because we cannot support these features yet
2024-09-03 22:54:43 -07:00
Yueqing Zhang
dd2425932d
[VitisAI] Fix model path (#21911)
### Description
<!-- Describe your changes. -->
Change the .data path so it is on the same path as the model path.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This would fix the issue if a model has .data file, the executable can't
read the data if the model is in another directory.
2024-09-03 22:42:01 -07:00
Yulong Wang
decb3852a0
refactor: extract shared util function ComputeBroadcastOutputShape (#21940)
### Description

This is used in multiple places.
2024-09-03 18:21:36 -07:00
Tianlei Wu
628c0a8f0e
Remove unused find_cudnn_supported_cuda_versions (#21620)
### Description

The function find_cudnn_supported_cuda_versions is not used anymore.
Remove it.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-03 14:38:33 -07:00
sfatimar
8dba8e3e24
Memory Optimization for Compilation in OVEP (#21872)
Calling Split API Calls Read+Model in lieu of unified Compile Model call
for export compile flow to ensure memory optimization. Freeing up model
proto and serialized string and read model ov ir later to free up memory
for the ahead pipeline
Optimization during EpCtxt flow
All the Graph related operations require all the Node Attributes to be
set while dealing with model instances internally with them, in the
existing implementation these attributes make a copy when constructing a
Graph dynamically during runtime.
Propose to use these attributes in place without creating a copy to
avoid memory allocation / copy while calling these Graph related
functions.
Changes to ensure the bug fixes related to openvino version and epctxt
file path.
Moving Compiler version to C++20 for getting r-value mem optimizations
benefit

### Motivation and Context
This change is required because memory optimization during Compilation
flow is too high.

---------

Co-authored-by: saurabhkale17 <saurabh1.kale@intel.com>
Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>
Co-authored-by: Vishnudas Thaniel S <vishnudas.thaniel.s@intel.com>
Co-authored-by: Javier E. Martinez <javier.e.martinez@intel.com>
Co-authored-by: jatinwadhwa921 <110383850+jatinwadhwa921@users.noreply.github.com>
Co-authored-by: ankitm3k <ankit.maheshkar@intel.com>
Co-authored-by: jatinwadhwa921 <jatin.wadhwa@intel.com>
2024-09-03 13:52:31 -07:00
Yi Zhang
4962252c8f
Enable xnnpack ep works in current windows xnn ci (#21951)
### Description
The EP wasn't added in session option in onnxruntime_test_all.



### Motivation and Context
After this PR
onnxruntime_test_all --gtest_filter=\*xnnpack\*maxpool\* can step into
8c5336449d/onnxruntime/core/providers/xnnpack/nn/max_pool.cc (L209)

---------

Co-authored-by: Yi Zhang <your@email.com>
2024-09-03 10:02:00 -07:00
Chester Liu
5c74539ab7
Fix copying ORT dylib into wheel on macOS (#21931)
Fix #21223 on macOS

---------

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2024-09-03 11:08:25 +08:00
Yulong Wang
257792225f
revert forceinline for MakeString (#21943)
### Description

revert forceinline for MakeString.

This change reverts https://github.com/microsoft/onnxruntime/pull/21893.
The forceinline was introduced for performance considerations, however
it turns out to have some notable binary size increase, which is a
concern for some binary size sensitive platforms like Android.

I made a few tests locally and found it is not related to whether or not
have used the template struct `if_char_array_make_ptr_t` trick. So I
have to revert this back.
2024-09-02 19:01:08 -07:00
Scott McKay
e788b3d30e
Fix C# warnings. (#21913)
### Description
<!-- Describe your changes. -->
Update some testing dependencies.
Fix various warnings. Mainly around documentation (existing) and unit
test usage (mainly resulting from xunit update).

Invalid angle brackets for generics in documentation were changed to use
curly braces based on
https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/xmldoc/
> To refer to generic identifiers in code reference (cref) elements, you
can use either the escape characters (for example, cref="List&lt;T&gt;")
or braces (cref="List{T}"). As a special case, the compiler parses the
braces as angle brackets to make the documentation comment less
cumbersome to the author when referring to generic identifiers.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-03 10:08:29 +10:00
Yulong Wang
bad00a3657
Add dependency dawn into deps.txt (#21910)
### Description

Add dependency dawn into deps.txt. This is a preparation for introducing
WebGPU EP.
2024-09-02 04:24:28 -07:00
Kyle
b1ae43cbcb
Add Files Signature Validation after Signed by ESRP (#21949)
### Description
<!-- Describe your changes. -->
Files signature validation after signed by ESRP.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
- Add validation after the ESRP process.
- Make sure the targeting pattern/suffix files are signed successfully
by ESRP.
- If the signature is not Valid, then will fail the following stages.
2024-09-02 17:16:59 +08:00
Yulong Wang
8c5336449d
Stop VSCode appending file associations to settings.json (#21944)
### Description

If you open onnxruntime source code using VSCode with C/C++ extension,
it's keeping adding file associations for C/C++ headers into this
settings.json. This is annoying when staging/committing changes.

Add a configuration to disable this behavior.

see:
-
https://stackoverflow.com/questions/65220185/how-to-stop-vs-code-to-keep-adding-standard-c-libraries-to-the-file-associatio
-
https://github.com/microsoft/vscode-cpptools/issues/722#issuecomment-480329005
2024-08-31 19:04:12 -07:00
mingyueliuh
047f32c79d
[VitisAI] Remove shape infer from bridge ort (#21331)
### Description
Vitis AI EP's custom op are completely self contained within Vitis AI EP
implementation (rather than needing to add static functions in
provider_bridge).

---------

Co-authored-by: liumingyue <mingyue@xilinx.com>
2024-08-31 08:57:23 -07:00