Commit graph

11730 commits

Author SHA1 Message Date
Jian Chen
ebcf2fcd16
Replace gradle/wrapper-validation-action with gradle/actions/wrapper-validation-action (#22224)
### Description
Replace gradle/wrapper-validation-action with
gradle/actions/wrapper-validation-action


### Motivation and Context
This is recommended by
https://github.com/gradle/wrapper-validation-action. This job uses
deprecated functionality from the 'gradle/wrapper-validation-action'
action.
2024-09-30 14:29:16 -07:00
Ranjit Ranjan
812075731c
[AIX] Build fix for using system installed protobuf/onnx (#22272)
### Description
To fix the build issues for AIX OS while using system installed
protobuf/onnx.

### Motivation and Context
Code changes in this PR contains:

1. Fix for below compilation issue.
```
collect2: fatal error: library liblibprotobuf-lite not found
compilation terminated.
```
2.  Adding onnx library into dependency list for test applicaitons.
2024-09-30 12:36:21 -07:00
Yi Zhang
d069475a63
Make A100 jobs in PR checks again (#22261)
### Description
if the variable is 1, the job running on A100 in PR checks.
Fixes
[AB#50333](https://aiinfra.visualstudio.com/6a833879-cd9b-44a4-a9de-adc2d818f13c/_workitems/edit/50333)


### Motivation and Context
We wish more big models which need to run on A100 can be tested in PR
checks, but Azure may decommission A100 agents without notifications
sometimes, which will block merging PRs.
This PR is an improvement of current workaround, making those jobs only
run main branch.
Once we find the A100 are all decommisioned by Azure, we could change
the UseA100 variable to 0 to disable the A100 jobs in PR checks
2024-09-30 08:29:30 -07:00
wejoncy
2cfe1f031d
[CoreML MLProgram] Support Float16 (1/N) (#22068)
### Description
Support Float16 for CoreML MLProgram EP.
Operations:
    "Add", "Mul", "Sub", "Div", "Pow", "Sqrt", "Reciprocal",
"Sigmoid", "Tanh", "Relu", "LeakyRelu", "Concat", "GridSample",
"GlobalAveragePool",
    "Clip", "DepthToSpace", "Resize", "Slice", "Conv",
    "ConvTranspose", "GlobalMaxPool", "Gemm", "MatMul",
    "AveragePool", "MaxPool", "Reshape", "Split", "Transpose"

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-09-30 17:56:47 +08:00
Yang Gu
434f0fa536
[js/webgpu] Fix the crash issue in unsqueeze (#22264)
While allowing axes in unsqueeze to be scalar, its shape couldn't be
always accessed like a vector. This PR fixes issue #22031 so that the
original model could run well.
2024-09-30 02:28:16 -07:00
Yulong Wang
1bda91fc57
[js/webgpu] fix external buffer registration (#22254)
### Description

Fixes the problem of running into failure when GPU inputs shuffled
between iterations.
2024-09-28 10:36:40 -07:00
Enrico Galli
52a8c1cae8
[WebNN EP] Enable IO Bindings with MLTensor (#21301)
### Description
Enables using the MLTensor to pass data between models. 


### Motivation and Context
Using MLTensor instead of ArrayBuffers reduces the number of copies
between the CPU and devices as well as the renderer and GPU process in
Chromium.
2024-09-27 17:24:21 -07:00
Patrice Vignola
ebda23be16
[DML EP] Fix Clip clamping (#22251)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-27 16:24:37 -07:00
shiyi
1e3cd86d80
[WebNN EP] Support LSTM op (#20293)
<!-- Describe your changes. -->




<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-27 14:23:08 -07:00
liqun Fu
f410e7c4cf
Fix mlas bench crash (#22248)
Fix mlas bench crash

---------

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2024-09-27 13:50:42 -07:00
Sumit Agarwal
529835cc46
[DML EP] Update DML to 1.15.2 (#22247)
### Description
Update DML binary to the current latest redist version
[1.15.2](https://www.nuget.org/packages/Microsoft.AI.DirectML/1.15.2).
2024-09-27 13:20:29 -07:00
Patrice Vignola
20be51525b
Support if node with sequence outputs (#22234)
`If` nodes can have sequence outputs. Those nodes are mapped to the DML
EP to be able to keep the outputs on the GPU, but they actually execute
on the CPU by selecting either the `then` subgraph or the `else`
subgraph.
2024-09-27 12:40:01 -07:00
Patrice Vignola
14ba2fb83c
[DML EP] Add intermediate tensor dumping for DML (#22246)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-27 12:39:45 -07:00
Hector Li
6e3163faa5
Update code regarding some QNN bug fixes (#22222)
### Description
Update code regarding some QNN bug fixes:
1. QnnProfile_ExtendedEventData_t.version is not initialized in Qnn
2. Failed to finalize the graph for HardSigmoid with FP16 precision
2024-09-27 09:51:47 -07:00
Kyle
b81e76b9a6
Jar Maven Signing - GnuPG and sha256 (#22217)
### Description
<!-- Describe your changes. -->
Jar maven signing: 
- GnuPG 
- sha256.

Jar packages artifacts: 
- onnxruntime-android-full-aar
- onnxruntime-java
- onnxruntime-java-gpu


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Previously, it is manually signed. 
Goal: make it automatically.
2024-09-27 17:50:06 +08:00
Tianlei Wu
ff8a48ef3b
Update SAM2 benchmark script and doc (#22238)
(1) Fix a bug of parameters order.
(2) Update benchmark script: 
* download test image if not exist
* combine multiple csv files into one file, and remove duplicated lines
(3) Add a section for benchmark in README.md
2024-09-26 20:57:03 -07:00
Scott McKay
3846f84218
Increase React Native E2E (#22230)
### Description
<!-- Describe your changes. -->
Increase the detox setup timeout to 4 minutes. 

The iOS RN E2E tests are taking slightly around 2 mins to setup causing
flakiness.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Improve RN CI pass rate
2024-09-27 08:59:36 +10:00
Tianlei Wu
2deab75d39
Add numeric_limits for float8 types (#22228)
Add std::numeric_limits for float8 data types to provide a consistent
way to access limits of those types.

Reference:
* https://onnx.ai/onnx/technical/float8.html
2024-09-26 14:42:36 -07:00
Jing Fang
1942e40e05
[ARM64] MatMulNBits: use neon instrinsics to convert between fp16 and fp32 (#22195)
### Description
For fp16 Atype, the fallback operation is convert the data to fp32 and
calculate.
Added neon intrinsics version to speed up the conversion.

Store address alignment and loop unrolling have insignificant impact on
latency so they are omitted.

|Benchmark | Time | CPU |

|--------------|---------------------------------------------|--------------------|
|M_ConvertF16ToF32/baseline/real_time | 1076961 ns | 1083398 ns |
|M_ConvertF16ToF32/aligned:0/real_time | 46785 ns | 46516 ns |
|M_ConvertF16ToF32/aligned:1/real_time | 46631 ns | 46391 ns |
|M_ConvertF16ToF32_unroll2/aligned:0/real_time | 44074 ns | 44392 ns |
|M_ConvertF16ToF32_unroll2/aligned:1/real_time | 44726 ns | 45226 ns |
|M_ConvertF32ToF16/baseline/real_time | 520109 ns | 527329 ns |
|M_ConvertF32ToF16/aligned:0/real_time | 73610 ns | 74015 ns |
|M_ConvertF32ToF16/aligned:1/real_time | 71557 ns | 71525 ns |
|M_ConvertF32ToF16_unroll2/aligned:0/real_time | 64227 ns | 63374 ns |
|M_ConvertF32ToF16_unroll2/aligned:1/real_time | 67428 ns | 67989 ns |



### Motivation and Context
speed up fallback implementation of Fp16 MatMulNBits
2024-09-26 13:55:40 -07:00
jingyanwangms
d0b0ecfdb9
[Running CI] Update TensorRT to 10.4 (#22049)
### Description
TensorRT 10.4 is GA now, update to 10.4



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-26 11:10:52 -07:00
Tianlei Wu
7880342e5e
Add numeric_limits for MLFloat16 and BFloat16 (#22197)
### Description
* Add std::numeric_limits for MLFloat16 and BFloat16.
* Update some comments in csharp ORTFloat16.shared.cs.
* Add unit tests (including Clip)

Note that the canonical NaN is not consistent in C++ and C#. C# uses
negative quiet NaN as canonical NaN, while C++ uses positive quiet NaN.
The choice of CSharp Float16.NaN is to be consistent with
System.Half.NaN.

FP16 data returns from CUDA might have 7FFF as NaN; FP16 data from CPU
provider might have 0x7E00 as NaN. Anyway there is no consistent
canonical NaN in ORT right now. Because all these NaNs are aligned with
IEEE spec, there shall not an issue in downstream.

### Motivation and Context
std::numeric_limits is used in codebase but not defined for MLFloat16
and BFloat16. It causes some bugs like
https://github.com/microsoft/onnxruntime/issues/21957 introduced by
https://github.com/microsoft/onnxruntime/pull/21493.
2024-09-25 17:10:05 -07:00
liqun Fu
72b0979e8a
Fix a wrong assignment that causing mlas benchmark to crash (#22221)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2024-09-25 15:53:28 -07:00
saurabh
4d6019fa02
OVEP: Tensor caching fix (#22218)
### Description
1. changing the emplace to [] that does have a difference, emplace will
only create a new entry if it doesn't already exist in the map
2. change the logic of the caching lookup to key off of input/output
names instead of ort raw ptrs.
3. changes OV tensor creation for CPU allocated input/output ORT
tensors. The CPU allocated input/output tensor path was re-allocating OV
tensors based on the ORT input/output tensors. So we'd get 2 copies: ORT
input/output tensor -> OV tensor (OVEP) -> NPU Tensor (NPU plugin).

---------

Co-authored-by: Eric Crawford <eric.r.crawford@intel.com>
2024-09-25 14:58:04 -07:00
Hector Li
50d9612bc0
change shared_ptr to unique_ptr to make the ownership clear (#22209)
### Description
change shared_ptr to unique_ptr to make the ownership clear.
2024-09-25 12:58:46 -07:00
Claude
3494f80e83
Check if HTMLCanvasElement exists (i.e. we are not running in a webworker) (#22153)
This fixes #22152


### Description
Tensor.fromImage fails in a webworker context, because HTMLCanvasElement
does not exist:

> HTMLCanvasElement is not defined



### Motivation and Context
This fixes #22152

---------

Co-authored-by: Yulong Wang <7679871+fs-eire@users.noreply.github.com>
2024-09-25 11:52:52 -07:00
Adrian Lizarraga
a47254eaef
Remove empty (DQ -> Q -> graph output) sequence in TransposeOptimizer (#22172)
### Description
Updates the TransposeOptimizer to also remove empty (DQ -> Q) sequences
that occur at a graph output. An empty DQ->Q sequence results from a
Transpose being optimized out.

Consider the following example model:

![image](https://github.com/user-attachments/assets/4e7bc4eb-ea8a-463b-9672-c4ec5ef779b2)

The TransposeOptimizer removes the final Transpose and leaves an empty
DQ->Q->output_0 sequence. This PR ensures that the final DQ->Q is also
removed.

### Motivation and Context
Models with quantized output can run on QNN EP. The inference latency of
a customer model is impacted by the unnecessary DQ->Q sequence at the
output.

---------

Co-authored-by: Scott McKay <skottmckay@gmail.com>
2024-09-24 21:02:17 -07:00
Caroline Zhu
ee6a91533c
Add BrowserStack mention to project ReadMe (#22207)
### Description
Condition for [BrowserStack support for open-source
projects](https://www.browserstack.com/open-source)

### Motivation and Context
- Considering using BrowserStack for our end-to-end tests for iOS and
Android
2024-09-24 17:14:14 -07:00
Adrian Lizarraga
7811839265
[QNN EP] Always fuse (DQ->Q) to a QNN Convert operator (#22205)
### Description
Previously, we only fused (DQ -> Q) into a QNN Convert if the
quantization types differed (e.g., converting uint8 to uint16). This PR
always fuses DQ -> Q regardless of the quantization type because a
single QNN Convert op is faster than two separate ops.

Example fusions:
- [CURRENTLY SUPPORTED] Convert uint8 to uint16:
  - `uint8 -> DQ -> Q -> uint16` becomes `uint8 -> Convert -> uint16`
- [CURRENTLY SUPPORTED] Convert uint16 to uint8:
  - `uint16 -> DQ -> Q -> uint8` becomes `uint16 -> Convert -> uint8`
- [NEW] Convert uint8 (zp0, scale0) to uint8 (zp1, scale1):
- `uint8(zp0/scale0) -> DQ -> Q -> uint8(zp1/scale1)` becomes
`uint8(zp0/scale0) -> Convert -> uint8(zp1/scale1)`
- [NEW] Convert uint16 (zp0, scale0) to uint16 (zp1, scale1):
- `uint16(zp0/scale0) -> DQ -> Q -> uint16(zp1/scale1)` becomes
`uint16(zp0/scale0) -> Convert -> uint16(zp1/scale1)`


### Motivation and Context
The Transpose optimizer will normally remove empty DQ->Q sequences if
the quantization params are equal. However, for cases in which the
quantization params are not equal, QNN EP should convert DQ->Q to a
single QNN Convert op for performance. This affects a customer model.
2024-09-24 15:51:32 -07:00
amarin16
eb2506d77a
Add MLFloat16 support for LayerNormalization, SkipLayerNormalization (#22063)
Add `MLFloat16` support for:
- `LayerNormalization`
- `SimplifiedLayerNormalization`
- `SkipLayerNormalization`
- `SkipSimplifiedLayerNormalization`

There are existing `LayerNormTest` unit tests that cover the `MLFloat16`
functionality for `LayerNormalization` once `MLFloat16` is registered
(for example
[`LayerNormTest.LayerNorm_Scale_Float16Input`](91c916f9c6/onnxruntime/test/contrib_ops/layer_norm_op_test.cc (L112))).

Similarly, there are unit tests such as
[`SkipLayerNormTest.SkipLayerNormBatch1_Float16`](91c916f9c6/onnxruntime/test/contrib_ops/skiplayernorm_op_test.cc (L255))
that cover MLFloat16 inputs for `SkipLayerNormalization`.
2024-09-24 15:06:27 -07:00
chenduan-amd
61996332ad
[VitisAI] support run_options in vitisai EP end (#22029)
### Description
add OnRunStart() method for Vitis AI execution provider



### Motivation and Context
To dynamically obtain some runtime parameters during execution, use
run_options within the Vitis AI execution provider (EP).
2024-09-24 14:37:05 -07:00
George Wu
7727b4b909
[TensorRT EP] update gen_trt_engine_wrapper_onnx_model.py script (#22184)
update script which was using deprecated num_bindings to num_io_tensors
tested on an engine dumped by trtexec and loaded the engine using
onnxruntime-gpu 1.19.2 python package.
2024-09-24 14:34:05 -07:00
Ye Wang
6cc06ad069
GQA MLFloat16 cpu (#22102)
### Description
<!-- Describe your changes. -->


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

---------

Co-authored-by: Your Name <you@example.com>
2024-09-24 09:51:59 -07:00
Hector Li
5fa4505d1b
Set enable_htp_fp16_precision default to true (#22186)
### Description
Set enable_htp_fp16_precision default to true for HTP backend.
2024-09-24 09:37:53 -07:00
Edward Chen
209ff86d52
Get build working on Xcode 16 (#22168) 2024-09-24 08:33:03 -07:00
Adam Reeve
ce13f651d8
Fix NaN propagation for float16 min and max operators (#22161)
This makes min and max with NaN for either operand always return NaN for
float16 data, matching the behaviour of float and double.

The behaviour for floats and doubles was previously fixed for the CPU
provider in #21492 and the CUDA provider in #19984, but these PRs didn't
fix the behaviour for float16 due to tests causing asan errors. The
memory access violations with float16 data have now been fixed in
#22135, so this PR is a follow up to make float16 min and max behave the
same as float and double for both the CPU and CUDA providers now that we
can add tests for this.

### Motivation and Context

Relevant previous issues (not float16 specific):
* #21455
* https://github.com/onnx/onnx/issues/6003
2024-09-24 08:25:20 -07:00
Adam Pocock
cfa45df6b5
[java] Migrate OnnxTensors created from arrays over to a backing Java buffer (#18556)
### Description
Following from #16578 and #16835 this migrates over
`OnnxTensor.createTensor(<array>)` to first instantiate a
`java.nio.Buffer` and then copy the array into that buffer in Java
before creating the tensor. It also changes the `OnnxTensor.getValue()`
method which returns a multidimensional array so it does the array
construction and value copy in Java. This allows the removal of some
unpleasant recursive C code which repeatedly calls into the JVM to
traverse Java's arrays. The equivalent Java code is still unpleasant and
recursive, but it's easier to reason about and memory safe. As a bonus,
more `OnnxTensor`s are now backed by buffers which allow users to pin
memory and reduce allocations by reusing them for same sized inputs.

Some of the JNI code which parses Java arrays still exists as it's used
by `OnnxMap`, removing that will be the target of a future refactor.
Strings are still processed in JNI as it is easier to work with String
tensors and UTF-8 arrays in C.

### Motivation and Context
Minimizing the amount of JNI code makes it easier to maintain and using
buffers in preference to arrays allows for fewer allocations.
2024-09-24 15:36:52 +10:00
Scott McKay
ae66d0e7cf
Update ROCm reduction to match recent CUDA change (#22192)
### Description
<!-- Describe your changes. -->
Add handling of a missing optional axes input to the ROCm reduction ops.
Matches CUDA EP change from #22149


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix pipeline.
2024-09-24 11:58:48 +10:00
Tianlei Wu
0806879ad4
Update lintrunner requirements (#22185)
### Description
* Add lintrunner to requirements-lintrunner.txt
* Lock lintrunner and lintrunner-adapter version
* Update documentation

### Motivation and Context
The document is not up to date.
2024-09-23 18:27:16 -07:00
Dmitri Smirnov
a7c9f27d2d
Remove training pipelines from Win CPI CI as redundant (#22190) 2024-09-23 18:15:41 -07:00
Yulong Wang
df25006d1b
upgrade micromatch to v4.0.8 (#22174)
### Description

Upgrade `micromatch` to v4.0.8

https://github.com/advisories/GHSA-952p-6rrq-rcjv
2024-09-23 14:39:32 -07:00
Hann Wang
7a782b7213
[ROCm] fix rocm-6.2 build issues (#21993)
Composable Kernel build fails under ROCm 6.2.

This PR patches Composable Kernel the same way as
https://github.com/ROCm/composable_kernel/pull/1346

* fix buffer resource to match "s" constraint
* add missing memory clobber
2024-09-23 14:01:54 -07:00
Christian Bourjau
1a84f53c35
Make argmin/armax support identical data types and add int64 support (#21641) 2024-09-23 13:02:29 -07:00
Jiajia Qin
80e9df826e
[js/webgpu] Optimize InstanceNormalization (#21995)
### Description
<!-- Describe your changes. -->
For InstanceNormalization, it has `y = scale * (x - mean) /
sqrt(variance + epsilon) + B` , where mean and variance are computed per
instance per channel. Calculating mean and variance per channel is a
reduce processing, which is NCHW layout friendly since it makes the
adjacent threads can access contiguous data in gpu memory.

This PR optimizes both NHWC and NCHW InstanceNormalization. To
efficiently calculate the mean and variance, we need to make sure the
input is NCHW instead of NHWC. Then use shared memory to do the reduce
operation to get `channel_scale` and `channel_shift`.

With this PR, getting `channel_scale` and `channel_shift` are same for
NHWC and NCHW InstanceNormalization. And the overall performance becomes
very close now.

Below data comes from SD Turbo profiling results.
Before (InstanceNormalization overall time: 140.84 ms)

InstanceNormalization\|InstanceNormComputeMean | 129.70
-- | -- 
InstanceNormalization\|InstanceNormalizationNHWC | 10.55
InstanceNormalization\|InstanceNormComputeChannelScaleShift | 0.59


After (InstanceNormalization overall time:  59.44 ms)

InstanceNormalization\|InstanceNormComputeChannelScaleShift | 28.57
-- | -- 
InstanceNormalization\|TransposeShared | 20.19
InstanceNormalization\|InstanceNormalizationNHWC | 10.68
2024-09-23 11:32:09 -07:00
Chester Liu
9b37b3ea44
Specify the paths of system tools when building Apple framework (#22056)
### Description
<!-- Describe your changes. -->

Specify the path of `ar`, `ld` and `libtool` when building apple
framework.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Sometimes non-system executables will comes before the system-provided
ones. This PR intends to prevent it from happening.
2024-09-23 17:19:30 +08:00
Hector Li
b636b275aa
Fix an issue that QNN models shared from other session use the session logger from that session (#22170)
### Description
Fix an issue that QNN models shared from other session use the session logger from that producer session also which cause confusion. Make QNN model compute function use the session logger from current session.
2024-09-21 20:41:56 -07:00
Tianlei Wu
171b901e32
Add benchmark script for segment anything v2 (#22169)
### Description
Add benchmark script segment anything v2. 
It depends on https://github.com/microsoft/onnxruntime/pull/22119 for
onnx export, and https://github.com/microsoft/onnxruntime/pull/22167 for
sam2 graph fusion.

### Motivation and Context

Benchmark SAM2 model performance.
2024-09-20 21:32:37 -07:00
Tianlei Wu
1431215dcf
Add fusion script for segment anything v2 (#22167)
### Description
* Add MultiHeadAttention fusion for SAM2.
* Add LayerNormalization fusion for NCHW format by inserting Transpose
from NCHW to NHWC before layer normalization, and add another Transpose
after layer norm to convert NHWC back to NCHW. Hopefully, those extra
Transpose nodes will be removed when prefer_nhwc is enabled later.
* Add a condition that the input shall be 3D when fuse SkipLayerNorm.
* Update convert_to_onnx.py to add `--optimize` and `--use_gpu` options
to output optimized onnx model for CPU/CUDA eps.
* Add an option `--dtype fp16|fp32` in convert_to_onnx.py to support
converting optimized model to float16.
* Update the demo to use the optimized onnx models.

### Motivation and Context
To support optimization of SAM2 for CPU/CUDA eps that is exported in
https://github.com/microsoft/onnxruntime/pull/22119
2024-09-20 21:32:16 -07:00
Dmitri Smirnov
fe8a10caa4
Address ZeroK case for Gemm for CPU and CUDA (#22111)
### Description
When K == 0 output a MxN matrix filled with bias if present or filled
with zeros.
This brings it inline with MatMul behavior especially when Gemm is used
to fuse MatMul with Add.


### Motivation and Context
* Comply with numpy spec of MatMul
* Address a case when empty initializers are used for computation.
2024-09-20 17:24:13 -07:00
Yi Zhang
8d2d40781c
set CMAKE_SYSTEM_PROCESSOR in xnnpack.cmake (#22155)
### Description
<!-- Describe your changes. -->



### Motivation and Context
By default, CMAKE_SYSTEM_PROCESSOR is same CMAKE_HOST_SYSTEM_PROCESSOR
https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_PROCESSOR.html
KleidiAI uses CMAKE_SYSTEM_PROCESSOR to determine whether to include
some arm64 ukernels.
https://gitlab.arm.com/kleidi/kleidiai/-/blob/main/CMakeLists.txt#L134
We use Mac with Intel CPU to cross compile MAC with ARM in ios packaging
pipeline
So we need to make CMAKE_SYSTEM_PROCESSOR same with ORT_TARGET_PROCESSOR
2024-09-20 15:19:26 -07:00
Scott McKay
d4692835bf
Fix std::chrono/date conflict for mac builds with C++20 (#22138)
### Description
Fix usage of c++ std::chrono::operator<< in mac builds for wider range
of xcode/targets.

### Motivation and Context

#21033
2024-09-20 11:18:24 -07:00