Commit graph

11685 commits

Author SHA1 Message Date
Tianlei Wu
171b901e32
Add benchmark script for segment anything v2 (#22169)
### Description
Add benchmark script segment anything v2. 
It depends on https://github.com/microsoft/onnxruntime/pull/22119 for
onnx export, and https://github.com/microsoft/onnxruntime/pull/22167 for
sam2 graph fusion.

### Motivation and Context

Benchmark SAM2 model performance.
2024-09-20 21:32:37 -07:00
Tianlei Wu
1431215dcf
Add fusion script for segment anything v2 (#22167)
### Description
* Add MultiHeadAttention fusion for SAM2.
* Add LayerNormalization fusion for NCHW format by inserting Transpose
from NCHW to NHWC before layer normalization, and add another Transpose
after layer norm to convert NHWC back to NCHW. Hopefully, those extra
Transpose nodes will be removed when prefer_nhwc is enabled later.
* Add a condition that the input shall be 3D when fuse SkipLayerNorm.
* Update convert_to_onnx.py to add `--optimize` and `--use_gpu` options
to output optimized onnx model for CPU/CUDA eps.
* Add an option `--dtype fp16|fp32` in convert_to_onnx.py to support
converting optimized model to float16.
* Update the demo to use the optimized onnx models.

### Motivation and Context
To support optimization of SAM2 for CPU/CUDA eps that is exported in
https://github.com/microsoft/onnxruntime/pull/22119
2024-09-20 21:32:16 -07:00
Dmitri Smirnov
fe8a10caa4
Address ZeroK case for Gemm for CPU and CUDA (#22111)
### Description
When K == 0 output a MxN matrix filled with bias if present or filled
with zeros.
This brings it inline with MatMul behavior especially when Gemm is used
to fuse MatMul with Add.


### Motivation and Context
* Comply with numpy spec of MatMul
* Address a case when empty initializers are used for computation.
2024-09-20 17:24:13 -07:00
Yi Zhang
8d2d40781c
set CMAKE_SYSTEM_PROCESSOR in xnnpack.cmake (#22155)
### Description
<!-- Describe your changes. -->



### Motivation and Context
By default, CMAKE_SYSTEM_PROCESSOR is same CMAKE_HOST_SYSTEM_PROCESSOR
https://cmake.org/cmake/help/latest/variable/CMAKE_SYSTEM_PROCESSOR.html
KleidiAI uses CMAKE_SYSTEM_PROCESSOR to determine whether to include
some arm64 ukernels.
https://gitlab.arm.com/kleidi/kleidiai/-/blob/main/CMakeLists.txt#L134
We use Mac with Intel CPU to cross compile MAC with ARM in ios packaging
pipeline
So we need to make CMAKE_SYSTEM_PROCESSOR same with ORT_TARGET_PROCESSOR
2024-09-20 15:19:26 -07:00
Scott McKay
d4692835bf
Fix std::chrono/date conflict for mac builds with C++20 (#22138)
### Description
Fix usage of c++ std::chrono::operator<< in mac builds for wider range
of xcode/targets.

### Motivation and Context

#21033
2024-09-20 11:18:24 -07:00
Scott McKay
da3bd45cdd
Fix CUDA reduction ops handling of optional axes input (#22149)
### Description
<!-- Describe your changes. -->
The optional `axes` input may exist with an empty name and be a nullptr.

Update the CUDA implementation to handle this.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#22035
2024-09-20 13:44:47 +10:00
Adam Reeve
f3cbe76059
Fix memory access violations in the CPU float16 min and max operators (#22135)
### Description

Fixes the logic for getting the number of elements for the input and
output spans in the `MinMaxMLFloat16` method. This was incorrectly using
the full number of elements in the output rather than the number of
elements in the current span, which worked fine with 1D inputs but
breaks with 2D inputs.

This meant that as the `BroadcastLooper` iterated over spans,
`MinMaxMLFloat16` would start at a position further forward in the input
and output and read and write further beyond the end of the input and
output respectively, causing the asan error in #21558 and sometimes
segfaults in larger examples.

### Motivation and Context

Fixes #21558.

From further testing, this issue didn't only cause asan errors in tests
but causes segfaults with larger sized inputs.
2024-09-19 18:04:10 -07:00
Jing Fang
b0ef1f3923
[CPU EP] Refactor MatMulNBits to decouple type implementation (#22140)
### Description
Decouple implementation for different A types to improve readability and
maintainability.

### Motivation and Context
As more types are added, the implementation can differ a lot between
types. Besides, different hardware may require different
implementations.
This PR creates an abstraction boundary where different implemetation
can plug in easily.
2024-09-19 17:57:35 -07:00
George Wu
c270fe6dd3
[qnn ep] fix naming convention of ort-nightly-qnn package (#22157)
followed the rocm example below it which isn't the naming convention we
want to follow. didn't fix rocm because i'm not sure if there are
consumers using its naming convention.
2024-09-19 17:33:31 -07:00
Hector Li
03ce996b7c
Fix QNN random crash for UT with multi-thread run (#22160)
### Description
Fix random crash for QNN UTs with multi-thread run like
QnnHTPBackendTests.MultithreadHtpPowerCfgDefaultAndRunOption

Root cause, last minute code change

b4e26bd5f9
static std::mutex mutex; -> OrtMutex mutex;
missed static.
2024-09-19 16:39:13 -07:00
raoanag
73b5c3354c
Set Transpose Attribute instead for manipulating MatMul Strides (#21927)
### Description
Update DML EP for `FusedMatMul` ORT graph node have TransA/B attribute
set instead of updating the strides.



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-19 16:26:20 -07:00
Scott McKay
bd60add8ce
Update nuget.exe used in WindowsAI nuget packaging so readme property is supported. (#22141)
### Description
<!-- Describe your changes. -->
Use the latest nuget.exe for the `readme` property to be supported.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#22137
2024-09-19 19:06:47 +10:00
Scott McKay
99ee6eeca2
Enable Android 16 KB page size support (#22076)
### Description
<!-- Describe your changes. -->
Add linker flags to support 16KB page size support on Android. 

See
https://source.android.com/docs/core/architecture/16kb-page-size/16kb#build-lib-16kb-alignment

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#21837
2024-09-19 18:53:57 +10:00
Wanming Lin
e33b08ead1
[WebNN EP] Use both MLOperandDescriptor.dimensions and MLOperandDescriptor.shape (#22121)
The spec renames MLOperandDescriptor.dimensions to
MLOperandDescriptor.shape, in order to support older Chromium versions,
we will keep both in WebNN EP for a while.

Fixed #22120
2024-09-19 01:20:40 -07:00
George Wu
944d87381d
[QNN EP] set up py packaging pipeline for Linux x64 (#22132)
set up a pipeline to produce nightly Linux x64 whls for onnxruntime-qnn
this can be used for offline context binary generation.
2024-09-18 23:24:32 -07:00
mguynn-intc
d5f6343a4a
Implementation of AVX-VNNI-INT8 dot product instructions into MLAS GEMM (#21984)
### Description
<!-- Describe your changes. -->
ONNXRuntime implementation of S8S8 was using the default C++
implementation; with this new ISA, all variants of QGemm Int8 can
support VNNI dot product and full AVX2 instructions.

All signed/unsigned variants support VNNI instructions starting with
LNL.
Renamed structs and functions to better indicate support of all Int8 vs
U8X8


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
LNL HW implemented new ISA, and this code enables that ISA in QGemm.
Speed is improved for S8S8 to match with existing U8S8 code. S8U8 would
also match speed if ONNX formally accepted the data type.
2024-09-18 22:18:23 -07:00
Yi Zhang
560778fd07
use mac 12 for esrp code sign (#22134)
### Description
Fix regression caused by #17361 



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-19 12:06:41 +08:00
Tianlei Wu
a9740d6f96
Add onnx export script for segment anything v2 (#22119)
### Description
Add ONNX export script for segment anything v2 (SAM2).

### Limitations
* Does not support video. Only support image right now.
* The decoder does not support batch inference.

### Credits
The demo that is based on [SAM2
notebook](https://github.com/facebookresearch/segment-anything-2/blob/main/notebooks/image_predictor_example.ipynb),
and modified to run with ORT.

The export of decoder is inspired by
https://github.com/vietanhdev/samexporter.

### Demo
Example output of demo:

![sam2_demo](https://github.com/user-attachments/assets/9a9fa360-8c20-482e-9935-a7aba9cf15de)

### Motivation and Context
For support optimization of SAM2 image segmentation.
2024-09-18 14:31:59 -07:00
Patrice Vignola
05acfb90ab
[DML EP] Add QDQ+MatMul fusion into MatMulNBits (#22114)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-17 22:37:45 -07:00
Adrian Lizarraga
b8dae685e4
[QNN EP] Build Python 3.12 wheel for Windows ARM64 (#22118)
### Description
Builds arm64 python 3.12 wheel for QNN EP.


### Motivation and Context
2024-09-17 21:16:31 -07:00
Fangjun Kuang
c6dc787a3d
Update q4common.h to include the missing header (#21786)
Fixes #21748

CC @gyagp
2024-09-17 20:55:56 -07:00
dependabot[bot]
7e98926810
Bump body-parser from 1.20.1 to 1.20.3 in /onnxruntime/test/wasm (#22106) 2024-09-17 22:59:40 +00:00
Atanas Dimitrov
275eb404bf
Speedup CumSum for large arrays (#22048)
### Description
This PR refactors the `CPU` kernel for the `CumSum` operator. The new
implementation strives to have as little indirection as possible.


### Motivation and Context
Currently the `CumSum` operator perform very poorly in the case of 1D
tensors(it was slower than a python loop). This is caused by the
extensive use of the `SliceIterator`-s.

Here is a relevant snippet:
```python
import time
import ndonnx as ndx
import onnxruntime as ort
import numpy as np
import onnx

def test_cumsum(sz):
    a = ndx.array(shape=(sz,), dtype=ndx.int64)
    b = ndx.cumsum(a)
    model = ndx.build({'a': a}, {'b': b})
    onnx.save(model, "model.onnx")

    input = np.ones(sz, np.int64)
    start = time.time()
    result = ort.InferenceSession(model.SerializeToString()).run(None, {'a': input})
    end = time.time()
    return end - start

def test_cumsum_by_hand(sz):
    input = np.ones(sz, np.int64)
    start = time.time()
    answer = [0]
    for i in input:
        answer.append(answer[-1] + i)
    end = time.time()
    return end - start

print(test_cumsum(int(1e7))) 
print(test_cumsum_by_hand(int(1e7))) 
```

Before
```console
0.9794480800628662
0.4518160820007324
```

After
```console
0.02483987808227539
0.5496008396148682
```

The `model.onnx`: 
<img width="214" alt="image"
src="https://github.com/user-attachments/assets/a213d6ff-86c3-49b5-a493-ebfd97deaa41">

The flame graph:

![profile-3](https://github.com/user-attachments/assets/c7418a05-cb65-4d72-a76d-6a6b05b4ba4d)
2024-09-17 15:53:07 -07:00
Yi Zhang
b94ba09e4f
Upgrade XNNPACK to latest version (#22012)
### Description
Update XNNPack to latest version (Sep 4)
- Some op outputs are changed, channel or stride paras are moved into
reshape func.
e.g.
96962a602d
- input params of xnnpack's resize related function are changed a lot
- KleidiAI is added as a dependency in ARM64
- The latest XNNPACK includes 2 static libs microkernels-prod and
xnnpack.
Without microkernels-prod, it throws the exception of Undefined symbols.
- Add ORT_TARGET_PROCESSOR to get the real processor target in CMake
2024-09-17 10:12:16 -07:00
Jian Chen
fa68ae2def
Update pool to MacOS-13 (#17361)
### Description
See https://github.com/microsoft/onnxruntime-extensions/pull/476
and https://github.com/actions/runner-images/issues/7671

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

### Current issue
- [ ] For default xcode 15.2, that come with the MacOS-13, We Need to
update the boost container header boost/container_hash/hash.hpp version
to pass the build
- [x] For xcode 14.2 The Build passed but the `Run React Native Detox
Android e2e Test` Failed.
Possible flaky test, https://github.com/microsoft/onnxruntime/pull/21969
- [x] For xcode 14.3.1 We encountered following issue in `Build React
Native Detox iOS e2e Tests`
```
ld: file not found: /Applications/Xcode_14.3.1.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/arc/libarclite_iphonesimulator.a
clang: error: linker command failed with exit code 1 (use -v to see invocation)
```
Applied following code to the eof in both ios/Podfile and fixed the
issue
```
post_install do |installer|
    installer.generated_projects.each do |project|
        project.targets.each do |target|
            target.build_configurations.each do |config|
                config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '13.0'
            end
        end
    end
end
```


- [x] https://github.com/facebook/react-native/issues/32483

Applying changes to ios/Pofile
```
pre_install do |installer|
  # Custom pre-install script or commands
  puts "Running pre-install script..."

  # Recommended fix for https://github.com/facebook/react-native/issues/32483
  # from https://github.com/facebook/react-native/issues/32483#issuecomment-966784501
  system("sed -i '' 's/typedef uint8_t clockid_t;//' \"${SRCROOT}/Pods/RCT-Folly/folly/portability/Time.h\"")
end
```

- [ ] Detox environment setting up exceeded time out of 120000ms during
iso e2e test


### dependent 

- [x] https://github.com/microsoft/onnxruntime/pull/21159

---------

Co-authored-by: Changming Sun <chasun@microsoft.com>
2024-09-17 10:07:30 -07:00
Chi Lo
6dcdc70aa7
[TensorRT EP] Add supportsModelV2 (#22081)
`supportsModel` is deprecated in TRT 10.1.
Add `supportsModelV2 `but still keep `supportsModel` as we still need to
support TRT 8.6 where `supportsModelV2 ` is not
supported.
2024-09-17 09:52:28 -07:00
Wanming Lin
9786909ab5
[WebNN EP] Support QuantizeLinear and DequantizeLinear ops (#22097) 2024-09-17 08:18:47 -07:00
Xu Xing
afd642a194
[js/webgpu] Replace array with string in transpose perm (#21930)
Perf test data(100000 times)
Array: 12.599999997764826ms
String: 1.6000000014901161ms

Perf test case:

```
const permFunctionBodyArray = (rank: number, input: string): string => {
  const reverseFunc = [];
  reverseFunc.push(`fn perm(i: int) -> int {
    var a: int};`);
  for (let i = 0; i < rank; ++i) {
    reverseFunc.push(input);
  }
  reverseFunc.push('return a;}');
  return reverseFunc.join('\n');
};

const permFunctionBodyString = (rank: number, input: string): string => {
  let reverseFunc= `fn perm(i: int}) -> int {
    var a: int;`;
  for (let i = 0; i < rank; ++i) {
    reverseFunc+=input;
  }
  reverseFunc+='return a;}';
  return reverseFunc;//.join('\n');
};
const count = 100000;
let start, end
console.time('array');
start = performance.now();
for(let i =0 ; i < count; i ++) {
    permFunctionBodyArray(3, 'input');
}
end = performance.now();
console.timeEnd('array');
console.log("Array: "+ (end-start));

console.time('string');
start = performance.now();
for(let i =0 ; i < count; i ++) {
    permFunctionBodyString(3, 'input');
}
end = performance.now();
console.log("String: " +(end-start));
console.timeEnd('string');
```

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2024-09-16 23:17:46 -07:00
Yang Gu
2db6b734f5
[js/webgpu] Fix issue to run model demucs (#22074)
This is to fix issue #22031 to run model demucs.
For conv-transpose, outputPadding.length could be 1, while spatialRank
is 2. The fix is to append enough 0s to outputPadding. For conv, the
issue is similar. kernelShape.length sometimes could be 1, while
inputs[1].dims.length is 4. The fix is also to append enough 0s to
kernelShape.
2024-09-16 23:17:10 -07:00
Yulong Wang
291a5352b2
[js/web] remove training release (#22103)
### Description

Remove training from onnxruntime-web

Following up of #22082
2024-09-16 10:56:22 -07:00
Erick Muñoz
e93f14e00d
Check partial conversion on FP16 to FP32 AVX Cast kernel (#22091)
### Description
Added checks to convert partial vectors in the early stages of the FP16
to FP32 cast using AVX NE CONVERT ISA.



### Motivation and Context
Avoid storing data in sections outside of the output buffer, these
checks are missing on the [original
PR](https://github.com/microsoft/onnxruntime/pull/21183).
This fix prevents memory corruption when the output buffer has a size
[n*16 + 1, n*16 + 7] with 0< n
2024-09-16 09:20:06 -07:00
George Wu
1a1669fe81
use node name in transpose optimizer when adding nodes rather than optype (#22084)
patch from @john-dance

"The main change is simple: Use the original node name rather than the
original node op_type when creating new nodes. Here are my comments on
the change:
------
The onnx runtime uses the op_type as the basis for a new node name, so a
node claimed by QNN EP might be named
Conv_token_1 with no relation to the original /conv1/Conv. This patch:
1. Adds OpName as a virtual function in NodeRef and implements it in
ApiNode.
2. AddNode now takes an op_name and op_type and passes them both to
CreateNodeHelper.
3. CreateNodeHelper uses the op_name rather than the op_type in
GenerateNodeName
4. Direct calls to AddNode are modified to either use the NodeRef if
available, or just repeat the op_type if not available.
The result is that the new nodes are named something like
/conv1/Conv_token_1, allowing a straight forward mapping back to the
original model node (if they exist in the original graph)."
2024-09-16 09:12:13 -07:00
Adam Pocock
6d7235ba5a
[Java] Exposing SessionOptions.SetDeterministicCompute (#18998)
### Description
Exposes `SetDeterministicCompute` in Java, added to the C API by #18944.

### Motivation and Context
Parity between C and Java APIs.
2024-09-16 11:55:38 +10:00
Adam Pocock
02e00dc023
[java] Adding ability to load a model from a memory mapped byte buffer (#20062)
### Description
Adds support for constructing an `OrtSession` from a
`java.nio.ByteBuffer`. These buffers can be memory mapped from files
which means there doesn't need to be copies of the model protobuf held
in Java, reducing peak memory usage during session construction.

### Motivation and Context
Reduces memory usage on model construction by not requiring as many
copies on the Java side. Should help with #19599.
2024-09-16 08:31:55 +10:00
Wanming Lin
c63dd0234b
[WebNN EP] Use opSupportLimits to dynamically check data type support (#22025)
- Remove hard code data type checks and use WebNN's opSupportLimits
instead
- Add HasSupportedOutputsImpl for output data type validation
- Get preferred layout info from opSupportLimits
- Move Not op to logical_op_builder.cc because it should be there. This
avoid the inconsistent input names in `unary_op_builder.cc`.
2024-09-13 21:36:20 -07:00
liqun Fu
a89bddd5c2
Matmul_nbits kernel for mlas sqnbits to support Fp16 inputs (#21807) 2024-09-13 14:55:08 -07:00
aciddelgado
7e2c722459
Add Continuous Decoding support in GQA (#21523)
### Description
This PR will add support for Continuous Decoding for batch_size = 1
input. From now on, GQA can take arbitrary length input using seqlens_k
as total_sequence_length - 1 and the sequence length of qkv as
new_sequence_length.

**This change will not affect the default behavior of GQA**



### Motivation and Context
Prior to this change it was impossible to support sequence_length > 1
inputs when past context was given. This use case is essential to making
continuous decoding work, which is one of our current efforts in
ORT-GenAI.
2024-09-13 13:21:11 -07:00
Changming Sun
59b7b6bb7c
Remove training from web ci pipeline (#22082)
### Description
Remove training from web ci pipeline


### Motivation and Context
2024-09-13 09:52:49 -07:00
Michael Tyler
904b850b44
Update Arm Compute Library Execution Provider (#22032)
### Description
This PR makes the following updates to the Arm Compute Library execution
provider:

- Target Arm Compute Library 24.07  
- Add support for the following operators: 
  - Conv (FP16) 
  - NhwcConv 
  - QLinearConv 
  - MatMul 
  - FusedMatMul 
  - MatMulIntegerToFloat 
- Optimize memory usage and performance
- Expose the enable_fast_math setting 
- Use the main runtime thread pool 



### Motivation and Context
These updates improve performance and memory usage, and enable use of a
more recent version of Arm Compute Library.

@microsoft-github-policy-service agree company="Arm Ltd"

---------

Signed-off-by: Michael Tyler <michael.tyler@arm.com>
2024-09-12 20:51:59 -07:00
Adam Pocock
22437b581b
[java] Fix for OnnxTensor creation when passing in a ByteBuffer containing elements of a different type (#21774)
### Description
Fixes a bug where the buffer offset and position was incorrectly
computed if the user supplied a `ByteBuffer` to `createTensor` but set
the type of the tensor to something other than `INT8`. This would be
more common if the user was trying to load the initializers from a
serialized representation and didn't want to bother with the type
information (which is the case in #21321).

### Motivation and Context
Partial fix for #21321. The remainder of the fix is to add a helper
which allows users to load initializers out of an `onnx_data` file, but
that will require adding protobuf as a dependency for the Java API to
allow the parsing of an ONNX file separately from the native code. It
might be nicer to put that functionality into ORT's C API so it can
return the lengths & offsets of the initializers when provided with an
ONNX file containing external initializers. We hit this kind of thing in
Java more often than other languages as in Java models can be supplied
as classpath resources which we can easily read, but not materialize on
disk for the ORT native library to read.
2024-09-13 12:38:17 +10:00
Adrian Lizarraga
f7bf5a19ba
[QNN EP] Ensure QNN EP rejects nodes with I/O of dynamic shape (#22066)
### Description
Updates QNN EP to properly reject nodes that have inputs or outputs with
dynamic shapes.


### Motivation and Context
Currently, QNN EP does not properly offload subgraphs with dynamic
shapes to the CPU EP. This PR ensures that QNN EP rejects nodes that
consume or generate I/O with dynamic shapes.
2024-09-12 17:18:50 -07:00
mingyueliuh
55ab13e7ca
[VitisAI] support memory buffer contains the TensorProto external data (#22042)
### Description
Extend VitisAI EP `tensor_proto_as_raw` API to support memory buffer
containing the TensorProto external data


### Motivation and Context
For reduce peak memory usage, VitisAI EP need support ORT format model
and setting session option
`session.use_ort_model_bytes_for_initializers` for enable directly use
the model bytes for initializers.

Co-authored-by: mingyue <mingyue@xilinx.com>
2024-09-12 16:23:09 -07:00
0xdr3dd
5c361106e6
[Fuzzer] Add two new ORT libfuzzer (Linux clang support for now) (#22055)
### Description
This PR adds two new libfuzzer in fuzzer project.
1. Binary libfuzzer 
2. libprotobuf-fuzzer

To compile run below cmd on linux:
```
LLVM_PROFILE_FILE="%p.profraw" CFLAGS="-g -fsanitize=address,fuzzer-no-link -shared-libasan -fprofile-instr-generate -fcoverage-mapping" CXXFLAGS="-g -shared-libasan -fsanitize=address,fuzzer-no-link -fprofile-instr-generate -fcoverage-mapping" CC=clang CXX=clang++ ./build.sh --update --build --config Debug --compile_no_warning_as_error --build_shared_lib --skip_submodule_sync --use_full_protobuf  --parallel --fuzz_testing --build_dir build/
```
Run fuzzer:
```
LD_PRELOAD=$(clang -print-file-name=libclang_rt.asan-x86_64.so) build/Debug/onnxruntime_libfuzzer_fuzz  testinput -rss_limit_mb=8196 -max_total_time=472800 -fork=2 -jobs=4 -workers=4 -ignore_crashes=1 -max_len=2097152 2>&1 | grep -v "\[libprotobuf ERROR"
```


### Motivation and Context
The existing custom fuzzer is not coverage guided and it's slow and it
will work on one model mutation at a time. The new fuzzers are coverage
guided, and we can use more models' files as a corpus to increase the
coverage.
2024-09-12 11:50:34 -07:00
wangshuai09
d539c27de8
Fix version check for using -mavxvnni (#21616)
### Description
<!-- Describe your changes. -->
Change the `CMAKE_CXX_COMPILER_VERSION` greater than `11` for using
'-mavxvnni'.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->


`CMakeFiles/onnxruntime_mlas.dir/root/Git.d/onnxruntime/onnxruntime/core/mlas/lib/x86_64/QgemmU8S8KernelAvx2.S.o
cc: error: unrecognized command-line option ‘-mavxvnni’; did you mean
‘-mavx512vnni’?` using `gcc (GCC) 10.3.1`.

`-mavxnni` is supported since [GCC 11
Release](https://gcc.gnu.org/gcc-11/changes.html), this PR change the
version check.
2024-09-12 11:42:17 -07:00
Clément Péron
10883d7997
Suppress GCC warning in TreeEnsembleAggregator (#22062)
### Description
When building with GCC 14.2.1, I got the following warning:

onnxruntime/core/providers/cpu/ml/tree_ensemble_aggregator.h:329:59:
error: template-id not allowed for constructor in C++20
[-Werror=template-id-cdtor]

Remove template parameters from the constructor: The constructor
TreeAggregatorMax<InputType, ThresholdType, OutputType> has been
simplified to TreeAggregatorMax, because the compiler already knows the
template parameters from the class definition.

### Motivation and Context
Fix the build issue

Signed-off-by: Clément Péron <peron.clem@gmail.com>
2024-09-12 19:46:27 +02:00
Yulong Wang
84f73327f5
allow scalar axes for Unsqueeze for WebGPU (#22054)
### Description

Align with CPU behavior.


https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/cpu/tensor/unsqueeze.cc#L60-L62
2024-09-12 10:33:37 -07:00
mindest
951b1b7160
[CI] Linux ROCm CI Pipeline: fix error, set trigger rules. (#22069)
### Description
* Correct the wrong EP name for ROCm, fix CI error.
* Update `set-trigger-rules.py`.
* Modify the .yml via `set-trigger-rules.py`
2024-09-12 09:54:32 -07:00
Yi Zhang
ae39c40e5b
fix typo in iOS pipeline (#22067)
### Description
<!-- Describe your changes. -->



### Motivation and Context
The parameter isn't correct.
Maybe it hasn't negative impact by chance so far.

d8e64bb529/cmake/CMakeLists.txt (L1712-L1717)
2024-09-12 19:07:42 +08:00
Prathik Rao
d495e6cf1c
adds support for Uint8ClampedArray (#21985)
Fixes https://github.com/microsoft/onnxruntime/issues/21753
2024-09-11 22:02:30 -07:00
Lennart Hannink
d8e64bb529
Refactor CoreMLExecution to C++ bridge class (#21857)
Refactor Objective-C++ class `CoreMLExecution` into existing C++ bridge class `onnxruntime::coreml::Execution`.
2024-09-11 16:05:37 -07:00