Commit graph

8943 commits

Author SHA1 Message Date
Edward Chen
b668a6da96
Treat Objective-C static analysis warnings as errors (#16293)
- Update Objective-C static analysis check to fail on warnings.
- Address warning.
- Clean up build definition.
2023-06-09 08:51:49 -07:00
Scott McKay
443f553782
Fix native onnxruntime library not loading in Azure App Service (#16286)
### Description
<!-- Describe your changes. -->
SetThreadDescription isn't available in an Azure App Service sandbox.
#15219 removed a check that it was available, making it a hard
dependency. When it's not available the dll load fails with a 'procedure
not found' error.

Add back the check.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

#15375 - although note this has nothing to do with the original issue.
This is just for
https://github.com/microsoft/onnxruntime/issues/15375#issuecomment-1579464889
2023-06-09 18:40:51 +10:00
Hector Li
a9d47f72a4
[QNN EP] Add model description into context binary file metadata for validation (#16248)
### Description
Add model description into context binary file metadata for validation

### Motivation and Context
Dump more information for validation

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-06-08 22:13:43 -07:00
Hector Li
d1e8d4a261
[QNN EP] Fix an issue for Conv with dynamic weights (#16235)
### Description
Fix an issue for Conv with dynamic weights
Root cause:
Conv op builder create the weight input tensor with wrong name. With dynamic weight, Transpose node is inserted. Conv op builder should use the new name which is Transpose output. It cause the weight producer has wrong output shape.
2023-06-08 17:09:35 -07:00
Jhen-Jie Hong
ac8444f299
[js/rn] Implement dispose native method (#16131)
### Description
<!-- Describe your changes. -->

Implement `dispose` react native method.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Currently we are not able to release the memory used by model in JS
runtime if we don't want to use it anymore, we can do that only by
reload app on debug or restart app on release.
2023-06-09 09:17:33 +10:00
Adrian Lizarraga
b48628f1cd
[QNN EP] Add tests for large inputs that trigger memory alloc errors (#16223)
### Description
Adds tests for operators that return error 1002
(QNN_COMMON_ERROR_MEM_ALLOC) when the call to graphFinalize() fails.
This seems to happen for large input sizes.

Operators:
- Sub
- Div
- Conv
- MaxPool



### Motivation and Context
This documents bugs that need to be addressed with unit tests.
2023-06-08 15:47:51 -07:00
Changming Sun
b72fe664c1
Refactor prepack buffer code (#16280)
### Description
1. Use IAllocatorUniquePtr to replace BufferUniquePtr. It will ensure
the deleter is always right.
2. Change some std::unique_ptr to std::optional
3. Bypass Arena allocator when allocating the prepack buffers for mlas.
In this special case, Arena doesn't help any. And this change is just an
internal implementation change, it doesn't affect our public interface.
2023-06-08 14:42:02 -07:00
Sheil Kumar
9d52632da9
[DML EP] Register Div with int64 and NonZero with bool (#16276)
[DML] Register Div with int64 and NonZero with bool

These data types are supported by DML
2023-06-08 13:49:39 -07:00
kunal-vaishnavi
79e0230002
Add vocab masks to Whisper export with beam search (#16180)
### Description
This PR adds flags for exporting Whisper with vocab masks for logits
processing. This PR also sets `input_features` back to FP32 precision
for the user and casts `input_features` to FP16 precision when needed.



### Motivation and Context
This helps enable specific logits processing for the exported Whisper
model.
2023-06-08 12:36:35 -07:00
Yuriy Chernyshov
a3a443c804
Support re2 == 2023-06-02 (#16257)
### Description

google/re2 [was
switched](49d776b9d2)
to absl::string_view in version 2023-06-02.

As `absl::string_view` is a drop-in replacement for `std::string_view`
it does not have `as_string()` method.
This PR ensures the forward compatibility with the newest versions of
re2 library.
2023-06-08 11:26:26 -07:00
Scott McKay
b07b647f66
Fix some issues with NNAPI Softmax (#16095)
### Description
<!-- Describe your changes. -->
Update NNAPI Softmax to coerce to 2D when opset is < 13. This prevents
the layout change to NHWC from breaking the implementation, as well as
making it work correctly when the ONNX node's axis != 1.

Add check for opset 13+ that axis is inner-most dimension as we don't
currently handle any other value correctly.

Update tests to add model to check NHWC layout, as well as 4D tests. We
didn't notice the issues with the NNAPI EP as it was only processing
input shapes that were 2D or 4D (which was overly restrictive as well).

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
#15949
2023-06-08 13:56:06 +10:00
Artur
dc1312cfb1
[web] fix: Provide typings for exports (#16249)
### Description
Adds typings to be compatible with `moduleResolution: bundler`


### Motivation and Context
Fixes #16242
2023-06-07 14:52:36 -07:00
Changming Sun
fe0cc8ce62
Remove some usages of CUDA_VERSION macro (#16199)
### Description
We should avoid using the macro since the value of the macro is
inaccurate. For example, our prebuilt packages are built with CUDA 11.8
but people may run the binaries with CUDA 11.4. (The minimal CUDA version we support is CUDA 11.4)
A runtime function should be used to determine CUDA version. Like:
```C++
  int cuda_runtime_version = 0;
  CUDA_CALL_THROW(cudaRuntimeGetVersion(&cuda_runtime_version));
  ORT_ENFORCE(cuda_runtime_version >= 11040, "ONNX Runtime needs cuda runtime higher than 11.4");
```
2023-06-07 14:34:22 -07:00
Dmitri Smirnov
908e940660
[CPP Api] Remove deprecated CustomOp API (#16256)
### Description
Custom Op API has been deprecated in 1.15 release. We are removing it.
2023-06-07 14:03:13 -07:00
Vrajang Parikh
67f4a4fd16
Objective-C binding for ORT training (#16127)
### Description
Implement Objective-C binding for `ORTCheckPoint`. Additionally, 
- Modify `onnxruntime_objectivec.cmake` to only include training header
and sources when training flag is enabled
- Enable objective-c binding for `orttraining-mac-ci-pipeline`

### Motivation and Context
This PR is part of implementing Objective-C bindings for training API.
It implements objective-c binding for ORTCheckPoint class. The
objective-C API closely resembles the C++ API.

**Note**: The test for saving checkpoint is skipped as it requires use
of training session. It will be added when the objective-c binding for
`ORTTrainingSession` is added.
2023-06-07 14:01:30 -07:00
Adam Pocock
bca49d62a0
Fixing CoreML in Java (#16231)
### Description
The name of the flag we set when compiling the JNI binding to enable the CoreML EP changed at some point in the past. This PR fixes it by updating the flag in the JNI. I also added a quick smoke test for the CoreML provider to make sure it doesn't crash and can be enabled.

### Motivation and Context
All the EPs should work as expected in Java. Fixes #16230.
2023-06-07 12:24:57 -07:00
Edward Chen
1261d0b8ba
Fix some build issues on MacOS with Xcode 14.3. (#15878)
- Fix flatbuffers flatc warning, unused-but-set-variable.
- Address `-Wshorten-64-to-32` warnings (fix in our code, allow in dependencies' code).
- Update CI builds to use Xcode 14.3.
- Update minimum iOS version to 12.0.
- Update Mac hosted agents to MacOS 13 where possible.
2023-06-07 12:07:11 -07:00
Adrian Lizarraga
b8858f034e
[QNN EP] Increase conv test tolerance for Windows x64 (#16241)
### Description
Increases allowable accuracy tolerance for specific Conv op test on QNN
CPU backed (Windows x64).


### Motivation and Context
Allow QNN NuGet pipeline to run. PR
https://github.com/microsoft/onnxruntime/pull/15975 introduced a failing
test on Windows x64.
2023-06-07 10:52:56 -07:00
Wanming Lin
a8c2f24ae0
[WebNN EP] Merge support for segment anything into main branch (#16208)
We implemented a number of new ops and data types to support running
segment anything model on Chromium WebNN DML backend (POC) in a forked
branch https://github.com/honry/onnxruntime/tree/stable-diffusion

In this PR, we migrate the changes in the forked branch to main branch,
includes:
 - 22 new ops
- New tensor data types: bool, int32, uint32, uint64, int64, float16 (As
JavaScript hasn't shipped Float16Array, we use Uint16Array as a
workaound)
 - Handle empty input tensors and duplicated outputs
 - Fixed some nits
2023-06-07 09:56:37 -07:00
cloudhan
05bea0d3c3
Add new cases for non biased mha tests (#16097)
1. Add new test data GetSelfAttentionData_WithPastAndPresent_HeadSize8_NoMask_NoRelPosBias, also added non-biased data
2. Add new test data GetCrossAttentionData_DiffSequenceLengths_HeadSize8, also added non-biased data
3. Disabled the new tests for CUDA EP due to qkv is not correctly transposed.
2023-06-07 15:04:27 +08:00
cloudhan
3373160863
[CPU EP] Refactor CPU mha (#16247)
Followup of #16075
2023-06-07 14:41:14 +08:00
cloudhan
f013965831
Add non qkv biased version mha unittests (#16075)
1. Add nonbiased mha unittests data
2. Update CPU and CUDA EP to accept inputs with `qkv_bias`
2023-06-06 09:18:41 +08:00
Adam Pocock
3c2a11f2f1
[java] Allow the creation of boolean tensors from ByteBuffer (#15556)
### Description
The tensor creation code now allows the creation of boolean tensors from
non-direct `ByteBuffer` instances. It previously only allowed them from
arrays and direct `ByteBuffer` instances and this fixes that
inconsistency. The boolean tensor test has been updated to cover all
three cases.

### Motivation and Context
Fixes #15509.
2023-06-05 09:58:50 -07:00
PeixuanZuo
a95f8ae53c
[ROCm] Update ROCm/MIGraphX CI pipeline (#16215)
MIGraphX CI

- Change docker container user name to `onnxruntimedev`

ROCm CI

- Build docker image every job instead of using prebuild image.
- Every job create a container with only one GPU with command `docker
run -it --device=/dev/kfd --device=/dev/dri/renderDxxx`
- Remove tests that are unstable or use outdated interfaces.
- Enable training ortmodule test.
2023-06-05 10:28:10 +08:00
ashari4
18c97381cd
Detect fake tensor mode if it has already been created. (#16220)
### Description
<!-- Describe your changes. -->

Detect fake tensor mode if it has already been created. Follows this
example in pytorch:
86c7652503/torch/_inductor/compile_fx.py (L280)


### Motivation and Context
As of torch nightly 6/2/23, when trying to run a torch dynamo graph on
the ORT backend, we observe

```
E           torch._dynamo.exc.BackendCompilerFailed: backend='compiler_fn' raised:
E           AssertionError: Mixing fake modes NYI
E           
E           
E           You can suppress this exception and fall back to eager by setting:
E               import torch._dynamo
E               torch._dynamo.config.suppress_errors = True
```
The issue is that `ort_backend.py` creates a new fake tensor mode even
though one has already been created by torch.
2023-06-02 23:17:49 -07:00
Somdev Sangwan
2e66bc8669
prevent object destruction compile error (#16134)
### Description
The proposed fix is to store the result of AsBlockSparse() in a variable
to ensure the object isn't destroyed until the end of the current scope.


### Motivation and Context
"own_buffer_tensor" is a temporary object that is destroyed at the end
of the expression and causes a compile error.
2023-06-02 11:19:53 -07:00
Changming Sun
6b5b79872b
Avoid taking dependency on dl.fedoraproject.org (#16202)
### Description
1. Avoid taking dependency on dl.fedoraproject.org
The website is not very stable. Our build pipelines often fail to fetch
packages from there.

2. Update manylinux to the latest version
2023-06-02 07:41:46 -07:00
Changming Sun
7686193c40
Fix DNNL build (#16201) 2023-06-02 09:46:03 +08:00
Yulong Wang
319a0dc6aa
[js/doc] allow deduplicate opset version (#16182)
### Description
allow deduplicate opset version in generated document
webgpu-operators.md
2023-06-01 17:28:08 -07:00
Dale Phurrough
6e1c3003ff
DML EP and MLAS buffer allocator - increase alignment to 64 bytes for AVX-512 processing (#15141)
Fixes #13119 top concerns by

* using `onnxruntime::AllocatorDefaultAlloc` instead of `malloc`
* set `MLAS_DEFAULT_PREFERRED_BUFFER_ALIGNMENT=64` which cascades that
value
  to several members and functions not directly related to MLAS.

### Motivation and Context

* Fixes #13119 top concerns. Otherwise, alignment is to 16 bytes circa
1990s 👴
* Does not yet enable flexible alignment. Instead fixed at 64 (64 x 8
bits=512 bits) for modern NN hardware like AVX-512
2023-06-01 16:32:55 -07:00
Adrian Lizarraga
5a4c3b7937
[QNN EP] Support Equal, Less, LessOrGreater, Greater, GreaterOrEqual operators on HTP backend (#16171)
### Description
- Updates QDQ transformer to handle QDQ logical operators (Equal, Less,
LessOrEqual, Greater, GreaterOrEqual).
  - Expects 2 DQ inputs and no Qs in the output, which is boolean.

### Motivation and Context
This is needed to enable QDQ models with logical comparison operators to
run on QNN EP.
2023-06-01 15:07:15 -07:00
Hector Li
f72dc198c6
[QNN EP]Add UT for cached Qnn context binary (#16184)
### Description
1. Add UT for cached Qnn context binary
2. Minor change: set model path to "" if model_path is not available
since the model could be loaded from buffer instead of Onnx file

### Motivation and Context
support more scenario

---------

Co-authored-by: Adrian Lizarraga <adlizarraga@microsoft.com>
2023-06-01 14:28:46 -07:00
Changming Sun
5bfa1183d1
Add a Memory Profiling build job in post merge pipeline (#16172)
### Description
1. Add a Memory Profiling build job
2. Remove no absl build job since the feature will be removed
3. Simplify post-merge-jobs.yml by unifying the pool names

### Motivation and Context
To catch build errors in #16124
2023-06-01 13:00:44 -07:00
Alexander Visheratin
e6c6184fee
[JS/WebGPU] Unsqueeze operator implementation (#16138)
### Description

This PR adds an implementation of the Squeeze operator to WebGPU JSEP.
The implementation follows the [operator
schema](https://github.com/onnx/onnx/blob/main/docs/Operators.md#Unsqueeze).

To implement the `Unsqueeze` operator in the same fashion as the
`Squeeze`, I added the `ComputeOutputShape()` method to the
`UnsqueezeBase` class and made some slight modifications. Please let me
know if it is a bad idea and if I should move this method to the JS
implementation.

I also uncommented test case lines in the `suite-test-list.jsonc` file
for both Squeeze and Unsqueeze operators following @hariharans29's
[comment](https://github.com/microsoft/onnxruntime/pull/16024#issuecomment-1565113633).

### How was it tested

1. I created a model with only one operator:

```Python
import onnx.helper

node = onnx.helper.make_node(
    "Unsqueeze",
    inputs=["T", "axes"],
    outputs=["y"],
)
graph = onnx.helper.make_graph([node], "test", [onnx.helper.make_tensor_value_info("T", 1, [3, 4, 5]), onnx.helper.make_tensor_value_info("axes", 7, [2])], [onnx.helper.make_tensor_value_info("y", 1, [3, 1, 4, 5, 1])])
onnx.save(onnx.helper.make_model(graph), "unsqueeze.onnx")
```

2. I compiled the runtime using @fs-eire's
[instructions](https://gist.github.com/fs-eire/a55b2c7e10a6864b9602c279b8b75dce).
3. I ran the test models in the browser using this minimal setup:
```HTML
<html>
    <script src=".\dist\ort.webgpu.min.js"></script>
    <script>
        async function run() {
            const session = await ort.InferenceSession.create('unsqueeze.onnx', {executionProviders: ['webgpu']});
            console.log(session);
            const input = new ort.Tensor('float32', new Float32Array(60), [3, 4, 5]);
            const dim = new ort.Tensor('int64', [1n, 4n], [2]);
            const output = await session.run({ "T": input, "axes": dim });
            console.log(output);
        }
        run();
    </script>
</html>
```

### Motivation and Context

Improve operator coverage for WebGPU JSEP.
2023-06-01 12:23:02 -07:00
Changming Sun
5b08176314
Exclude shufflenet from DNNL's model tests (#16126) 2023-06-01 10:56:24 -07:00
FFFrog
d185bf444d
[CANN] Add IOBinding Support For CANN EP (#15802)
### Description
Add IOBinding Support For CANN EP

### Motivation and Context
Now, Users can use IOBinding feature to speed up the inference on CANN.
2023-06-01 03:13:38 -07:00
FFFrog
8c85d990c2
add third-party pipeline status to README.md (#16155)
Refer to this
[issue](https://github.com/microsoft/onnxruntime/issues/16154), please.
2023-05-31 22:14:39 -07:00
PeixuanZuo
1b518c6836
[ROCm] add early stop to tunable profile progress (#15716)
For TunableOp, some instance may has very bad performance and it will
take a long time during profile process.
Add `tunable_op_max_tuning_duration_ms` parameter to limit max tuning
time.
2023-06-01 10:18:25 +08:00
pengwa
65b316a138
Consolidate ORTModule logging (#16078)
### Consolidate ORTModule logging

There are few improvements for ORTModule loggings:
- All ORTModule logging are used logger that is initialized in
`ortmodule.py`.
- Manage all export logs same way, e.g. use `
_logger.suppress_os_stream_output(log_level=self._debug_options.logging.log_level)`
to control exporting related logs suppressing or not. If any warning or
errors suppressed, `self._warning_log_detected_during_export` will be
set to True, then when we log ORTModule feature matrix, we will also
told users there are logs suppressed.
- Downgrade some warnings. We had some warnings for years, and looks
many models have them by default, no action we actually can take, so
downgrade them to make user logging cleaner.
- PyTorch export requires update of custom export function signature
changes, otherwise, _symbolic_context_handler complains with warnings,
so update custom export function adaption for version >=1.13 PyTorch.
- Add ORTModule feature matrix summary, **this is supposed to be only
places users see our logs by default** (unless they use INFO or
VERBOSE). Features ON/OFF states are shown clearly to them in case they
want to try some features in OFF states. This logs only shows up in rank
0 (if there are multiple rank), the intention is we want user to see a
useful and clean output from ORTModule by default. The outputs shown as
below:



![image](https://github.com/microsoft/onnxruntime/assets/10530022/9c6653ac-50fa-4b2d-ba7f-4d5ce44b25b2)


![image](https://github.com/microsoft/onnxruntime/assets/10530022/10dff5a9-2d46-4646-a4b4-2c515566376e)


- `reinitialize_ortmodule` in util.py is only used by ortmodule.py,
moving it into ortmodule.py, then utils takes no dependency on
`orttraining/orttraining/python/training/ortmodule/_custom_op_symbolic_registry.py`,
then `_custom_op_symbolic_registry.py` can call functions defined in
utils.py (without recursively include).
2023-06-01 10:09:12 +08:00
Changming Sun
d19e5c0abb
Fix a misaligned error in CUDA GEMM (#16130)
### Description

Fix an issue that FusedMatMulOpTest.FloatTypeTransposeBatch fails to run on GPUs with TF32 support. 


Authored-by: Tianlei Wu <tlwu@microsoft.com>
2023-05-31 18:10:17 -07:00
Yulong Wang
f67f7c0f0b
[js/web] disable node fallback in webpack (#16166)
### Description
disable webpack's polyfill for node's `global`, `__filename` and
`__dirname` in web build. This will confuse emscripten generated
environment detection.

see https://webpack.js.org/configuration/node/
2023-05-31 16:47:00 -07:00
cao lei
13d6ac74de
fix memory profile build (#16177)
### Description
<!-- Describe your changes. -->
This PR is to fix the build break when onnxruntime_ENABLE_MEMORY_PROFILE
is on


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR is to fix the build break when onnxruntime_ENABLE_MEMORY_PROFILE
is on.
It fixes this issue
https://github.com/microsoft/onnxruntime/issues/16124

Co-authored-by: Lei Cao <leca@microsoft.com>
2023-05-31 16:08:14 -07:00
dependabot[bot]
a55637a103
Bump socket.io-parser from 4.2.2 to 4.2.3 in /onnxruntime/test/wasm (#16067) 2023-05-31 21:55:00 +00:00
Aung T Naing
3cca32beec
[QNN EP] exapand convolution test coverage. (#15975)
### Description
<!-- Describe your changes. -->
Convolution with Padding and Convolution with large inputs,outputs.




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This is mainly to check the CPU vs QNN EP output mismatch for models.

./onnxruntime_test_all --gtest_filter=*.TestQDQConvU8U8S32*
Failed tests with mismatch.
[  FAILED  ] 2 tests, listed below:
[ FAILED ]
QnnHTPBackendTests.TestQDQConvU8U8S32_large_input1_padding_bias_initializer
[ FAILED ]
QnnHTPBackendTests.TestQDQConvU8U8S32_large_input2_bias_initializer


./onnxruntime_test_all --gtest_filter=*.TestCPUConvf32_*
[ FAILED ]
QnnCPUBackendTests.TestCPUConvf32_large_input1_pad_bias_initializer
2023-05-31 10:12:35 -07:00
Yi Zhang
e0199cfbd9
extend mac packaging timeout limit (#16173)
### Description

### Motivation and Context
MacOS_py_wheels are often failed due to timeout
2023-05-31 18:31:28 +08:00
Yulong Wang
ba5f5e3198
[js] allow manually release inference session (#16169)
### Description
This change adds a new instance function (method) to type
`InferenceSession` to allow users to manually release an inference
session instance.

#16131 depends on this change to work correctly.
2023-05-31 00:31:38 -07:00
PeixuanZuo
3dc5179a36
[ROCm] Change ortmodule test (#15884)
Change ortmodule test because rocm ep behaves differently than cuda.
The warning from torch `The first argument to symbolic functions is
deprecated in 1.13 and will be removed in the future. Please annotate
treat the first argument (g) as GraphContext and use context information
from the object instead.` appears twice on ROCm EP.

On ROCm EP, the log is shown as below:
```
The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead.
The first argument to symbolic functions is deprecated in 1.13 and will be removed in the future. Please annotate treat the first argument (g) as GraphContext and use context information from the object instead.
User Module's attribute name _torch_module collides with ORTModule's attribute name. User Module's attribute may not be returned when trying to retrieve the attribute through ORTModule.
User Module's attribute name load_state_dict collides with ORTModule's attribute name. User Module's method may not be called upon invocation through ORTModule.
```
2023-05-31 15:14:10 +08:00
dependabot[bot]
03216e2313
Bump socket.io-parser from 4.2.2 to 4.2.3 in /js/web (#16068) 2023-05-31 02:15:23 +00:00
Baiju Meswani
7edc4b105d
Copy missing training header files to the package archive (#16119) 2023-05-30 16:45:40 -07:00
RandySheriffH
2802614846
Condition the usage of variadic callback by version (#16112)
For older versions of custom ops, optional and variadic callbacks are
null pointers, hence adding conditions to scope the usage.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-05-30 16:43:22 -07:00