Commit graph

9025 commits

Author SHA1 Message Date
Ryan Hill
d5b606d50d
Remove now duplicated symbol (#16458)
### Description
Change #16161 broke rocm by duplicating this symbol. This removes the
duplicate to unblock the tests.
2023-06-23 09:21:03 -07:00
Chen Fu
5c125b4366
Cfu revertamx (#16455)
### Description

This is to revert two PRs that aim at reducing AMX toolchain
requirements. Unfortunately we still have some pipeline issues.

https://github.com/microsoft/onnxruntime/pull/16390
https://github.com/microsoft/onnxruntime/pull/16086

### Motivation and Context

Looks like gcc link time optimization does not work very well with
inline assembly in the above PRs.
2023-06-23 09:20:23 -07:00
Rachel Guo
04dbdc96bf
[js/rn] Fix React Native CI pipeline E2E test (#16447)
### Description
<!-- Describe your changes. -->

Based on this kindly provided quick fix:
https://github.com/microsoft/onnxruntime/pull/16411

See more description in the above linked pr about bumping AGP version,
etc.

Also fixed import header file path in detox e2e test.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Good build:

https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1041757&view=logs&j=de302ec2-2305-57e0-e8c6-cd89c569f2a3&t=9894c870-b8ce-548d-51ff-8f44d21a4117&l=18
2023-06-22 14:33:49 -07:00
Baiju Meswani
10ba1e270c
Minimal Build for On-Device Training (#16326)
🛠️ __Changes in this pull request:__

This pull request introduces two significant changes to the project:

- Changing on device training checkpoint format: The current
implementation stores the on device training checkpoint as a sequence of
tensors in multiple files inside a checkpoint folder, which can be
inefficient in terms of storage and performance. In this PR, I have
modified the checkpoint format to utilize the flatbuffer table to save
the checkpoint to a single file, providing a more compact and efficient
representation. The changes around this are twofold:
- Add the checkpoint flatbuffer schema that will generate the necessary
checkpoint source files.
- Update the checkpoint saving and loading functionality to use the new
format.

- Adding support for onnxruntime minimal build: To support scenarios
where binary size is a constraint, I made changes to ensure that the
training build can work well with the minimal build.

🔍 __Open Issues:__
- In order to extract the optimizer type, the existing implementation
re-loaded the onnx optimizer model and parsed it. This is no longer
possible, since the model format can either be onnx or ort. One idea is
to do the same for ort format optimizer model. This needs some
investigation.
- Changes to the offline tooling to generate ort format training
artifacts.
- End-to-end training example showcasing the use of the minimal training
build.
- Add support for export model for inferencing in a minimal build.
2023-06-22 12:27:23 -07:00
dependabot[bot]
97f4484df9
Bump actions/setup-python from 3 to 4 (#16404) 2023-06-22 18:12:11 +00:00
Yi Zhang
8e8840f1de
Enable Web CI on Linux (#16419)
### Description
1. Enable Web ci on Linux

### Motivation and Context
1. speed up web ci, the duration can be reduced from 160 minutes to 130
minutes, a time saving of 20% could be be achieved.
The total computation time is 455 minutes now. Moved to Linux, it could
be reduced to 336 minutes.
2. It's the first step to enable compilation cache for emscripten
3. per Yulong's request, build_web stages are still using windows pool


![image](https://github.com/microsoft/onnxruntime/assets/16190118/c9496408-74bd-45ea-b4ae-a4dd2a574d17)


https://dev.azure.com/onnxruntime/onnxruntime/_build/results?buildId=1038382&view=results
2023-06-22 15:42:58 +08:00
Pranav Sharma
a270d8407e
Allow saving of large models after optimization (github issue 12882) (#16440)
### Description
Allow saving of large models after optimization.

### Motivation and Context
Addresses https://github.com/microsoft/onnxruntime/issues/12882
2023-06-21 22:46:26 -07:00
Yufeng Li
89f8f20a61
fix protobuf copyfrom 2G limit (#16422)
### Description
<!-- Describe your changes. -->
protobuf CopyFrom doesn't work for model > 2GB for version 4.23. This PR
removes the copy for Calibrator.

Currently Calibrator copies the ModelProto to avoid changing it. The
reason is that: quantize_static passes a ModelProto to Calibrator to
calibrate quantitation parameters, and then use it for quantization. If
calibrator changes the ModelProto, quantizaiton won't work.

This PR changes quantize_static to pass in a model path to Calibrator
instead of a ModelProto, and make Calibrator only take in model path as
input, which is how it is used in most cases.

This PR also remove the optimization from quantization. User needs to
call pre-process to optimize the model
2023-06-21 20:45:11 -07:00
kunal-vaishnavi
4b69226fca
Fix input typo in Whisper export with beam search (#16439)
### Description
This PR fixes a typo with assigning the `repetition_penalty` input in
the Whisper export with beam search model. It is a follow-up to the
[export stabilization
PR](https://github.com/microsoft/onnxruntime/pull/16297).


### Motivation and Context
The `repetition_penalty` input should be set to `repetition_penalty`
instead of `input_features`.
2023-06-21 18:59:11 -07:00
Baiju Meswani
42489a8a24
Add ability to create ort format models from training offline utility (#16360) 2023-06-21 18:51:43 -07:00
yf711
0ad0d6ebbf
Unblock Linux MultiGPU TensorRT CI (#16446)
### Description
Revert docker base image to
nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04@sha256:b754c43fe9d62e88862d168c4ab9282618a376dbc54871467870366cacfa456e



### Motivation and Context
The default img env of nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04 has
minor upgrade, which make Linux MultiGPU TensorRT CI (NV12 instance with
Maxwell GPU) fail on three CApiTestGlobalThreadPoolsWithProvider
tests (these three tests have higher error which are above the tolerance)

That minor upgrade includes cudnn 8.7.0->8.9.0, which might be a factor
that make maxwell GPU generator higher error. CIs with T4 GPU are not
affected.
2023-06-21 17:15:39 -07:00
Yulong Wang
de476c8075
[js/web] update webgl context creating (#16436)
### Description
Modify the creating of webgl context.

Previous behavior:
STEP.1 - create canvas (document.createElement), if failed, goto step.2
else step.3
STEP.2 - create offscreenCanvas, if failed abort
STEP.3 - use the canvas created in step.1 or 2 to create webgl context.
if successful return context else abort

Now bahavior:
STEP.1 create offscreenCanvas, if failed goto step.3
STEP.2 use it to create webgl context. if successful, return context
STEP.3 create canvas  (document.createElement). if failed, abort
STEP.4 use it to create webgl context. if successful, return context
else abort

Motivation:
we found in some environment, normalCanvas.getContext() returns null but
offscreenCanvas.getContext() returns the context object. and when
offscreenCanvas is available it is good idea to always prefer to use it.
2023-06-21 17:10:26 -07:00
pallavides
3c2d52a995
DORT support for custom ops (#16392)
### Description
DORT support for custom ops


### Motivation and Context
Custom ops registered via custom_ops shared_library cannot be run using
DORT atm. This PR enables it using:
1. registering custom_ops supported in DORT
2. plumbing down session_options from OrtBackend when creating the
InferenceSession, that were used to register the custom_ops shared
library using
`session_options.register_custom_ops_library(shared_library)`
2023-06-21 15:58:05 -07:00
Yulong Wang
da532f3f5a
[js/webgpu] fix GPU to GPU memcpy (#16393)
### Description
Fixes a GPU to GPU memory copy bug which causes #16267
2023-06-21 15:50:08 -07:00
Tianlei Wu
52e2bdf541
Add license header to CUDA related files (#16437)
Add license header for files under core/providers/cuda or contrib_ops/cuda/
2023-06-21 13:31:43 -07:00
RandySheriffH
6e29e185f3
Clean AzureEP logics (#16367)
Moving out AzureEP invokers out of core runtime.

---------

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-06-21 09:38:52 -07:00
Chi Lo
4e3cff60fd
CUDA graph support for TRT EP (#16081)
CUDA EP already supports [CUDA
graph](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#cuda-graphs),
also we observed some models can benefit from using CUDA graph with
`trtexec`. Therefore, this PR enables the CUDA graph support for TRT EP.

The implementation is based on
https://github.com/microsoft/onnxruntime/pull/9978 with the same
[constraints](https://github.com/microsoft/onnxruntime/pull/9978) as
below:

- Models with control-flow ops (i.e. If, Loop and Scan ops) are not
supported.
- Usage of CUDA Graphs is limited to models where-in all the model ops
(graph nodes) can be partitioned to the TRT EP.
- The input/output types of models need to be tensors.
- Shapes of inputs/outputs cannot change across inference calls.
- IObinding is required.
2023-06-21 09:36:45 -07:00
Rachel Guo
961fa7274a
[NNAPI doc] add reducemean to supported op list (#16414)
### Description
<!-- Describe your changes. -->

As title.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-21 00:29:20 -07:00
Rachel Guo
b4b126ffb0
Set onnxruntime-c local pod path environment variable for react native e2e tests on ci (#16431)
### Description
<!-- Describe your changes. -->
Set onnxruntime-c local pod path environment variable for react native
e2e tests on react-native-ci.yml


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Previously the E2E test project is not properly consuming a local built
onnxruntime-c version pod.


https://github.com/microsoft/onnxruntime/pull/16411#issuecomment-1598512816
2023-06-21 00:27:36 -07:00
yf711
9a80955c45
Add compute capacity to trtep engine cache file (#16356)
### Description

Add "_smXX" to trtep engine cache file name, which "sm" stands for
"Streaming Multiprocessor".

> The GPU compute capability version is prefixed with "SM" because
NVIDIA typically improves and updates the SM in each new GPU
architecture.

### Motivation and Context

Github issue: https://github.com/microsoft/onnxruntime/issues/15982

Reduce the chance of misusing incompatible engine cache, when user is
switching GPU devices with different compute capacity

* The prevention can't be 100%, as model size & GPU memory size could be
another factor to make cache incompatible
2023-06-20 23:34:24 -07:00
zesongw
64b22cd00f
[WebNN EP] Support Where Op (#16380)
### Description
Add Where Op for WebNN EP as ternary conditional operator.

---------

Co-authored-by: Dwayne Robinson <dwayner@microsoft.com>
2023-06-20 16:45:40 -07:00
Justin Chu
e2381c42f2
Use M_PI to replace 3.14 constants (#16421)
### Description

Use M_PI to replace 3.14 constants

### Motivation and Context

Fixes #16413
2023-06-20 15:09:10 -07:00
Yuhong Guo
48e6186b1a
Move tests from core/providers/cuda/test/* to test/providers/cuda/ and refactor CUDA UT (#16161)
### Description
<!-- Describe your changes. -->

1. Add a new test lib `onnxruntime_providers_cuda_ut` which is similar
to `onnxruntime_providers_cuda` but `onnxruntime_providers_cuda_ut` is
only built if `onnxruntime_BUILD_UNIT_TESTS` is set. We can call all
CUDA UTs through this ut lib without affecting production lib
`onnxruntime_providers_cuda`.
2. Move all test cases from `core/providers/cuda/test/` to
`test/providers/cuda/`. These test cases are built into lib
`onnxruntime_providers_cuda_ut` and run by `./onnxruntime_test_all
--gtest_filter="*CUDA_EP_Unittest*"`. Since the lib is only for test, we
can use gtest macros in the test cases. Previous implementation do not
support using gtest lib in the CUDA UT cases.
3. The cmake code in `cmake/onnxruntime_providers.cmake` is refactored a
bit. A new function `onnxruntime_add_object_library` is to build a
object target. The 2 libs `onnxruntime_providers_cuda_ut` &
`onnxruntime_providers_cuda` share most of the code, so the object files
can be used in both libs, which helps reduce build time. Another
function `config_cuda_provider_shared_module` is used to configure all 3
similar
targets(onnxruntime_providers_cuda_obj/onnxruntime_providers_cuda/onnxruntime_providers_cuda_ut).
4. Refactored the test to call `testing::InitGoogleTest` &
`RUN_ALL_TESTS` in `libonnxruntime_providers_cuda_ut.so`'s `TestAll`.
After this change, we can see all the cases running in
`CUDA_EP_Unittest.All`:

![image](https://github.com/microsoft/onnxruntime/assets/19584326/8ff80df6-060b-4ef0-90b7-657e68d3db87)




### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

After https://github.com/microsoft/onnxruntime/pull/13016, there are
still test files in test/providers/cuda/ that are not moved to
core/providers/cuda/test/ and the test cases are disabled. This PR helps
to clean the unfinished TODOs.

Even through onnxruntime_shared_lib_test covers some test for CUDA
provider. onnxruntime_shared_lib_test works like a coarse grain
end-to-end test for CUDA provider. If CUDA unittest can run cases for a
single component, this wound be helpful for CUDA developers.

---------

Co-authored-by: Yuhong Guo <yuhong.gyh@antgroup.com>
2023-06-20 14:54:55 -07:00
Aung T Naing
e83993bbaf
Added MatMul tests for QNN EP (#15956)
### Description
<!-- Describe your changes. -->
Added test coverage for QNN EP MatMul op


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Created test coverage for HTP based MatMul with broadcasting.

---------

Co-authored-by: Hector Li <hecli@microsoft.com>
2023-06-20 13:58:56 -07:00
Sheil Kumar
7b51a1b17d
Enable Microsoft.AI.MachineLearning NuGet with WinUI projects (#16415)
Microsoft.AI.MachineLearning NuGet fails to build on WinUI projects due
to the conflict between the ReferenceCopy of binaries that occurs with
managed applications, with the manual binplacement that occurs with
native appliactions.

With WinUI, both cases are triggered, and a duplicate binplace is
detected as an error.

Fix: Don't rely on the ReferenceCopy for WinUI applications, and
manually binplace the Microsoft.AI.MachineLearning dll.
2023-06-20 13:10:19 -07:00
Wei-Sheng Chin
c8de3eaac6
Update DORT to follow PyTorch changes (#16394)
Fix #16355. The root cause change in PyTorch is
[#103302](https://github.com/pytorch/pytorch/pull/103302), which seem
blocking calling make_fx inside a dynamo backend.

Changes:
1. Move decomposition to `register_backend.py`, so we don't have to call
`make_fx` inside DORT, which triggers a bunch of new exceptions.
2. Remove shape inference based on FakeTensorProp since the FX graph
received from dynamo contains all shapes now.
3. Fix a macro bug so that DORT can build without CUDA.

Before (3),
```
#if defined(USE_CUDA) || defined(USE_ROCM)
  virtual PhiloxGenerator& PhiloxGenerator__Default() = 0;
#ifdef ENABLE_TRAINING_TORCH_INTEROP
...
#endif
#endif
```
After (3),
```
#if defined(USE_CUDA) || defined(USE_ROCM)
  virtual PhiloxGenerator& PhiloxGenerator__Default() = 0;
#endif
#ifdef ENABLE_TRAINING_TORCH_INTEROP
...
#endif
```
The later one looks better since the `ENABLE_TRAINING_TORCH_INTEROP` is
for Python bridge code, not for random-number-generating kernels
`PhiloxGenerator`.
2023-06-20 12:06:50 -07:00
Rachel Guo
30bb0959dc
[NNAPI EP] Add ReduceMean Op support (#16294)
### Description
<!-- Describe your changes. -->

As title.

Special cases for ReduceMean:
[UPDATE] The following cases are supported now by converting to
providing an input with all axes for NNAPI.
Behaviors when axes is not provided or axes provided as an empty vector:
For ReduceMean Opset version 18:
- Support case `axes` is provided as empty with `noop_with_empty_axes`
set to true.
- Support case `axes` is not provided with `noop_with_empty_axes` set to
true.
All treat as identity op.
- Does not support the case when `axes` is not provided/provided as
empty but `noop_with_empty_axes` is set to false.

For ReduceMean OpSet Version 13-:
- Does not support when `axes` attribute is not provided. (as onnx
treats it as default behavior to reduce all dimensions, and the case is
not implemented by NNAPI.)


https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaabbe492c60331b13038e39d4207940e0a047fe95a35b27f45c05432b6ca18eb6c

> 1: A 1-D Tensor of
[ANEURALNETWORKS_TENSOR_INT32](https://developer.android.com/ndk/reference/group/neural-networks#group___neural_networks_1ggaf06d1affd33f3bc698d0c04eceb23298ac34965d8e76ac5acfddf5acd9e40f896).
The dimensions to reduce. Must be in the range [-rank(input_tensor),
rank(input_tensor)).NOTE: When the operation was introduced, the
documentation incorrectly stated that if dimensions were empty, the
operation would reduce across all dimensions. This behavior was never
implemented.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Fixes issue #16194

---------

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2023-06-20 11:09:00 -07:00
Yufeng Li
d190db7fcd
Update default external_data_location for pre-process of quantization (#16399)
external_data_location should be a string/file_name to indicate the file
name of external data instead of a directory
2023-06-20 09:37:17 -07:00
Yulong Wang
b8917ad84f
[js/web] fix nodejs detection (#16400)
### Description
We used to use `typeof fetch === 'undefined'` as condition to detect the
environment is Node.js or not. Before Node.js v18, this works. However,
in Node.js v18, it introduced `fetch` function, so this check does not
work any more.

This PR changes the condition to check whether `process`,
`process.versions` and `process.versions.node` exists.

Checking whether `process` exists is not enough. This is because in some
configuration, webpack may polyfill nodejs's process.
2023-06-20 00:20:58 -07:00
Prateek Chokse
12dffef768
added support for cmake "find_package" (#8919)
**Description**: 
Adds support for cmake find_package.

**Motivation and Context**
As mentioned in issue #7150 onnxruntime doesn't have support for CMake
find_package, this PR adds that and also adds the CMake package version
file. Now anyone can link onnxruntime like this:
```cmake
find_package(onnxruntime)
add_executable(test Source.cpp)
target_link_libraries(test PRIVATE onnxruntime::onnxruntime)
```
this also simplifies #3124
2023-06-19 22:20:31 -07:00
cao lei
dd72192cf4
ExecutionProvider API refactor - move allocator from EP level to SessionState level and indexed by OrtDevice (#15833)
### Description
This PR is to refactor ExecutionProvider API for memory management,
which is to move allocators from EP level to SessionState level and
indexed by OrtDevice



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
This PR is to refactor ExecutionProvider API for memory management,
which is to move allocators from EP level to SessionState level and
indexed by OrtDevice. By this change, EP level will shift the burden of
maintaining allocators, which will be user friendly for EP developers

---------

Co-authored-by: Lei Cao <leca@microsoft.com@orttrainingdev8.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:44:45 -07:00
jingyanwangms
5dcaf70501
Adding this set_to_none flag to zero_grad to have signature parity with pytorch Adam (#16375)
### Description
torch.optim Adam zero_grad() signature is
zero_grad(set_to_none=True)

https://pytorch.org/docs/stable/generated/torch.optim.Adam.html#torch.optim.Adam.zero_grad

We set this flag in initialization, similar to deepspeed:
https://deepspeed.readthedocs.io/en/latest/optimizers.html#deepspeed.ops.adam.FusedAdam

Adding this flag to have signature parity with pytorch Adam

### Motivation and Context
Easier model integration

Co-authored-by: Jingyan Wang <jingywa@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2023-06-19 17:27:41 -07:00
PeixuanZuo
470d6c1cce
[ROCm] Delete unused file to fix Component Governance Alert (#16407)
Delete unused file to fix Component Governance Alert
2023-06-19 11:28:32 -07:00
guyang3532
341484e67c
Embedding sparsity optimization (#16141)
### Description
Optimize compute graph by eliminating padding in embedding.


### Motivation and Context
The computation for padding in nodes after embedding is unnecessary and
waste computation resources.
This pr just add an Optimizer of PaddingElimination to check and
eliminate the padding after embedding automatically by modifying the
graph.

### Implementation:
1. Find and check embedding node in graph.
2. Iterate the subgraph afterward the embedding node and record all the
input nodes and output nodes to this subgraph.
3. Insert 'Reshape + ShrunkenGather' to flatten each input node shape
from [batch_size, seqlen, ...] to [valid_token_without_padding, ...],
and insert 'GatherGrad + Reshape' to unflatten each output node shape
from [valid_token_without_padding, ...] to [batch_size, seqlen, ...]

---------

Co-authored-by: mindest <linminuser@gmail.com>
2023-06-19 20:34:53 +08:00
PeixuanZuo
1418d8728c
[ROCm] Fix CI Pipeline (#16409)
1. add `set -ex` before commands.
2. update ccache.
2023-06-19 15:22:13 +08:00
Yi Zhang
8b9eab093b
keep symlinks in maven package (#16376)
### Description
1. Keep symlink in the package.
2. keep the artifact package format

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-06-19 09:41:39 +08:00
Dipanjan Sengupta
35fa6af428
Fix for the build break in AMX feature on Mac OS. (#16390)
### Description
Fixing the build break issue in Apple pipeline due to AMX flag removal.
2023-06-16 21:00:41 -07:00
Scott McKay
8fdfd20191
Separate out operator vs model testing. (#16228)
### Description
<!-- Describe your changes. -->
Split up OpTester to separate out operator vs model testing. This led to
a lot of other cleanups/refactoring.

- create BaseTester class and derived OpTester/ModelTester classes to
limit APIs to what is applicable for each test type
  - e.g. adding an attribute isn't relevant to a model test
- cleanup structure
- don't expose member variables either directly or via public methods
returning them
  - split out checkers so they can be easily re-used
- refactor so there's one public Check method for comparing two OrtValue
instances containing any data type
  - refactor the GradientOpTester usage
- it required a lot of OpTester internals to be exposed and no other
tests needed this
- it also returned Status through various parts which prevented the
usage of the google test macros which provide better output. change to
return void and use the macros.
- fix some other minor issues
  - update some cmake files so all the source files are included
  - remove some low value helpers (FetchTensor and GetShapeVector)
- remove some outdated code to allow unreleased opset versions from when
onnx opset 15 wasn't released
  - move files from test/util/include/test to test/util/include
- doesn't seem to be any reason for the additional subdirectory given
they're not files use to test the code in test/util
    - files were moved with no changes
    
### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Cleanup test infrastructure.

---------

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-06-17 12:58:57 +10:00
saurabh
a6ce7b339f
Enable model subgraph execution in OVEP and setting the OpenVINO dll's to the path from the OpenVINO pypi packge in OVEP and fix OVEP windows io buffer sample (#16147)
### Description
This PR enables execution of subgraphs in OVEP and currently, when OVEP
developers install the onnxruntime-openvino package on windows from
pypi, they would have to additionally download OpenVINO windows binaries
and run the setupvars.bat script which sets the environment PATH to
locate the OV dll's. Also this PR fixes issues of OVEP windows io buffer
sample.



### Motivation and Context
Fix: We want to make the user experience easy for OVEP Python developers
on windows platform.
This fix, introduces a function add_openvino_libs_to_path at the
location tools/python/util/add_openvino_win_libs.py.
The above function, can be called by OVEP python users in the
application code and that takes care of setting
the OpenVINO dll's to the path from the OpenVINO pypi packge (openvino)
which was installed.
This change also makes sure that add_openvino_libs_to_path() function is
added to onnxruntime python package
only when it is build for OpenVINO Execution Provider for ONNXRuntime
and not for default ORT python package builds.

New user experience for Python OVEP developers on windows platform:
step 1: pip install onnxruntime-openvino
step 2: pip install openvino
step 3: <Add these 2 lines in the application code>
import onnxruntime.tools.add_openvino_win_libs as utils
utils.add_openvino_libs_to_path()

---------

Signed-off-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: MaajidKhan <n.maajid.khan@intel.com>
Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com>
2023-06-16 19:47:09 -07:00
kunal-vaishnavi
3f7f90aed0
Stabilize Whisper export with beam search (#16297)
### Description
This PR stabilizes the Whisper export with beam search by adding the
following:
- Remove unused ONNX models and extra folders generated during the
export process
- Specify the Whisper with beam search model's IR version for E2E
integration
- Parity check for Whisper with beam search model between PyTorch and
ORT
- Remove previously exported Whisper with beam search model before
saving newly exported model


### Motivation and Context
- Removing the unused ONNX models and extra folders frees up disk space
after exporting and makes it easier to copy and move the output folder
to other environments.
- Specifying the IR version fixes an issue with generating the ONNX E2E
model
- Adding a parity check helps detect runtime issues during the export
process
- Removing the previously exported Whisper with beam search model
prevents the data file size from doubling when the newly exported model
is saved with the same filename
2023-06-16 18:56:52 -07:00
dependabot[bot]
dd660c054e
Bump transformers from 4.24.0 to 4.30.0 in /tools/ci_build (#16331) 2023-06-16 13:08:46 -07:00
zesongw
d813d991b1
[WebNN EP] Support Squeeze Op (#16361)
### Description
<!-- Describe your changes. -->
Adds support for the Squeeze Op to WebNN EP.
It shares the similar parameters as Unsqueeze, so they are merged.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Enable more models to run on WebNN EP.

---------

Co-authored-by: Dwayne Robinson <fdwr@hotmail.com>
2023-06-16 11:18:58 -07:00
Chi Lo
fbf08c4b4d
Fix minor TRT EP provider option issue (#16107)
Several TRT EP provider options are not included when calling
OrtApis::GetTensorRTProviderOptionsAsString().
This issue doesn't affect TRT EP, but when user calling above api to get
all the provider options will find some provider options not included in
the string.
2023-06-16 10:07:40 -07:00
Silvio Traversaro
4915191e63
Fix build of Python wheel on Windows with single-config generator (#16337)
### Description

Before this PR, the CMake code assumed that when on Windows a
multiple-config CMake generator was used, while on non-Windows there was
the assumption of a single-config CMake generator. After this PR this
information is obtained from the
[`GENERATOR_IS_MULTI_CONFIG`](https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html)
global CMake propery.



### Motivation and Context

I discovered this problem when building with Ninja generator on Windows,
but I guess this should fix problems also on non-Windows platforms when
using a multiple-config generator (such as Xcode on macOS or "Ninja
Multi-Config" on all platforms).

See
https://cmake.org/cmake/help/latest/prop_gbl/GENERATOR_IS_MULTI_CONFIG.html
for more info.
2023-06-16 09:17:49 -07:00
Jhen-Jie Hong
685816bb0a
[js/rn] Add executionProviders support (#16233)
### Description
<!-- Describe your changes. -->

This PR adds support for `executionProviders` option for react-native
package, support:

- Android: cpu / xnnpack / nnapi
- iOS: cpu / xnnpack /  coreml

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

In my case I want to enable Core ML / NNAPI EP for react-native project.
2023-06-16 19:38:41 +10:00
Jhen-Jie Hong
ea1a5cf920
[js/rn] Implement blob exchange by JSI instead of use base64 (#16094)
### Description
<!-- Describe your changes. -->

- Create `OnnxruntimeJSIHelper` native module to provide two JSI
functions
- `jsiOnnxruntimeStoreArrayBuffer`: Store buffer in Blob Manager &
return blob object (iOS: RCTBlobManager, Android: BlobModule)
  - `jsiOnnxruntimeResolveArrayBuffer`: Use blob object to get buffer
- The part of implementation is reference to
[react-native-blob-jsi-helper](https://github.com/mrousavy/react-native-blob-jsi-helper)
- Replace base64 encode/decode
  - `loadModelFromBlob`: Rename from `loadModelFromBase64EncodedBuffer`
  - `run`: Use blob object to replace input.data & results[].data

For [this
context](https://github.com/microsoft/onnxruntime/issues/16031#issuecomment-1556527812),
it saved a lot of time and avoid JS thread blocking in decode return
type, it is 3700ms -> 5~20ms for the case. (resolve function only takes
0.x ms)

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

It’s related to #16031, but not a full implementation for migrate to
JSI.

It just uses JSI through BlobManager to replace the slow part (base64
encode / decode).

Rewriting it entirely in JSI could be complicated, like type convertion
and threading. This PR might be considered a minor change.

/cc @skottmckay
2023-06-16 19:37:02 +10:00
cloudhan
9110e5b9bd
[ROCm] Add attention kv cache for decoding (#16076) 2023-06-16 14:17:56 +08:00
Tianlei Wu
96471491d7
Fix test failure in debug CUDA build (#16370)
Fix assertion failure in onnxruntime_test_all in debug build with CUDA,
which is caused by a test case added in
https://github.com/microsoft/onnxruntime/pull/16075.

Remove an assumption that bias exists in MultiHeadAttention.
2023-06-15 23:16:16 -07:00
Tianlei Wu
1866a9d818
Use the lowest float for causal mask (#16369)
Always set causal mask to the lowest float. 

Note that since huggingface transformers v4.21, gpt2 uses lowest half
for FP16, and lowest float for FP32:

66fd3a8d62/src/transformers/models/gpt2/modeling_gpt2.py (L199)
Assume that most fp16 ONNX models are converted from fp32 models. We
decided to use lowest float32 for both half and float model for
consistency.

The mask_filter_value only applies to raw attention mask (2D, 3D or 4D).
For 1D mask, masked item is 0.0 after softmax so mask filter value is
the lowest float for 1D mask.
* For BERT model, when users use 1D mask (required by FMHA) and
mask_filter_value is not applicable.
* For BERT or GPT-2, when fused kernel is used, mask_filter_value has no impact

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
https://github.com/microsoft/onnxruntime/issues/12843
https://github.com/microsoft/onnxruntime/issues/14363
2023-06-15 21:32:29 -07:00
PeixuanZuo
bcdb81c563
[Whisper] add a fusion option to split input bias from MHA/DMHA (#16049)
Whsiper model contains five different types of attention, q, k, v bias
was fused into Attention/MHA/DMHA op,

encoderdecoderinit subgraph
- Attention: encoder attention
- Attention: decoder self attention + present k, v
- MultiHeadAttention: decoder cross attention + present k and v. q and v
have bias.

decoder subgraph
- DecoderMultiHeadAttention: decoder cross attention + past k, v. q has
bias
- DecoderMultiHeadAttention: decoder self attention + past/present k, v.
q, k, v have bias.

For ROCm EP, MHA/DMHA doesn't support additional bias. This PR add a
fusion option `disable_multi_head_attention_bias` to split q.k,v bias
from MHA/DMHA.
2023-06-16 10:29:48 +08:00