Commit graph

8076 commits

Author SHA1 Message Date
Yi Zhang
80f807c03d
upgrade protobuf to 3.20.2 and onnx to 1.13 (#14279)
### Description
upgrade protobuf to 3.20.2, same as onnx 1.13.0

### Motivation and Context
Per component governance requirement and Fixes #14060

unused-parameter error occurs in 2 conditions.
1. compile protolbuf

`onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66:
error: unused parameter ‘prototype’ [-Werror=unused-parameter]`
2. include onnx_pb.h
```
2023-01-28T10:20:15.0410853Z FAILED: CMakeFiles/onnxruntime_pybind11_state.dir/onnxruntime_src/onnxruntime/python/onnxruntime_pybind_iobinding.cc.o 
......
2023-01-28T10:20:15.0466024Z                  from /build/Debug/_deps/onnx-src/onnx/onnx_pb.h:51,
2023-01-28T10:20:15.0466958Z                  from /onnxruntime_src/include/onnxruntime/core/framework/to_tensor_proto_element_type.h:10,
....
2023-01-28T10:20:15.0609678Z /build/Debug/_deps/onnx-build/onnx/onnx-operators-ml.pb.h:1178:25:   required from here
2023-01-28T10:20:15.0610895Z /onnxruntime_src/cmake/external/protobuf/src/google/protobuf/repeated_ptr_field.h:752:66: error: unused parameter ‘prototype’ [-Werror=unused-parameter]
2023-01-28T10:20:15.0611707Z cc1plus: all warnings being treated as errors

```

https://dev.azure.com/onnxruntime/2a773b67-e88b-4c7f-9fc0-87d31fea8ef2/_apis/build/builds/874605/logs/22
2023-01-31 12:55:09 -08:00
pengwa
e2dd1315c7
Fix build for --enable_language_interop_ops + DISABLE_ABSEIL=ON (#14469)
### Fix build error on Windows when building with "
--enable_language_interop_ops -cmake_extra_defines
onnxruntime_DISABLE_ABSEIL=ON"

This is a subsequent fix after
https://github.com/microsoft/onnxruntime/pull/14309, which fixed build
for onnxruntime_DISABLE_ABSEIL=ON build.

Going furthur, if we enable --enable_language_interop_ops, there are
following two errors:

```
 test_symm_qgemm.cpp
  test_transpose.cpp
onnxruntime_session.lib(inference_session.obj) : error LNK2019: unresolved external symbol "void __cdecl onnxruntime::L
oadInterOp(class std::basic_string<wchar_t,struct std::char_traits<wchar_t>,class std::allocator<wchar_t> > const &,cla
ss std::vector<struct Ort::CustomOpDomain,class std::allocator<struct Ort::CustomOpDomain> > &,class std::function<void
 __cdecl(char const *)> const &)" (?LoadInterOp@onnxruntime@@YAXAEBV?$basic_string@_WU?$char_traits@_W@std@@V?$allocato
r@_W@2@@std@@AEAV?$vector@UCustomOpDomain@Ort@@V?$allocator@UCustomOpDomain@Ort@@@std@@@3@AEBV?$function@$$A6AXPEBD@Z@3
@@Z) referenced in function "public: __cdecl <lambda_f3a907e0b0a0e11d80d305605215cce8>::operator()(class std::shared_pt
r<class onnxruntime::Model> &)const " (??R<lambda_f3a907e0b0a0e11d80d305605215cce8>@@QEBA@AEAV?$shared_ptr@VModel@onnxr
untime@@@std@@@Z) [C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_test_trainer.vcxproj]
onnxruntime_session.lib(inference_session.obj) : error LNK2019: unresolved external symbol "void __cdecl onnxruntime::L
oadInterOp(class onnx::ModelProto const &,class std::vector<struct Ort::CustomOpDomain,class std::allocator<struct Ort:
:CustomOpDomain> > &,class std::function<void __cdecl(char const *)> const &)" (?LoadInterOp@onnxruntime@@YAXAEBVModelP
roto@onnx@@AEAV?$vector@UCustomOpDomain@Ort@@V?$allocator@UCustomOpDomain@Ort@@@std@@@std@@AEBV?$function@$$A6AXPEBD@Z@
5@@Z) referenced in function "public: __cdecl <lambda_340b7b787b9c0f81848d348e60fe6c91>::operator()(class std::shared_p
tr<class onnxruntime::Model> &)const " (??R<lambda_340b7b787b9c0f81848d348e60fe6c91>@@QEBA@AEAV?$shared_ptr@VModel@onnx
runtime@@@std@@@Z) [C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_test_trainer.vcxproj]
C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\RelWithDebInfo\onnxruntime_test_trainer.exe : fatal error
LNK1120: 2 unresolved externals [C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_test_trainer.
vcxproj]
  onnxruntime.vcxproj -> C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\RelWithDebInfo\onnxruntime.dll
  onnxruntime_test_utils.vcxproj -> C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\RelWithDebInfo\onnxrun
  time_test_utils.lib
CUDACOMPILE : nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may
 be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [C:\Users\pengwa\dev\onnxruntime
\build\Windows\RelWithDebInfo\custom_op_library.vcxproj]
  cuda_ops.cu
CUDACOMPILE : nvcc warning : The 'compute_35', 'compute_37', 'sm_35', and 'sm_37' architectures are deprecated, and may
 be removed in a future release (Use -Wno-deprecated-gpu-targets to suppress warning). [C:\Users\pengwa\dev\onnxruntime
\build\Windows\RelWithDebInfo\onnxruntime_test_cuda_ops_lib.vcxproj]
```



```
  kernel_type_str_resolver_utils_test.cc
  local_kernel_registry_test.cc
C:\Users\pengwa\dev\onnxruntime\onnxruntime\test\framework\allocation_planner_test.cc(1388,9): error C2220: the followin
g warning is treated as an error [C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebInfo\onnxruntime_test_all.vcxp
roj]
C:\Users\pengwa\dev\onnxruntime\onnxruntime\test\framework\allocation_planner_test.cc(1388,9): warning C4067: unexpected
 tokens following preprocessor directive - expected a newline [C:\Users\pengwa\dev\onnxruntime\build\Windows\RelWithDebI
nfo\onnxruntime_test_all.vcxproj]
```


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-31 12:34:45 +08:00
Ankit
a5b620e79d [Build] Fix arm64 Docker build (#14283) 2023-01-30 16:25:19 -08:00
Wei-Sheng Chin
679ae7ff33
[Java] Fix warnings (#14076)
Fix C6011, C6385, C6386 found by Visual Studio. Basically, I set the
maximum number of options for every EP to 128. To my knowledge, 128 is
big enough to support all EPs.

For support arbitrary number of EP options, we probably need #13999 and
create a "std::vector"-like struct in C language.
2023-01-30 09:22:28 -08:00
Ashwini Khade
764202d740
fix prefast warning (#14446)
### Description
Fixes a prefast warning:
https://aiinfra.visualstudio.com/ONNX%20Runtime/_workitems/edit/11113



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-30 09:13:39 -08:00
cloudhan
3b6d551c35
Enable ccache for HIP objects (#14465)
This enables HIP compiler to be launched with `ccache` when build with `--use_cache`
2023-01-28 22:34:24 +08:00
Vincent Wang
7aecb2150f
Fix onnxruntime-CI-nightly-ort-pipeline Failure (#14464)
PyTorch skipped version 1.14 and jumped to 2.0, while the image for the
onnxruntime-CI-nightly-ort-pipeline is still using
nightly-ubuntu2004-cu116-py38-torch1140dev. Switch to the new torch
version image to fix the failure of the pipeline.
2023-01-28 16:05:56 +08:00
Vincent Wang
91d42e9d85
Tool to Convert ONNX Model to TFEvents (#14160)
A tool to convert ONNX model to tfevents so that we can use tensorboard
to open it for visualization. This is especially useful for debugging
when the ONNX model is too large to open by Netron.

usage: onnx2tfevents.py [-h] [--logdir LOGDIR] [--model MODEL]
2023-01-28 15:09:15 +08:00
Yulong Wang
d9219685ad
always set OpSchema in CreateNodeHelper() (#14356)
### Description
as a more generic solution to #13660, always set OpSchema in
CreateNodeHelper() so that added nodes by transformers will have
OpSchema set

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-27 16:56:14 -08:00
dependabot[bot]
b5b70eaa8c
Bump ua-parser-js from 0.7.31 to 0.7.33 in /js/web (#14435) 2023-01-27 23:22:48 +00:00
Zhang Lei
f87dd408f6
Support long sequence in attention (#14371)
Support long sequence in attention operator for (1) raw mask of 2/3/4-D,
(2) no mask.
Set longer greedy search max length.
2023-01-27 09:39:09 -08:00
shalvamist
368d2fc11e
Added E2E test for Image Tensor API (#14406)
### Description
Added E2E test - Currently covering -
URL --> Tensor
ImageData --> Tensor
HTML Image Element --> Tensor
Tensor --> ImageData

---------

Co-authored-by: shalvamist <shalva.mist@microsoft.com>
2023-01-27 08:54:27 -08:00
Wei-Sheng Chin
4ef64f3681
Fix warning c26409 (#14079)
We should avoid using `new` and `delete` in C/C++ code whenever possible
as suggested by VC compiler.
2023-01-26 15:43:53 -08:00
Yulong Wang
de11527d76
[js] fix js/web bundle (#14434)
### Description
make sure "crypto" is not processed by webpack for browser configuration
2023-01-26 14:43:09 -08:00
Rui Ren
eacd829d23
Bump ORT version number (#14226)
### Description
Bump ort version after the creation of release candidate of 1.14

Co-authored-by: ruiren <ruiren@microsoft.com>
2023-01-26 12:33:47 -08:00
Ye Wang
d9c744ed9a
Fix a bug in t5 beamsearch with half precision (#14436)
the CreateEncoderInputs functor was passed to the ctor as nullptr when
type is MLFloat16.

### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-26 11:14:22 -08:00
liqun Fu
2b1a59f01a
cpu support of LpPool(18) (#14205)
Signed-off-by: Liqun Fu <liqfu@microsoft.com>

### Description
To support LpPool (18)



### Motivation and Context
for Ort 1.14 release

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
2023-01-25 23:14:56 -08:00
Sumit Agarwal
edb377f2cb
[DML EP] Upgrade DML to 1.10.1 (#14433)
### Description
Updated DirectML version to 1.10.1
(https://www.nuget.org/packages/Microsoft.AI.DirectML/1.10.1)



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-25 21:07:10 -08:00
Pranav Sharma
3b8dfe2e27
Don't use free to satisfy Prefast requirements (#14354)
### Description
Don't use free to satisfy Prefast requirements

### Motivation and Context
Fix ADO#9004
2023-01-25 18:50:18 -08:00
Yulong Wang
4d9ddb5193
[js] upgrade packages in js/web/test/e2e (#14334)
### Description
upgrade versions to latest to avoid security vulerables.
2023-01-25 18:03:48 -08:00
Thiago Crepaldi
32c05fcdd1
Add Col2Im CPU op (#12311)
**Description**
This PR implements N-dimensional Col2Im as a contrib CPU Op as specified
by ONNX's https://github.com/onnx/onnx/pull/3948

**Motivation and Context**
- Col2Im enables models such as:
  - [SS-DCNet](https://github.com/xhp-hust-2018-2011/SS-DCNet)
  - [DSTT](https://github.com/ruiliu-ai/DSTT)
- It also serves to document the ORT's obscure `math::Col2ImNd` utility

Signed-off-by: Liqun Fu <liqfu@microsoft.com>
Co-authored-by: Liqun Fu <liqfu@microsoft.com>
2023-01-25 12:23:00 -08:00
Tianlei Wu
94b1791974
Upgrade CUTLASS to v2.11 and add sequence length threshold for cutlass FMHA (#14401)
### Description
Add sequence length threshold for triggering cutlass FMHA in FP32. See
performance test results in
https://github.com/microsoft/onnxruntime/pull/14343 to see how this
threshold is selected.

Upgrade cutlass to v2.11 and update deps.txt and cgmanifest for nuget
pipeline build (test build:
https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=268574&view=results)
2023-01-25 09:43:48 -08:00
Edward Chen
7cc9aed314
Android package custom build script update (#14403)
Update Android package custom build script.
- Use later version of various dependencies (CMake, JDK, Android command line tools, Android NDK, Ubuntu). The CMake version was too old for the current ORT code.
- Do in-container build in a directory that is not shared with the host. Resolves some file permission issues and speeds up file access.

Add a nightly build to make sure the script works with the latest ORT.
2023-01-25 09:19:05 -08:00
Edward Chen
3bc092b1ea
Update ORT format v5 change docs to cover limited backwards compatibility in 1.14. (#14413) 2023-01-25 08:23:12 -08:00
Adrian Lizarraga
85d7e9c596
Fix unused variable for CUDA EP builds with USE_FLASH_ATTENTION off (#14404)
### Description
Fixes unused `use_memory_efficient_attention` variable in
contrib_ops/cuda/bert/attention_impl.cu.



### Motivation and Context
ORT with CUDA version < 11.6 fails to build for release configurations
due to an unused variable.

```shell
c:\...\onnxruntime\onnxruntime\contrib_ops\cuda\bert\attention_impl.cu(420): error : variable "use_memory_efficient_attention" was declared but never referenced [C:\...\onnxruntime\build\Windows\RelWithDebInfo\onnx
runtime_providers_cuda.vcxproj]
            detected during instantiation of "onnxruntime::common::Status onnxruntime::contrib::cuda::QkvToContext(const cudaDeviceProp &, cublasHandle_t &, cudaStream_t, onnxruntime::contrib::AttentionParameters &, onnxruntime::contrib::cuda::AttentionData<T> &) [wit
  h T=float]"
  (923): here
```

This happens for CUDA < 11.6. Our cmake script turns off
onnxruntime_USE_FLASH_ATTENTION for CUDA < 11.6, which leaves the
aforementioned variable unused outside of asserts (which are removed in
release builds).

The USE_FLASH_ATTENTION option was added by
https://github.com/microsoft/onnxruntime/pull/14343
2023-01-24 09:31:57 -08:00
Edward Chen
3c1ef7dee6
Fix CI build with no Abseil. (#14400)
Use '||' instead of 'or' in onnxruntime/core/optimizer/attention_fusion_helper.h.
2023-01-24 09:17:35 -08:00
Kevin Chen
81120e9e8b
Add custom tolerance option for onnx_test_runner (#13683)
Signed-off-by: Kevin Chen <kevinch@nvidia.com>

### Description
Add a `-t` option for `onnx_test_runner` to allow users to specify
custom tolerance values when running ONNX models.


### Motivation and Context
For some backends, the default tolerance of 1-e5 is too tight to pass
accuracy checks with ONNX model zoo reference values, especially if only
one or two values are mismatched. Having a custom option will allow
different backends to specify their own custom tolerance when running
these models.

Signed-off-by: Kevin Chen <kevinch@nvidia.com>
2023-01-23 16:42:36 -08:00
liqun Fu
7b6d880b28
cpu to support bitwise ops (#14197) 2023-01-23 16:42:18 -08:00
sfatimar
77b455b969
Ort openvino 4.3 cli (#14341)
### Description
Introduce cache_dir CLI for graph serialisation.
Replace existing use_compile_network and blob_dump_path cli options for
openvino with a single command line option "cache_dir" specifying the
path that needs to be passed for blob dump/load improving the developer
experience.

### Motivation and Context?
We were having two values to set cache dir which was unnecessary

Co-authored-by: Preetha <preetha.veeramalai@intel.com>
2023-01-23 14:17:52 -08:00
Scott McKay
c252a7f992
Remove exclusions for ONNX model tests that now pass. (#14337)
### Description
<!-- Describe your changes. -->
Remove exclusions for ONNX model tests that now pass due to kernels
being implemented.
Update ONNX update doc to point to correct location for tests.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Run as many tests as possible.

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2023-01-24 08:04:27 +10:00
liqun Fu
05915d8393
support Pad(18) (#14219) 2023-01-23 12:14:35 -08:00
Hector Li
f03c507cf0
Fix fuzz test (#14385)
Fix fuzz test
2023-01-22 22:17:43 -08:00
Nat Kershaw (MSFT)
abaed6f474
Add link to Python API examples (#14345) 2023-01-21 16:23:16 -08:00
Tianlei Wu
a95fcb4345
UNet fusion and fp16 conversion for stable diffusion (#14248)
Add script to fuse nodes to optimized operators in stable diffusion 1.5
models, and a script to convert fp32 models to fp16 models. Tested with
stable diffusion 1.5.

Note that the optimized model needs onnxruntime-gpu v1.14 (release candidate
will be available soon).

Note: We will update the script to work with latest diffusers and stable
diffusion v2 and v2.1 models.
2023-01-21 10:16:44 -08:00
Nat Kershaw (MSFT)
e57c312f9d
Pin sphinx to avoid broken link (#14383) 2023-01-21 09:50:56 -08:00
Yi Zhang
cf3661ff6d
Revert "Allow PostAnalysis@2 task to continue on error for Windows_Pa… (#14375)
…ckaging_CPU_x86_default (#14332)"

This reverts commit a491f33f54.

### Description


### Motivation and Context
It looks an ADO issue.
Now, it's recovered.
It could be reenabled.
2023-01-21 09:32:39 +08:00
Nat Kershaw (MSFT)
0d40119624
Fix broken link (#14368)
Fixes #11661
2023-01-20 15:55:03 -08:00
Ye Wang
de7a868d5f
Update quantization_defs.cc (#14380)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-20 15:03:50 -08:00
Hariharan Seshadri
2d8ee5251c
Misc transformer fixes - 3 (#14320) 2023-01-20 13:57:57 -08:00
kunal-vaishnavi
72821a6113
Add PyTorch 2.0 to ORT transformer benchmarking (#14300)
### Description
This PR adds PyTorch 2.0 as an option when running the ORT transformer
benchmarking script.


### Motivation and Context
PyTorch released [PyTorch
2.0](https://pytorch.org/get-started/pytorch-2.0/) in the nightly
binaries and a stable release of PyTorch 2.0 is expected in March 2023.
2023-01-20 12:50:53 -08:00
Tianlei Wu
414b012f42
Add memory efficient attention from CUTLASS (#14343)
### Description
Add memory efficient attention from CUTLASS.

TODO (in next pull request): 
(1) Need performance tests on different GPUs, then add a sequence length
threshold (only activate it for long sequence length).
(2) Merge changes from https://github.com/NVIDIA/cutlass/pull/773 when
it is in cutlass master.
2023-01-20 12:33:01 -08:00
Zhang Lei
e64f357ad4
Fix some prefast checking found problems. (#14342)
Fix : BUG 8989, BUG 9014
2023-01-20 11:04:52 -08:00
Edward Chen
3b382ea7e1
Free OrtStatus in ASSERT_ORT_STATUS_OK, make run_android_emulator.py work with newer JDK version (#14369)
- Free OrtStatus in ASSERT_ORT_STATUS_OK in model_tests.cc
- Make run_android_emulator.py work with newer JDK version
2023-01-20 09:27:47 -08:00
cao lei
22fdc31667
remove unnecessary waitOnEPStep when current node and the consumer node are in the same stream (#14173)
### Description
Remove the unnecessary WaitOnEPStep if the current operator node and its
consumer are in the same stream while there are notifications filed in
the current node



### Motivation and Context
In the current code, the WaitOnEPStep will always be launched as long as
the notification is filed in the input node, no matter the current node
and the input node are in the same stream or not, which is not
necessary.
This PR is to remove the WaitOnEPStep for this case.

Co-authored-by: Lei Cao <leca@microsoft.com>
2023-01-20 07:35:15 -08:00
Kyushick Lee
cd24f0794a
Extend ort_backend.py for another ep (#14349)
### Description
<!-- Describe your changes. -->

This PR extends OrtBackend to allow for configuring an EP based on the
name, and fallbacks to existing mechanism that infers the EP based on
tensor affinity if nothing is provided.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Currently OrtBackend needs `get_ort_device()` with the device tag
inferred from torch.Tensor, but ort device is not yet supported for
dort. The change allows run dort with a supported EP, by configuring
dort with a desired EP and letting the dort (ort InferenceSession) take
CPU-affined pytorch Tensors as inputs then inject data transfer nodes
internally.
2023-01-20 07:30:00 -08:00
Yi Zhang
3d6cea14f4
Remove intermedia obj files once build finished (#14361)
### Description
Remove intermedia obj files and reenable cache

### Motivation and Context
Recently, training_debug_x64 pipeline often failed due to not enough
space.
It could free nearly 8G space by deleting obj files.
So, the compilation cache can be reenabled
2023-01-20 13:37:15 +08:00
Ye Wang
668586e8f8
Support muP in Attention (#14348)
### Description
<!-- Describe your changes. -->



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->

Co-authored-by: Ubuntu <wy@v100-2.0cdb2e52twzevn1i4fi45bylyg.jx.internal.cloudapp.net>
2023-01-19 20:36:55 -08:00
Tianlei Wu
1dd07d147d
fix windows build error (#14362)
### Description
Fix https://github.com/microsoft/onnxruntime/issues/14359

test\greedy_search_top_one.cc(21,44): warning C4244: '=':
conversion from 'int32_t' to '_Ty', possible loss of data
[C:\Users\11000978\onnxruntime\build\Windows\Debug\onnxrunti
me_providers_cuda.vcxproj]



### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
2023-01-19 18:20:46 -08:00
Wei-Sheng Chin
432a9912a3
Fix LORT CI failure due to PyTorch change (#14367)
As title. The fuser in LORT doesn't like "scalar". With a recent PyTorch
change, scalar is intorduced somewhere it was there before. Now, a
simple fix is to check if all inputs are tensors or some specially
allowed cases before sending ops to ORT.
2023-01-19 16:02:40 -08:00
RandySheriffH
36ba3d8d21
Exclude a multi-stream case from reduced ops build (#14351)
Exclude a multi-stream case from reduced ops build to unblock
[pipeline](https://dev.azure.com/onnxruntime/onnxruntime/_build?definitionId=120&_a=summary).

Co-authored-by: Randy Shuai <rashuai@microsoft.com>
2023-01-19 14:39:25 -08:00