Commit graph

6152 commits

Author SHA1 Message Date
Edward Chen
3bc91c2151
Move reduced ops files into build directory (#10030)
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
2021-12-28 19:04:20 -08:00
Scott McKay
a367f0664d
From Python 3.8 and on you need to explicitly add the current directory for libraries to be loaded from it. Update onnxruntime_test_python.py with that handling. (#10129) 2021-12-28 16:10:26 +10:00
George Wu
3d6786c92e
update tensorrt multi gpu pipeline to tensorrt 8.2 (#10141) 2021-12-27 15:43:27 -08:00
Vincent Wang
ceb17f82ff
Use FusedMatMul When Transpose is Between First Dim and Contiguous Batch Dims (#9734)
* fusedmatmul support transpose batches

* fix win build

* fix contrib op md

* more comments
2021-12-27 10:49:46 +08:00
Vincent Wang
f780f06240
ConcatGrad for OpSet13 (#10109) 2021-12-24 10:02:52 +08:00
stevenlix
05d20343ee
Remove duplicated constant initializer copies for TensorRT nodes (#10105)
* add new field constant_initializers in metadef and remove constant initializers from trt node inputs

* remove redundancy

* use GetConstantInitializer() to get constant initializers

* add ORT_ENFORCE check

Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
2021-12-22 12:19:56 -08:00
Sheil Kumar
ce1a9ca618
Fix Microsoft.AI.MachineLearning NuGet App failure with multiple binaries copied to same destination (#10076)
* Include onnxruntime binary when not using pacakge referene or uap app.

* Remove the lib\uap10.0 build from the nuget package - causing conflicts

* Add UWP test

* remove build files

* remove local change

* reset mimalloc and onnx-tensorrt

* change username to Microsoft

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-12-21 12:34:03 -08:00
Ye Wang
7a1bdc2052
Don't check cache shape when using dynamic axis (#10090)
Co-authored-by: Ubuntu <wy@linux-v100.aidmrjtolptuzevavgwhrapqcd.jx.internal.cloudapp.net>
2021-12-20 21:19:29 -08:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
satyajandhyala
bd4fb4c5da
Coding style fix. (#10080) 2021-12-18 12:05:48 -08:00
ashari4
cdbd678192
Check kMSDomain already exists before registering it (#10078)
* Check domain before registration
2021-12-17 17:55:15 -08:00
Yufeng Li
12ee2e942f
add int8_t for Resize (#10067)
As we support quantization for format s8s8, we need Resize to support int8_t.
2021-12-17 15:36:09 -08:00
Moshe David
4fd85cd97a
Fix broken link to TRT doc in exception details (#9496)
Co-authored-by: Moshe <modav@microsoft.com>
2021-12-17 09:00:33 -08:00
Faith Xu
d42feae042
Add citation file (#10061)
* Add citation file

* Fix typos
2021-12-16 19:56:21 -08:00
Guoyu Wang
f3c72de718
[QDQ] Add shared NodeUnit class (#10052)
* initial change

* move more function to node_unit

* Remove commented code

* Minor update

* Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder.cc

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* address CR comments

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2021-12-16 17:37:51 -08:00
Tianlei Wu
ef36488df0
Add BeamSearch operator for GPT-2 decoding (#9680)
* Add BeamSearch operator and CPU implementation
* Add ONNX conversion script
2021-12-16 16:08:05 -08:00
Yufeng Li
fab39b4704
Update optimization level message in perf_test tool (#9972) 2021-12-16 13:49:18 -08:00
Bowen Bao
102f9b05e1
Support new symbolic function api from PyTorch with PythonOp (#9880)
* Support new symbolic function api from PyTorch with PythonOp

* Specify exact exception

* add comments

* move comments and arg
2021-12-16 11:08:06 -05:00
George Nash
93636cbd20
Reduce ops for DNNL ep (#10056)
* Add Reduce Ops to DNNL ep

Combine the Reduction ops into one class

Add ReduceL1, ReduceL2, ReduceSum, ReduceMax, ReduceMin, and ReduceProd,
ReduceSumSquare, ReduceLogSum, and ReduceLogSumExp

Reduce code now also handles the keepdims attribute

Also updated code to use HandleNegativeAxis function from
the providers/common.h code instead of manually calculating.

In code documentation exists to help explain complex reduction op code

Add elementwise ops to Reduction op capability code removed keepdims check
from the Reduction op capability code.

Updated the error_tolerance for LogGrad(DNNL EP only) after finding a few
instances that the tests were a little out of tolerance.

Signed-off-by: George Nash <george.nash@intel.com>

* Documentation cleanup in dnnl_qattention

Cleaned up the Comments documenting the QAttention operator
For some reason a bunch of new lines were introduced to the
comment making it harder to read.

Signed-off-by: George Nash <george.nash@intel.com>
2021-12-16 07:31:16 -08:00
Changming Sun
44c701192b
Revert a bad change in bfc_arena.cc (#10057) 2021-12-15 23:38:45 -08:00
Tang, Cheng
6357c12977
use inplace reshape (#9991)
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
2021-12-15 21:17:29 -08:00
George Nash
7922a8c22f
Optimization Convolution op when using dnnl ep (#10051)
If Group attr = 1 allow the OneDNN library to optimize the memory
layout for the device the Convolution operator is being run on.

With out this optimization the default NCHW memory layout is used
on CPUs the NCHW memory layout can result in a significant performance
decrease.

Signed-off-by: George Nash <george.nash@intel.com>
2021-12-15 20:28:34 -08:00
Edward Chen
3466ee45a3
Add hash value typedef. (#9710)
Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.
2021-12-15 19:07:17 -08:00
Chih-Hsuan Yen
4e73cc83d6
Fix building DNNL EP with clang (#10014)
Before this change, building DNNL EP from onnxruntime 1.10.0 with clang fails with:

In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc:4:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.h:5:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_subgraph.h:10:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/shared_library/provider_api.h:19:
In file included from /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/common.h:36:
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:33:6: error: call to function 'operator<<' that is neither visible in the template definition nor found by argument-dependent lookup
  ss << t;
     ^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<std::vector<long>>' requested here
  MakeStringImpl(ss, args...);
  ^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<long, const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, long, const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<unsigned long, const char *, long, const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:46:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, unsigned long, const char *, long, const char *, std::vector<long>>' requested here
  MakeStringImpl(ss, args...);
  ^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:93:18: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, unsigned long, const char *, long, const char *, std::vector<long>>' requested here
  return detail::MakeStringImpl(detail::if_char_array_make_ptr_t<Args const&>(args)...);
                 ^
/build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc:46:7: note: in instantiation of function template specialization 'onnxruntime::MakeString<char [20], unsigned long, char [23], long, char [9], std::vector<long>>' requested here
      ORT_ENFORCE(data_dims[i] == 1, "Dimension of input ", i, " must be 1 instead of ", data_dims[i],
      ^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/common.h:184:64: note: expanded from macro 'ORT_ENFORCE'
                                                ::onnxruntime::MakeString(__VA_ARGS__)); \
                                                               ^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:147:15: note: 'operator<<' should be declared prior to the call site
std::ostream& operator<<(std::ostream& out, const TensorShape& shape);
              ^
1 error generated.
make[2]: *** [CMakeFiles/onnxruntime_providers_dnnl.dir/build.make:384: CMakeFiles/onnxruntime_providers_dnnl.dir/build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc.o] Error 1

Two-phase lookups fail as:

1. visible in the template definition - fails as `std::ostream& operator<<(std::ostream& out, const TensorShape& shape)` (from include/onnxruntime/core/framework/tensor_shape.h) is defined after `template <typename... Args> std::string MakeString(const Args&... args)` (from include/onnxruntime/core/common/make_string.h) as per `clang++ -E`
2. argument-dependent lookup - fails as the argument data_dims has type `std::vector<long>` (via typedef in dnnl.hpp), while `std::ostream& operator<<(std::ostream& out, const TensorShape& shape)` is in namespace onnxruntime instead of std

There are several possible fixes:

* Make operator<< appear before MakeString by adjust the order of header files - I consider it fragile
* Also define operator<< in namespace std - may results in namespace pollution
* Use an argument of a class in onnxruntime namespace - this commit
2021-12-15 17:08:57 -08:00
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
George Wu
16274beb6f
update TensorRT EP to use TensorRT 8.2 (#9981)
* update base image from 11.4.0 to 11.4.2

* update Linux TRT GPU pipeline to TRT 8.2

* update onnx-tensorrt to 8.2-GA

* disable failing TensorRT 8.2 tests.

* update pad test.

* fix

* update win trt ci pipeline to trt 8.2

* test run with cuda 11.4 and cudnn 8.2

* increase timeout

* revert

* revert

* update packaging pipelines to use trt 8.2

* fix typo

* update trt gpu perf pipeline to trt 8.2

* increase timeout

* delete deprecated ci-perf-pipeline.yml

* bump timeout

* adjust timeout packaging
2021-12-15 15:59:31 -08:00
Yufeng Li
ee975de77b
reorganize quantization files (#10023)
* reorganize quantization files
2021-12-15 15:45:04 -08:00
Edward Chen
6cdab06255
Enable argument files in build.py. (#10040) 2021-12-15 08:22:15 -08:00
Changming Sun
20f8a06f1f
Remove OpenMP code (#10032) 2021-12-15 00:58:42 -08:00
jingyanwangms
8043a9facc
Bump master version to 1.11 (#9957)
* Bump master version to 1.11

* Update Windows AI version

* update version in onnxruntime_c_api.cc
2021-12-14 23:32:06 -08:00
Changming Sun
91096781c3
A small fix to allocators (#10042) 2021-12-14 21:21:07 -08:00
Changming Sun
9d9ebd3b85
Fix some static analysis warnings in the core framework (#10033) 2021-12-14 14:41:42 -08:00
Changming Sun
e0a0f385bb
Fix some warnings in mlas (#10034) 2021-12-14 14:41:11 -08:00
ashari4
af71da0ac6
Yield op supports bf16 (#10035) 2021-12-14 13:12:37 -08:00
Ye Wang
703becd796
Fix a bug in fusion_embedlayer.py (#10022) 2021-12-14 12:50:35 -08:00
Ginés Hidalgo
5be0fa13c0 [DML] Fixed huge bug in ORT_NO_EXCEPTIONS for DML back end, the check is reversed 2021-12-14 10:17:06 -08:00
ashari4
9e04b7e59b
Remove memcpy in in-place ATen ops (#9913)
* Make ops in-place

* Add comment
2021-12-14 08:28:12 -08:00
Vincent Wang
a7c2d1cb09
bf16 for dlpack (#10016) 2021-12-14 13:34:14 +08:00
Chen Fu
cd0af7ad44
Symmetric quantized convolution kernel ARM64 (#9772)
Adding a symmetric quantized convolution kernel for ARM64

Note:
Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-12-13 21:14:45 -08:00
Suffian Khan
7e55a942cd
Add torch 1.10 requirements for rocm (#10028) 2021-12-13 20:39:58 -08:00
Sunghoon
6de2a878cb
[js/react_native] Fix a broken manual build (#10012)
* Fix a broken manual build

* Keep the same file structures
2021-12-13 19:02:10 -08:00
Changming Sun
7b63d1102b
Fix some warnings in orttraining code (#10009) 2021-12-13 15:28:21 -08:00
Gani Nazirov
c82160bbd0
Add AtenOp at:bitwise_or (#9662)
* Add AtenOp at:bitwise_or

* Specify overload name for bitwise_or

* undo unnecessary import

* set output element type to BOOL

* Add broadcasting support

* Fix test

Co-authored-by: Gani Nazirov <ganaziro@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>
Co-authored-by: Gani Nazirov <ganaziro@microsoft.com>
2021-12-13 14:36:15 -08:00
Rajalakshmi Srinivasaraghavan
ad99dff298 POWER10: Update builtins for DGEMM
This patch changes builtin names in DGEMM based on endianness order.
Also changing some casting style in SGEMM and DGEMM code for POWER10.
2021-12-13 21:43:01 +00:00
Edward Chen
5d821b5bd9
Address null dereference warning in div_grad_impl.cu. (#10010) 2021-12-13 13:26:56 -08:00
Abhishek Jindal
777a80fbc1
Abjindal/eager onnx operators fix (#9968)
* adding view operator changes

* adding the slice operator definition

* moving to opgen script for slice op and removing redundant steps in view op and reshape_copy

* adding for at definition

* adding for at::infer_size definition

* changing template style for reshape_copy to ensure int64_t type
2021-12-13 13:23:46 -08:00
George Nash
d0b08af37a
Implementation of QAttention for the DNNL execution provider (#10004)
* Add QAttention to DNNL EP

Add QAttention to DNNL EP (limited support and disable for gpu)

update ONEDNN version to 2.4.4

bug fix in getcapability

add memory debug print

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Address Code Review + MatMulInteger Fix

clean up code and add comments

fix matmulinteger and add fusion rule to enable initialized vector weight zero
points of 0s

update DNNL_TAG to v2.5

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Linux Compile Fix + rollback ONEDNN to 2.4.4

Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com>

* Fix QAttention Debug build

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Fix QAttention build if USE_DNNL not specified

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>
2021-12-10 21:50:13 -08:00
Zhang Lei
787755328b
Add s8s8 for depthwise qconv 3x3 5x5 (#10008)
* Add depthwise conv s8s8 routine for 3x3 5x5 on arm64. And its testcase.

* fix some comments.
2021-12-10 17:52:51 -08:00
Nat Kershaw (MSFT)
b4434c7694
Automate generation of C/C++ API docs (#9997) 2021-12-10 17:45:50 -08:00
Zhang Lei
b000ec91cc
Add quantization tool and its unittest with s8s8 support (#10007)
* Add quantization tool with s8s8 support
  * Add unittest for existing s8s8 support operators
  * Comment ready unittest for upcomming s8s8 operator (ConvInteger, and Resize)
  * Minor change on quantization tools

* Use different s8 min value upon weight or activation.

* use same qmin for reduce ranged s8.
2021-12-10 16:40:01 -08:00