Commit graph

975 commits

Author SHA1 Message Date
Weixing Zhang
ea9c8a7cdc
support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368)
Co-authored-by: Weixing Zhang <wezhan@microsoft.com>

Support MIGraphXEP to work with ROCMEP for inference on AMD GPU
2022-01-26 15:52:56 -08:00
Yulong Wang
847801f5be
[wasm] update emscripten v2.0.34 (#10391) 2022-01-26 14:46:02 -08:00
Guoyu Wang
4af116649c
[QDQ] Hookup NNAPI GetCapability/Compile with shared QDQ selectors (#10347)
* add qdqgroup as input for NodeUnit

* minor update

* hookup nnapi_ep

* minor update

* update compiler setting

* Add a simple UT

* Pipeline change to add build minimal extended with NNAPI for Android

* move GetAllNodeUnits to node_unit.h, add UT for NodeUnits, minor updates

* minor updates

* address CR comments

Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>
2022-01-25 17:13:46 -08:00
Xavier Dupré
6e95c0316d
Builds onnxruntime + eager mode with the same value for _GLIBCXX_USE_CXX11_ABI as pytorch (#10114)
* add _GLIBCXX_USE_CXX11_ABI
* restrict to eager mode
2022-01-25 11:25:31 +01:00
Chen Fu
2afce4830c
Symmetric QGEMM (#10289)
Adding code for symmetric quantized matrix multiplication. Used in quantized convolution, achieving significant perf gain.

TODO, use Symmetric Quantized GEMM in other operators!

TODO address activation buffer overread in custom allocators and tensors supplied by users.

DOT kernel perf test:

Pixel 5a:

Cartoongan	513.539 ms	471.786 ms
Efficient	57.5169 ms	56.4174 ms
Edgetpu	14.6673 ms	13.5959 ms
NEON kernel perf test

Pixel 3a

Cartoongan	1423.53 ms	1069.92 ms
Efficient	114.086 ms	107.968 ms
Edgetpu	39.2632 ms	36.9839 ms


Co-authored-by: Chen Fu <fuchen@microsoft.com>
2022-01-24 10:49:04 -08:00
Dmitri Smirnov
7e092a7e3f
Reduce number of memory allocations based on a customer profiling case (#10193)
Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
2022-01-24 10:40:46 -08:00
Abhishek Jindal
4aa7cee0d8
Abjindal/clean eager backend (#10055)
* clearing map for eager mode backends

* clearing map for eager mode backends manager

* making OrtBackendsManager an extern variable and trying to delete it

* cleaning backends manager when the python interpret exits

* adding ifdef for eager mode code

* disabling warning for pybind state file

* disabling warning for python module file

* running clang auto format and reducing redundancy

* remove new line

* moving declaration to a new header file

* adding the header file for eager mode for python module

* removing source files for eager mode

* add source file for python module in eager mode

* Update orttraining/orttraining/python/orttraining_python_module_eager.h

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>

Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
2022-01-19 14:20:09 -08:00
Sunghoon
b038f4e56f
Add a build option to create a WebAssembly static library (#10184)
* add p50 in test

* Add a build option to create a WebAssembly static library

Co-authored-by: Yulong Wang <yulongw@microsoft.com>
2022-01-18 18:05:04 -08:00
Rachel Guo
a099bd454b
[QDQ] Add shared qdq selectors (#10178)
* wip

* wip

* wip

* wip

* wip

* save

* minor changes

* update test graph name

* address pr comments

* update

* address pr comments

* address pr comments

* fix warning

* minor include fix

* update to nodegroupselectors

* delete unnecessary includes

Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>
2022-01-11 19:41:45 -08:00
Shucai Xiao
ce103ace93
Amdmigraphx fix build error (#9272)
* fix build error

* rename a missing api for the MIGraphX EP
2022-01-10 15:18:43 -08:00
Ye Wang
5ebb857501
Update onnxruntime_unittests.cmake (#10215) 2022-01-07 16:14:15 -08:00
Edward Chen
34c025109c
Exclude graph_runtime_optimization_test.cc from reduced ops build. (#10191) 2022-01-05 09:22:38 -08:00
Ye Wang
2803a9465d
Add example of registering custom cuda op as shared lib (#10025) 2022-01-05 09:22:15 -08:00
Edward Chen
792db33f01
Enable loading of ORT format model graph runtime optimizations (#9901)
Initial implementation of load/replay of runtime optimizations in an ORT format model.
2022-01-04 12:09:07 -08:00
Tongliang Liao
1d3b34cc92 Add .git suffix to github URL.
Although github works with both, this is more precise.
Having an extension also makes it easy to match with regex, when we want to inject code to reroute traffic to our own git mirror.
2022-01-03 14:38:35 -08:00
Yufeng Li
7208fcbe1c
use wasmscalar as default kernel (#9988)
* use wasmscalar as default kernel
2022-01-03 10:55:08 -08:00
Edward Chen
3bc91c2151
Move reduced ops files into build directory (#10030)
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
2021-12-28 19:04:20 -08:00
Changming Sun
4e9e01cb3c
Fix SDL warnings in CPU EP (#9975) 2021-12-19 20:54:29 -08:00
Guoyu Wang
f3c72de718
[QDQ] Add shared NodeUnit class (#10052)
* initial change

* move more function to node_unit

* Remove commented code

* Minor update

* Update onnxruntime/core/providers/nnapi/nnapi_builtin/builders/op_builder.cc

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>

* address CR comments

Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>
2021-12-16 17:37:51 -08:00
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
George Wu
16274beb6f
update TensorRT EP to use TensorRT 8.2 (#9981)
* update base image from 11.4.0 to 11.4.2

* update Linux TRT GPU pipeline to TRT 8.2

* update onnx-tensorrt to 8.2-GA

* disable failing TensorRT 8.2 tests.

* update pad test.

* fix

* update win trt ci pipeline to trt 8.2

* test run with cuda 11.4 and cudnn 8.2

* increase timeout

* revert

* revert

* update packaging pipelines to use trt 8.2

* fix typo

* update trt gpu perf pipeline to trt 8.2

* increase timeout

* delete deprecated ci-perf-pipeline.yml

* bump timeout

* adjust timeout packaging
2021-12-15 15:59:31 -08:00
Changming Sun
20f8a06f1f
Remove OpenMP code (#10032) 2021-12-15 00:58:42 -08:00
Chen Fu
cd0af7ad44
Symmetric quantized convolution kernel ARM64 (#9772)
Adding a symmetric quantized convolution kernel for ARM64

Note:
Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-12-13 21:14:45 -08:00
George Nash
d0b08af37a
Implementation of QAttention for the DNNL execution provider (#10004)
* Add QAttention to DNNL EP

Add QAttention to DNNL EP (limited support and disable for gpu)

update ONEDNN version to 2.4.4

bug fix in getcapability

add memory debug print

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Address Code Review + MatMulInteger Fix

clean up code and add comments

fix matmulinteger and add fusion rule to enable initialized vector weight zero
points of 0s

update DNNL_TAG to v2.5

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Linux Compile Fix + rollback ONEDNN to 2.4.4

Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com>

* Fix QAttention Debug build

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Fix QAttention build if USE_DNNL not specified

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>
2021-12-10 21:50:13 -08:00
Chi Lo
4669048b47
Handle compiler warnings for TRT EP (#9956)
* fix error C4996

* remove wd4996 and fix error C4966

* fix typo

* remove wd4996 for onnx-tensorrt

* remove more /wd for onnx-tensorrt

* gix bug for strncpy_s of (Buffer is too small && 0)

* fix code to remove warning 4244

* fix code to remove warning 4267

* remove /wd4267 /wd4244

* fix bug

* change int to size_t

* using size_t instead of int

* use float instead of double

* Use size_t instead of int

* use size_t instead of int

* use size_t instead of int. Also fix typo
2021-12-09 15:33:52 -08:00
Dmitri Smirnov
a7abd541c7
Correct message type (#9973) 2021-12-09 10:00:44 -08:00
Patrik Vavercak
fb30e9fdae
Remove /safeseh link option from non-msvc builds (#9744) (#9935) 2021-12-08 11:44:00 -08:00
Yi-Hong Lyu
f60a287a64
Add __x86.get_pc_thunk.bx to avoid dependency (#9955) 2021-12-08 04:50:41 -08:00
Dmitri Smirnov
a7f649db7c
Enable proper override using MIMalloc (#9944)
Redirect memory allocations to MiMalloc and advance its version to v2.0.3
Refactor for a universal ifdef
2021-12-07 17:56:58 -08:00
Guoyu Wang
b34b991aea
Improve reduced ops and types build (#9908)
* Improve reduceops and types build

* minor update

* fix test error

* fix minimal build break

* minor update and add comments

* Address CR comments
2021-12-07 13:02:05 -08:00
Justin Stoecker
63c8889944
Restore arm64x onnxruntime binaries (#9950) 2021-12-07 12:39:46 -08:00
Yufeng Li
e613019174
add s8s8 support for quantized conv and gemm (#9902)
* add s8s8 support for quantized conv and gemm
2021-12-03 14:55:18 -08:00
Jeff Daily
8d88a6ac7f
add --amdgpu-target=gfx90a (#9820) 2021-12-01 22:28:52 -08:00
Abhishek Jindal
740679d329
Abjindal/fix windows ci pipeline (#9883)
* switching to /wd4800 for eager mode

* fixing compile flags ignore warnings, previously it was only using the last one
2021-11-30 10:33:13 -08:00
RandySheriffH
9345894c82
Add build option to enable cuda profiling (#9875) 2021-11-29 22:44:50 -08:00
Maajid khan
0ae0f29f14
[OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848)
* Changes to ensure openvino build go through in Windows

* Modified Hetero plugin Logic

*Modified Hetero Feature logic. In Hetero,
if the operator to be marked true in getcapability(),
it should be supported by either of the devices
specified with HETERO in the device_type.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV updated to 2021.4.2 version

* OV updated to 2021.4.2 version

* Updated OV to 2021.4.2 version, mono download  link and dotnet version

* Copying Managed nugets in openvino c# docker file

*Copying Managed nuget to nugets artifacts
directory

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
2021-11-23 13:12:08 -08:00
RajalakshmiSR
8564fc1933
POWER10: Add optimized dgemm kernel (#9652)
* POWER10: Add optimized dgemm kernel

This patch makes use of POWER10 matrix multiply assist feature and
adds new DGEMM kernel.

* Indentation update

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-11-22 20:28:21 -08:00
Dwayne Robinson
32419974ad Merge remote-tracking branch 'origin/master' into user/dwayner/DML1.8forORT1.10 2021-11-19 05:20:26 -08:00
Dwayne Robinson
e0ffc30a0b Update to 1.8.0 2021-11-19 04:44:32 -08:00
Zhang Lei
8ef6aff734
Zhalei/dwqconv3x3 5x5 arm64 (#9714)
* Arm64 Depthwise Convolution 3x3.

* Add 5x5 intrinsic dwqconv for arm64

* rebase to master, remove no-need logic after arm64 convsym enabled.

* Some more adjustment on the instrunction pipeling.

* Add specific test cases.

* Fix test dimension too small.

* Fix build warning as error on some CI.

* better format, etc.
2021-11-18 13:57:16 -08:00
Changming Sun
76715ad525
Delete ioscross code (#9793) 2021-11-18 11:31:13 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
Dwayne Robinson
99afb87a02 Update DirectML 1.5.1 to 1.8.0 for ORT1.10 2021-11-15 21:17:25 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Chen Fu
1c84621020
Adding ARM64 depthwise convolution kernel for symmetric quantization (#9655)
Adding ARM64 depthwise convolution kernel for symmetric quantization

Motivation and Context
Two improvements against current kernel code :

1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication.
2. Unrolled loop with manual software pipelining

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-11-15 12:18:43 -08:00
Tang, Cheng
99257eb8e3
support build option to include external graph transformers (#9478)
* temp code

* support external graph transformer  from build script

* remove debug code

* add test case

* support register rewrite rule

* fix source_group issue if external source is not share any common prefix

* fix python code style checker

* resolve merge conflict

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-15 08:16:20 -08:00
Edward Chen
9f69d8bbae
Disable partial runtime optimization implementation by default (#9748)
* Only serialize runtime optimization records container if non-empty.

* Remove runtime optimizations from onnxruntime/core/flatbuffers/schema/README.md as it's not completely implemented yet.

* Disable partial runtime optimization implementation by default.
2021-11-12 17:37:29 -08:00
Sheil Kumar
a17bdaf725
Enable JoinModels API in WinML+RT Experimental API (#9746)
* Dynamic onnx model fusion

* empty node names shoudl remain empty

* comments and cleanup

* logic reversed for promoting_unlined_outputs

* PR feedback

* type

* typo

* fix model outputs with promote unlinked output

* remove disembodied model

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-11-12 16:56:31 -08:00
Edward Chen
997266a620
Add build.py option to disable ORT format model runtime optimization (#9723)
ORT format model runtime optimization implementation is in progress.
This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.
2021-11-11 18:05:45 -08:00
Tang, Cheng
6420530b3a
fix the mkl dependency for eager mode (#9702)
* explicit link with libtorch instead of use cmake var to avoid introduce mkl dependency

* use find_lib to get libtorch lib name

* temp fix

* add missing libraries

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-09 08:52:55 -08:00