Commit graph

956 commits

Author SHA1 Message Date
Valery Chernov
b327e89efa
Standalone TVM Executor Provider (#10019)
* squashed commit for standalone tvm execution provider

* critical fix for correct python build with stvm ep

* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG

* updates and fixes

* update parsing of stvm provider options

* add support of external data for onnx model

* add conditional dump of subgraphs

* remove unused code

* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API

* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)

* add fp16

* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options

* fix license text in header. fix log format

* small fixes

* fix issues from flake8

* remove model proto construction from GetCapability

* reserve memory for vector of DLTensors

* add simple tutorial for STVM EP

* STVM docs

* jroesch/tvm -> apache/tvm

* remove dead code, unneccessary logs and comments

* fix in readme

* improve tutorial notebook

* tvm update

* update STVM_EP.md

* fix default value

* update STVM_EP.md

* some TODOs for the future development

* shorten long lines

* add hyperlink to STVM_EP.md

* fix Linux CI error

* fix error in csharp test

Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
2021-12-15 16:59:20 -08:00
George Wu
16274beb6f
update TensorRT EP to use TensorRT 8.2 (#9981)
* update base image from 11.4.0 to 11.4.2

* update Linux TRT GPU pipeline to TRT 8.2

* update onnx-tensorrt to 8.2-GA

* disable failing TensorRT 8.2 tests.

* update pad test.

* fix

* update win trt ci pipeline to trt 8.2

* test run with cuda 11.4 and cudnn 8.2

* increase timeout

* revert

* revert

* update packaging pipelines to use trt 8.2

* fix typo

* update trt gpu perf pipeline to trt 8.2

* increase timeout

* delete deprecated ci-perf-pipeline.yml

* bump timeout

* adjust timeout packaging
2021-12-15 15:59:31 -08:00
Changming Sun
20f8a06f1f
Remove OpenMP code (#10032) 2021-12-15 00:58:42 -08:00
Chen Fu
cd0af7ad44
Symmetric quantized convolution kernel ARM64 (#9772)
Adding a symmetric quantized convolution kernel for ARM64

Note:
Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-12-13 21:14:45 -08:00
George Nash
d0b08af37a
Implementation of QAttention for the DNNL execution provider (#10004)
* Add QAttention to DNNL EP

Add QAttention to DNNL EP (limited support and disable for gpu)

update ONEDNN version to 2.4.4

bug fix in getcapability

add memory debug print

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Address Code Review + MatMulInteger Fix

clean up code and add comments

fix matmulinteger and add fusion rule to enable initialized vector weight zero
points of 0s

update DNNL_TAG to v2.5

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Linux Compile Fix + rollback ONEDNN to 2.4.4

Signed-off-by: Zhaoyang Wang <zhaoyang.wang@intel.com>

* Fix QAttention Debug build

Signed-off-by: Wang <zhaoyang.wang@intel.com>

* Fix QAttention build if USE_DNNL not specified

Signed-off-by: George Nash <george.nash@intel.com>

Co-authored-by: Wang <zhaoyang.wang@intel.com>
Co-authored-by: MTC <63478620+jeyblu@users.noreply.github.com>
2021-12-10 21:50:13 -08:00
Chi Lo
4669048b47
Handle compiler warnings for TRT EP (#9956)
* fix error C4996

* remove wd4996 and fix error C4966

* fix typo

* remove wd4996 for onnx-tensorrt

* remove more /wd for onnx-tensorrt

* gix bug for strncpy_s of (Buffer is too small && 0)

* fix code to remove warning 4244

* fix code to remove warning 4267

* remove /wd4267 /wd4244

* fix bug

* change int to size_t

* using size_t instead of int

* use float instead of double

* Use size_t instead of int

* use size_t instead of int

* use size_t instead of int. Also fix typo
2021-12-09 15:33:52 -08:00
Dmitri Smirnov
a7abd541c7
Correct message type (#9973) 2021-12-09 10:00:44 -08:00
Patrik Vavercak
fb30e9fdae
Remove /safeseh link option from non-msvc builds (#9744) (#9935) 2021-12-08 11:44:00 -08:00
Yi-Hong Lyu
f60a287a64
Add __x86.get_pc_thunk.bx to avoid dependency (#9955) 2021-12-08 04:50:41 -08:00
Dmitri Smirnov
a7f649db7c
Enable proper override using MIMalloc (#9944)
Redirect memory allocations to MiMalloc and advance its version to v2.0.3
Refactor for a universal ifdef
2021-12-07 17:56:58 -08:00
Guoyu Wang
b34b991aea
Improve reduced ops and types build (#9908)
* Improve reduceops and types build

* minor update

* fix test error

* fix minimal build break

* minor update and add comments

* Address CR comments
2021-12-07 13:02:05 -08:00
Justin Stoecker
63c8889944
Restore arm64x onnxruntime binaries (#9950) 2021-12-07 12:39:46 -08:00
Yufeng Li
e613019174
add s8s8 support for quantized conv and gemm (#9902)
* add s8s8 support for quantized conv and gemm
2021-12-03 14:55:18 -08:00
Jeff Daily
8d88a6ac7f
add --amdgpu-target=gfx90a (#9820) 2021-12-01 22:28:52 -08:00
Abhishek Jindal
740679d329
Abjindal/fix windows ci pipeline (#9883)
* switching to /wd4800 for eager mode

* fixing compile flags ignore warnings, previously it was only using the last one
2021-11-30 10:33:13 -08:00
RandySheriffH
9345894c82
Add build option to enable cuda profiling (#9875) 2021-11-29 22:44:50 -08:00
Maajid khan
0ae0f29f14
[OpenVINO-EP] V3.4 Release with OpenVINO 2021.4.2 LTS Release (#9848)
* Changes to ensure openvino build go through in Windows

* Modified Hetero plugin Logic

*Modified Hetero Feature logic. In Hetero,
if the operator to be marked true in getcapability(),
it should be supported by either of the devices
specified with HETERO in the device_type.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* OV updated to 2021.4.2 version

* OV updated to 2021.4.2 version

* Updated OV to 2021.4.2 version, mono download  link and dotnet version

* Copying Managed nugets in openvino c# docker file

*Copying Managed nuget to nugets artifacts
directory

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: saharfraza <sfatima.3001@gmail.com>
Co-authored-by: mayavijx <mayax.vijayan@intel.com>
Co-authored-by: Aravind Gunda <aravindx.gunda@intel.com>
2021-11-23 13:12:08 -08:00
RajalakshmiSR
8564fc1933
POWER10: Add optimized dgemm kernel (#9652)
* POWER10: Add optimized dgemm kernel

This patch makes use of POWER10 matrix multiply assist feature and
adds new DGEMM kernel.

* Indentation update

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-11-22 20:28:21 -08:00
Dwayne Robinson
32419974ad Merge remote-tracking branch 'origin/master' into user/dwayner/DML1.8forORT1.10 2021-11-19 05:20:26 -08:00
Dwayne Robinson
e0ffc30a0b Update to 1.8.0 2021-11-19 04:44:32 -08:00
Zhang Lei
8ef6aff734
Zhalei/dwqconv3x3 5x5 arm64 (#9714)
* Arm64 Depthwise Convolution 3x3.

* Add 5x5 intrinsic dwqconv for arm64

* rebase to master, remove no-need logic after arm64 convsym enabled.

* Some more adjustment on the instrunction pipeling.

* Add specific test cases.

* Fix test dimension too small.

* Fix build warning as error on some CI.

* better format, etc.
2021-11-18 13:57:16 -08:00
Changming Sun
76715ad525
Delete ioscross code (#9793) 2021-11-18 11:31:13 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
Dwayne Robinson
99afb87a02 Update DirectML 1.5.1 to 1.8.0 for ORT1.10 2021-11-15 21:17:25 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Chen Fu
1c84621020
Adding ARM64 depthwise convolution kernel for symmetric quantization (#9655)
Adding ARM64 depthwise convolution kernel for symmetric quantization

Motivation and Context
Two improvements against current kernel code :

1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication.
2. Unrolled loop with manual software pipelining

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-11-15 12:18:43 -08:00
Tang, Cheng
99257eb8e3
support build option to include external graph transformers (#9478)
* temp code

* support external graph transformer  from build script

* remove debug code

* add test case

* support register rewrite rule

* fix source_group issue if external source is not share any common prefix

* fix python code style checker

* resolve merge conflict

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-15 08:16:20 -08:00
Edward Chen
9f69d8bbae
Disable partial runtime optimization implementation by default (#9748)
* Only serialize runtime optimization records container if non-empty.

* Remove runtime optimizations from onnxruntime/core/flatbuffers/schema/README.md as it's not completely implemented yet.

* Disable partial runtime optimization implementation by default.
2021-11-12 17:37:29 -08:00
Sheil Kumar
a17bdaf725
Enable JoinModels API in WinML+RT Experimental API (#9746)
* Dynamic onnx model fusion

* empty node names shoudl remain empty

* comments and cleanup

* logic reversed for promoting_unlined_outputs

* PR feedback

* type

* typo

* fix model outputs with promote unlinked output

* remove disembodied model

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-11-12 16:56:31 -08:00
Edward Chen
997266a620
Add build.py option to disable ORT format model runtime optimization (#9723)
ORT format model runtime optimization implementation is in progress.
This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.
2021-11-11 18:05:45 -08:00
Tang, Cheng
6420530b3a
fix the mkl dependency for eager mode (#9702)
* explicit link with libtorch instead of use cmake var to avoid introduce mkl dependency

* use find_lib to get libtorch lib name

* temp fix

* add missing libraries

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-09 08:52:55 -08:00
Changming Sun
53afaefe3b
Refactor Windows CI pipeline yaml files (#9672) 2021-11-08 11:11:49 -08:00
Ginés Hidalgo
13e64f8ff7
Remove all warnings C4800: Implicit conversion from 'int32_t/int64_t' to bool. Possible information loss (#9535) 2021-11-08 10:12:27 -08:00
Yulong Wang
c6fddb263f
Add Node.js binding support to packaging pipeline (#9577) 2021-11-05 15:29:40 -07:00
Changming Sun
1cbbafdbe0
Change the default value of onnxruntime_DISABLE_RTTI (#9674) 2021-11-05 15:27:04 -07:00
Weixing Zhang
e11fde0179
libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. (#9618)
* libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-11-01 19:12:09 -07:00
Edward Chen
c315d1b3cd
Always enable ORT format model loading. (#9586) 2021-11-01 10:00:08 +10:00
Ginés Hidalgo
79436a2d5b
Avoided warning C5038 (#9543)
Updated several DML EP files to avoid warning C5038: data member 'member1' will be initialized after data member 'member2' / base class 'base_class'

More information:
https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/c5038?view=msvc-160
2021-10-30 00:36:22 -07:00
Jingqiao Fu
f7774a91d6
Add api-ms-win-core-com-l1-1-0.dll, shlwapi.dll, oleaut32.dll to delay load (#9619) 2021-10-29 18:54:23 -07:00
Hariharan Seshadri
b5f7bb7d10
Update ONNX (#9462) 2021-10-29 10:33:40 -07:00
TomWildenhain-Microsoft
e8268c9a18
Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284)
* Add Transpose Optimizer and modify nhwc optimizer to use it.

* Fix casts

* Fix casts2

* Fix move

* Add tests

* Add headers

* Fixes and tests

* Remove explicit template instantiation

* Fix build warning

* Name unit tests

* Code review fixes

* Add some comments

* Fix some casts

* Make optimization slightly less agressive

* Some unit test fixes

* Update Attention pattern to work with transpose optimizer

* Update attention fuser

* Fix attention fusion python script

* Improve transpose optimizer documentation

* Create OptimizerCtx struct

* Disable Slice handler for testing

* Implement Slice int32

* Only push transposes leading up to other transposes

* Improve optimization heuristic

* Add exemption for MaxPool

* Document transpose optimizer api.h

* Revert fusion tests to master

* Remove temp files

* Replace typedef with using

* Trim trailing whitespace

* Move class declarations from api_impl.h to api_impl.cc

* Remove copy constructors and move allocator

* Alphabetize headers

* Add override keyword

* Comments for nhwc_transformer

* Rename OrtGraph to ApiGraph, etc.

* Wrap line

* Remove extra qualifier on ApiGraph

* Refector attention fusion

* Remove c-style casts from api_impl.cc

* Improve documentation

* Avoid printing vector in ORT_ENSURES

* Revert attention fusion refactor

* Remove duplicate cost heuristics and improve documentation

* Fix size_t casts

* Fixes from Scott's review

* Unrevert attention refactor and more updates from Scott's review

* Revert api_impl.cc ValueInfo change

* only optimize first transpose input

* Unrevert api_impl.cc changes

* Make vector call reserve

* transpose_optimizer.cc update from Scott's comments

* Rename api::Graph to api::GraphRef etc.

* Consider domains 'onnx.ai' and '' equal

* Replace AddInput with SetInput

* Improve tests

* quantization and heuristic tests

* Comments for tests

* Replace const string_view with string_view and update tests

* Fixes requested by Edward

* Fix std::string to string_view conversion

* Add <string> to includes

* Fix bug for broadcasting ops with unknown rank. Slight safety improvements

* Changes requested by Edward

* Fix formatting

* Improve description of cost metric
2021-10-27 22:10:39 -07:00
Scott McKay
b5a652c578
Add Xamarin support (#9436)
Add Xamarin support to the ORT nuget packages.
  - Update C# code to support Xamarin builds for iOS and Android
  - refactor some things to split out common code
  - include iOS and Android ORT native shared library in native nuget package
2021-10-27 20:07:07 +10:00
RajalakshmiSR
c54ad0dd0b
POWER: Add Dgemm kernel for POWER processor (#9459)
* POWER: Add Dgemm kernel for POWER processor

This patch adds new dgemm kernel specific to POWER processor.

* POWER: Restrict new functions to VSX in header

* Remove warning check in header

* POWER: Dgemm Adjust indentation

Fixing indentation based on review comments.

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-10-26 20:27:24 -07:00
Yulong Wang
90555bf96d
[node.js binding] enable CI for macOS arm64 (#9532)
* nodejs aggr

* add dependency

* no unzip

* fix aggregation

* add arm64 for mac

* mac arm64 build

* fix commandline

* add check for multi-CMAKE_OSX_ARCHITECTURES

* fix
2021-10-26 16:42:19 -07:00
Changming Sun
f39821adbc
Fix a bug in CMakeLists.txt when handling NO RTTI (#9547) 2021-10-26 14:29:29 -07:00
Jingqiao Fu
da15f5fc2f
change cmake condition to prevent WCOS fom linking advapi32 (#9500)
* change condition to prevent WCOS fom linking advapi32.dll

* Remove linkage to advapi32.lib
2021-10-26 12:16:49 -07:00
Stella Stamenova
542f1a9737
Cleanup some whitespace and capitalization for set (#9504) 2021-10-26 12:02:07 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
Changming Sun
f92b8e2ac8
Clean up optional-lite references (#9534) 2021-10-25 21:05:45 -07:00
Yulong Wang
bf4c3fa3d6
[node.js binding] aggregate binaries for multiple platforms in single NPM package (#9501) 2021-10-25 20:16:10 -07:00