Commit graph

939 commits

Author SHA1 Message Date
RajalakshmiSR
8564fc1933
POWER10: Add optimized dgemm kernel (#9652)
* POWER10: Add optimized dgemm kernel

This patch makes use of POWER10 matrix multiply assist feature and
adds new DGEMM kernel.

* Indentation update

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-11-22 20:28:21 -08:00
Dwayne Robinson
32419974ad Merge remote-tracking branch 'origin/master' into user/dwayner/DML1.8forORT1.10 2021-11-19 05:20:26 -08:00
Dwayne Robinson
e0ffc30a0b Update to 1.8.0 2021-11-19 04:44:32 -08:00
Zhang Lei
8ef6aff734
Zhalei/dwqconv3x3 5x5 arm64 (#9714)
* Arm64 Depthwise Convolution 3x3.

* Add 5x5 intrinsic dwqconv for arm64

* rebase to master, remove no-need logic after arm64 convsym enabled.

* Some more adjustment on the instrunction pipeling.

* Add specific test cases.

* Fix test dimension too small.

* Fix build warning as error on some CI.

* better format, etc.
2021-11-18 13:57:16 -08:00
Changming Sun
76715ad525
Delete ioscross code (#9793) 2021-11-18 11:31:13 -08:00
Hariharan Seshadri
e23892ddbe
Support disabling support for the optional type in ORT builds (#9745) 2021-11-17 19:13:28 -08:00
Dwayne Robinson
99afb87a02 Update DirectML 1.5.1 to 1.8.0 for ORT1.10 2021-11-15 21:17:25 -08:00
sfatimar
1d03baa8cc
Openvino ep 2021.4 v3.3 (#9588)
* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Remote Context Plugin

* changes for IO Buffer plugin

* erronous couts added

* erronous entry rectified

* Added checks for Hetero/Multi

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Set the Openvino OP Buffer also as output

* Enable AUTO plugin in OpenVINO EP

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Please commit error message and rectification of param.context

* Alignment fixed

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the string to OpenVINO_GPU

* hanged OpenVINO to to OpenVINO_CPU

* Onnxruntime updated API for memory location

* Removing Duplicate LOG Error

* Tensor.h removed DeviceType function. Updated comment

* API Comments updated

* Removing changes to Provider Indo

* Erronous commit

* Removing Extra logs

* Merge CMAKE

* Not copy from a  local location

* Duplicate Entry

* Remove extra line

Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
2021-11-15 13:41:12 -08:00
Chen Fu
1c84621020
Adding ARM64 depthwise convolution kernel for symmetric quantization (#9655)
Adding ARM64 depthwise convolution kernel for symmetric quantization

Motivation and Context
Two improvements against current kernel code :

1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication.
2. Unrolled loop with manual software pipelining

Co-authored-by: Chen Fu <fuchen@microsoft.com>
2021-11-15 12:18:43 -08:00
Tang, Cheng
99257eb8e3
support build option to include external graph transformers (#9478)
* temp code

* support external graph transformer  from build script

* remove debug code

* add test case

* support register rewrite rule

* fix source_group issue if external source is not share any common prefix

* fix python code style checker

* resolve merge conflict

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-15 08:16:20 -08:00
Edward Chen
9f69d8bbae
Disable partial runtime optimization implementation by default (#9748)
* Only serialize runtime optimization records container if non-empty.

* Remove runtime optimizations from onnxruntime/core/flatbuffers/schema/README.md as it's not completely implemented yet.

* Disable partial runtime optimization implementation by default.
2021-11-12 17:37:29 -08:00
Sheil Kumar
a17bdaf725
Enable JoinModels API in WinML+RT Experimental API (#9746)
* Dynamic onnx model fusion

* empty node names shoudl remain empty

* comments and cleanup

* logic reversed for promoting_unlined_outputs

* PR feedback

* type

* typo

* fix model outputs with promote unlinked output

* remove disembodied model

Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
2021-11-12 16:56:31 -08:00
Edward Chen
997266a620
Add build.py option to disable ORT format model runtime optimization (#9723)
ORT format model runtime optimization implementation is in progress.
This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.
2021-11-11 18:05:45 -08:00
Tang, Cheng
6420530b3a
fix the mkl dependency for eager mode (#9702)
* explicit link with libtorch instead of use cmake var to avoid introduce mkl dependency

* use find_lib to get libtorch lib name

* temp fix

* add missing libraries

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-11-09 08:52:55 -08:00
Changming Sun
53afaefe3b
Refactor Windows CI pipeline yaml files (#9672) 2021-11-08 11:11:49 -08:00
Ginés Hidalgo
13e64f8ff7
Remove all warnings C4800: Implicit conversion from 'int32_t/int64_t' to bool. Possible information loss (#9535) 2021-11-08 10:12:27 -08:00
Yulong Wang
c6fddb263f
Add Node.js binding support to packaging pipeline (#9577) 2021-11-05 15:29:40 -07:00
Changming Sun
1cbbafdbe0
Change the default value of onnxruntime_DISABLE_RTTI (#9674) 2021-11-05 15:27:04 -07:00
Weixing Zhang
e11fde0179
libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package. (#9618)
* libonnxruntime_providers_rocm.so and libonnxruntime_providers_shared.so are not included in python package.

Co-authored-by: Weixing Zhang <wezhan@microsoft.com>
2021-11-01 19:12:09 -07:00
Edward Chen
c315d1b3cd
Always enable ORT format model loading. (#9586) 2021-11-01 10:00:08 +10:00
Ginés Hidalgo
79436a2d5b
Avoided warning C5038 (#9543)
Updated several DML EP files to avoid warning C5038: data member 'member1' will be initialized after data member 'member2' / base class 'base_class'

More information:
https://docs.microsoft.com/en-us/cpp/error-messages/compiler-warnings/c5038?view=msvc-160
2021-10-30 00:36:22 -07:00
Jingqiao Fu
f7774a91d6
Add api-ms-win-core-com-l1-1-0.dll, shlwapi.dll, oleaut32.dll to delay load (#9619) 2021-10-29 18:54:23 -07:00
Hariharan Seshadri
b5f7bb7d10
Update ONNX (#9462) 2021-10-29 10:33:40 -07:00
TomWildenhain-Microsoft
e8268c9a18
Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284)
* Add Transpose Optimizer and modify nhwc optimizer to use it.

* Fix casts

* Fix casts2

* Fix move

* Add tests

* Add headers

* Fixes and tests

* Remove explicit template instantiation

* Fix build warning

* Name unit tests

* Code review fixes

* Add some comments

* Fix some casts

* Make optimization slightly less agressive

* Some unit test fixes

* Update Attention pattern to work with transpose optimizer

* Update attention fuser

* Fix attention fusion python script

* Improve transpose optimizer documentation

* Create OptimizerCtx struct

* Disable Slice handler for testing

* Implement Slice int32

* Only push transposes leading up to other transposes

* Improve optimization heuristic

* Add exemption for MaxPool

* Document transpose optimizer api.h

* Revert fusion tests to master

* Remove temp files

* Replace typedef with using

* Trim trailing whitespace

* Move class declarations from api_impl.h to api_impl.cc

* Remove copy constructors and move allocator

* Alphabetize headers

* Add override keyword

* Comments for nhwc_transformer

* Rename OrtGraph to ApiGraph, etc.

* Wrap line

* Remove extra qualifier on ApiGraph

* Refector attention fusion

* Remove c-style casts from api_impl.cc

* Improve documentation

* Avoid printing vector in ORT_ENSURES

* Revert attention fusion refactor

* Remove duplicate cost heuristics and improve documentation

* Fix size_t casts

* Fixes from Scott's review

* Unrevert attention refactor and more updates from Scott's review

* Revert api_impl.cc ValueInfo change

* only optimize first transpose input

* Unrevert api_impl.cc changes

* Make vector call reserve

* transpose_optimizer.cc update from Scott's comments

* Rename api::Graph to api::GraphRef etc.

* Consider domains 'onnx.ai' and '' equal

* Replace AddInput with SetInput

* Improve tests

* quantization and heuristic tests

* Comments for tests

* Replace const string_view with string_view and update tests

* Fixes requested by Edward

* Fix std::string to string_view conversion

* Add <string> to includes

* Fix bug for broadcasting ops with unknown rank. Slight safety improvements

* Changes requested by Edward

* Fix formatting

* Improve description of cost metric
2021-10-27 22:10:39 -07:00
Scott McKay
b5a652c578
Add Xamarin support (#9436)
Add Xamarin support to the ORT nuget packages.
  - Update C# code to support Xamarin builds for iOS and Android
  - refactor some things to split out common code
  - include iOS and Android ORT native shared library in native nuget package
2021-10-27 20:07:07 +10:00
RajalakshmiSR
c54ad0dd0b
POWER: Add Dgemm kernel for POWER processor (#9459)
* POWER: Add Dgemm kernel for POWER processor

This patch adds new dgemm kernel specific to POWER processor.

* POWER: Restrict new functions to VSX in header

* Remove warning check in header

* POWER: Dgemm Adjust indentation

Fixing indentation based on review comments.

Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
2021-10-26 20:27:24 -07:00
Yulong Wang
90555bf96d
[node.js binding] enable CI for macOS arm64 (#9532)
* nodejs aggr

* add dependency

* no unzip

* fix aggregation

* add arm64 for mac

* mac arm64 build

* fix commandline

* add check for multi-CMAKE_OSX_ARCHITECTURES

* fix
2021-10-26 16:42:19 -07:00
Changming Sun
f39821adbc
Fix a bug in CMakeLists.txt when handling NO RTTI (#9547) 2021-10-26 14:29:29 -07:00
Jingqiao Fu
da15f5fc2f
change cmake condition to prevent WCOS fom linking advapi32 (#9500)
* change condition to prevent WCOS fom linking advapi32.dll

* Remove linkage to advapi32.lib
2021-10-26 12:16:49 -07:00
Stella Stamenova
542f1a9737
Cleanup some whitespace and capitalization for set (#9504) 2021-10-26 12:02:07 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
Changming Sun
f92b8e2ac8
Clean up optional-lite references (#9534) 2021-10-25 21:05:45 -07:00
Yulong Wang
bf4c3fa3d6
[node.js binding] aggregate binaries for multiple platforms in single NPM package (#9501) 2021-10-25 20:16:10 -07:00
marcusfreisleben
651955d3c9
CUDA: Enable parallel compilation (#8974)
* Pass on parallel option to nvcc

* Fixed build.py

* Added missing string conversion

* Adressed review points
2021-10-25 16:42:58 -07:00
Stella Stamenova
d608504438
Don't use legacy mode for protobuf (#9498) 2021-10-22 16:50:29 -07:00
Changming Sun
d83adaaf9f
Remove optional-lite (#9424) 2021-10-22 16:45:45 -07:00
Stella Stamenova
49b66c7486
NFC: Normalize whitespace around if statements in CMakeLists.txt (#9464)
Always add a space after if to make the file consistent
2021-10-21 15:35:58 -07:00
Stella Stamenova
9fc53df33a
Only add aliasing to targets if the corresponding package was found (#9404) 2021-10-20 11:32:08 -07:00
Changming Sun
406f1629c1
Remove Featurizers code (#9300) 2021-10-20 10:20:35 -07:00
Yufeng Li
da3dd398c5
Kernels for QLinearConv with symmetrically quantized filter (#9323)
Add kernels for QLinearConv with symmetric quantized filter, e.g., filter type is int8 and zero point of filter is 0. This PR includes kernels for avx2, avxvnni, avx512 and avx 512 vnni. Will adds kernels for ARM64 in following PR.

Kernels uses direct input buffer directly for pointwise, and in-direct buffer for depthwise and non-group conv.

The advantages of those new kernels are:

no need to compute the sum of each pixel output image, and sum/offset of filter can be combined with bias.
with in-direct buffer, im2col returns an array of buffer pointers instead of memcpy'ing the original data. This saves memcpy time and reduces the size of the intermediate buffer needed to hold the im2col transform. In the future, will compute im2col ahead of time for input with fixed input size.
2021-10-18 19:40:18 -07:00
Jeff Daily
c8789d3047
[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877)
* re-hipify all rocm EP sources

* fix all other files affected by re-hipify

* add cuda_provider_factory.h to amd_hipify.py

* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration

* Fix ReduceConsts template specialization introduced in #9101.

Fixes the error when building for ROCm 4.3.1:

error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)

* fix flake8 error in amd_hipify.py

* speed up hipify with concurrent.futures

* flake8 fix in amd_hipify.py
2021-10-14 15:15:51 -07:00
Abhishek Jindal
23700a15a0
Abjindal/eager windows build (#9326)
* removing warnings which are causing errors from torch and changing flags for Windows

* adding MKL library resolution and comments

* cleaning up the code

* fixing onnxruntime_python file for windows build

* fix the include order to aovid the python_d.lib issue on win debug build

* changes for warnings, typos and other comments

* merge conflict

* adding fix for mkl library error

* Revert "adding fix for mkl library error"

This reverts commit 73b87c73c2.

* fix for dll path for windows

* typo for dll path

Co-authored-by: Cheng Tang <chenta@microsoft.com>
2021-10-14 12:54:49 -07:00
Sunghoon
2f1204a5d5
[js/web] Enable wasm profiling and preserve function names in profiling (#9314)
* add p50 in test

* allow WebAssembly profiling and preserve function names

Co-authored-by: Yulong Wang <yulongw@microsoft.com>
2021-10-11 22:04:50 -07:00
Maajid khan
72c4cea9e6
[OpenVINO-EP] V3.2 Release (#9232)
* model caching changes for 2021.4

Signed-off-by: Your Name <you@example.com>

* changed the ov version check

* Minor changes added

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added support for external data format

Starting from OpenVINO 2021.4 version, OpenVINO-EP
will support onnx models with Weights saved in external
file location.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Introduced Hetero/Multi options for perf_test

Enabled to use HETERO/MULTI device feature from
OpenVINO-EP using the onnxruntime_perf_test tool.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* cleaned up CMake code for older OV version support

OV 2020.3 is now longer supported by OpenVINO-EP.
This check is not required now.

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Add option to disable graph partitioning

Added a option to diable graph partitioning
during build time for OpenVINO-EP.

with this option, when the model is not fully
supported on OpenVINO-EP, the model fully fall
backs to default CPU EP (MLAS).

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Changed the flag for diabling graph partitioning

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixes the flake8 check error

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Added changes for disable graph partition option

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

* Fixed flake8 indentation error

Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>

Co-authored-by: Your Name <you@example.com>
2021-10-07 16:02:19 -07:00
Gary Miguel
e2b1852eec
Build: respect onnxruntime_PREFER_SYSTEM_LIB for more things (#9181)
This is based on a patch applied locally by
https://github.com/conda-forge/onnxruntime-feedstock. Having this in
master seems useful.
2021-10-06 13:49:28 -07:00
baijumeswani
bcdb411c8d
Implement FusedAdam for ORT adapted from DeepSpeed (#9266) 2021-10-05 20:50:34 -07:00
Tiago Koji Castro Shibata
11a391a88f
Port ARM64x support (#9230) 2021-10-01 13:06:43 -07:00
Yulong Wang
e2d779246a
[wasm] remove deprecated prefix 'EXTRA_' in emcc flags (#9211) 2021-09-30 16:02:24 -07:00
Yulong Wang
8c57d51928
support WebAssembly SIMD for qgemm (#9191)
* support WebAssembly SIMD for qgemm

* remove '--experimental-wasm-bulk-memory' for test
2021-09-30 12:40:56 -07:00
Thiago Crepaldi
ceb51dda4a
Support external torch cpp extensions on ORTModule (#9223) 2021-09-30 10:37:35 -04:00