* Exception when duplicated autograd.Function name detected
* reorder a bit for a bittle bit better perf
* fix a bug in previous PR :(
* correct the error message a bit
* Support opset-13 for squeeze, unsqueeze, maxpool, pad, cast, clip
* merge master and update a operators.md
* resolve comment. revise pool and cast kernel implementation.
* skip fusion when clip min and max is not in initializer
* re-hipify all rocm EP sources
* fix all other files affected by re-hipify
* add cuda_provider_factory.h to amd_hipify.py
* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration
* Fix ReduceConsts template specialization introduced in #9101.
Fixes the error when building for ROCm 4.3.1:
error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)
* fix flake8 error in amd_hipify.py
* speed up hipify with concurrent.futures
* flake8 fix in amd_hipify.py
* removing warnings which are causing errors from torch and changing flags for Windows
* adding MKL library resolution and comments
* cleaning up the code
* fixing onnxruntime_python file for windows build
* fix the include order to aovid the python_d.lib issue on win debug build
* changes for warnings, typos and other comments
* merge conflict
* adding fix for mkl library error
* Revert "adding fix for mkl library error"
This reverts commit 73b87c73c2.
* fix for dll path for windows
* typo for dll path
Co-authored-by: Cheng Tang <chenta@microsoft.com>
* Added null check before filling tensor with a value. Passing optional parameter for EinSum in case of MatMul type
* Addressed comment on the PR
Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>
* The serialization can be very heavy for large models
* Only use the serialized model check on compatible onnx versions
* onnx version >= 1.10.0 supports serialized model check
Signed-off-by: IceTDrinker <49040125+IceTDrinker@users.noreply.github.com>
* Transpose for DNNL EP
Transpose reorders the memory to the right format but has the wrong
dimentions and memory::format. So a new memory descriptor is created
that points to the reordered memory. However, that memory is in a
different location than the output expects. An extra parameter was
added to the SetMemory to specify when memory must be copied if it is
output from the subgraph.
Signed-off-by: George Nash <george.nash@intel.com>
* Implementation of Reshape op for dnnl ep
Signed-off-by: George Nash <george.nash@intel.com>
* Add Pow op to dnnl execution provider
This Pow is limited; the exponent must be scaler or a one dimensional tensor
e.g. a tensor with only a single element. The exponent must also be a constant
initializer since it is only read when the primitive is created. OneDNN does
not have any way to change the exponent after the primitive is created.
The GraphViewer is now passed into the NodeCapability code since the GraphViewer
is needed to find out if an input is a constant initializer.
The unit tests for "Pow" did not make the exponent a constant initializer. To
help verify the dnnl execution providers Pow function a version of the Pow unit
tests was created for the DNNL execution provier that made the exponent a constant
initializer.
Signed-off-by: George Nash <george.nash@intel.com>
* Add LeakyRelu to DNNL execution provider
LeakyRelu was added to the dnnl elementwise ops.
In the elementwise op the GetAlpha method was modified
to take the default value for Alpha as a parameter instead
of reading it from a member varable. This felt like it would
be less likely to cause programer error.
Signed-off-by: George Nash <george.nash@intel.com>
* Switch dnnl_code_capability DataTypes from strings to enums
Signed-off-by: George Nash <george.nash@intel.com>
* Update DnnlSubgraphPrimitive.GetMemory function input
This updates the GetMemory member function to take DnnlTensor
instead of a string. This was done for two reasons. Every
time the function was called it was always done using
DnnlTensor.Name() this will reduce the code repition. We never
called it using a saved string.
This also makes the function inputs more closely match the
GetMemoryAndReshape function. Making less differences between
member functions.
Signed-off-by: George Nash <george.nash@intel.com>
* resolve the provider options before create training session in orttrainer
* Update orttraining/orttraining/python/orttraining_pybind_common.h
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* support clear the training ep instance pool
* fix status error
Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>
* bias dropout improvement
* add transform case for same shape case
* combine kernel
* merge with vectorized kernel
* use "has_same_shape_bias"
* minor: a "N % 4 != 0" case
* add op UT for has_same_shape_bias
* address comments; add param case for 1d bias;
add param case tests for 1d and same-shape bias
* rewrite logic condition
Co-authored-by: Peng Wang <pengwa@microsoft.com>
Use thrust::transform_iterator when feeding input to cub::DeviceScan::InclusiveScan() to make sure the accumulator type is wide enough not to overflow.
* model caching changes for 2021.4
Signed-off-by: Your Name <you@example.com>
* changed the ov version check
* Minor changes added
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Added support for external data format
Starting from OpenVINO 2021.4 version, OpenVINO-EP
will support onnx models with Weights saved in external
file location.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Introduced Hetero/Multi options for perf_test
Enabled to use HETERO/MULTI device feature from
OpenVINO-EP using the onnxruntime_perf_test tool.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* cleaned up CMake code for older OV version support
OV 2020.3 is now longer supported by OpenVINO-EP.
This check is not required now.
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Add option to disable graph partitioning
Added a option to diable graph partitioning
during build time for OpenVINO-EP.
with this option, when the model is not fully
supported on OpenVINO-EP, the model fully fall
backs to default CPU EP (MLAS).
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Changed the flag for diabling graph partitioning
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixes the flake8 check error
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Added changes for disable graph partition option
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Fixed flake8 indentation error
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
Co-authored-by: Your Name <you@example.com>