* fix segmentation fault
* fix typo
* fix bug
* make logic the same as CUDA ep
* Modify for OpenVINO
* Add env variable check for OpenVIO
* refine the code
* refine EP failed registration warning messages.
* update OpenVINO exception message.
Co-authored-by: George Wu <jywu@microsoft.com>
* Implementaiton of Squeeze op for dnnl ep
Signed-off-by: George Nash <george.nash@intel.com>
* Implementaiton of Unsqueeze op for dnnl ep
Tests were added to the unsqueeze_op_test to test Unsqueeze op
with a scalar input.
The OneDNN (dnnl) ep automatically converts scalars to a one dimentional
tensor. For most operations this causes no problems. However, for
Unsqueeze the difference between a scalar vs. tensor couldn't be
ignored. A IsScalar member function was added to the DnnlSubgrapPrimitive
class that will return true if the ORT tensor was a scalar type. IsScalar()
is then used inside the Unsqueeze code.
updated the squeeze node capability to only accept ConstantInitializer
inputs.
All unsqueeze op tests that tested opset 13 now run with and without
constant initializers.
Signed-off-by: George Nash <george.nash@intel.com>
* Arm64 Depthwise Convolution 3x3.
* Add 5x5 intrinsic dwqconv for arm64
* rebase to master, remove no-need logic after arm64 convsym enabled.
* Some more adjustment on the instrunction pipeling.
* Add specific test cases.
* Fix test dimension too small.
* Fix build warning as error on some CI.
* better format, etc.
* add use_tensorrt build option
* Add use_tensorrt to running tests
* add use_tensorrt for Windows
* make trt ep to skip backend test
* make trt ep to skip backend test
* Fix bug
* Add/Modify description
* modify for debug
* swtich pool to test
* modify to debug
* modify to debug
* add vobersity
* refine the code
* refine the code
* refine the code
* fix flake8 warning
* refine the code
* add pre_load check for trt as well as add cupti lib to cuda depedencies
* modify script to make trt build path the same as cuda
* show error message when user wants to run TensorRT but TensorRT is not installed in the env
* fix bug
* fix bug
* add trt lib for manylinux
* include cuda_dependencies for trt
* rewrite the condition to throw exception
* make code more compact
* Add intermediate header between the ORT code and pybind11 to workaround an issue with VS2022 debug builds by making sure corecrt.h is included first.
This avoids the _STL_ASSERT macro being defined in an incompatible way for a debug build by pybind including the python headers with _DEBUG temporarily undefined .
See #9735 for details.
* register custom symbolic for einsum
* bugfix for case needs permute at the end
* refactor
* refactor equation parser
* support new case, use ReduceProd
* optimize perf and graph
* remove some Gather node
* add more ut, fix gemm trans fusion
* Update required operators for prebuilt package to add opsets 14 and 15.
Add helper script to check if the prebuilt package will support the model and if not why not.
* Add support for multiple opsets being specified on a single line in the required operators config. This makes it easier to update the pre-built package config.
It's also required for validation tools to work as they only have a single opset from the model and not per-operator opsets. If we only list the incremental ops we could merge in the ops from the previous opset, but that wouldn't give a way to drop an operator from being supported.
Left the info on which ops changed though so we have a better feel for the cost of supporting each opset.
* Added checks for Hetero/Multi
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Remote Context Plugin
* changes for IO Buffer plugin
* erronous couts added
* erronous entry rectified
* Set the Openvino OP Buffer also as output
* Enable AUTO plugin in OpenVINO EP
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Remote Context Plugin
* changes for IO Buffer plugin
* erronous couts added
* erronous entry rectified
* Added checks for Hetero/Multi
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Set the Openvino OP Buffer also as output
* Enable AUTO plugin in OpenVINO EP
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Please commit error message and rectification of param.context
* Alignment fixed
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Changed the string to OpenVINO_GPU
* hanged OpenVINO to to OpenVINO_CPU
* Onnxruntime updated API for memory location
* Removing Duplicate LOG Error
* Tensor.h removed DeviceType function. Updated comment
* API Comments updated
* Removing changes to Provider Indo
* Erronous commit
* Removing Extra logs
* Merge CMAKE
* Not copy from a local location
* Duplicate Entry
* Remove extra line
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
Adding ARM64 depthwise convolution kernel for symmetric quantization
Motivation and Context
Two improvements against current kernel code :
1. Signed int8 based instructions, no need to extend from 8b to 16b before multiplication.
2. Unrolled loop with manual software pipelining
Co-authored-by: Chen Fu <fuchen@microsoft.com>