* POWER10: Add optimized dgemm kernel
This patch makes use of POWER10 matrix multiply assist feature and
adds new DGEMM kernel.
* Indentation update
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
- Only set them as targets for the ORT nuget package
- Use OrtPackageId as the condition for inclusion, if installed
- need to do the nuget restore via msbuild so that this property is set correctly
- Add desktop-only version of the C# sln as there is no way to exclude the mobile specific csproj's from an sln
- use this when applicable if someone is running build.py with the `--build_nuget` flag
Other
- remove attempt to include symbols in the nuget package as nuget doesn't support symbols in native packages
- update build.py to use `nuget` and not a windows specific path and filename for a linux build with `--build_nuget`
* es2017 by default for ort-common
* add visualizer and define plugin
* es2017 for ort-web. also add build target for es5
* add multiple reduced size build for ort-web
* resolve comments, add e2e tests and add docs
* fix segmentation fault
* fix typo
* fix bug
* make logic the same as CUDA ep
* Modify for OpenVINO
* Add env variable check for OpenVIO
* refine the code
* refine EP failed registration warning messages.
* update OpenVINO exception message.
Co-authored-by: George Wu <jywu@microsoft.com>
* Implementaiton of Squeeze op for dnnl ep
Signed-off-by: George Nash <george.nash@intel.com>
* Implementaiton of Unsqueeze op for dnnl ep
Tests were added to the unsqueeze_op_test to test Unsqueeze op
with a scalar input.
The OneDNN (dnnl) ep automatically converts scalars to a one dimentional
tensor. For most operations this causes no problems. However, for
Unsqueeze the difference between a scalar vs. tensor couldn't be
ignored. A IsScalar member function was added to the DnnlSubgrapPrimitive
class that will return true if the ORT tensor was a scalar type. IsScalar()
is then used inside the Unsqueeze code.
updated the squeeze node capability to only accept ConstantInitializer
inputs.
All unsqueeze op tests that tested opset 13 now run with and without
constant initializers.
Signed-off-by: George Nash <george.nash@intel.com>
* Arm64 Depthwise Convolution 3x3.
* Add 5x5 intrinsic dwqconv for arm64
* rebase to master, remove no-need logic after arm64 convsym enabled.
* Some more adjustment on the instrunction pipeling.
* Add specific test cases.
* Fix test dimension too small.
* Fix build warning as error on some CI.
* better format, etc.
* add use_tensorrt build option
* Add use_tensorrt to running tests
* add use_tensorrt for Windows
* make trt ep to skip backend test
* make trt ep to skip backend test
* Fix bug
* Add/Modify description
* modify for debug
* swtich pool to test
* modify to debug
* modify to debug
* add vobersity
* refine the code
* refine the code
* refine the code
* fix flake8 warning
* refine the code
* add pre_load check for trt as well as add cupti lib to cuda depedencies
* modify script to make trt build path the same as cuda
* show error message when user wants to run TensorRT but TensorRT is not installed in the env
* fix bug
* fix bug
* add trt lib for manylinux
* include cuda_dependencies for trt
* rewrite the condition to throw exception
* make code more compact
* Add intermediate header between the ORT code and pybind11 to workaround an issue with VS2022 debug builds by making sure corecrt.h is included first.
This avoids the _STL_ASSERT macro being defined in an incompatible way for a debug build by pybind including the python headers with _DEBUG temporarily undefined .
See #9735 for details.
* register custom symbolic for einsum
* bugfix for case needs permute at the end
* refactor
* refactor equation parser
* support new case, use ReduceProd
* optimize perf and graph
* remove some Gather node
* add more ut, fix gemm trans fusion
* Update required operators for prebuilt package to add opsets 14 and 15.
Add helper script to check if the prebuilt package will support the model and if not why not.
* Add support for multiple opsets being specified on a single line in the required operators config. This makes it easier to update the pre-built package config.
It's also required for validation tools to work as they only have a single opset from the model and not per-operator opsets. If we only list the incremental ops we could merge in the ops from the previous opset, but that wouldn't give a way to drop an operator from being supported.
Left the info on which ops changed though so we have a better feel for the cost of supporting each opset.