Although github works with both, this is more precise.
Having an extension also makes it easy to match with regex, when we want to inject code to reroute traffic to our own git mirror.
In a reduced ops build, some source files get updated. This change moves the updated files into the build directory. This way, it is easier to simultaneously manage different build directories (with possibly different reduced ops configurations) based on a single source directory.
* add new field constant_initializers in metadef and remove constant initializers from trt node inputs
* remove redundancy
* use GetConstantInitializer() to get constant initializers
* add ORT_ENFORCE check
Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
* Include onnxruntime binary when not using pacakge referene or uap app.
* Remove the lib\uap10.0 build from the nuget package - causing conflicts
* Add UWP test
* remove build files
* remove local change
* reset mimalloc and onnx-tensorrt
* change username to Microsoft
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
* Add Reduce Ops to DNNL ep
Combine the Reduction ops into one class
Add ReduceL1, ReduceL2, ReduceSum, ReduceMax, ReduceMin, and ReduceProd,
ReduceSumSquare, ReduceLogSum, and ReduceLogSumExp
Reduce code now also handles the keepdims attribute
Also updated code to use HandleNegativeAxis function from
the providers/common.h code instead of manually calculating.
In code documentation exists to help explain complex reduction op code
Add elementwise ops to Reduction op capability code removed keepdims check
from the Reduction op capability code.
Updated the error_tolerance for LogGrad(DNNL EP only) after finding a few
instances that the tests were a little out of tolerance.
Signed-off-by: George Nash <george.nash@intel.com>
* Documentation cleanup in dnnl_qattention
Cleaned up the Comments documenting the QAttention operator
For some reason a bunch of new lines were introduced to the
comment making it harder to read.
Signed-off-by: George Nash <george.nash@intel.com>
If Group attr = 1 allow the OneDNN library to optimize the memory
layout for the device the Convolution operator is being run on.
With out this optimization the default NCHW memory layout is used
on CPUs the NCHW memory layout can result in a significant performance
decrease.
Signed-off-by: George Nash <george.nash@intel.com>
Before this change, building DNNL EP from onnxruntime 1.10.0 with clang fails with:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc:4:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.h:5:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_subgraph.h:10:
In file included from /build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/shared_library/provider_api.h:19:
In file included from /build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/common.h:36:
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:33:6: error: call to function 'operator<<' that is neither visible in the template definition nor found by argument-dependent lookup
ss << t;
^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<std::vector<long>>' requested here
MakeStringImpl(ss, args...);
^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<long, const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, long, const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:39:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<unsigned long, const char *, long, const char *, std::vector<long>>' requested here
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:46:3: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, unsigned long, const char *, long, const char *, std::vector<long>>' requested here
MakeStringImpl(ss, args...);
^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/make_string.h:93:18: note: in instantiation of function template specialization 'onnxruntime::detail::MakeStringImpl<const char *, unsigned long, const char *, long, const char *, std::vector<long>>' requested here
return detail::MakeStringImpl(detail::if_char_array_make_ptr_t<Args const&>(args)...);
^
/build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc:46:7: note: in instantiation of function template specialization 'onnxruntime::MakeString<char [20], unsigned long, char [23], long, char [9], std::vector<long>>' requested here
ORT_ENFORCE(data_dims[i] == 1, "Dimension of input ", i, " must be 1 instead of ", data_dims[i],
^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/common/common.h:184:64: note: expanded from macro 'ORT_ENFORCE'
::onnxruntime::MakeString(__VA_ARGS__)); \
^
/build/python-onnxruntime/src/onnxruntime/include/onnxruntime/core/framework/tensor_shape.h:147:15: note: 'operator<<' should be declared prior to the call site
std::ostream& operator<<(std::ostream& out, const TensorShape& shape);
^
1 error generated.
make[2]: *** [CMakeFiles/onnxruntime_providers_dnnl.dir/build.make:384: CMakeFiles/onnxruntime_providers_dnnl.dir/build/python-onnxruntime/src/onnxruntime/onnxruntime/core/providers/dnnl/subgraph/dnnl_squeeze.cc.o] Error 1
Two-phase lookups fail as:
1. visible in the template definition - fails as `std::ostream& operator<<(std::ostream& out, const TensorShape& shape)` (from include/onnxruntime/core/framework/tensor_shape.h) is defined after `template <typename... Args> std::string MakeString(const Args&... args)` (from include/onnxruntime/core/common/make_string.h) as per `clang++ -E`
2. argument-dependent lookup - fails as the argument data_dims has type `std::vector<long>` (via typedef in dnnl.hpp), while `std::ostream& operator<<(std::ostream& out, const TensorShape& shape)` is in namespace onnxruntime instead of std
There are several possible fixes:
* Make operator<< appear before MakeString by adjust the order of header files - I consider it fragile
* Also define operator<< in namespace std - may results in namespace pollution
* Use an argument of a class in onnxruntime namespace - this commit
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* update base image from 11.4.0 to 11.4.2
* update Linux TRT GPU pipeline to TRT 8.2
* update onnx-tensorrt to 8.2-GA
* disable failing TensorRT 8.2 tests.
* update pad test.
* fix
* update win trt ci pipeline to trt 8.2
* test run with cuda 11.4 and cudnn 8.2
* increase timeout
* revert
* revert
* update packaging pipelines to use trt 8.2
* fix typo
* update trt gpu perf pipeline to trt 8.2
* increase timeout
* delete deprecated ci-perf-pipeline.yml
* bump timeout
* adjust timeout packaging
Adding a symmetric quantized convolution kernel for ARM64
Note:
Indirect conv performs worse for shallow convs (input channels are small). This is much more so for low end pre-dot CPUs, where only 128 or deeper conv is faster with indirect conv. With DOT-CPUs, 32 deep conv is already faster
Co-authored-by: Chen Fu <fuchen@microsoft.com>