* Only serialize runtime optimization records container if non-empty.
* Remove runtime optimizations from onnxruntime/core/flatbuffers/schema/README.md as it's not completely implemented yet.
* Disable partial runtime optimization implementation by default.
* schema change
* cc channges
* remove temp debug code
* Adding fbs namespace to session_state_flatbuffers_utils.h
* Add fbs namepsace to all ort format utils
When the pattern Sum(Gemm(A, B), C) exists, we can convert it to
Gemm(A, B, C), assuming that C the output of the original Gemm is
not used elsewhere, and this change does not break broadcasting.
ORT format model runtime optimization implementation is in progress.
This change adds a build.py option to disable the partial runtime optimization implementation, adds CI builds to test it, and disables runtime optimizations in mobile package builds.
* Construct valid graphs for ONNX checker for IR version < 4.
Previously the constructed graph was not guaranteed to have its
initializers be a subset of its inputs, which is required for IR
version < 4. This resulted in spurious failures.
Fixes#9663
implement dynamicquantizelinear in DNNL EP
add debug log in EP for operator coverage
block gpu elementwise op with 5 dims or more
Signed-off-by: Wang <zhaoyang.wang@intel.com>
* Fix#9671 by running the level 1 rewrite rules first and allowing the transpose optimizer to run multiple times to ensure it completes in level 1.
Removed unnecessary call to GenerateRuleBasedGraphTransformer as there are no level 2 rewrite rules.
* remove default python ep registration. raise exception if providers are not explicitly set if there are available providers
* temporarily disable exception
* fix python tests
* explicitly set CUDAProvider for python iobinding tests
* explicitly set providers param for InferenceSession())
* onnxrt
* raise ValueError if not explicitly set providers when creating InferenceSession
* add required providers param
* explicitly set providers
* typo
* Add 1.option for enable qdq for node's output 2.force qdq appear as a pair
* modify description
* modify description
* Revert the logic of variable
* Revert the logic of variable
* Code refactor based on review's suggestions
* Update init
* Code refactor for able to specify nodes to exclude output quantization
* rename variable
* Fix bug
* code refactor
* remove the exposure of APIs
* fix bug
* fix bug
* fix bug
* fix bug
* exposure one API
Co-authored-by: Ubuntu <onnxruntime@ort-trt-ep-linux-t4.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
Co-authored-by: Chi Lo <Chi.Lo@gmail.com>
* add p50 in test
* Support FusedConv in WebGL
* resolve comments
* add a comment for longToNumber change
Co-authored-by: Yulong Wang <yulongw@microsoft.com>
* explicit link with libtorch instead of use cmake var to avoid introduce mkl dependency
* use find_lib to get libtorch lib name
* temp fix
* add missing libraries
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Updated MLOperatorAuthorPrivate.h to remove `enum DML_TENSOR_DATA_TYPE;` to avoid warning "C4471: 'DML_TENSOR_DATA_TYPE': a forward declaration of an unscoped enumeration must have an underlying type"
Updated OperatorUtility to avoid compiler error errors C2672 and C2783.
- Error C2672: 'TryMapStringToIndex': no matching overloaded function found
- Error C2783: 'std::optional<_Ty> Dml::TryMapStringToIndex(std::string_view,gsl::span<const Dml::NameAndIndex>)': could not deduce template argument for 'T'. note: see declaration of 'Dml::TryMapStringToIndex'. 'TryMapStringToIndex': function declaration must be available as none of the arguments depend on a template parameter
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.