Add abseil and inlined containers typedefs
Introduce TensorShapeVector for shape building.
Use gsl::span<const T> to make interfaces accept different types of vector like args.
Introduce InineShapeVectorT for shape capacity typed instantiations
Refactor cuda slice along with provider shared interfaces
Refactor Concat, Conv, Pad
Build with Conv Einsum and ConvTranspose refactored.
Remove TesnorShape::GetDimsAsVector()
Refactor SliceIterator and SliceIteratorBase
Refactor broadcast
Refactor Pads for twice as long
Remove memory planner intermediate shapes vector
Refactor orttraining
Fix passing TenshroShapeVector to tests
Remove abseil copy and submodule, use FetchContent_Declare/Fetch
Path with separate command
Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.
* add new field constant_initializers in metadef and remove constant initializers from trt node inputs
* remove redundancy
* use GetConstantInitializer() to get constant initializers
* add ORT_ENFORCE check
Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>
* squashed commit for standalone tvm execution provider
* critical fix for correct python build with stvm ep
* get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG
* updates and fixes
* update parsing of stvm provider options
* add support of external data for onnx model
* add conditional dump of subgraphs
* remove unused code
* get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API
* support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type)
* add fp16
* add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options
* fix license text in header. fix log format
* small fixes
* fix issues from flake8
* remove model proto construction from GetCapability
* reserve memory for vector of DLTensors
* add simple tutorial for STVM EP
* STVM docs
* jroesch/tvm -> apache/tvm
* remove dead code, unneccessary logs and comments
* fix in readme
* improve tutorial notebook
* tvm update
* update STVM_EP.md
* fix default value
* update STVM_EP.md
* some TODOs for the future development
* shorten long lines
* add hyperlink to STVM_EP.md
* fix Linux CI error
* fix error in csharp test
Co-authored-by: Jared Roesch <jroesch@octoml.ai>
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>
* Added checks for Hetero/Multi
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Remote Context Plugin
* changes for IO Buffer plugin
* erronous couts added
* erronous entry rectified
* Set the Openvino OP Buffer also as output
* Enable AUTO plugin in OpenVINO EP
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Remote Context Plugin
* changes for IO Buffer plugin
* erronous couts added
* erronous entry rectified
* Added checks for Hetero/Multi
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Set the Openvino OP Buffer also as output
* Enable AUTO plugin in OpenVINO EP
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Please commit error message and rectification of param.context
* Alignment fixed
Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com>
* Changed the string to OpenVINO_GPU
* hanged OpenVINO to to OpenVINO_CPU
* Onnxruntime updated API for memory location
* Removing Duplicate LOG Error
* Tensor.h removed DeviceType function. Updated comment
* API Comments updated
* Removing changes to Provider Indo
* Erronous commit
* Removing Extra logs
* Merge CMAKE
* Not copy from a local location
* Duplicate Entry
* Remove extra line
Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>
* schema change
* cc channges
* remove temp debug code
* Adding fbs namespace to session_state_flatbuffers_utils.h
* Add fbs namepsace to all ort format utils
* Construct valid graphs for ONNX checker for IR version < 4.
Previously the constructed graph was not guaranteed to have its
initializers be a subset of its inputs, which is required for IR
version < 4. This resulted in spurious failures.
Fixes#9663
Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.
* re-hipify all rocm EP sources
* fix all other files affected by re-hipify
* add cuda_provider_factory.h to amd_hipify.py
* do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration
* Fix ReduceConsts template specialization introduced in #9101.
Fixes the error when building for ROCm 4.3.1:
error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0)
* fix flake8 error in amd_hipify.py
* speed up hipify with concurrent.futures
* flake8 fix in amd_hipify.py
* Remove unused NodeArgs
* Handle case where a node arg from an initializer from initializer_names_to_preserve
* Fix CI failure
* update test
* Fix outer scope node args failure
* Use NodeArg* as the key of the std::set instead of string
* Minor updates
* implement cuda provider
* define profiler common
* call start after register
* add memcpy event
* add cuda correlation
* format code
* add cupti to test path
* switch to CUpti_ActivityKernel3
* reset cupti path
* fix test case
* fix trt pipeline
* add namespace
* format code
* exclude training from testing
* remove mutex