* Share thread pools between devices
* make tests reuse device
* Change cpu thread pool options for dml sessions to use 1 thread with no spinning
* fix test failure
* Update missing type constraints for dft
* Add comment and rename inference session parameter
* default missing causing inconsistent test behavior
Co-authored-by: Sheil Kumar <sheilk@microsoft.com>
This patch uses vector instrinsics to optimize MlasQLinearAddKernelHelper
function for POWER processor.
Co-authored-by: Rajalakshmi Srinivasaraghavan <rajis@linux.ibm.com>
**Description**: Extract arg value from torch Value
**Motivation and Context**
Input to gelu is `torch._C.Value` type values. This caused the `if approximate == "none"` check to always fail, preventing the optimized `com.microsoft::Gelu` op from being used.
* Checkpoint API Implementation
* fix build issues
* fix undefined reference for ParseData of type string.
* refinements
* resolve some comments
* expose python api
* make save and load test pass
* some clean up
* make optimizer save/load test pass
* make custom property save/load test pass
* formatting
* fix comments - fix wave - code placement, remove legacy ckpt logic dependency, remove external data support
* fix comment - wave 2 - Remove ParseData/ParseStringData, Use UnpackTensor, Simplify CheckpointProperty usage
* fix comment - wave 3 - rename all api_test namespace to api
* fix comment - wave 4 - load/save trainable/nontrainable param seperately.
* Rename Load/SaveORTCheckpoint
* renaming API && remove CheckpointUntils. api::LoadCheckpoint/SaveCheckpoint is the exposed interfaces.
* revert unnecessary format change for onnxruntime/core/framework/tensorprotoutils.h/cc
* formatting
* re-org the class folders for better dependency managerment
* save_checkpoint accpeting TensorProto as inputs
* More clean up
* clean up the naming
* refactor a bit type constraints on custom property
* fix comment - file read/write && report error when file read/write failed
* extract LoopDir to FilterFilesFromDirectory
* fix build
* initial implementation for support nnapi depthtospace
* modify depthtospace output tensor shape and enable test pass
* minor update
* minor update
* modify input output layout order and hack nnapi instance to use nchw flag for optest
* address pr comments
* add depthtospace to layout logic
* format length and revert UT log level
* add nchw and android feature level check in opsupportchecker
* minor fix
* update
* update
* fix
* minor update
* Add disentangled attention TRT plugin as contrib op
* update plugin name & remove null character
* update onnx-tensorrt submodule with my beta version
* use suggested plugin name & simpler shape propagation
* update onnx-tensorrt gitsubmodule to temporary fork
* update onnx-tensorrt to temporary commit
* redirect submodule back to latest 8.2-GA release of onnx-tensorrt repo
Co-authored-by: HHH-ComputeLab <haohangh@nvidia.com>
* use the lightweight compile api as default; use dnnl ep for testing
* apply to tensorrt ep
* fix the missing files
* fix build
* fix the copy issue on linux
* migrate migraphx and openvino ep
* fix openvino build break
* fix linux build
* fix unused parameter
* fix coreml build
* use graph view's filtered initializers
* fix openvino break
* fix tvm compile api
* fix tvm / rknpu / vitisai ep build
* add IsInitializedTensor in graph_viewer; fix nuphar build
* use serializer directly as tvm ep is still static lib
* fix the type mismatch
* fix the type mismatch
* fix merge conflict
* add a comment
* fix minimal build
* fix the DML EP's legacy approach
* save type/shape in dnnl IR
* fix linux break
* fix tvm failure
* dnnl ep: move initializer referenced out of dnnl subgraph
* Revert "add IsInitializedTensor in graph_viewer; fix nuphar build"
This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587.
* add IsInitializedTensor to graph viewer
* add the legacy code for nuphar build to temporarily make nuphar build work
* ignore internal test for nuphar
* remove the out of date tests
* keep the legacy API in EP for a while
* turn serializer into a static function
* update comments
* fix tvm build
* Update include/onnxruntime/core/framework/execution_provider.h
Co-authored-by: Pranav Sharma <prs@microsoft.com>
* Update include/onnxruntime/core/framework/execution_provider.h
Co-authored-by: Pranav Sharma <prs@microsoft.com>
* Update onnxruntime/core/framework/execution_provider.cc
Co-authored-by: Pranav Sharma <prs@microsoft.com>
* updatee comments; add warning message for legacy compil call
* add a flag to control out of scope arg in serialization
* fix trt build; improve the test
* resolve merege errors
* fix a typo
Co-authored-by: Cheng Tang <chenta@microsoft.com>
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
Co-authored-by: Pranav Sharma <prs@microsoft.com>
* update TVM
* small fixes
* update TVM with new set_input and NDArray API
* use set_input instead of set_one_input
Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>
Description:
Add the extra param to match gelu in PyTorch in the contrib symbolic function
Motivation and Context
Why is this change required? What problem does it solve?
The symbolic function in /onnxruntime/python/tools/pytorch_export_contrib_ops.py is missing a recently added parameter approximate. We add this parameter and use the exporter defined gelu if approximate is "tanh".
* support ort device tensor in ort module inference
* fallback aten equal to cpu; add ortmodule inference test case
* fix python format
Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
* draft kernel creation
* setup eager context
* call into kernel in eager mode
* redefine test case
* refact eager context
* add comment
* remove header
* rename argument
* redefine API definition with types
* list outputs as argument
* switch to int to represent length
* fix compile err
* create attribute API
* add test case for topk
* remove bool from c api
* add gru test case
* remove var
* fix compile warnings
* rename status
* fix compile err
* exclude sparse tensor
* fix comments
* fix comments
* fix build err
* rename file and move location
* format code
* move file to session folder
* fix comments
Co-authored-by: Randy <Randy@randysmac.attlocal.net>
* Move some of the tranpose kernel code to onnxruntime_framework.lib
* Fix C4244 warnings in the tranpose code
* Rename IsMovingSingleAxis to IsTransposeMovingSingleAxis
This reverts commit 4983d6e5d6. We can't destroy OrtEnv through python's atexit function, because at that time there might be many other ORT python objects alive.