onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-17 01:44:45 +00:00

Author	SHA1	Message	Date
Baiju Meswani	ba7b83ff3c	Remove onnxruntime_PYBIND_EXPORT_OPSCHEMA definition from onnxruntime (#15776 )	2023-05-03 13:08:35 -07:00
Chunye Wang@AMD	d35850c142	[VitisAI]Update VitisAI EP to be compatible with VitisAI 3.5 (#15673 ) ### Description Originally VitisAI EP only works with old version of VitisAI release. ### Motivation and Context Update VitisAI EP so that it works together with the current VitisiAI 3.5 and further version of VitisAI. We try our best to make it forward compatible. --------- Co-authored-by: Wang Chunye <chunywan@xilinx.com> Co-authored-by: mingyue <mingyue@amd.com> Co-authored-by: mingyueliuh <131847423+mingyueliuh@users.noreply.github.com> Co-authored-by: liumingyue <mingyue@xilinx.com> Co-authored-by: moore-ch <129165652+moore-ch@users.noreply.github.com> Co-authored-by: shoucair <shoucai.ren@amd.com> Co-authored-by: zz002 <zhenze.wang@amd.com> Co-authored-by: BoarQing <yuz75@Pitt.edu> Co-authored-by: Yueqing Zhang <yueqingz@amd.com> Co-authored-by: Scott McKay <Scott.McKay@microsoft.com>	2023-05-01 08:28:26 -07:00
sfatimar	ebaafac3f5	Openvino ep ort 5.0 (#15626 ) ### Description The PR adds VPU support to OpenVINO Execution Provider Bug fixes for GPU, CPU. Changes to OpenVINO Backend in Serialized Model API for faster First Inference Latency. Deprecation to HDDL-VADM and MYRIAD, removed code Support OpenVINO 2023.0 Dynamic Shapes Support for iGPU ### Motivation and Context - VPU is an upcoming hardware that can provide AI Acceleration for Client Systems through OpenVINO - If it fixes an open issue, please link to the issue here. --> --------- Signed-off-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Suryaprakash Shanmugam <suryaprakash.shanmugam@intel.com> Co-authored-by: MaajidKhan <n.maajid.khan@intel.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com>	2023-04-25 20:59:42 -07:00
Justin Chu	cf19c3697d	Run clang-format in CI (#15524 ) ### Description Run clang-format in CI. Formatted all c/c++, objective-c/c++ files. Excluded ``` 'onnxruntime/core/mlas/', 'onnxruntime/contrib_ops/cuda/bert/tensorrt_fused_multihead_attention/', ``` because they contain assembly or is data heavy ### Motivation and Context Coding style consistency	2023-04-18 09:26:58 -07:00
Dmitri Smirnov	0d7855ea5a	Re-work global objects dependancies in pybind layer. (#14941 ) ### Description Re-work handling of static objects in pybind. Make sure we ref-count Environment from Sessions. The following has been done: - Make global objects function static. This ensures that the objects are constructed on demand. The first object constructed is destructed last. This is platform independent. - Make global objects ownership shared as suggested by pybind since they are not surfaced at Python level, and they cannot be referred to by dependent python objects. Verified that all python objects are GCed before globals are destroyed. This takes care of inference session dependency on environment and its default logger and this is also platform independent. - Utilize pybind atexit mechanism to clear execution providers and unload CUDA libraries (as suggested by https://github.com/microsoft/onnxruntime/pull/14903) . Since this is registered for module exit, it takes place before any other global are destroyed and clears shared objects state or even unloads the libraries. This should also work in a platform independent way. ### Motivation and Context - Global object destruction order is managed manually and that becomes source of trouble. We want to make it deterministic and platform independent. - Frequent hangs in Python layer due to the static object's destruction order. Some of the Python session objects are being garbage collected after main exits and they require ORT environment to be alive. (Use after free)	2023-03-10 13:55:31 -08:00
Erick Muñoz	d1533c27eb	[oneDNN] Improved thread handling (#13618 ) * Added the OrtDnnlProviderOptions structure to expose configuration options to the user * The number of threads can be defined by the user with the -i flag on the perftest * Number of threads can also be configured via the OMP_NUM_THREADS environment variable * The number of threads defined in the OrtDnnlProviderOptions is prioritized over the environment variable ### Description Avoids thread oversubscription caused by OpenMP allocating the maximum number of threads possible for oneDNN EP. Added support for the OrtDnnlProviderOptions, this will allow for more EP customization capabilities, and allows for user defined number of threads. ### Motivation and Context - Improves performances and allows for user to fine tune the number of threads	2023-01-31 14:37:13 -08:00
Adrian Lizarraga	68794d0ac1	Improve custom op library handle cleanup (#14099 ) ### Description - Adds a new C API `OrtApi::RegisterCustomOpsLibrary_V2` that manages the lifetime of dynamic library handles (i.e., calls `dlclose` or `FreeLibrary`). - Deprecates C API `OrtApi::RegisterCustomOpsLibrary`. - Adds C++ API wrapper for convenient registering of custom op libraries. - `PySessionOptions` is now an alias of `OrtSessionOptions` ### Motivation and Context The current API for registering custom op libraries loads dynamic libraries but requires users to handle the release of the corresponding library handles. Additionally, the user has to make sure to release the library handle _after_ the session has been destroyed (or the program segfaults). The new API automatically cleans up the library and allows the user to write more straightforward code.	2023-01-04 17:56:29 -08:00
Adam Louly	fb4707f76d	add cuda support to python bindings (#13700 ) ### Description Add cuda support to the on device training python bindings. ### Motivation and Context Now users can set the execution provider (cpu or cuda) when using python bindings for on device training apis.	2022-12-08 16:03:53 -08:00
cloudhan	9e649d1ac4	Allow CUDA EP enable or disable TunableOp via session options and environment variable (#13601 ) This ports #13116 from ROCm EP to CUDA EP	2022-11-15 14:43:54 +08:00
cloudhan	fc12abf6b1	Enable/Disbale tunable GEMM by using tunable switch in provider options and env var (#13116 ) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.	2022-10-19 22:35:08 -07:00
wangxiyuan	952c99304a	Add CANN EP (#12416 ) Description: This PR adds Ascend CANN execution provider support. Motivation and Context - Why is this change required? What problem does it solve? As the info shown in the issue. CANN is the API layer for Ascend processor. Add CANN EP can allow user run onnx model on Ascend hardware via onnxruntime The detail change: 1. Added CANN EP framework. 2. Added the basic operators to support ResNet and VGG model. 3. Added C/C++、Python API support - If it fixes an open issue, please link to the issue here. https://github.com/microsoft/onnxruntime/issues/11477 Author: lijiawei <lijiawei19@huawei.com> wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: FFrog <ljw1101.vip@gmail.com>	2022-09-22 14:53:40 -07:00
sfatimar	cccbe90764	Openvino ep 2022.2 v4.2 (#13023 ) This changes are to align OV 2022.2 Release with ORT . Changes CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile Motivation and Context - This change is required to ensure ORT-OpenVINO Execution Provider is aligned with latest changes. - If it fixes an open issue, please link to the issue here. Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: shamaksx <shamax.kshirsagar@intel.com> Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com> Co-authored-by: pratiksha <mohsinx.mohammad@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: nmaajidk <n.maajid.khan@intel.com> Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com> Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>	2022-09-22 12:31:40 -07:00
RandySheriffH	d3b684cd9e	Drop nuphar (#11555 ) * drop nuphar code and configs * refactor test case * format python * remove nuphar from training test * remove commented nuphar logics * restore llvm setting * drop nuphar ci * fix compile err * fix compile err Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-07 15:11:18 -07:00
Scott McKay	d64f23fec0	EP factory creation cleanup and enhancements. (#11798 ) * Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places. Convert append EP for SNPE to be generic, and also use for XNNPACK. Add XNNPACK to C# API * Don't need stub for MIGraphX as it's using provider bridge. * Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries. * Only use EPs that require the layout transform if the opset is supported by the layout transformer. * Update wasm registration of xnnpack.	2022-06-16 07:01:41 +10:00
Xavier Dupré	a805a49363	Move OrtValueVector from onnxruntime-training to onnxruntime (#11176 ) * Move OrtValueVector from onnxruntime-training to onnxruntime * disable dlpack on onnxruntime * disable dlpack * dlpack * opaque inlcuded in any cc file of the python binding * fix type issue * fix incomplete name * remove len() * remove unused parameter * black * black * black * remove unused import * add unit test to check the output type * black * lint * lint * lint * fix method name * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/python/onnxruntime_pybind_ortvalue.cc Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * Update onnxruntime/test/python/onnxruntime_test_python_sparse_matmul.py Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * check return type of C API * lint * lint * fix missing ; * fix type issue * fix merge issue Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-06-15 09:36:28 +02:00
Xavier Dupré	3f42665a40	Improve transfered time from ort to torch (#9610 ) * Improve transfered time from ort to torch * Use static_cast * fix call to Python API for python <= 3.8 * investigation * fix ref counts * disable import if no training * one function to convert multiple ortvalues * add proto_type * enforce dlpack->deleter to be not null * fix _ortvalues_to_torch_tensor for eager mode * rename proto_type into element_type in the Python API * conversion from ort to torch 2x times faster * fix conversion of list of OrtValue * replace has_bool_tensor by bool_tensor_indices * introduce _ortvalues_to_torch_tensor_list * use _ortvalues_to_torch_tensor_list for cache * fix ambiguity between c and python classes Co-authored-by: xavier dupré <xavier.dupre@gmail.com> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-04-06 09:12:58 +02:00
Valery Chernov	625a1f7673	[TVM EP] code refactor (#10655 ) * rename info to options for TVM EP * transfer options processing from TVMExecutionProvider to TVMEPOptions * transfer TVMRunner to separated files * implement TVMCompiler class * replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider * correct logging of TVM EP options * RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented * add prepareComputeInfo method * remove update_output_shapes flag * embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner * refactor compileModel method * small cleaning * separate TVM EP options data store and processing * replace TvmTensorShape by InlinedVector with max_size 5 * correct indentation * update TVM hash Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-16 13:55:04 +01:00
Valery Chernov	1cdc23aba4	[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260 ) * update java API for STVM EP. Issue is from PR#10019 * use_stvm -> use_tvm * rename stvm worktree * STVMAllocator -> TVMAllocator * StvmExecutionProviderInfo -> TvmExecutionProviderInfo * stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict * STVMRunner -> TVMRunner * StvmExecutionProvider -> TvmExecutionProvider * tvm::env_vars * StvmProviderFactory -> TvmProviderFactory * rename factory funcs * StvmCPUDataTransfer -> TvmCPUDataTransfer * small clean * STVMFuncState -> TVMFuncState * USE_TVM -> NUPHAR_USE_TVM * USE_STVM -> USE_TVM * python API: providers.stvm -> providers.tvm. clean TVM_EP.md * clean build scripts #1 * clean build scripts, java frontend and others #2 * once more clean #3 * fix build of nuphar tvm test * final transfer stvm namespace to onnxruntime::tvm * rename stvm->tvm * NUPHAR_USE_TVM -> USE_NUPHAR_TVM * small fixes for correct CI tests * clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files * update CUDA support for TVM EP * roll back CudaNN home check * ERROR for not positive input shape dimension instead of WARNING * update documentation for CUDA * small corrections after review * update GPU description * update GPU description * misprints were fixed * cleaned up error msgs Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>	2022-02-15 10:21:02 +01:00
Chi Lo	0f5d0a091a	Make user capable of adding new field in OrtTensorRTProviderOptionsV2 as new provider option (#10450 ) * modify code for add additional field in OrtTensorRTProviderOptionsV2 * add include file * fix typo * fix bug * add comment * fix code * revert change	2022-02-05 11:15:12 -08:00
Shucai Xiao	ce103ace93	Amdmigraphx fix build error (#9272 ) * fix build error * rename a missing api for the MIGraphX EP	2022-01-10 15:18:43 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
Valery Chernov	b327e89efa	Standalone TVM Executor Provider (#10019 ) * squashed commit for standalone tvm execution provider * critical fix for correct python build with stvm ep * get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG * updates and fixes * update parsing of stvm provider options * add support of external data for onnx model * add conditional dump of subgraphs * remove unused code * get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API * support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type) * add fp16 * add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options * fix license text in header. fix log format * small fixes * fix issues from flake8 * remove model proto construction from GetCapability * reserve memory for vector of DLTensors * add simple tutorial for STVM EP * STVM docs * jroesch/tvm -> apache/tvm * remove dead code, unneccessary logs and comments * fix in readme * improve tutorial notebook * tvm update * update STVM_EP.md * fix default value * update STVM_EP.md * some TODOs for the future development * shorten long lines * add hyperlink to STVM_EP.md * fix Linux CI error * fix error in csharp test Co-authored-by: Jared Roesch <jroesch@octoml.ai> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2021-12-15 16:59:20 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00
Dmitri Smirnov	a7f649db7c	Enable proper override using MIMalloc (#9944 ) Redirect memory allocations to MiMalloc and advance its version to v2.0.3 Refactor for a universal ifdef	2021-12-07 17:56:58 -08:00
Scott McKay	912e50f61c	Add CI minimal build with all options disabled. Fix python binding code if sparse tensors are disabled. (#9898 ) * Add 2 builds to validate the cmake defines for excluding optional components work in both full and minimal builds. * Create empty config for no-ops build * Create empty config for no-ops build - attempt #2 * Create empty config for no-ops build - attempt #3 * Update python binding code to work when sparse tensors are disabled.	2021-12-03 06:56:51 +10:00
Scott McKay	1aa21df149	Fix issue with debug VS2022 build when python bindings are enabled (#9794 ) * Add intermediate header between the ORT code and pybind11 to workaround an issue with VS2022 debug builds by making sure corecrt.h is included first. This avoids the _STL_ASSERT macro being defined in an incompatible way for a debug build by pybind including the python headers with _DEBUG temporarily undefined . See #9735 for details.	2021-11-18 16:58:02 +10:00
sfatimar	1d03baa8cc	Openvino ep 2021.4 v3.3 (#9588 ) * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Please commit error message and rectification of param.context * Alignment fixed Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Changed the string to OpenVINO_GPU * hanged OpenVINO to to OpenVINO_CPU * Onnxruntime updated API for memory location * Removing Duplicate LOG Error * Tensor.h removed DeviceType function. Updated comment * API Comments updated * Removing changes to Provider Indo * Erronous commit * Removing Extra logs * Merge CMAKE * Not copy from a local location * Duplicate Entry * Remove extra line Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>	2021-11-15 13:41:12 -08:00
Dmitri Smirnov	4e76360261	Prevent PySparseTensor form being garbage collected if we have an outstanding OrtValue (#9540 ) Prevent PySparseTensor form being garbage collected if we have an outstanding OrtValue Improve comments.	2021-10-27 11:28:37 -07:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Tang, Cheng	ae7f2d824d	Share the execution provider instance for training (#8719 ) * seperate the training python module; share the execution proivder instance * fix build break * fix cuda test crash; reorg the python module code base * se correct env * use provider customized hash func * fixbuild break * fix rocm break * use const ref in argument * rename the file * move hash func to trainiing module	2021-08-27 16:23:35 -07:00
Tang, Cheng	de2a53e46d	[eager mode] fix build and support customize shared provider entry point (#8680 ) * fix build break * support customize the name of shared provide lib's entry point * fix non training build * check error code * check return code	2021-08-11 15:10:35 -07:00
Rachel Guo	0cf2ed029b	Add python binding for CoreML EP (#8472 ) * add pybind binding for coreml ep * update merged files * address comments * format * remove lines for non-macOS platform Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2021-07-29 10:06:47 -07:00
Edward Chen	b4baac888c	[NNAPI EP] Make partitioning stop ops configurable from Python API. (#8484 )	2021-07-27 08:16:47 -07:00
Dmitri Smirnov	950fe5e28b	Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038 ) SparseTensor support Implement Builder pattern Fix support for 1-D and 2-D COO indices Implement and test CSR support. Handle shape inference for SparseTensors Implement conversion for COO, CSR and tests. Address the case where constant sparse initializer is the output. Implement test infra for SparseTensors Implement SparseDenseMatMul for Csr and COO and tested it. Add hash for SparseToDenseMatMul Finish shared provider refactor Refactor GetOrCreate to Create Working on py interface Expose OrtDevice and use it in allocate_numpy Adjust Sparse interfaces, add support for string SparseTensor. Add tests. Add and test to_cuda() Add accessors to format specific indices Test values and indices views, read-only flag, after GC access Add sparse related methods to OrtValue Re-work SparseTensor wrapper, add OrtValue methods Rework numpy_array_to_cuda/to_cpu Add run_with_ort_values Add models and test sparse_mat_mul with run_with_ort_values Refactor sparse tensor to use a single buffer Ifdef x86 Eigen CSR sparse matmul implementation Exclude broken test, check for string type when copying cross device Split pybind schema, regenerate docs, add exclusion Conditionally exclude schema module Update docs fix cuda build Add test to a filter and renerate JS docs Add conversion and test string support for sparse tensors Exclude conversion utils from minimal build Add CUDA Memcpy and adjust provider interfaces	2021-07-22 15:24:36 -07:00
pengwa	5454af4b95	decouple the shared python dependency (#8294 ) * remove warnining message for non-training build * move to/from dlpack for onnxruntime_python back into python project	2021-07-09 11:47:11 +08:00
Ryan Hill	49938cce77	Fix Python Cuda loading issues (#7939 )	2021-06-25 02:26:50 -07:00
pengwa	9e4dc08483	training with custom autograd Functions (#7513 ) * Register Torch Custom autograd.Function * Add flag to supress pybind11 warning * Avoid unnecessary include in cmake * Add missing reference * Add getter for registerred functions * Format for making subsquent changes cleaner * Fix interop feature build failure * Forward pass, run PyOP on CPU EP * clean up the code * Fix build * Define new ops * refactor pyop - extract PyOpLibProxy class * Hacks to run example * implement the kernel compute func * add back PyOP for comparision experiments * debug info - thread id * refine the kernels * Polish code (cherry picked from commit `4ed606f9a0`) * Fix a the Tensor address mismatch in C++ side * PythonOpGrad compute * add distributed test case * refine test cases * get dist.get_rank() in Autograd forward pass * Add CUDA kernels * Store float, int, and tuple of them as PythonOp's attributes * Populate local changes * Fix bugs * PythonOp/PythonOpGrad CUDA kernels * Support non-tensor inputs * Single GPU FP16 Run Pass (cherry picked from commit e539989e91e18ee997900292d3493b97d3eafa8a) * Fix segement * add basic test cases * Save progress * fix gradient builder for a Add op who have same inputs * add test cases for auto grad fallback feature * fix ref cnt issue. add thread id for debugging * POC: remove interface class * Remove interface classes * Clean a bit * Coarse-grained clean up after rebase master * reset pyop and language_interop_ops to latest master * Fix missing part during merge * re-structure torch related language interop files * Fix build * Fix tests and build * Fix build and basic unit tests * Fix most of uts * remove unnecessary import * clean up and fix build when enabling language_interop_ops * Fix single-GPU UTs * Move runner register into ORT package * Update dist UTs to new style * Also fix distributed UTs and leaf gradient problem * Static generation for constant args * Move arg_positions_ to static field * Rename some functions * Move arg ceration into a function * Clean output logic in PythonOp * Move PythonOp's ctor * Revise PythonOpGrad * Fix "ORT only supports contiguous tensor for now" for inputs * Fix evaulation mode error, add test & clean up * clean up codes * Fix issues introduced by recent master change (enabled symbolic shape infer) * automatically register forward/backward function pointers && clean up * Fix multi-output case * Add a test back * fix build and clean up * RAII for function params PyObject * Use new exporter * Clean full name in new exporter * Fix UTs * Format a file * Add "inplace" back Remove a legacy comment * Refine TorchProxy 1. Make TorchProxy a formal singleton class. 2. Remove unused Scope class. 3. Simplify the call to Forward and Backward. The two functions now automatically acquire and release GIL state, so user doesn't need any GIL-related calls. * Format * Add lock to avoid racing condition when registering Python objs * Fix Python call param ref issues && Add RefcountTracker for debug build && Clean up * clean up print * Resolve part of comments && clean up * Fix a potential bug * track pyobject consistently * move kernels to cpu provider as base class * Refactor - 1. Extract PythonOpBase/PythonOpGradBase 2. Implement CPU kernels 3. Test coverage for CPU kernels * Refine register code * Add a missing macro * Release python call result objects with PythonObjectPtr && Add UnRegisterContext && Track PyObject for Debugging && Clena up * Fix random segfault issue - relasing a wrong ctx pointer for inplace cases * put ref count in debug macro * Move GIL out * Refine tests * Fix memory leak issue && forward output lifecycle issue: 1. Unregister the OrtValue PythonObject. Currently, the OrtValue shared same buffer with PythonOp/PythonOpGrad's output. So after those kernels outputs are released, the "leaked" OrtValue caused the shared buffer cannot be released. 2. According PyTorch forward+backward execution. The forward outputs (e.g. torch tensors) maintains the context/saved variables/dirty inputs, etc, which are used for backward execution, so its life should be after the backward runs. This change added such a depencencies between PythonOpGrad on PythonOp. * Move dlpack->ortvalue into C++ to avoid temp object registration * Fix the over released Py_False/Py_True && refine tests * Clean up unused functions * Always assume the first forward output is context so we don't need to test unused cases. * Fix a memory leak * move-copy unique_ptr & avoid C-style casting * Use inplace attribute to determine if input tensors are copied * Move DlpackCapsuleDestructor's to a common place * Thread-safe TorchProxy * Use OrtValue instead of OrtValue* * Only keep checks for Debug build * Wrap some long line per comment * onnx_export_type --> kwargs * Use requires_grads to create PythonOpGrad's inputs * add missing files during master merge * Fix build issue after merge * Address two comments. 1. Internalize DlpackCapsuleDestructor 2. Change "(" to "]" for describing closed interval. * Address some comments. 1. "override" -> "overwrite" to avoid using reserved keyword. 2. Call DLPack's helper to create OrtValue for avoiding repeated code. * Address comments. 1. Pass std::mutex to registeration helpers so their callers don't have to lock the mutex expclicitly. 2. Rename "func_context_pool_mutex_" to "mutex_". This mutex is the global mutex for OrtTorchFunctionPool. * Add bridging code to make cuda kernels work with merged master * put debue macro check within RefCountTracker && use default logger for debug info && remove useless ortvalue_ptr interface && typos && revert unncessary blank line changes * fix some comments * Resolve more comments * Capitalize a word * use unique_ptr instead of ObjectPointer for PyObject management && add converntion * Support symbolic shape * Remove unused variable * fix build * Enable function registration for training only && rectify ToDlpack/FromDlpack merge with master. * Don't add context for non-PythonOp opeartors (for example AtenOp) * Fix build error * Polish frontend part. 1. Avoid adding kwargs to ORTModule's ctor 2. Use onnx_export_type rather than kwargs for type safty 3. Fix some build bugs. * Resolve simpler comments * Resolve export related comments * sync master && fix tests && fix non-training build error * Fix build errors * add target link lib * windows build error * Fix orttraining-linux-ci build * disable autograd test && clean up * fix linux orttraining ci build * try fixing win build error * Revise append calls in runner * Enable custom function using a function * Rename to avoid using reservied keyword * Use list comprehension * Set ORT random seed in tests * Remove print code and fix ctx shape * [] -> list() * Move autograd.Function and nn.Module into corresponding functions * Move test helpers * Polish dist test a bit. Tried move helpers to helper file but it causes a deadlock. * trying fix undefined reference * Context is not managed by global pool * Polish dist test * Polish dist test * Add enable_custom_autograd_function * Remove enable_custom_autograd_function from ctors * Add doc strings * Shorter code * Address comments * Add one empty line * revert a minor and not needed change * Address comments * Back to reference * Fix windows builds * Fix windows debug build fail to find "'python39_d.lib'" * fix mac build error * revert _to_contiguous change * add debugging tag for orttraining-cpu-ci * Fix the wrong PYTHON_LIBRARIES which is affected by PYTHON_LIBRARY given in build command * add debugging info * Fix the build in this case: PYTHON_LIBDIR: /opt/_internal/cpython-3.7.10/lib, PYTHON_EXECUTABLE: /opt/python/cp37-cp37m/bin/python3, PYTHON_MULTIARCH: x86_64-linux-gnu PYTHON_LIBRARY_PATH python3.7m * fix build error due to python lib not found * Fixes 1. Release PyObject's 2. Not useing deepcopy because we assume autograd.Function's non-tensor inputs are static (constants) so there should be no side effect after calling any autograd.Function multiple times. * Revert dtoc for decreasing refcnt * add debugging log * add debugging tag * Fix a small leak * Remove ONNX_FALLTHROUGH flag * debug tag * debug tag * fix builds * remove debug tag * fix build * fix builds * fix build * install python3 in centos, in case there is no libpython3.xm.so * build python so for redhat * add training cpu specific docker, build python so inside * revert build-cpython change * try fixing numpy include issue * install_deps after re-installing cpython * fix build && remove debug tag * install openssl before cpython * let's say: builds pass! * add build flag for torch iterop, only enable it when training+Python is enabled * skip ComputeBroadcastBackwardAxesDynamic for the shared inputs * fix build * add debug info for padgrad test * Fix builds * Split dlpack_converter into C++ and Python interfaces respecitively. Then different build use them as needed. * clean up the changes * fix addsubgradient builder * Fix builds * clean up * clean up * Address some comments. 1. Use pointer wraper to avoid calling Py_DECREF 2. Remove unregister_* functions 3. Allow repeated registration by skipping those with existing keys 4. Unregister context in PythonOpGrad * Fix over-released Py_Boolean Co-authored-by: Wei-Sheng Chin <wschin@outlook.com>	2021-06-07 13:01:21 -07:00
Dmitri Smirnov	d1f0251e39	Python bindings fix ups in preparation to Sparse Tensor introduction (#7817 ) * Fix up constness in pybindings Fix up return argument treatments. Specifically, for all functions that return pointers or references to the members of other pybind registered classes, we want not to copy them, but internally bump up a reference to the hosting class so they do not disappear before the reference to the returned members is re-claimed. This policy is applied by default to def_property and def_readwrite but not to def_readonly and other def methods. See https://pybind11-jagerman.readthedocs.io/en/stable/advanced.html#return-value-policies https://pybind11.readthedocs.io/en/stable/advanced/functions.html#return-value-policies Move OrtValue binding to a separate file Move IOBinding into separate file.	2021-05-26 09:47:41 -07:00
baijumeswani	a6ca9f0a40	Use list comprehensions instead of list appends where possible (#7753 ) * Use list comprehensions instead of list appends where possible * Add OrtValueVector class as an opaque object in pybind * Add dlpack methods to the OrtValueVector pybind class	2021-05-21 10:28:09 -07:00
Changming Sun	1012535dab	Change onnxruntime::make_unique to std::make_unique (#7502 ) 1. Change onnxruntime::make_unique to std::make_unique 2. Add "-std=c++14" to ROCM EP's build flags.	2021-04-29 17:04:53 -07:00
Scott McKay	9297527b7a	Enable NHWC transformer when generating ORT format model (#7126 ) * Allow specific optimizers to be disabled. - replace unused ability to specify just the optimizers to run - never used so not needed Allow the disabled list to be specified via the python bindings - expected usage is internal, so using kwargs for that so as not to pollute the documentation with stuff no user is likely to need Update the ORT format model conversion script to disable NCHWc transformer when level is 'all' - currently there aren't any known use cases where we'd want the NCHWc transformations to run as they create a device specific model and aren't used on ARM - the ORT format model is not expected to be generated on the target device (e.g. generate on Windows/Linux/macOS to deploy to Android/iOS so there's a good chance we'd generate a useless/invalid model - default to 'all' as ARM and MLAS prefer NHWC and the NHWC transformer runs at that level * Add matching changes to optimizer generation in training code	2021-03-29 18:39:48 +10:00
Scott McKay	02c7873b0e	Update ORT model conversion script to support custom ops (#6701 ) * Add support for custom ops library to the ORT model conversion script Simplify model conversion now that we read ops from the ORT format model. Enable custom ops in the python bindings if custom ops are turned on in a minimal build. * Add test of model conversion involving custom ops.	2021-02-17 12:52:39 +10:00
Edward Chen	d761571afc	Deprecate Python global configuration functions [Part 2] (#6171 ) Update Python API to allow more flexibility for setting providers and provider options. The providers argument (InferenceSession/TrainingSession constructors, InferenceSession.set_providers()) now also accepts a tuple of (name, options dict). Fix get_available_providers() API (and the corresponding function in the C API) to return the providers in default priority order. Now it can be used as a starting point for the providers argument and maintain the default priority order. Convert some usages of the deprecated global configuration functions to use EP-specific options instead. Update some EP-specific option parsing to fail on unknown options. Other clean up.	2021-01-07 10:10:55 -08:00
ashbhandare	b1a75d0e98	Enable passing initial optimizer state while creating training session (#5869 ) * Support to pass initial optimizer states to optimizer graph builder * Changes for passing init optim state to training session config * Pass optimizer state through cpp and python frontend * Cleanup * Review comments * Fix windows and mac CI * Review comments * review comments * Review comments * Frontend review changes * Fix CI	2020-12-08 21:20:51 -05:00
Hariharan Seshadri	b9f90e297e	Support sharing of initializers between session via the Python API (#5407 )	2020-10-09 20:26:28 -07:00
Scott McKay	28445c88f9	Changes to enable saving and loading an ORT format model (#4995 ) * Changes to enable saving and loading an ORT format model via the public APIs. Cleanup session.py to try and make slightly more understandable. More refactoring is needed here. Couple of bug fixes * Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info. * Address PR comments - tweak SessionOptions config to avoid double lookup - merge duplicated functionality in python binding around registering an EP with optional options Fix a couple of build issues. * Update C API to be consistent with python API - only load model in InferenceSession ctor if required - support loading ORT model in minimal build * Fix nodejs test. We get an invalid path error from LoadInterOp first now * Another attempt at fixing nodejs test. Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent. The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed. * Fix couple of build issues. * Disable test temporarily so PR can be checked in. Will fix in separate PR that adds final pieces for minimal build as the test is required there. * Give up on nodejs test and make the match simpler. Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion. * Fix call to Session.__init__ in TrainingSession. Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.	2020-09-03 09:10:48 -07:00
Hariharan Seshadri	d30dd41c0e	Remove public default ctor in PyInferenceSession and replace it with a protected ctor (#4990 )	2020-09-01 17:10:36 -07:00
Hariharan Seshadri	7045910d10	Support RegisterCustomOpsLibrary via the Python API (#4764 )	2020-08-28 13:24:29 -07:00
liqunfu	c3c4ce5ceb	refactor prototypes into headers (#4337 ) * refactor prototypes into headers	2020-06-26 12:02:14 -07:00
Xueyun Zhu	9eb792a5b3	move env to .cc file	2020-03-25 16:57:05 +00:00

1 2

52 commits