onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-08 00:23:03 +00:00

Author	SHA1	Message	Date
Dale Phurrough	68db1b62a8	add noexcept to `InitApi()` and `GetApi()` (#13869 ) ### Description * add noexcept to `InitApi()` and `GetApi()` ### Motivation and Context * fixes microsoft/onnxruntime#12581	2023-02-15 16:49:16 -08:00
cao lei	50fa151298	remove device_id parameter out of ExecutionProvider::GetAllocator() (#14580 ) ### Description Remove the parameter device_id out of ExecutionProvider::GetAllocator() function ### Motivation and Context The parameter device_id is not necessary. We can fully rely on the second parameter OrtMemType mem_type to determine the device_id when getting allocator from executionProvider.	2023-02-13 10:01:07 -08:00
cloudhan	9bd022b8be	Add TuningContext for TunableOp (#14557 ) This makes the the TunableOp tuning results state free and will allow us to dump and load offline tuning results.	2023-02-10 14:27:43 +08:00
Maximilian Müller	e9ab56fa64	Adding RunOptions synchronization behaviour to C/C++ API (#14088 ) ### Description This is exposing the already existent interface of asynchronous work of all CUDA base EP's (CUDA + TensorRT). ### Motivation and Context This is something requested in #12216. It will enable users to build an efficient data pipeline with ONNXRuntime and CUDA pre-/post-processing. PCI traffic to the CUDA device can be run during inference as soon as the postprocessing consumed the input buffer and it can be overwritten. To do this work has to be submitted async to the device. Please see below screenshots showing the illustration of this using NSight Systems. Async: <img width="1401" alt="image" src="https://user-images.githubusercontent.com/44298237/209894303-706460ed-cbdb-4be2-a2e4-0c111ec875dd.png"> Synchronous: <img width="1302" alt="image" src="https://user-images.githubusercontent.com/44298237/209894630-1ce40925-bbd5-470d-b888-46553ab75fb9.png"> Note the gap in between the 2 inference runs due to issuing PCI traffic in between and to the CPU overhead the active synchronization has. --------- Co-authored-by: Chi Lo <chi.lo@microsoft.com>	2023-02-07 19:59:28 -08:00
Nat Kershaw (MSFT)	638f21b969	Upgrade doxygen to fix C API docs build issue (#13950 )	2023-02-03 09:43:29 -08:00
Baiju Meswani	3d8fa4d77b	GetTrainingApi to not print to stderr when not an ort training build (#14515 )	2023-02-02 13:28:32 -08:00
Dmitri Smirnov	61e7636e61	Re-work GetAvailableProviders API (#14486 ) ### Description Re-work `OrtApi::GetAvailableProviders` in a way that the data is returned in a single allocation. Fix exception safety issues and fix `Release` function. Remove warning suppressions. Fix exception safety issue in C++ API. Fix exception safety issue in C# API. Move EP name length enforcement to the implementation. ### Motivation and Context The original motivation comes from https://github.com/microsoft/onnxruntime/issues/14378. However, the API is already implemented. Cc: @prabhat00155	2023-02-01 14:38:04 -08:00
Erick Muñoz	d1533c27eb	[oneDNN] Improved thread handling (#13618 ) * Added the OrtDnnlProviderOptions structure to expose configuration options to the user * The number of threads can be defined by the user with the -i flag on the perftest * Number of threads can also be configured via the OMP_NUM_THREADS environment variable * The number of threads defined in the OrtDnnlProviderOptions is prioritized over the environment variable ### Description Avoids thread oversubscription caused by OpenMP allocating the maximum number of threads possible for oneDNN EP. Added support for the OrtDnnlProviderOptions, this will allow for more EP customization capabilities, and allows for user defined number of threads. ### Motivation and Context - Improves performances and allows for user to fine tune the number of threads	2023-01-31 14:37:13 -08:00
sfatimar	77b455b969	Ort openvino 4.3 cli (#14341 ) ### Description Introduce cache_dir CLI for graph serialisation. Replace existing use_compile_network and blob_dump_path cli options for openvino with a single command line option "cache_dir" specifying the path that needs to be passed for blob dump/load improving the developer experience. ### Motivation and Context? We were having two values to set cache dir which was unnecessary Co-authored-by: Preetha <preetha.veeramalai@intel.com>	2023-01-23 14:17:52 -08:00
Adrian Lizarraga	de17d53c50	Custom Op runtime wrapper (#13427 ) ### Description Adds the below C APIs to support custom ops that wrap an entire model to be inferenced with an external runtime. The current SNPE EP is an example of an EP that could be ported to use a custom op wrapper. Ex: The custom op stores the serialized SNPE DLC binary as a string attribute. The SNPE model is built when the kernel is created. The model is inferenced with SNPE APIs on call to the kernel's compute method. #### C APIs \| API \| Description \| Why \| \| --- \| --- \| --- \| \| `KernelInfo_GetInputCount` \| Gets number of inputs from `OrtKernelInfo`. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetOutputCount` \| Gets number of outputs from `OrtKernelInfo`. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetInputName` \| Gets an input's name. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetOutputName` \| Gets an output's name. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetInputTypeInfo` \| Gets the type/shape information for an input. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfo_GetOutputTypeInfo` \| Gets the type/shape information for an output. \| Query I/O characteristics during kernel creation<sup>1</sup> \| \| `KernelInfoGetAttribute_tensor` \| Get a OrtValue tensor stored as an attribute in the graph node \| Extract serialized models, weights, etc. \| \| `GetSessionConfigEntry` \| Get a session configuration value \| Need to be able to get session-time configurations from within custom op \| \| `HasSessionConfigEntry` \| Check if session configuration entry exists. \| Need to be able to get session-time configurations from within custom op \| #### Why so many KernelInfo APIs?<sup>1</sup> Similar APIs currently exist for `OrtKernelContext`, but not `OrtKernelInfo`. Note that `OrtKernelContext` is passed to the custom op on call to its kernel's compute() function. However, `OrtKernelInfo` is available on kernel creation, which occurs when the session is created. Having these APIs available from `OrtKernelInfo` allows an operator to trade-off computation time for session-creation time, and vice versa. Operators that must build expensive state may prefer to do it during session creation time instead of compute-time. SNPE is an example of an EP that needs to be able to query `KernelInfo` for the name, type, and shape of inputs and outputs in order to build the model from the serialized DLC data. This is an expensive operation. Other providers (e.g., OpenVINO) are able to query i/o info from the serialized model, so they do not strictly need these APIs. However, the APIs can still be used to validate the expected I/O characteristics. Additionally, several of our CPU contrib ops currently use the same internal version of these KernelInfo APIs (Ex: [qlinear_softmax](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/contrib_ops/cpu/quantization/qlinear_softmax.cc#L71)). If custom ops are also meant to be a test bed for future ops, then all custom ops (not just runtime wrappers) would benefit from the addition of these public KernelInfo APIs (IMO). #### Example of usage in a custom OP From `onnxruntime/test/testdata/custom_op_openvino_wrapper_library/openvino_wrapper.h` ```c++ struct CustomOpOpenVINO : Ort::CustomOpBase<CustomOpOpenVINO, KernelOpenVINO> { explicit CustomOpOpenVINO(Ort::ConstSessionOptions session_options); CustomOpOpenVINO(const CustomOpOpenVINO&) = delete; CustomOpOpenVINO& operator=(const CustomOpOpenVINO&) = delete; void* CreateKernel(const OrtApi& api, const OrtKernelInfo* info) const; constexpr const char* GetName() const noexcept { return "OpenVINO_Wrapper"; } constexpr const char* GetExecutionProviderType() const noexcept { return "CPUExecutionProvider"; } // IMPORTANT: In order to wrap a generic runtime-specific model, the custom operator // must have a non-homogeneous variadic input and output. constexpr size_t GetInputTypeCount() const noexcept { return 1; } constexpr size_t GetOutputTypeCount() const noexcept { return 1; } constexpr ONNXTensorElementDataType GetInputType(size_t /* index /) const noexcept { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; } constexpr ONNXTensorElementDataType GetOutputType(size_t / index /) const noexcept { return ONNX_TENSOR_ELEMENT_DATA_TYPE_UNDEFINED; } constexpr OrtCustomOpInputOutputCharacteristic GetInputCharacteristic(size_t / index /) const noexcept { return INPUT_OUTPUT_VARIADIC; } constexpr OrtCustomOpInputOutputCharacteristic GetOutputCharacteristic(size_t / index */) const noexcept { return INPUT_OUTPUT_VARIADIC; } constexpr bool GetVariadicInputHomogeneity() const noexcept { return false; // heterogenous } constexpr bool GetVariadicOutputHomogeneity() const noexcept { return false; // heterogeneous } std::vector<std::string> GetSessionConfigKeys() const { return {"device_type"}; } private: std::unordered_map<std::string, std::string> session_configs_; }; ``` #### How to create a session: ```c++ Ort::Env env; Ort::SessionOptions session_opts; Ort::CustomOpConfigs custom_op_configs; // Create local session config entries for the custom op. custom_op_configs.AddConfig("OpenVINO_Wrapper", "device_type", "CPU"); // Register custom op library and pass in the custom op configs (optional). session_opts.RegisterCustomOpsLibrary(lib_name, custom_op_configs); Ort::Session session(env, model_path.data(), session_opts); ``` ### Motivation and Context Allows creation of simple "wrapper" EPs outside of the main ORT code base.	2023-01-18 09:09:32 -08:00
Jian Chen	d95249f516	Removing Double QDQ from Graphs (#14024 ) ### Description When there are 2 QDQ pair back to back, we want to delete the 1 Q and 1 DQ nodes. ex: Q->DQ->Q->DQ =====> Q->DQ ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. -->	2023-01-16 19:06:57 -08:00
Scott McKay	b9ecd428c1	Add ability to register custom ops by specifying a function name (#14177 ) ### Description <!-- Describe your changes. --> Use dlsym/GetProcAddress to lookup a custom ops registration function by name and call it. This will be better on mobile platforms where the custom ops library is linked against, and there isn't necessarily a filesystem that a library path can be loaded from. Alternative is to wire up passing in the address of the function, but that has multiple complications which differ by platform. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Enable using ort and ort-ext packages on mobile platforms. Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2023-01-12 15:11:34 +10:00
RandySheriffH	83ad562826	Rename CloudEP to AzureEP (#14175 ) Rename CloudEP to AzureEP. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-11 12:25:04 -08:00
Ashwini Khade	d92c663f28	Create dedicated build for training api (#14136 ) ### Description Enable creating dedicated build for on device training. With this PR we can build a lean binary for on device training using flag --enable_training_apis. This binary includes only the essentials like training ops, optimizers etc and NOT features like Aten fallback, strided tensors, gradient builders etc . This binary also removes all the deprecated components like training::TrainingSession and OrtTrainer etc ### Motivation and Context This enables our partners to create a lean binary for on device training.	2023-01-10 20:58:04 -08:00
we1559	c65a03699a	add ThreadingOptions, wraps OrtThreadingOptions (#13711 ) …threadpools' options of The Env. ### Description <!-- Describe your changes. --> add a c++ class ThreadingOptions, wraps OrtThreadingOptions as I described in issue #13710 ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> close #13710 Co-authored-by: zengxiangneng <zengxiangneng@360.cn>	2023-01-06 11:21:10 -08:00
Abhishek Udupa	d460c01b8c	Fix skew between GPU/CPU timestamps in ORT profiler (#14004 ) ### Description This PR fixes the skew between GPU/CPU timestamps with a more reliable algorithm. ### Motivation and Context An earlier implementation attempted to guess the right correction to apply, but this led to misleading profile outputs. This PR fixes this problem by utilizing a more reliable technique to normalize GPU timestamps. Attached are sample profile outputs and visualization screen-grabs from a run of a transformer-based model before and after the fix. Before Fix: ![profile_visualization_cuda_without_fix](https://user-images.githubusercontent.com/17418420/208197234-7390d8e3-4354-4e67-93cf-958c319146ee.png) After Fix: ![profile_visualization_cuda_with_fix](https://user-images.githubusercontent.com/17418420/208197230-3e108b82-8dfa-476b-9277-7895639a3785.png) Profiler outputs that are rendered in the visualizations above: [sample_outputs.zip](https://github.com/microsoft/onnxruntime/files/10249689/sample_outputs.zip) Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>	2023-01-05 11:07:26 -08:00
Adrian Lizarraga	68794d0ac1	Improve custom op library handle cleanup (#14099 ) ### Description - Adds a new C API `OrtApi::RegisterCustomOpsLibrary_V2` that manages the lifetime of dynamic library handles (i.e., calls `dlclose` or `FreeLibrary`). - Deprecates C API `OrtApi::RegisterCustomOpsLibrary`. - Adds C++ API wrapper for convenient registering of custom op libraries. - `PySessionOptions` is now an alias of `OrtSessionOptions` ### Motivation and Context The current API for registering custom op libraries loads dynamic libraries but requires users to handle the release of the corresponding library handles. Additionally, the user has to make sure to release the library handle _after_ the session has been destroyed (or the program segfaults). The new API automatically cleans up the library and allows the user to write more straightforward code.	2023-01-04 17:56:29 -08:00
cao lei	b29a1c7348	Address follow-up comments on multistream pr #13495 (#13992 ) ### Description This PR is to address follow-up comments for the multi-stream pr https://github.com/microsoft/onnxruntime/pull/13495 Changes including: - Make StreamAwareArena transparent to minimal build - Make DeviceStreamCollection transparent to minimal build - Replace ORT_MUST_USE_RESULT with [[nodiscard]] - Remove unnecessary shared_ptr ### Motivation and Context This PR is to address follow-up comments for the multi-stream pr https://github.com/microsoft/onnxruntime/pull/13495 Co-authored-by: Lei Cao <leca@microsoft.com>	2023-01-03 16:33:36 -08:00
RandySheriffH	587e891cae	CloudEP (#13855 ) Implement CloudEP for hybrid inferencing. The PR introduces zero new API, customers could configure session and run options to do inferencing with Azure [triton endpoint.](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-deploy-with-triton?tabs=azure-cli%2Cendpoint) Sample configuration in python be like: ``` sess_opt.add_session_config_entry('cloud.endpoint_type', 'triton'); sess_opt.add_session_config_entry('cloud.uri', 'https://cloud.com'); sess_opt.add_session_config_entry('cloud.model_name', 'detection2'); sess_opt.add_session_config_entry('cloud.model_version', '7'); // optional, default 1 sess_opt.add_session_config_entry('cloud.verbose', '1'); // optional, default '0', meaning no verbose ... run_opt.add_run_config_entry('use_cloud', '1') # 0 for local inferencing, 1 for cloud endpoint. run_opt.add_run_config_entry('cloud.auth_key', '...') ... sess.run(None, {'input':input_}, run_opt) ``` Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2023-01-03 10:03:15 -08:00
Adrian Lizarraga	3bbcc2799f	Support for custom op variadic inputs/outputs (#13946 ) ### Description Adds support for variadic inputs and outputs to custom operators. ### Motivation and Context Needed for custom ops that wrap external runtimes/models and maybe TensorRT plugins.	2022-12-23 11:41:15 -08:00
Changming Sun	fc2a6db573	Update absl to the latest release (#13990 ) ### Description Update absl to a new version ### Motivation and Context The new version contains fixes that are needed for Nvidia GPU build. Once we update it to that version, we don't need to maintain our private patches for Nvidia GPU build.	2022-12-19 14:25:13 -08:00
FFFrog	6705915af8	[CANN] Add the ability to run graph (#13728 ) ### Description Add the ability to run graph ### Motivation and Context A brief description is as follows: 1) If the whole graph is supported, then will be processed by the graph engine, directly. 2) If the whole graph is not supported, the whole graph will be divided into subgraphs and single operators; The sub-graphs will be run on graph engine, and the single operators will fallback to the traditional mode.	2022-12-16 06:57:40 -08:00
Abhishek Udupa	c882601425	Add noexcept annotation to address prefast warnings (#13965 ) ### Description Add noexcept annotations to move constructors and assignment ops to address prefast warnings. (see https://dev.azure.com/aiinfra/ONNX%20Runtime/_workitems/edit/11012/) Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>	2022-12-15 09:44:22 -08:00
Tang, Cheng	a81faee41e	Multi-stream execution support (#13495 ) Description: This PR including following works: 1. provide stream and related synchronization abstractions in onnxruntime. 2. enhance onnxruntime's execution planner / executor / memory arena to support execute multiple streams in parallel. 3. deprecate the parallel executor for cpu. 4. deprecate the Fence mechanism. 5. update the cuda / tensorrt EP to support the stream mechanism, support running different request in different cuda stream. Motivation and Context - Why is this change required? currently, the execution plan is just a linear list of those primitives, ort will execute them step by step. For any given graph, ORT will serialize it to a fixed execution order. This sequential execution design simplifies most scenarios, but it has the following limitations: 1. it is difficult to enable inter-node parallelization, we have a half-baked parallel executor but it is very difficult to make it work with GPU. 2. The fence mechanism can work with single gpu stream + cpu thread case, but when extend to multiple stream, it is difficult to manage the cross GPU stream synchronizations. 3. our cuda EP rely on the BFCArena to make the memory management work with the GPU async kernels, but current BFCArena is not aware of the streams, so it doesn't behavior correctly when run with multiple streams. This PR enhance our existing execution plan and executor to support multiple stream execution. we use an unified algorithm to mange both single stream and multiple stream scenarios. This PR mainly focus on the infrastructure support for multiple stream execution, that is said, given a valid stream assignment, onnxruntime can execute it correctly. How to generate a good stream assignment for a given model will be in the future PR. Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: RandySheriffH <48490400+RandySheriffH@users.noreply.github.com> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: cao lei <jslhcl@gmail.com> Co-authored-by: Lei Cao <leca@microsoft.com>	2022-12-15 07:39:29 -08:00
Jeff Daily	c9edc01c0b	[ROCm] float16.h should use __HIP__ not USE_ROCM (#13684 ) The float16.h header is shared between the CPU and ROCm EPs. The USE_ROCM macro is defined universally, but for the float16.h header we only wish to detect the hip-clang compiler. Otherwise, the CPU EP fails to build because of -Werror -Wuninitialized caused by the USE_ROCM code additions, and the CPU EP should be using a different code path.	2022-12-13 15:34:42 -08:00
RandySheriffH	75584c5fa8	Enabling thread pool to be numa-aware (#13778 ) The PR enables ort thread pool to be numa-aware, so that threads could be evenly created and distributed among numa nodes. In addition, to facilitate performance tuning, the PR opens a new API allowing customers to attach threads to certain logical processors. Please check the API [definition](https://github.com/microsoft/onnxruntime/pull/13778/files#diff-5845a5c76fb64abdc8f0cffe21b37f8da1712674eb3abc4cd87190891be1bd48) for details. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-12-12 10:33:55 -08:00
JiCheng	22fa62152a	Pass SessionOptions to XnnpackProviderFactoryCreator. (#13318 ) ### Description To pass session_options to Xnnpack EP via `XnnpackProviderFactoryCreator` for Initializing xnnpack's threadpool. If you want to use different threadpool size or even disable xnnpack's threadpool, just setting intra_threadpool to 1 by xnnpack EP's provider_options. ### Motivation and Context Co-authored-by: Guangyun Han <guangyunhan@microsoft.com> Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-12-10 14:23:46 +08:00
Abhishek Udupa	83c59d2594	Session-aware and thread-safe CUDA profiler (#13706 ) ### Description The existing CUDA profiler is neither session-aware, nor thread-safe. This PR ensures both. ### Motivation and Context [PR 13549](https://github.com/microsoft/onnxruntime/pull/13549) brought thread-safety and session-awareness to the ROCm profiler. This PR brings the same goodness to the CUDA profiler as well. Sample outputs of a profiling run from the StableDiffusion model (this model was chosen because it requires orchestration of multiple sessions, and verifies that the profilers are now indeed session-aware) on both CUDA and ROCm EPs are attached, along with a script that checks that the trace files generated by the profile are well-formed. Update 11/29: Updated the profile outputs. The older profile outputs exhibited an issue where some timestamps were wildly out of range, leading to problems visualizing the traces. The bug has been fixed and the profile outputs have been updated, along with an update to the check script to ensure that timestamps are monotonically increasing. [sd_profile_outputs_cuda.tar.gz](https://github.com/microsoft/onnxruntime/files/10118088/sd_profile_outputs_cuda.tar.gz) [sd_profile_outputs_rocm.tar.gz](https://github.com/microsoft/onnxruntime/files/10118089/sd_profile_outputs_rocm.tar.gz) [check_profile_output_well_formedness.zip](https://github.com/microsoft/onnxruntime/files/10118090/check_profile_output_well_formedness.zip) Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>	2022-12-09 13:22:12 -08:00
Ashwini Khade	983877c712	Decouple strided tensor support from ENABLE_TRAINING (#13829 ) ### Description Decouple strided tensor support from ENABLE_TRAINING ### Motivation and Context This is step 1 for creating a dedicated build for on device training. Intention is 1. We can set ENABLE_STRIDED_TENSORS in cmake when either ENABLE_TRAINING or ENABLE_TRAINING_ON_DEVICE is selected, this way we dont have to use if defined(ENABLE_TRAINING) \|\| defined(ENABLE_TRAINING_ON_DEVICE ) everywhere in the code. 2. This also paves the way to easily enable strided tensor support for inference in future (if required).	2022-12-07 09:22:21 -08:00
stevenlix	ce0025d3f2	Fallback Pow op in layer norm to FP32 in TRT to avoid overflow (#13639 ) Accuracy loss is observed when transformer models such as BERT, DeBERTa, ViT are running in TRT FP16 mode. The cause is that overflow happens at Pow op in layer norm. This PR provides the option to force Pow to run in TRT FP32 precision if overflow occurs. Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2022-11-29 13:37:31 -08:00
Changming Sun	87e6a26c5d	Enforce Prefast check in Windows CPU CI pipeline (#13735 ) Right now we fix the warnings in an ad-hoc way. We run static analysis in nightly builds, then create work items for the finding it found. Our CI build pipelines run the same scan but do not break the build. So, this PR will fix the remaining findings in the CPU EP(including the training part) and enforce the check. Later on we can continue to expand the scope. We still have some warnings left in the JNI part. I will try to address them later in the next month.	2022-11-23 09:25:02 -08:00
cloudhan	9e649d1ac4	Allow CUDA EP enable or disable TunableOp via session options and environment variable (#13601 ) This ports #13116 from ROCm EP to CUDA EP	2022-11-15 14:43:54 +08:00
Abhishek Udupa	9954454c65	Make the ROCM profiler thread-safe, session-aware and preserve logical ordering between CPU and GPU events (#13549 ) ### Description The existing ROCM profiler has a few shortcomings, which this PR fixes. ### Motivation and Context The existing ROCM profiler: 1. Is not thread-safe 2. Is not session-aware: i.e., if multiple inference sessions enable profiling, then events (esp GPU events) get mixed up between the sessions 3. Has some issues with respect to coding standards. This PR addresses all of the above by cleanly re-implementing parts of the ROCM profiler as required. Attached are 4 profile outputs from a multi-session run of the StableDiffusion model, as well as a quick-and-dirty script that checks the profile outputs for the invariants claimed. [sd_profile_outputs.tar.gz](https://github.com/microsoft/onnxruntime/files/9924608/sd_profile_outputs.tar.gz) [check_profile_output_wellformedness.zip](https://github.com/microsoft/onnxruntime/files/9924614/check_profile_output_wellformedness.zip) Co-authored-by: Abhishek Udupa <abhishek.udupa@microsoft.com>	2022-11-10 10:25:41 -08:00
Edward Chen	215732f74b	Ignore saved runtime optimizations when updating ORT format model <v5. (#13393 ) The old runtime optimization format is not readily convertible to the new one without extra information for translating kernel def hashes. Ignore such saved runtime optimizations and output a warning for now.	2022-11-08 13:36:46 -08:00
yf711	8b9065a396	Add getter/setter of C# OrtEnv log level (#13402 ) ### Description * Add getter/setter to access and update C# OrtEnv log level * Add C API about updating ort env with custom log level to support the setter above (Following [pybind implementation](`952c99304a/onnxruntime/python/onnxruntime_pybind_state.cc (L923-L924)`)) * Add test case to verify getter & setter ### Motivation and Context * For C++/Python, the log level can be adjusted via OrtEnv, and this feature is missing in C# binding	2022-11-04 21:46:00 -07:00
pengwa	a3e7da60e7	Trade subgraph recompute for memory (#12852 ) Description: Subgraph-level recompute This PR adds an optional capability trading additional re-computation for better memory efficiency. Specifically, a pre-defined operator list used to iterate the Graph to find some subgraphs for recompute, to reduce some stashed activations whose lifetime across forward and backward pass. When training with ORTModule, by default, the graph transformer will scan the execution graph to find all eligible subgraph to recompute, along with sizes that can save. An example looks like below. If we want to enable some of them to recompute, we can define env variable this way: `export ORTMODULE_ENABLE_MEMORY_ALLEVIATION="Mul+FusedMatMul+Cast+Unsqueeze+Unsqueeze+Cast+Sub+Mul+Add+BiasSoftmaxDropout+Cast+:1:-1,BiasGelu+:1:-1,BitmaskDropout+Cast+:1:-1,FusedMatMul+:1:-1,Cast+:1:-1,Mul+Add+:1:-1,Mul+Sub+:1:-1"` ``` [1,0]<stderr>:2,022-10-12 14:47:39.302,954,530 [W:onnxruntime:, memory_alleviation.cc:595 PrintSummary] [1,0]<stderr>:MemoryAlleviation Summary: [1,0]<stderr>: User config: [1,0]<stderr>: Mul+FusedMatMul+Cast+Unsqueeze+Unsqueeze+Cast+Sub+Mul+Add+BiasSoftmaxDropout+Cast+:1,BiasGelu+:1,BitmaskDropout+Cast+:1,FusedMatMul+:1,Cast+:1,Mul+Add+:1,Mul+Sub+:1 [1,0]<stderr>: ================================= [1,0]<stderr>: Subgraph: BitmaskDropout+ [1,0]<stderr>: AlleviationType: Disabled [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x 1,024 x Frequency:1 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: BiasGelu+ [1,0]<stderr>: AlleviationType: Recompute [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x input_ids_dim1 x 4,096 x Frequency:24 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: Reshape[1,0]<stderr>:+ [1,0]<stderr>: AlleviationType: Disabled [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:labels_dim0 x Frequency:1 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: Unsqueeze+Unsqueeze+Cast+Sub+Mul+Mul+FusedMatMul+Cast+Add+BiasSoftmaxDropout+Cast+ [1,0]<stderr>: AlleviationType: Disabled [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x 16 x input_ids_dim1 x input_ids_dim1 x Frequency:23 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: Mul+FusedMatMul+Cast+Unsqueeze+Unsqueeze+Cast+Sub+Mul+Add+BiasSoftmaxDropout+Cast+ [1,0]<stderr>: AlleviationType: Recompute [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x 16 x input_ids_dim1 x input_ids_dim1 x Frequency:1 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: Mul+Add+ [1,0]<stderr>: AlleviationType: Recompute [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x 16 x input_ids_dim1 x 1 x Frequency:24 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: FusedMatMul+Cast+Add+Reshape+Cast+ [1,0]<stderr>: AlleviationType: Disabled [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x 16 x input_ids_dim1 x 2 x 4 x Frequency:24 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: Mul+Sub+ [1,0]<stderr>: AlleviationType: Recompute [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x 16 x input_ids_dim1 x 1 x Frequency:24 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: Cast+ [1,0]<stderr>: AlleviationType: Recompute [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:1,024 x 1,024 x Frequency:97 [1,0]<stderr>: PatternShape:3 x 1,024 x Frequency:1 [1,0]<stderr>: PatternShape:8 x 64 x Frequency:24 [1,0]<stderr>: PatternShape:1,024 x 4,096 x Frequency:24 [1,0]<stderr>: PatternShape:4,096 x Frequency:24 [1,0]<stderr>: PatternShape:4,096 x 1,024 x Frequency:24 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: Subgraph: FusedMatMul+ [1,0]<stderr>: AlleviationType: Recompute [1,0]<stderr>: Patterns: [1,0]<stderr>: PatternShape:input_ids_dim0 x input_ids_dim1 x 4,096 x Frequency:24 [1,0]<stderr>: -------------------------------- [1,0]<stderr>: ================================= ``` "Type config:" whether recompute is enabled by users. 0 - disable, 1- enable. "Subgraph" means what kind of subgraph will be recomputed, in this case, it is a single node "Gelu", and it will be "Recompute". "Shape && Frequency" means, for this recompute, one tensor of size (batch size, 500) will be saved because it will be recomputed. Baseline On a 1P model (DEBERTA V2), sequence length 256, training with 16 A100 GPUs. With latest main branch, we can run batch size 16, and the maximum batch size < 32. So 16 is usually chosen by data scientists. 65% of 40GB memory is used during training. The SamplesPerSec=479.2543353561354. ![image](https://user-images.githubusercontent.com/10530022/188320941-13dde5e7-c32b-4399-a64b-6803fbb9dcda.png) With this PR Gelu is recomputed for saving memory peak, batch size 32 can be run. The 97% of 40GB A100 is used, the SamplesPerSec=562.041593991271 (1.17X of baseline). ![image](https://user-images.githubusercontent.com/10530022/188321081-f64811bf-9637-4873-8095-349de8d498cc.png) Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here.	2022-11-03 13:49:41 +08:00
Wei-Sheng Chin	b5904c40dd	Enable ORT in TorchDynamo (#13259 ) This PR enables ORT to execute graphs captured by TorchDynamo. Major compilation code is in `OrtBackend.compile` in ort_backend.py. `register_backend.py` is for plugging `OrtBackend` into TorchDynamo as a compiler.	2022-11-01 11:19:29 -07:00
Adrian Lizarraga	9d867a07c0	Fix regression in CustomOpApi::GetTensorData (#13450 ) - Reverts change to CustomOpApi::GetTensorData introduced by commit `5dae0c477d`, which causes infinite recursion. - Moves EndsProfilingAllocated to non-const session implementation (C++ API header).	2022-10-31 12:20:49 -07:00
Edward Chen	2ecd1d6622	Switch GSL to MS GSL 4.0.0 (#13416 )	2022-10-29 04:15:20 -07:00
Fei Hu	943e156f4c	Allow custom ops to set input memory type (#10879 )	2022-10-28 21:45:26 -07:00
cloudhan	fc12abf6b1	Enable/Disbale tunable GEMM by using tunable switch in provider options and env var (#13116 ) Related PRs #12853 This allows the user enable/disbale tunable GEMM on demand.	2022-10-19 22:35:08 -07:00
Scott McKay	565da71275	Make 'env' argument to Session const (#13362 ) ### Description <!-- Describe your changes. --> The Env argument does not need to be mutable to call the underlying C API. Update the Ort::Session ctor to have a const Env. All other changes are from clang-format running. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Cleanup	2022-10-19 14:23:24 +10:00
Dmitri Smirnov	f5e3165cc3	Fix move Base::operator= (#13355 ) ### Description Base::operator= move is broken, loses a valid ptr. ### Motivation and Context Address https://github.com/microsoft/onnxruntime/pull/13215#discussion_r997814275	2022-10-18 13:07:40 -07:00
Dmitri Smirnov	4a63cd0290	Improve thread pool creation failure handling. (#13313 ) ### Description Detect and report thread creation failure on Windows. Do not throw out of constructor after the thread is created, the thread handle is lost and cannot be joined, resulting in a deadlock. Make setting a thread priority on Linux consistent with windows. Set thread priority in the thread itself. Log failure properly, but do not exit the thread. ### Motivation and Context Address issues https://github.com/microsoft/onnxruntime/issues/13291 And https://github.com/microsoft/onnxruntime/issues/13285#issuecomment-1278063223	2022-10-15 17:57:19 -07:00
Dmitri Smirnov	f0fbff6dd4	Adjust docs to comply with Doxygen requirements (#13302 ) ### Description Fix up param names in docs ### Motivation and Context Make pipelines pass	2022-10-12 18:07:18 -07:00
cloudhan	1e55949a70	Fix unsound hipify in ROCm EP (#13269 ) Some cuda related things is still left in the rocm ep statically hipified code. Eliminate them to avoid confusion.	2022-10-12 08:32:42 +08:00
cloudhan	2cf5d04e3d	Fix clang-tidy(cppcoreguidelines-pro-bounds-array-to-pointer-decay) (#13241 ) clang-tidy says "Do not implicitly decay an array into a pointer; consider using gsl::array_view or an explicit cast instead" It is a false positive scattering around all our codebase when using helper macros. It is becuase for function with 4 char name, say `main`, the type of __FUNCTION__ and __PRETTY_FUNCTION__ is `char [5]`.	2022-10-11 13:16:48 +08:00
Dmitri Smirnov	5dae0c477d	Deprecate CustomApi and refactor public API for better safety and consistency (#13215 ) ### Description Deprecate CustomOpApi and refactor dependencies for exception safety and eliminate memory leaks. Refactor API classes for clear ownership and semantics. Introduce `InitProviderOrtApi()` ### Motivation and Context Make public API better and safer. Special note about `Ort::Unowned`. The class suffers from the following problems: 1. It is not able to hold const pointers to the underlying C objects. This forces users to `const_cast` and circumvent constness of the returned object. The user is now able to call mutating interfaces on the object which violates invariants and may be a thread-safety issue. It also enables to take ownership of the pointer and destroy it unintentionally (see examples below). 2. The objects that are unowned cannot be copied and that makes coding inconvenient and at times unsafe. 3. It directly inherits from the type it `unowns`. All of the above creates great conditions for inadvertent unowned object mutations and destructions. Consider the following examples of object slicing, one of them is from a real customer issue and the other one I accidentally coded myself (and I am supposed to know how this works). None of the below can be solved by aftermarket patches and can be hard to diagnose. #### Example 1 slicing of argument ```cpp void SlicingOnArgument(Ort::Value& value) { // This will take possession of the input and if the argument // is Ort::Unowned<Ort::Value> it would again double free the ptr // regardless if it was const or not since we cast it away. Ort::Value output_values[] = {std::move(value)}; } void main() { const OrtValue* ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue>(ptr)}; // onowned is destroyed when the call returns. SlicingOnArgument(unowned); } ``` #### Example 2 slicing of return value ```cpp // The return will be sliced to Ort::Value that would own and relase (double free the ptr) Ort::Value SlicingOnReturn() { const OrtValue ptr = nullptr; // some value does not matter Ort::Unowned<Ort::Value> unowned{const_cast<OrtValue*>(ptr)}; return unowned; } ```	2022-10-06 14:57:37 -07:00
Edward Chen	5c89c37f7f	Consolidate enabled/default kernel def type constraints (#13034 ) Consolidate enabled/default kernel def type constraint types into enabled.	2022-09-27 14:04:15 -07:00
RandySheriffH	a83a9ed6b0	Remove miscellaneous nuphar configs (#13070 ) Remove a handful of nuphar related configurations after deprecation. Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-09-26 13:41:28 -07:00
Edward Chen	5f611b63a1	Make classes IKernelTypeStrResolver and IKernelLookup have protected destructors. (#13059 )	2022-09-23 09:16:45 -07:00
wangxiyuan	952c99304a	Add CANN EP (#12416 ) Description: This PR adds Ascend CANN execution provider support. Motivation and Context - Why is this change required? What problem does it solve? As the info shown in the issue. CANN is the API layer for Ascend processor. Add CANN EP can allow user run onnx model on Ascend hardware via onnxruntime The detail change: 1. Added CANN EP framework. 2. Added the basic operators to support ResNet and VGG model. 3. Added C/C++、Python API support - If it fixes an open issue, please link to the issue here. https://github.com/microsoft/onnxruntime/issues/11477 Author: lijiawei <lijiawei19@huawei.com> wangxiyuan <wangxiyuan1007@gmail.com> Co-authored-by: FFrog <ljw1101.vip@gmail.com>	2022-09-22 14:53:40 -07:00
sfatimar	cccbe90764	Openvino ep 2022.2 v4.2 (#13023 ) This changes are to align OV 2022.2 Release with ORT . Changes CPU FP16 Support, dGPU Support, RHEL Dockerfile, Ubuntu 20 Dockerfile Motivation and Context - This change is required to ensure ORT-OpenVINO Execution Provider is aligned with latest changes. - If it fixes an open issue, please link to the issue here. Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: shamaksx <shamax.kshirsagar@intel.com> Co-authored-by: pratiksha <pratikshax.bapusaheb.vanse@intel.com> Co-authored-by: pratiksha <mohsinx.mohammad@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: nmaajidk <n.maajid.khan@intel.com> Co-authored-by: Mateusz Tabaka <mateusz.tabaka@intel.com> Co-authored-by: intel <intel@iotgecsp-nuc04.iind.intel.com>	2022-09-22 12:31:40 -07:00
Edward Chen	454f77cd94	Update kernel matching logic: decouple from op schemas and remove kernel def hashes (#12791 ) # Motivation Currently, ORT minimal builds use kernel def hashes to map from nodes to kernels to execute when loading the model. As the kernel def hashes must be known ahead of time, this works for statically registered kernels. This works well for the CPU EP. For this approach to work, the kernel def hashes must also be known at ORT format model conversion time, which means the EP with statically registered kernels must also be enabled then. This is not an issue for the always-available CPU EP. However, we do not want to require that any EP which statically registers kernels is always available too. Consequently, we explore another approach to match nodes to kernels that does not rely on kernel def hashes. An added benefit of this is the possibility of moving away from kernel def hashes completely, which would eliminate the maintenance burden of keeping the hashes stable. # Approach In a full build, ORT uses some information from the ONNX op schema to match a node to a kernel. We want to avoid including the ONNX op schema in a minimal build to reduce binary size. Essentially, we take the necessary information from the ONNX op schema and make it available in a minimal build. We decouple the ONNX op schema from the kernel matching logic. The kernel matching logic instead relies on per-op information which can either be obtained from the ONNX op schema or another source. This per-op information must be available in a minimal build when there are no ONNX op schemas. We put it in the ORT format model. Existing uses of kernel def hashes to look up kernels are replaced with the updated kernel matching logic. We no longer store kernel def hashes in the ORT format model’s session state and runtime optimization representations. We no longer keep the logic to generate and ensure stability of kernel def hashes.	2022-09-20 14:24:59 -07:00
Cheng	f26054deca	[XNNPACK] Support running in multi-thread with seperate pthreadpool (#11762 ) Description: Describe your changes. XNNPACK takes pthreadpool as its internal threadpool implemtation, it couples calculation and parallelization. Thus it's impossible to leverage ORT's threadpool (EIGEN/OPENMP based). So we enabled pthreadpool in XNNPACK EP in this PR. Case 1: Pthreadpool coexist with ORT-threadpool simply Expriments setup hardware:RedMi8A with 8 cores, ARMv7 The two threadpool has the same pool size form 1 to 8. Two models: mobilenet_v2 and mobilenet_egetppu. we can see the picture below and draw a conclusion, latency are even higher from 5 threads or more. ![image](https://user-images.githubusercontent.com/9417365/190550127-2304adfe-97ac-4aeb-91a0-4606b5305a82.png) Case 2: For the reason of performance regression with 5 more threads, ORT-threads are spinning on CPU and diddn't realease it after computation finished. It's equivalent of creating 5x2 threads for parallelization while we have only 8 cpu cores. So I mannuly disabled spinning after ort-threadpool finished and enabled it when enter ort-threadpool. The result is quite normal now. ![image](https://user-images.githubusercontent.com/9417365/190675230-0d85dd02-01f0-4255-967d-e3dbb2a1fe52.png) Case 3: Even we achieved a reasonable results with disabling spinning, Will ORT-threadpool still impact performance of pthreadpool? we have expriment setting up as: Setting ORT-threadpool size (intra_thread_num) as 1, and only pthreadpool created. Attention that, almost a third of ops are running by CPU EP. we are surprisingly find that disabling ort-threadpool is even better in performance than creating two threadpool. ![image](https://user-images.githubusercontent.com/9417365/190556480-d6507396-d777-44fc-94e1-938d2b9bb7d7.png) Case 4: Use a unified threadpool between CPU ep and XNNPACK ep. It's the fastest among all. But if we take the similar workload partition strategy as ORT-threadpool, it could be faster. ![image](https://user-images.githubusercontent.com/9417365/190674908-a68fd20f-bdf4-41f9-bf0a-76b304cda490.png) Motivation and Context - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. Co-authored-by: Jicheng Wen <jicwen@microsoft.com>	2022-09-20 16:02:15 +08:00
Tang, Cheng	739b5675c8	remove legacy compile api (#12932 ) Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-09-15 13:18:40 -07:00
Dmitri Smirnov	bc2df1bf95	Remove previously deprecated API (#12935 ) Remove previously deprecated API Format JS code, address review comments NPM Formatting	2022-09-14 10:58:03 -07:00
Scott McKay	1016c33519	Fix prefast warning in upsample.cc. (#12938 ) * Fix prefast warning. * Fix some other static analysis warnings.	2022-09-14 08:14:33 +10:00
Cheng	8cedafe250	[xnnpack] Have `Initializer` in Mobile related EPs in Minimal_build and creating EP specific dynamic-schema (#12555 ) * Remove the dependence of Qlinearsoftmax schema * refactor initializerview && create shared schema * Dynamic Create EP specific schema * Have Initializer in minimal_build * address comments * remove CancelFuseSubGraph	2022-09-06 14:32:15 +08:00
ashbhandare	27dde0b51f	Csharp bindings for on-device training APIs (#12404 )	2022-09-02 13:13:48 -07:00
Yulong Wang	82a28cc2c3	upgrade emsdk to 3.1.19 (#12690 ) * upgrade emsdk to 3.1.19 * fix build break * ignore '-Wunused-but-set-variable' in eigen * add malloc and free in exported functions * EXPORTED_FUNCTIONS	2022-08-30 13:42:45 -07:00
Baiju Meswani	b83ea3c2ff	Address prefast static analysis warnings (#12756 )	2022-08-29 10:09:32 -07:00
mwootton	817dc94345	Add first pass of rocm kernel profiler (#10911 ) * Add first pass of rocm kernel profiler * Clean up rocm_profiler. Format args. Demangle kernel names. Add Api EventRecords * Remove debug output * Temporarily disable profiling unit test 'api record check' for cupti * Fix compile error for non-gpu builds * Use common file for demangle and pid/tid. Namespace ThreadUtil. Fix gpu buffer clearing. * Merge demangle into profiler_common * Merge demangle into profiler_common part 2 * Style cleanup * Resolve linking issues via ProviderHost interface * Demangle cuda kernel names * Clean up comments * Fix formatting * Fix anal retentive formatting	2022-08-26 19:38:03 -07:00
edgchen1	c270ea148a	Move 'using common::Status;' from common.h to status.h.	2022-08-26 15:05:53 -07:00
Yulong Wang	c144acc534	Replace 'master' branch ref to 'main' in the code (#12547 )	2022-08-22 10:48:12 -07:00
Dmitri Smirnov	9481893b58	Replace to lock_guard as lighter class for locking (#12616 ) Replace to lock_guard as lighter class	2022-08-17 11:08:31 -07:00
Haoming Chen	8a038b9b0c	Fix a build error (#12600 ) LLVM compiler complains the std::hash<const char> and suggests std::hash<const void>. But the intention is to hash the name string instead of the pointer. So use std::hash<std::string> to be explicit.	2022-08-17 10:49:54 -07:00
Scott McKay	0b0c51e028	Support direct usage of ORT format model flatbuffer for initializers (#12465 ) * Add ability to use ORT format model flatbuffer directly for intiializers by leveraging the TensorProto external data infrastructure. Requires user to provide ORT format model bytes when creating the session, and set both `session.use_ort_model_bytes_directly` and `session.use_ort_model_bytes_for_initializers` to 1 in SessionOptions config entries (AddSessionConfigEntry in C API).	2022-08-12 18:31:43 +10:00
Changming Sun	ac7538b909	Remove CUDA 10.2 support (#12541 )	2022-08-10 22:46:41 -07:00
Dmitri Smirnov	c10704a501	Use alignas instead of naive padding to avoid false cache sharing (#12514 ) PerThread and ChildThreadStat alignas	2022-08-10 11:23:20 -07:00
Cheng	64e991a9fc	[Qlinearsoftmax] contrib cpu (#12177 ) * [Qlinearsoftmax] contrib cpu * int8 implementation * contrib operator md * qdq transformer test * new attribute: opset * doc * quantized tool * remove template to reduce Binary size * doc of contribe operators * enforce x_shape is valid * fix reduce_size if input-shape is dynamic * add UT * register one op for reducing binarysize * kernel hash update * docs/ContribOperators.md	2022-08-10 10:52:02 +08:00
Hector Li	730240d2a5	remove the link the comments (#12510 )	2022-08-08 15:20:40 -07:00
Scott McKay	8d830adf24	Rework parts of Graph::Resolve to reduce memory usage (#12176 ) * Rework some aspects of Graph::Resolve to reduce memory usage.	2022-08-05 13:20:25 +10:00
Dmitri Smirnov	a4ef0e7f7b	Remove dynamic allocation for ThreadPool ParallelSection (#12429 ) Use InlinedVector in a TP Store per thread parallel section in std::optional and avoid memory allocation	2022-08-04 09:46:16 -07:00
Ryan Hill	52d4699788	Minor doc fixes (#12388 )	2022-08-03 19:47:36 -07:00
Hariharan Seshadri	d5a1c01b38	Add C++ Session ctor taking model bytes and OrtPrepackedWeightsContainer (#12333 )	2022-07-29 12:32:43 -07:00
Yateng Hong	c579497134	Fix TRT custom op issue (#12283 ) * Pass schema registry on CreateModel. * Fix ORT_MINIMAL_BUILD. * Fix build issue.	2022-07-29 03:39:56 -07:00
Ryan Hill	3e014a5e5d	Fix C header to stop people accidentally copying the OrtApi by value (#12297 ) * Fix C header to stop people accidentally copying the OrtApi by value * Remove api_ from KernelTwo	2022-07-25 19:19:40 -07:00
Dmitri Smirnov	3bf614fd47	Eliminate memory allocations per recent profiling (#12225 ) * Alloc begin FeedsFetches refactoring Refactor Tensor class Fix buffer deletor Remove new/delete deleted Adjust alloc move Fix up xnnpack provider Clarifying the comment on Create()	2022-07-25 14:14:38 -07:00
Ashwini Khade	ceb76429db	Merge pull request #12056 from microsoft/bmeswani/merge-training_dev/on_device_poc Merge On-Device-Training Offline Tooling and C/C++ APIs	2022-07-21 15:09:48 -07:00
Baiju Meswani	cbf08c7a7b	Make GetTrainingApi as a part of the OrtApis, add Training API documentation and address other pull request review comments	2022-07-21 18:11:48 +00:00
Dmitri Smirnov	4f106d2b3b	Eliminate unnecessary status lock acquisition in TP (#12196 ) Eliminate unnecessary status lock acquisition in the Thread Pool	2022-07-19 14:16:12 -07:00
Chen Fu	040c2f4517	x86/64 U8S8 Gemm Precision Fix (#12088 ) Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in: https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied: 1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support 2. Session option "session.x64quantprecision" is on. 3. Constant weight tensor contains values outside of [-64, 63] range Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.	2022-07-13 10:12:25 -07:00
Baiju Meswani	a457ddc41d	Merge branch 'master' of https://github.com/microsoft/onnxruntime into bmeswani/merge_pr	2022-06-30 21:53:07 +00:00
Baiju Meswani	6e8edfff0c	Separate training apis from shared core apis (#12027 )	2022-06-29 14:12:29 -07:00
RandySheriffH	d5fcb432fa	Generalize native op creation (#11539 ) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo	2022-06-27 21:12:15 -07:00
Baiju Meswani	d25cf4df26	Merge branch 'master' into training_dev/on_device_poc	2022-06-24 20:18:19 +00:00
Dmitri Smirnov	088bc7494b	Deprecate APIs returning raw ptrs and provide replacements (#11922 ) Provider better documentation	2022-06-24 09:50:04 -07:00
G. Ramalingam	b1411c8357	Restructure function inliner (#11731 ) * Add nested function call tests * Add overload for Specialize * Pass symboltable to onnx shape inference * Avoid renaming empty names * Enable sequence_map tests which failed before this change	2022-06-24 09:21:31 -07:00
Dmitri Smirnov	607b7df060	Allow saving on CPU usage for infrequent inference requests by reducing thread spinning (#11841 ) Introduce Start/Stop threadpool spinning switch Add a session config option to force spinning stop at the end of the Run()	2022-06-23 10:04:37 -07:00
sfatimar	61a74f2f4d	Mohsin/enable dynamic shapes (#11867 ) * Add pypi build changes to latest Master * Add ORT training part of OV build * Disabling SqueezeOpTest.BadAxes * Add ONNXruntime branch ARG to Docker build * Changes to include file details versions * Commit File Version Updates * Change naming for linux build * Add fix for pylint format errors * Fix pylint warnings. * Enable Dynamic Shapes for OV_API_20 * Update requirements.txt whl version- internal_ci fix * Update backend_manager.cc MYRIAD Fix * Update wheel version in requirements.txt * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update setup.py * Fix pylint warnings * Fix pylint warnings 2 * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc * Update backend_manager.cc Co-authored-by: Preetha Veeramalai <preetha.veeramalai@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: Sahar Fatima <sfatima.3001@gmail.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com>	2022-06-21 08:03:58 -07:00
Dmitri Smirnov	267a424e52	Retry Rework execution frame to reduce memory allocations (#11897 ) * Revert "Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888)" This reverts commit `d2cbae3a04`. * Revert prepacked_weights to avoid indirect inclusion in CUDA and TRT code that breaks the build.	2022-06-20 10:29:43 -07:00
Edward Chen	a93fe7824a	Update EP compile API deprecation warning message. (#11808 ) Minor wording update to warning message to clarify that the function style Compile API is deprecated now and will be removed soon. Also updated some code comments.	2022-06-17 12:49:24 -07:00
Yi Zhang	d2cbae3a04	Revert "Refactor ExecutionFrame and SessionState to reduce memory all… (#11888 ) Revert "Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804)" This reverts commit `2ecba6fd25`.	2022-06-17 17:07:21 +08:00
stevenlix	bd65acd08d	Share execution context memory between TensorRT subgraphs (#11859 ) * share trt context memory * update parser to 8.4-EA * update parser to 8.4-GA * add context memory sharing enable option * update parser to 8.2-GA * fix format issue * reverse orders * fix format * fix format * fix issues	2022-06-16 22:42:40 -07:00
Dmitri Smirnov	2ecba6fd25	Refactor ExecutionFrame and SessionState to reduce memory allocations and improve data locality (#11804 ) Refactor ExecutionFrame and SessionState for better data locality and less memory allocations.	2022-06-16 16:50:48 -07:00
Scott McKay	d64f23fec0	EP factory creation cleanup and enhancements. (#11798 ) * Rework the EP factory creation setup so we're not cut-and-pasting function declarations in multiple places. Convert append EP for SNPE to be generic, and also use for XNNPACK. Add XNNPACK to C# API * Don't need stub for MIGraphX as it's using provider bridge. * Remove old 'create' functions that aren't applicable now that the EPs are built as separate libraries. * Only use EPs that require the layout transform if the opset is supported by the layout transformer. * Update wasm registration of xnnpack.	2022-06-16 07:01:41 +10:00
Ashwini Khade	f63e28c92f	C API version 0.001 (#11758 ) * C API version 0.001 * fix linker issues * fixes for save checkpoint api * plus fixes based on tests * plus test_runner and other changes * Plus cosmetic updates * remove unnecessary headers * plus some updates * plus more changes Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-06-15 11:13:35 -07:00
Chen Fu	d936751aad	QlinearConv threading adjustments (#11228 ) * Reserve the first core for the main thread Currently in "auto affinity" mode the worker threads are affinized to cores 0..(N-1), leaving the very last core for the main thread. This patch preserves core #0 for the main thread, and affinizes the worker threads to cores 1..N. * Avoid unneeded spin_pause in thread pool's worker threads Remove unneeded PAUSE instruction (0.1-0.2 usec latency) after a worker thread finds a task to execute. * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CPUs * MLAS/x86: optimize QLinearConv on hybrid CPUs Existing 4x task granularity for task partitioning on hybrid CPUs is not sufficient to compensate the difference of VNNI instructions throughput between performance and efficient cores. This patch... * Increases granularity for QLinearConv by 2x, to have 2x more tasks with 2x smaller output count * Limits QLinearConv task count from above, to avoid output count per task getting smaller than kernel's capability * Remove hardcoded task count for QLineConv as it limited scaling on 16+ cores CP * Addressing comments * combining x86 ARM branches in qlinearconv threaded job partition * revert first core assignment Co-authored-by: Saurabh <saurabh.tangri@intel.com> Co-authored-by: Chen Fu <fuchen@microsoft.com>	2022-06-14 14:42:12 -07:00
Vincent Wang	5ecfaef042	ATen Fallback for Inference (#11597 ) * aten op for inference * fix build error * more some code to training only * remove domain from operator name * move aten_op_executor ext out from ortmodule * add pipeline * add exec mode * fix script * fix ut script * fix test pipeline * failure test * rollback * bugfix * resolve comments * enable aten for python build only * fix win build * use target_compile_definitions * support io binding * turn off aten by default * fix ut Co-authored-by: Vincent Wang <weicwang@microsoft.com> Co-authored-by: zhijxu <zhijxu@microsoft.com>	2022-06-09 16:07:30 +08:00
Scott McKay	927bac0f86	Rework allocator sharing to work for multiple devices. (#11700 ) * Rework allocator sharing to work for multiple devices. * Update SessionState to not use allocator name in matching for consistency with IExecutionProvider. The name doesn't have any clear meaning (e.g. we use the same name for the per-thread allocator in the CUDA EP as the shared allocate there and in the TRT EP). * NOTE: this means we will have one allocator per OrtMemType+OrtDevice. * Reverse order when doing allocator setup in SessionState. This will result in the CPU and CUDA EPs allocators being preferred (they are the most configurable), and also means the per-thread CUDA allocator for default GPU memory will be used even when TRT is enabled. * NOTE: Combined with the change to remove the allocator name from the key this will mean that if CUDA and TRT or ROCM and MIGraphX are both enabled the CUDA/ROCM per-thread allocator will be used to allocate GPU memory. * Use InsertAllocator instead of TryInsertAllocator. Each EP should be registered once, and we should only enter RegisterAllocator once, so the 'try' should not be required and would indicate an unexpected setup was involved. i.e. better to fail and figure out if we need to support that setup. * Add some clarifying comments around how replace allocator works. * Add unit testing for setup where EP has local allocator that may get out of sync with values in the IExecutionProvider base class. * Fix invalid check of whether data is on CPU to use device info instead of allocator name.	2022-06-09 17:38:38 +10:00
Changming Sun	3c1dd9514d	Revert "fixed point based requantization on arm64 (#11540 )" (#11732 ) This reverts commit `1f2c926`. Because it makes our packaging pipeline crash Error message: [ RUN ] QLinearConvTest.Conv3D_S8S8_Depthwise Test #1: onnxruntime_test_all ...................Subprocess killed***Exception: 838.24 sec We haven't successfully reproduced the bug on a real ARM64 hardware. Currently we only saw it showed up with qemu. More investigations are on-going.	2022-06-03 19:12:25 -07:00
Hector Li	95a16c1ffe	Snpe ep (#11665 ) * Initiate Ort SNPE EP * fix snpe ep windows build which is caused by the utility method (ToUTF8String) name change on master * correct the source path for libonnxruntime.so while building for andorid package * add AdditionalDependencies for amr64 * On MS-Windows, the patchfile must be a text file, i.e. CR-LF must be used as line endings. A file with LF may give the error: "Assertion failed, hunk, file patch.c, line 343," unless the option '--binary' is given. * fix build failure if snpe is not enabled * update doc for contrib op * separate out snpe ep settings to onnxruntime_snpe_provider.cmake * renaming according review comments * update according review comments	2022-06-03 14:10:02 -07:00
Scott McKay	4445dd6bc1	XNNPACK EP (#11445 ) * Implement XNNPACK support via an EP. * Layout transform uses the GraphPartitioner infrastructure. * Node fusion is supported. * Conv and MaxPool implementations were ported from Changming's PR. * Added optional mutex in InferenceSession::Run as we only want to allow sequential calls if xnnpack is enabled	2022-06-03 20:22:34 +10:00
Yufeng Li	1f2c92673b	fixed point based requantization on arm64 (#11540 ) * fixed point based requantization on arm64 * reverse MlasConvSymDepthwiseKernel u8s8 and s8s8 order	2022-06-02 12:34:17 -07:00
Edward Chen	738d9b153c	Consolidate several types into onnxruntime::ArgType. (#11430 )	2022-05-09 14:44:28 -07:00
Tang, Cheng	3f3c5fcd68	Unify the Compile API for mobile build and normal build (#10632 ) * use the lightweight compile api as default; use dnnl ep for testing * apply to tensorrt ep * fix the missing files * fix build * fix the copy issue on linux * migrate migraphx and openvino ep * fix openvino build break * fix linux build * fix unused parameter * fix coreml build * use graph view's filtered initializers * fix openvino break * fix tvm compile api * fix tvm / rknpu / vitisai ep build * add IsInitializedTensor in graph_viewer; fix nuphar build * use serializer directly as tvm ep is still static lib * fix the type mismatch * fix the type mismatch * fix merge conflict * add a comment * fix minimal build * fix the DML EP's legacy approach * save type/shape in dnnl IR * fix linux break * fix tvm failure * dnnl ep: move initializer referenced out of dnnl subgraph * Revert "add IsInitializedTensor in graph_viewer; fix nuphar build" This reverts commit 1cc3c7f08c16fee4fe3309a67209eb769d479587. * add IsInitializedTensor to graph viewer * add the legacy code for nuphar build to temporarily make nuphar build work * ignore internal test for nuphar * remove the out of date tests * keep the legacy API in EP for a while * turn serializer into a static function * update comments * fix tvm build * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update include/onnxruntime/core/framework/execution_provider.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update onnxruntime/core/framework/execution_provider.cc Co-authored-by: Pranav Sharma <prs@microsoft.com> * updatee comments; add warning message for legacy compil call * add a flag to control out of scope arg in serialization * fix trt build; improve the test * resolve merege errors * fix a typo Co-authored-by: Cheng Tang <chenta@microsoft.com> Co-authored-by: Cheng Tang <chenta@microsoft.com@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net> Co-authored-by: Pranav Sharma <prs@microsoft.com>	2022-05-05 08:30:07 -07:00
Changming Sun	963e1ace4e	Fix SAL annotations for custom op (#11432 ) Fix SAL annotations for custom op. For example, "_In_" only applies to pointers, not integers.	2022-05-04 10:47:28 -07:00
RandySheriffH	8d69b9398b	APIs for custom op to invoke ort operator directly (#10713 ) * draft kernel creation * setup eager context * call into kernel in eager mode * redefine test case * refact eager context * add comment * remove header * rename argument * redefine API definition with types * list outputs as argument * switch to int to represent length * fix compile err * create attribute API * add test case for topk * remove bool from c api * add gru test case * remove var * fix compile warnings * rename status * fix compile err * exclude sparse tensor * fix comments * fix comments * fix build err * rename file and move location * format code * move file to session folder * fix comments Co-authored-by: Randy <Randy@randysmac.attlocal.net>	2022-05-03 14:16:30 -07:00
Changming Sun	5023f6750b	Revert "Call pluggable EP's shutdown function in Environment::~Environment() (#11120 )" (#11393 ) This reverts commit `4983d6e5d6`. We can't destroy OrtEnv through python's atexit function, because at that time there might be many other ORT python objects alive.	2022-05-02 14:38:31 -07:00
Tang, Cheng	4b875e3543	Re-implment the function support in onnxruntime (#11167 ) * initial fix * refactor the function handle * update the implementation * fix linux build break * fix training build * fix minmal build * fix gradient checker * deprecate the local function members in graph. host it in model * fix changming's comments * fix comments about inlined containers * fix a missed inlined container * fix training build * avoid const for std string_view Co-authored-by: Cheng Tang <chenta@microsoft.com>	2022-04-29 10:15:58 -07:00
Vincent Wang	1c64351e09	Create Tensor with Strides (#11294 ) * create tensor with strides * resolve comments * refactor Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2022-04-28 16:49:37 +08:00
Dmitri Smirnov	a7d0158c24	Introduce a way to disable Abseil library (#11353 ) Introduce a way to disable Abseil library. Use cmake extra args, no new build switch.	2022-04-27 08:57:52 -07:00
Edward Chen	4d0214f851	Move Contains() helper function to a higher common.h. (#11289 )	2022-04-21 09:31:48 -07:00
Gary Miguel	7aa4af238a	Add strict_shape_type_inference config option (#11081 ) Prior to this, certain shape and type errors were surfaced only when the model was using the latest known op set version. Providing users an explicit option allows for better testing of code that produces models, which includes unit tests within this repo and other repos such as the TF-ONNX and PT-ONNX converters. Remove the previous behavior which seems quite counter-intuitive: an otherwise identical model with a later op set version should be treated identically in this regard. The option defaults to false to avoid causing errors for users that rely on the previous permissive behavior. Turned on the strict enforcement by default in OpTester, which revealed a few disagreements between ORT and ONNX on what the correct output shape should be. Fix shape inference bug in ReduceSumTraining with noop_with_empty_axes=1 which was revealed. Fix TensorOpTest.Unsqueeze_scalar, which was testing negative axes on an op set version where the op did not actually support negative axes. Fixes #9506.	2022-04-21 08:32:40 -07:00
Edward Chen	4854a09340	Consolidate utils::ToTensorProtoElementType, TypeToDataType, and data_types_internal::ToTensorDataType. (#9824 )	2022-04-20 12:45:53 -07:00
Ahmad Zakaria	63ff391b16	add AppendExecutionProvider_CUDA_V2 to the C++ api (#11153 )	2022-04-14 17:33:27 -07:00
Vincent Wang	9707181257	fix build error (#11199 )	2022-04-13 13:09:19 +08:00
Dmitri Smirnov	12c687f594	Rework initializer.cc to eliminate code duplication (#11131 ) Rework initializer.cc to eliminate code duplication and add type enforcement. Address review comments. Add literal operators for MLFloat16 abd BFloat16 and tests.	2022-04-08 09:42:31 -07:00
Justin Stoecker	7609694464	Enable building with a GDK (#11126 )	2022-04-07 15:06:31 -07:00
Changming Sun	4983d6e5d6	Call pluggable EP's shutdown function in Environment::~Environment() (#11120 ) I disabled some tests temporarily. I will move them to a separated executable file in another PR. In the future, I want to combine onnxruntime::Environment and OrtEnv classes. Now we have 3 env classes, it is too confusing: 1. onnxruntime::Env 2. onnxruntime::Environment 3. OrtEnv Our python binding uses onnxruntime::Environment, while all other language bindings use OrtEnv. So python doesn't unload EPs but the others do. It's better to make them consistent. Please note even I added the call, currently the unload function still is a no-op on Linux. So, currently on Windows we must unload the EPs while on Linux we must not do it.	2022-04-07 14:11:29 -07:00
Dmitri Smirnov	2700261f7c	Provide an API to supply external initializers data from user buffers (#11109 ) Imlpement AddExternalInitializers	2022-04-07 12:21:53 -07:00
Maajid khan	81fa28bc56	OpenVINO-EP v4.0 Release PR with OpenVINO 2022.1 (#11025 ) * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Modification to include new api 2.0 changes in the code * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Log comments updated * Changes to enable 2.0 api * Enabling ov-ep for 2022.1 Release ->Added ov-ep 2022.1 flow ->Validated CPU Unit tests with OV Master using onnxruntime_test_all unit tests. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix for output mismatch b/w OpenVINO and ONNX Refer: https://jira.devtools.intel.com/browse/CVS-60310 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enabling Adobe ops ->Enable Resize op for iGPU ->Enable Add op for iGPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing irrelevant conditions ->Removing some conditions from GetCapability() which are now not required. (Removed conditions for OV version support less than 2021.2) Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable upsample op Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Enable Adobe proxy-e model Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Removing any extra conditions for Opset13 ops * Opset13 changes Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Exception handling for devices * Added comments * Implement GPU Throttling feature Added GPU Throttling feature for iGPU's. when user enables it as a runtime option, it helps in reducing overall CPU usage of the application Added changes to exercise this option using onnxruntime_perf_test application. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Renaming the runtime config option Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added the user to video and users group * Handling_GPU.0_GPU.1 * Handling special conditions ->Handling corner cases for device_type checks Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added opset13 changes ->Enabled Few ops ->Added Debug info for case 3b in getcapability() Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix build issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes issues Fixes compiler warnings c4458 on windows. Fixes the bug in device_type check logic Adds print info for enable_opencl_throttling option in onnxruntime_perf_test Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> commit to make openvino_2021.4 compatible * Fixed IO Buffer Optimization * Fix output names issue * Fix 2021.3 branch * Bug Fix for Multiple inputs/outputs - Assigns the right output_name and input_name for the graph when returned by CompiledModel::inputs() OV function. - Also takex care of output mismatch issue b/w openvino output and onnx output Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add comments for the changes made Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * IO Buffer Changes * Commit for Disabling GPU Throttling for 2021.4 * Updated branch * Fix windows build ->Fixed windows build in debug mode ->Disabled scatternd3_tensor_int64 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed CPP Unit tests for CPU -Fixed shrink, MVN, ReduceL2, Maxpool, upsample, scatter, slice, reshape, unsqueeze. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed first set of GPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed additional failing tests on GPU ->Added conditions to disable certain ops under certain conditions ->Disabled certain tests ->Added some op supports for no_dimension supported Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added Expand op support for CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added condition for squeeze op ->Shape can't have empty axes attribute Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Add support for LessOrEqual op function Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * OV Interface wait for replaced by indefinite wait call * use names from ONNX model to access OV tensors This chnage is to use the input/output names retrieved from original onnx model to access OV tensors and to check if there's any input or output names mismatch b/w ONNX naming and OV naming. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixes Myriad unit tests and other issues ->Fixes Myriad CPP unit tests ->Fixes output mismatch issue with models with sub graph partitioning Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fix segfault issue ->Fixed case 3b condition in get_capability() which was causing the segfault issue Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed build isuse with ov 2021.4 with I/O buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disables performance counters for I/O Buffer Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed inputs/outputs mismatch for HDDL with 2022.1 Signed-off-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com> * Fix to enable GPU FP16 * Enabled mlperf_ssd_mobilenet_300 model fully on CPU Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Added ov version specific dll packaging for nuget * Fixed conditions for few ops Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Dockerfile updates * Updated License Info -Updated the copyrights License Info -modified FP16 transformations with OV 2022.1 Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling mlperf_ssd_mobilenet_300 model ->Disabled this model for openvino. The test is failing in Internal_CI pipelines. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Disabling failing python CPU Tests Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed flake8 python errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: hdgx <harinix.d.g@intel.com> Co-authored-by: mayavijx <mayax.vijayan@intel.com> Co-authored-by: sfatimar <sahar.fatima@intel.com> Co-authored-by: mohsinmx <mohsinx.mohammad@intel.com> Co-authored-by: Mohammad Amir Aqeel <mohammadx.amir.aqeel@intel.com>	2022-04-06 13:30:33 -07:00
Vincent Wang	3b6cee8059	[CUDA] Optimize Conv and ConvGrad for Training (#10999 ) * Optimize Conv and ConvGrad for Training * add provider option to control * fix typo	2022-03-29 07:31:36 +08:00
Chi Lo	8ba52b0a05	Bump master version to 1.12 (#10797 ) * bump master version to 1.11 * bump master version to 1.12	2022-03-28 12:30:11 -07:00
Scott McKay	47c09e6701	Clarify usage of kOnnxDomainAlias. (#10962 ) * Clarify usage of kOnnxDomainAlias.	2022-03-25 09:52:59 +10:00
Leandro Gracia Gil	1cc2cfb7b8	Move #ifndef ORT_CXX_API_THROW to the no exceptions case. (#10937 ) This is related to https://github.com/microsoft/onnxruntime/issues/10564 which introduced a fix in the wrong case where exceptions are enabled.	2022-03-21 11:12:56 -07:00
Valery Chernov	625a1f7673	[TVM EP] code refactor (#10655 ) * rename info to options for TVM EP * transfer options processing from TVMExecutionProvider to TVMEPOptions * transfer TVMRunner to separated files * implement TVMCompiler class * replace CompileFunc by TVMCompiler object. update TVMRunner. now it does not depend on TvmExecutionProvider * correct logging of TVM EP options * RunnerImpl, GERunnerImpl and VMRunnerImpl were implemented * add prepareComputeInfo method * remove update_output_shapes flag * embed all TVM EP dependences to tvm namespace. transfer model compilation from TVMRunner. connect TVMRunnerImpl to TVMRunner * refactor compileModel method * small cleaning * separate TVM EP options data store and processing * replace TvmTensorShape by InlinedVector with max_size 5 * correct indentation * update TVM hash Co-authored-by: Valery Chernov <valery.chernov@deelvin.com>	2022-03-16 13:55:04 +01:00
Edward Chen	f468ea40e5	Refactor Node::AddAttribute() (#10869 )	2022-03-16 14:53:00 +10:00
Edward Chen	e53422c6d0	Update convert_onnx_models_to_ort.py to support runtime optimizations. (#10765 ) Add runtime optimization support to ONNX -> ORT format conversion script. Replace `--optimization_level`, `--use_nnapi`, and `--use_coreml` with a new `--optimization_style` option.	2022-03-14 16:50:41 -07:00
Hariharan Seshadri	e80ff63274	Fix bug in MemcpyToHost (#10816 )	2022-03-10 07:02:27 -08:00
Edward Chen	c147c9dda6	Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD. (#10778 ) Remove ORT_ENABLE_RUNTIME_OPTIMIZATION_IN_MINIMAL_BUILD as it is now implied by ORT_EXTENDED_MINIMAL_BUILD. Remove related CMake option.	2022-03-08 16:18:49 -08:00
Vincent Wang	4a38f9e31d	enable strided tensor for training only (#10748 )	2022-03-08 08:31:28 +08:00
Fei Hu	60acfd3dd8	Support CUDA Graph in the CUDA EP (#9978 )	2022-03-06 20:47:31 -08:00
Scott McKay	e337f5faf3	Enable QDQ cleanup and NHWC optimizers in an extended minimal build. (#10729 ) * Enable QDQ cleanup and NHWC optimizers in an extended minimal build.	2022-03-04 15:45:42 +10:00
Rachel Guo	a9dc50ba8b	Add option to force QDQIsInt8Allowed to return true when exporting to ORT format (#10719 ) * wip * save * minor update * fix * fix * Revert "fix" This reverts commit `a76f364b2d`. * revert * revert * revert submodule removal * address pr comments * minor fix * address cr comments * fix format Co-authored-by: rachguo <rachguo@rachguos-Mini.attlocal.net>	2022-03-02 23:26:14 -08:00
Yulong Wang	f4b2d3af2b	Upgrade emsdk to 3.1.3 (#10577 )	2022-02-28 23:52:41 -08:00
Vincent Wang	9a22b5d253	Strided Tensor Support for Eager Mode (#10578 ) * strided tensor for eager mode * fix build and resolve comments * fix win x86 build	2022-03-01 14:25:31 +08:00
Dmitri Smirnov	e23a224518	Fix CUDA 10.2 compile error due to inlined_containers.h inclusion (#10702 ) Fix CUDA 10.2 compile error due to inlined_containers.h inclusion into a common CUDA header. Use NumberOfNodes() to reserve space in a hash table Prefer separate call to reserve() rather than passing in the hash table constructor. They have somewhat different meaning.	2022-02-28 19:56:44 -08:00
cloudhan	3243c9579f	Fix VLOG?_DEFAULT macros usability. (#10568 ) * Add `set_default_logger_verbosity` api. * fix docs * make flake8 happy	2022-03-01 13:16:26 +10:00
Scott McKay	1f6d8248da	Add optional optimizer to remove leftover Q->DQ pairs after all other QDQ processing has completed (#10659 ) Add an optimizer that can remove leftover Q->DQ pairs. Depending on the model this may help with performance and/or improve accuracy. Optional as it could make things worse so user needs to be aware of this and test what works best for their scenario. Enable with SessionOptions config param `session.enable_quant_qdq_cleanup`	2022-03-01 08:05:02 +10:00
Thiago Crepaldi	e788cc2a23	Convert com.microsoft::ATen into org.pytorch.aten::ATen onnx op (#10060 ) Signed-off-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2022-02-28 14:14:45 -05:00
Ryan Hill	eb116595d4	Add ability to customize ORT_CXX_API_THROW (#10688 )	2022-02-28 00:15:10 -08:00
Dmitri Smirnov	b30e0e2283	Remove inline_containers include from tensor_shape (#10682 ) Hide Inlined Hash set and maps guts behind template forward declarations. Currently CUDA 10.2 compiler can not compile abseil but provider interfaces use those types in their signatures. InlinedVector seems to be fine. Introduce core/common/inlined_containers_fwd.h header	2022-02-26 20:07:18 -08:00
Dmitri Smirnov	2679711bee	Refactor transformers and other code to reduce memory allocation calls (#10523 ) Work on minimizing memory management calls by reducing number of allocations and copies. Replace std::unordered_set to InlinedHashSet and add usage of InlinedVector. Employ std::move() to minimize copying and memory allocations. Remove copying of the const shared data into each of the PropagateCast transformer instances. Move inlined_containers.h header to include/common Adjust AsSpan imlementation for C++ < 17	2022-02-24 16:17:14 -08:00
RandySheriffH	e056fbaa51	Add restrictions for hybrid cpus for thread pool task distribution (#10393 ) * add restrictions for hybrid cpus * add unit test to mock hybrid cpu * attach hybrid flag * add mocking interface to CpuInfo * make is_hybrid * make mock function const * add force_hybrid for thread pool * remove header	2022-02-17 14:34:09 -08:00
Ashwini Khade	f436d3437e	Add layout transformer for NNAPI (#10371 ) * Add layout transformer for NNAPI * plus merge fixes * plus some more merge fixes * test fixes * comments + cleanup * plus updates * post merge changes * enable layout transformer in extended minimal build * plus more comments * more tests + fix CI * plus updates per review * more updates per review * fix file name * fix qdq tests * plus more updates * plus updates * typo fix * fix qdq selection in 2nd optimization pass * fix typo * fix a test * update dependency structure for layout transformer * plus updates * more updates * plus change * more updates to fix linker error in minimal build * remove unnecessary headers	2022-02-15 20:25:29 -08:00
Vincent Wang	ceb1e2b1a6	[ROCm] Bugfix of BFloat16-float conversion and Add FastGelu Kernel for AMD (#10557 ) * bf16 bugfix on amd * enable fastgelu ut on amd	2022-02-16 11:11:08 +08:00
Valery Chernov	1cdc23aba4	[TVM EP] Rename Standalone TVM (STVM) Execution Provider to TVM EP (#10260 ) * update java API for STVM EP. Issue is from PR#10019 * use_stvm -> use_tvm * rename stvm worktree * STVMAllocator -> TVMAllocator * StvmExecutionProviderInfo -> TvmExecutionProviderInfo * stvm -> tvm for cpu_targets. resolve onnxruntime::tvm and origin tvm namespaces conflict * STVMRunner -> TVMRunner * StvmExecutionProvider -> TvmExecutionProvider * tvm::env_vars * StvmProviderFactory -> TvmProviderFactory * rename factory funcs * StvmCPUDataTransfer -> TvmCPUDataTransfer * small clean * STVMFuncState -> TVMFuncState * USE_TVM -> NUPHAR_USE_TVM * USE_STVM -> USE_TVM * python API: providers.stvm -> providers.tvm. clean TVM_EP.md * clean build scripts #1 * clean build scripts, java frontend and others #2 * once more clean #3 * fix build of nuphar tvm test * final transfer stvm namespace to onnxruntime::tvm * rename stvm->tvm * NUPHAR_USE_TVM -> USE_NUPHAR_TVM * small fixes for correct CI tests * clean after rebase. Last renaming stvm to tvm, separate TVM and Nuphar in cmake and build files * update CUDA support for TVM EP * roll back CudaNN home check * ERROR for not positive input shape dimension instead of WARNING * update documentation for CUDA * small corrections after review * update GPU description * update GPU description * misprints were fixed * cleaned up error msgs Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru> Co-authored-by: Thierry Moreau <tmoreau@octoml.ai>	2022-02-15 10:21:02 +01:00
Chi Lo	0f5d0a091a	Make user capable of adding new field in OrtTensorRTProviderOptionsV2 as new provider option (#10450 ) * modify code for add additional field in OrtTensorRTProviderOptionsV2 * add include file * fix typo * fix bug * add comment * fix code * revert change	2022-02-05 11:15:12 -08:00
Edward Chen	c43c1691ad	Enable transpose optimizer in minimal extended build (#10349 ) Enable transpose optimizer and infrastructure it depends on in a minimal extended build.	2022-01-31 09:41:04 -08:00
Dwayne Robinson	b02f4ece5e	Remove cbegin and cend calls which do not exist in std::span or gsl::span (#10426 )	2022-01-28 14:25:12 -08:00
Edward Chen	0e951d7d6b	Add some more documentation for the C/C++ API tensor creation functions. (#10394 )	2022-01-27 13:19:11 -08:00
Changming Sun	ec4362f8f3	Enable more static analysis warnings and enable the analyzer for training cpu (#10176 )	2022-01-27 11:17:20 -08:00
Dmitri Smirnov	3367ddc5ba	Add abseil cgmanifest declaration. Update coding standards. (#10374 ) Add abseil cgmanifest declaration. Update coding standards for InlinedContainers Adjust coding guidelines. Add default N calculation for InlinedVector<T, N> for general use. Rename T from InlinedShapeVectorT. Fix Eager build Add LLVM Copyright with modified derived code notice.	2022-01-27 08:32:05 -08:00
Weixing Zhang	ea9c8a7cdc	support MIGraphXEP to work with ROCMEP for inference on AMD GPU (#10368 ) Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Support MIGraphXEP to work with ROCMEP for inference on AMD GPU	2022-01-26 15:52:56 -08:00
Edward Chen	df16c605e8	Add "available since" message for C API additions since v1.10.0. (#10348 )	2022-01-25 10:15:34 -08:00
Edward Chen	4b87d2c172	Fix dockerfiles/Dockerfile.arm32v7 build. (#10360 ) Install CMake, ignore some Eigen warnings.	2022-01-24 19:06:09 -08:00
Dmitri Smirnov	7e092a7e3f	Reduce number of memory allocations based on a customer profiling case (#10193 ) Add abseil and inlined containers typedefs Introduce TensorShapeVector for shape building. Use gsl::span<const T> to make interfaces accept different types of vector like args. Introduce InineShapeVectorT for shape capacity typed instantiations Refactor cuda slice along with provider shared interfaces Refactor Concat, Conv, Pad Build with Conv Einsum and ConvTranspose refactored. Remove TesnorShape::GetDimsAsVector() Refactor SliceIterator and SliceIteratorBase Refactor broadcast Refactor Pads for twice as long Remove memory planner intermediate shapes vector Refactor orttraining Fix passing TenshroShapeVector to tests Remove abseil copy and submodule, use FetchContent_Declare/Fetch Path with separate command Make RocmAsyncBuffer accept anything convertible to span. Adjust Linux GPU pipeline.	2022-01-24 10:40:46 -08:00
Vincent Wang	44e2db9397	CUDA BFloat16 Refactor (#10085 )	2022-01-14 19:38:56 +08:00
RandySheriffH	79d2a0d185	Dynamic cost model to mitigate high E2E perf variance (#9833 ) * commit dyamic block size * summarize granularity * add configure * add test case * call std stoi * add comments * fix typo * rename var * update comment * reset default * better comments * extend LoopCounter for dynamic blocking * fix comments and add more UT * update comments * swtich type to std::ptrdiff_t * format code with better indention * cast ptrdiff_t * fix typo	2022-01-11 17:26:41 -08:00
Shucai Xiao	ce103ace93	Amdmigraphx fix build error (#9272 ) * fix build error * rename a missing api for the MIGraphX EP	2022-01-10 15:18:43 -08:00
Dwayne Robinson	1f5b073508	Minor DirectML EP provider factory comments (#9965 )	2022-01-10 02:06:31 -08:00
Nat Kershaw (MSFT)	d52d3c0052	Update C/C++ API docs automation to create a PR (instead of push to publish branch) (#10093 )	2022-01-07 16:16:47 -08:00
Hariharan Seshadri	0552a47ec2	Enable CUDA provider option configuration for C# (#10188 )	2022-01-06 11:03:14 -08:00
Edward Chen	792db33f01	Enable loading of ORT format model graph runtime optimizations (#9901 ) Initial implementation of load/replay of runtime optimizations in an ORT format model.	2022-01-04 12:09:07 -08:00
stevenlix	05d20343ee	Remove duplicated constant initializer copies for TensorRT nodes (#10105 ) * add new field constant_initializers in metadef and remove constant initializers from trt node inputs * remove redundancy * use GetConstantInitializer() to get constant initializers * add ORT_ENFORCE check Co-authored-by: Ubuntu <azureuser@orteplinuxdev.bxgbzpva45kedp3rhbsbit4phb.jx.internal.cloudapp.net>	2021-12-22 12:19:56 -08:00
Changming Sun	4e9e01cb3c	Fix SDL warnings in CPU EP (#9975 )	2021-12-19 20:54:29 -08:00
Edward Chen	3466ee45a3	Add hash value typedef. (#9710 ) Add a typedef for the various hash value variables. Use of a typedef conveys some additional meaning.	2021-12-15 19:07:17 -08:00
Valery Chernov	b327e89efa	Standalone TVM Executor Provider (#10019 ) * squashed commit for standalone tvm execution provider * critical fix for correct python build with stvm ep * get tuning log file from ep options. It has priority over AUTOTVM_TUNING_LOG * updates and fixes * update parsing of stvm provider options * add support of external data for onnx model * add conditional dump of subgraphs * remove unused code * get input tensor shapes through provider options. get output shapes for fixed input ones by TVM API * support AUTO_TVM tuning log file inside ORT. Selector for Ansor and Auto_TVM is provider option (tuning_type) * add fp16 * add functionality of conversion of model layout to NHWC if need. Necessary parameter was added to STVM provider options * fix license text in header. fix log format * small fixes * fix issues from flake8 * remove model proto construction from GetCapability * reserve memory for vector of DLTensors * add simple tutorial for STVM EP * STVM docs * jroesch/tvm -> apache/tvm * remove dead code, unneccessary logs and comments * fix in readme * improve tutorial notebook * tvm update * update STVM_EP.md * fix default value * update STVM_EP.md * some TODOs for the future development * shorten long lines * add hyperlink to STVM_EP.md * fix Linux CI error * fix error in csharp test Co-authored-by: Jared Roesch <jroesch@octoml.ai> Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2021-12-15 16:59:20 -08:00
Changming Sun	20f8a06f1f	Remove OpenMP code (#10032 )	2021-12-15 00:58:42 -08:00
Changming Sun	9d9ebd3b85	Fix some static analysis warnings in the core framework (#10033 )	2021-12-14 14:41:42 -08:00
Ryan Hill	343a76945b	Fix some documentation errors plus ones generating doxygen warnings (#9993 )	2021-12-09 17:42:34 -08:00
Dmitri Smirnov	a7f649db7c	Enable proper override using MIMalloc (#9944 ) Redirect memory allocations to MiMalloc and advance its version to v2.0.3 Refactor for a universal ifdef	2021-12-07 17:56:58 -08:00
Ryan Lai	57a6f7c205	Various fixes to fix WindowsAI RI build. (#9877 ) * WAI RI fixes * span changes * Spaces * Additional warnings to fix * Fix redundant commment	2021-11-29 21:33:15 -08:00
jingyanwangms	bf5e9a5044	bumping up ORT_API_VERSION to 10 (#9838 ) Co-authored-by: Jingyan Wang <jingywa@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-11-22 20:27:45 -08:00
Edward Chen	bcc6ab29f6	Trim DataTypeImpl binary size (#9813 ) * De-virtualize DataTypeImpl::AsXType() functions. * Refactor helpers.	2021-11-22 12:06:24 -08:00
Dmitri Smirnov	567749b2dc	Expose IOBinding SynchronizeInputs/Outputs via C/C++/C# And Python APIs (#9823 ) Add C/C++ APIs for SynchronizeBoundInputs/Outputs Add python bindings Expose SynchronizeBoundInputs/Outputs to C# API	2021-11-22 09:45:31 -08:00
Wei-Sheng Chin	e520bb5145	Improve print functions for NodeArg, Node, and Graph (#9801 )	2021-11-19 09:48:27 -08:00
Scott McKay	dc1724b0e2	Reduce DataTypeImpl binary size (#9783 ) * Reduce the number of virtual methods in DataTypeImpl to reduce binary size. Refactor some helpers to reduce the amount of templatized code.	2021-11-19 10:29:12 +10:00
Dmitri Smirnov	6284cbe833	Add TensorShape noexcept for move ops and fix some warnings (#9802 ) Add TensorShape noexcept for move ops and fix some warnings	2021-11-18 15:27:24 -08:00
Hariharan Seshadri	e23892ddbe	Support disabling support for the optional type in ORT builds (#9745 )	2021-11-17 19:13:28 -08:00
satyajandhyala	421e4c03ce	Update default cast propagation strategy from None to FloodFill (#9713 ) * Changed the default cast propagation strategy from None to FloodFill.	2021-11-16 13:15:57 -08:00
Edward Chen	9acbfeba09	Address some code scan issues. (#9752 )	2021-11-16 10:24:46 -08:00
sfatimar	1d03baa8cc	Openvino ep 2021.4 v3.3 (#9588 ) * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Remote Context Plugin * changes for IO Buffer plugin * erronous couts added * erronous entry rectified * Added checks for Hetero/Multi Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Set the Openvino OP Buffer also as output * Enable AUTO plugin in OpenVINO EP Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Please commit error message and rectification of param.context * Alignment fixed Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Changed the string to OpenVINO_GPU * hanged OpenVINO to to OpenVINO_CPU * Onnxruntime updated API for memory location * Removing Duplicate LOG Error * Tensor.h removed DeviceType function. Updated comment * API Comments updated * Removing changes to Provider Indo * Erronous commit * Removing Extra logs * Merge CMAKE * Not copy from a local location * Duplicate Entry * Remove extra line Co-authored-by: MaajidKhan <n.maajidkhan@gmail.com>	2021-11-15 13:41:12 -08:00
Sheil Kumar	3d0bd2596f	Enable creating OrtValues from ID3D12Resources from the onnxruntime C-API (#9686 ) * Add onnxruntime-windows api. * minor fixes * add to package headers * Build ort_dml_api for provider extensions. * Cleanup * misc comment * remove winml specific comments * use dml check in onnxruntime * Update include/onnxruntime/core/providers/dml/dml_provider_factory.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update include/onnxruntime/core/session/onnxruntime_c_api.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update include/onnxruntime/core/providers/dml/dml_provider_factory.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update include/onnxruntime/core/providers/dml/dml_provider_factory.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update onnxruntime/core/session/onnxruntime_c_api.cc Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update onnxruntime/core/session/ort_apis.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update winml/test/adapter/AdapterSessionTest.cpp Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update onnxruntime/core/session/onnxruntime_c_api.cc Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update winml/adapter/winml_adapter_c_api.cpp Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update include/onnxruntime/core/session/onnxruntime_c_api.h Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update onnxruntime/core/session/onnxruntime_c_api.cc Co-authored-by: Pranav Sharma <prs@microsoft.com> * Update winml/adapter/winml_adapter_c_api.cpp * PR feedback * Update include/onnxruntime/core/providers/dml/dml_provider_factory.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update include/onnxruntime/core/providers/dml/dml_provider_factory.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * Update include/onnxruntime/core/providers/dml/dml_provider_factory.h Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> * PR feedback * merge resolution and unreference param * (naming) Remove Dml prefix * maybe unused version * move DML code into DML path. CIs failing because DML is not available when --use_dml is not on * fix warning causing local build failures after merging * Change getvaluememoryinfo to gettensormemoryinfo * minor breaks * fix comment paste * fix comment Co-authored-by: Sheil Kumar <sheilk@microsoft.com> Co-authored-by: Dwayne Robinson <dwayner@microsoft.com> Co-authored-by: Pranav Sharma <prs@microsoft.com>	2021-11-13 03:34:54 -08:00
RandySheriffH	21eb747a0f	Custom thread creation and join hooks (#9426 )	2021-11-12 19:10:31 -08:00
Guoyu Wang	5ad6dbb314	Remove experimental from ORT format namespace (#9729 ) * schema change * cc channges * remove temp debug code * Adding fbs namespace to session_state_flatbuffers_utils.h * Add fbs namepsace to all ort format utils	2021-11-11 19:46:30 -08:00
Gary Miguel	93e239747f	Construct valid graphs for ONNX checker for IR version < 4. (#9665 ) * Construct valid graphs for ONNX checker for IR version < 4. Previously the constructed graph was not guaranteed to have its initializers be a subset of its inputs, which is required for IR version < 4. This resulted in spurious failures. Fixes #9663	2021-11-12 09:13:28 +10:00
Guoyu Wang	a70ae24475	Add QDQ::Selector::Select to use const GraphViewer instead of mutable Graph (#9621 ) * Move qdq selector to use const GraphViewer * minor update * Move qdq logic from NodeSelector to QDQ Selectors * Fix build break * Move selector result to NodesToOptimizeIndexes * fix build break * address CR comments * move indexes -> indices * Pass graph_viewer to avoid recreating many times * Update after merge master * update graph viewer remarks * update comments * Add ut for new qdq selector logic * Increase minimal binary size limit * UT minor update * Address CR comments	2021-11-08 21:36:29 -08:00
Hariharan Seshadri	65590b049c	Expose an API to query the CUDA compute stream to launch a custom kernel (#9141 )	2021-11-08 21:10:34 -08:00
Ryan Hill	24e35fba32	Change TensorShape to typically not allocate heap memory (#9542 )	2021-11-08 10:29:54 -08:00
Hariharan Seshadri	bbeceb7541	Support optional type in ORT (#8339 )	2021-11-04 15:01:42 -07:00
Edward Chen	ddb4c05852	Save graph runtime optimizations for minimal build (#9508 ) Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.	2021-11-04 10:49:46 -07:00
Edward Chen	c315d1b3cd	Always enable ORT format model loading. (#9586 )	2021-11-01 10:00:08 +10:00
TomWildenhain-Microsoft	e8268c9a18	Add Transpose Optimizer and modify nhwc optimizer to use it. (#9284 ) * Add Transpose Optimizer and modify nhwc optimizer to use it. * Fix casts * Fix casts2 * Fix move * Add tests * Add headers * Fixes and tests * Remove explicit template instantiation * Fix build warning * Name unit tests * Code review fixes * Add some comments * Fix some casts * Make optimization slightly less agressive * Some unit test fixes * Update Attention pattern to work with transpose optimizer * Update attention fuser * Fix attention fusion python script * Improve transpose optimizer documentation * Create OptimizerCtx struct * Disable Slice handler for testing * Implement Slice int32 * Only push transposes leading up to other transposes * Improve optimization heuristic * Add exemption for MaxPool * Document transpose optimizer api.h * Revert fusion tests to master * Remove temp files * Replace typedef with using * Trim trailing whitespace * Move class declarations from api_impl.h to api_impl.cc * Remove copy constructors and move allocator * Alphabetize headers * Add override keyword * Comments for nhwc_transformer * Rename OrtGraph to ApiGraph, etc. * Wrap line * Remove extra qualifier on ApiGraph * Refector attention fusion * Remove c-style casts from api_impl.cc * Improve documentation * Avoid printing vector in ORT_ENSURES * Revert attention fusion refactor * Remove duplicate cost heuristics and improve documentation * Fix size_t casts * Fixes from Scott's review * Unrevert attention refactor and more updates from Scott's review * Revert api_impl.cc ValueInfo change * only optimize first transpose input * Unrevert api_impl.cc changes * Make vector call reserve * transpose_optimizer.cc update from Scott's comments * Rename api::Graph to api::GraphRef etc. * Consider domains 'onnx.ai' and '' equal * Replace AddInput with SetInput * Improve tests * quantization and heuristic tests * Comments for tests * Replace const string_view with string_view and update tests * Fixes requested by Edward * Fix std::string to string_view conversion * Add <string> to includes * Fix bug for broadcasting ops with unknown rank. Slight safety improvements * Changes requested by Edward * Fix formatting * Improve description of cost metric	2021-10-27 22:10:39 -07:00
Ginés Hidalgo	9639eded4b	Missing `#pragma once` in dml_provider_factory.h (#9457 )	2021-10-27 02:49:52 -07:00
Ginés Hidalgo	a79d375d24	Added fixes for Clang on Win64	2021-10-22 16:59:09 -07:00
Changming Sun	d83adaaf9f	Remove optional-lite (#9424 )	2021-10-22 16:45:45 -07:00
Sherlock	ff23b9ff55	Avoid cudaStreamSync at the end of Forward/Backward (#9470 ) * Skip cudaStreamSynchronize at the end of fw * skip sync stream for end of backward	2021-10-21 11:28:25 -07:00
Changming Sun	406f1629c1	Remove Featurizers code (#9300 )	2021-10-20 10:20:35 -07:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Edward Chen	79e736ed25	Make onnxruntime::Status nodiscard (#9279 ) Mark onnxruntime::Status class with [[nodiscard]] attribute. Fix existing warnings.	2021-10-08 17:10:31 -07:00
Guoyu Wang	60bbdf1403	Remove unused NodeArgs in Graph::Resolve (#9213 ) * Remove unused NodeArgs * Handle case where a node arg from an initializer from initializer_names_to_preserve * Fix CI failure * update test * Fix outer scope node args failure * Use NodeArg* as the key of the std::set instead of string * Minor updates	2021-10-01 11:44:26 -07:00
RandySheriffH	058108bef9	Execution Provider Profiler (#8406 ) * implement cuda provider * define profiler common * call start after register * add memcpy event * add cuda correlation * format code * add cupti to test path * switch to CUpti_ActivityKernel3 * reset cupti path * fix test case * fix trt pipeline * add namespace * format code * exclude training from testing * remove mutex	2021-09-28 13:59:52 -07:00
Hariharan Seshadri	f7dedc9002	Fix default initialization value in C API header (#9126 ) * fix default initialization value in C API header * Fix conflicts * Nits	2021-09-20 20:58:13 -07:00
Ryan Hill	6ae5f7a244	C API Docs - Add build instructions (#9106 ) * Update Doxyfile, add build instructions to header * Update paths in README.md	2021-09-17 18:40:27 -07:00
Ryan Hill	b876e5675b	C API Enum Name Fixes (#9092 )	2021-09-17 15:11:26 -07:00
Ryan Hill	280e79463a	FIll in more documentation (#9088 ) Fix plural values with %s Fix more symbol links Add custom header for web metrics	2021-09-16 17:08:27 -07:00
Ryan Hill	26509465f0	Add default C++ initialization to OrtCUDAProviderOptions (#9064 ) * Add default C++ initialization to OrtCUDAProviderOptions	2021-09-16 15:03:58 -07:00
Guoyu Wang	bee5c26580	Add CPU_ONLY runtime option to NNAPI EP (#9066 ) * Add NNAPI cpu only option * update java * Update comments	2021-09-15 15:50:18 -07:00
Edward Chen	e574be4a53	[C API Docs] Add docs for run options tag/log level accessors/modifiers. (#9045 ) Add documentation for these C API functions: RunOptionsGetRunLogSeverityLevel RunOptionsGetRunLogVerbosityLevel RunOptionsGetRunTag RunOptionsSetRunLogSeverityLevel RunOptionsSetRunLogVerbosityLevel RunOptionsSetRunTag Update some existing documentation.	2021-09-14 08:53:35 -07:00
satyajandhyala	ce7b12bf5d	Added new fp16 allow/safe opcodes in PropagateCastOps (#8964 ) * Removed RemoveInputOutputUpDownCasts strategy in PropagatCastOps. * Added Expand, Squeeze and Unsqueeze ops to fp16 allow ops * Added onnx models for squeeze/unsqueeze tests.	2021-09-10 11:53:26 -07:00
Ryan Hill	2439ced3ec	API Documentation (#8948 ) * Make help information compile properly	2021-09-09 22:04:51 -07:00
Ashwini Khade	ec63d10303	add model local function support (#8540 ) * updates for picking pnnx commit * add tests filter to c# tests * plus test fixes * fix versioning for contrib ops * fix tests * test filter for optional ops * more versioning related updates * fix test * fix layernorm spec * more updates * update docs * add more test filters * more filters * update binary size threshold * update docs * draft - enable model local function * enable model local functions in ORT * update to latest rel onnx commit * plus tests * plus more updates * plus updates * test updates * Fix for nested functions + shape inference * plus bug fix and updates per review * plus fixes per review * plus test updates * plus updates per review * plus fixes * fix a test	2021-09-08 11:47:01 -07:00
Vincent Wang	c343f7cb43	Add Algorithm Search for ConvGrad (#8613 ) * algo search for conv grad * global cache, bigger workspace size * fix build error * refactor * refactor * resolve comments * fix rocm * change lock places * rename variable * remove setting for inference * resolve comments	2021-09-03 11:25:17 +08:00
Hariharan Seshadri	acd9db7fad	Fix location planning for initializers used only in nested subgraphs (#8642 )	2021-09-01 00:02:08 -07:00
Tang, Cheng	4dc0ddf606	support register external ep lib information (#8897 ) * support register external ep lib inforation; make eager mode share the same ep pools with training workloads * fix inference code * fix build break * fix the message	2021-08-31 20:51:22 -07:00
Tang, Cheng	ae7f2d824d	Share the execution provider instance for training (#8719 ) * seperate the training python module; share the execution proivder instance * fix build break * fix cuda test crash; reorg the python module code base * se correct env * use provider customized hash func * fixbuild break * fix rocm break * use const ref in argument * rename the file * move hash func to trainiing module	2021-08-27 16:23:35 -07:00
Scott McKay	0034ad72e6	Minimize changes to fix missing symbols used from C# (#8867 ) * Revert "Cleanup C# bindings to add EP (#8810)" This reverts commit `b21ea00020`. * Add back in a minimal set of changes. Provide stubs in for a limited set of things - things called from C# using a static lib of ORT built for mac/ios - things in OrtApis that are not included in the build by default - things in OrtApis that are excluded in a minimal build * Cleanup order or EPs in test * Fix unused function in ROCM build	2021-08-28 07:10:14 +10:00
Edward Chen	7e53a1df6f	Enable selector action transformer infrastructure in minimal build. (#8804 )	2021-08-27 17:16:05 +10:00
Rachel Guo	1886f1a737	Make SparseTensor infrastructure optional (#8802 ) Add cmake parameter and #ifdefs to allow for disabling sparse tensor support. This comes with a significant binary size cost so we want to be able to exclude it in a minimal build.	2021-08-27 17:12:26 +10:00
Scott McKay	b21ea00020	Cleanup C# bindings to add EP (#8810 ) Fix C# add EP bindings. Add stubs to ORT so that if EP is not included in the build we return a graceful error message. Move declaration of stubs into C API and out for EP so they're in one place and are easier to use (no extra header required in the C/C++ world and consistent with the CUDA EP setup). Fix inconsistency in ROCM EP. Cleanup a few other things.	2021-08-26 13:59:40 +10:00
Hariharan Seshadri	cee79526fd	Add opset 15 kernels for Pow, BatchNorm, and Shape (#8442 )	2021-08-25 12:04:20 -07:00
Changming Sun	4bfff45859	Downgrade Eigen (#8817 )	2021-08-23 18:06:23 -07:00
Dmitri Smirnov	8713d76dd1	Introduce C and C++ APIs for Sparse Tensors (#8621 ) Add IsSparseTensor Add CreateSparseTensor Add utilities and test fully sparse instantiation Fully sparse blocksparse Add test and docs for fully sparse tensor instantiation Rework creation API Use API Non string API Retrofit of existing String API Add tests Add documentation Address build issues (Winml pending) Add inference test Bump binary size Add ifdef DISABLE CONTRIB	2021-08-16 16:33:47 -07:00
Changming Sun	436ac6dd5f	Rename ml_value.h to ort_value.h (#8726 )	2021-08-13 07:04:56 -07:00
Dmitri Smirnov	1a8adb96fe	Reduce templatization of C API and refactor for InitOrtValue (#8700 ) Refactor for OrtInit Simplify C API Add ort_provider bridge interfaces	2021-08-12 16:51:18 -07:00
Edward Chen	89601ee6b3	[EP Partitioning Utils] Add check for assigned node. (#8473 ) Adds a check that a node is not already assigned to an EP before adding it to an EP partition.	2021-08-12 16:08:25 -07:00
Hariharan Seshadri	e791faeca5	Fix bug in CPU force fallback logic (#8597 )	2021-08-05 21:36:28 -07:00
Tim Harris	56441dcd88	Limit work items to available threads, upgrade checks from assert to ORT_ENFORCE (#8495 )	2021-07-27 19:25:12 -07:00
Guoyu Wang	4c939e1cb7	Add an option to use the input model bytes (ORT format only) directly without copy at session creation (#8502 ) * Do not copy the model_data when session is started by CreateSessionFromArray * Add config option for disabling copy model bytes * Add one additional test * Address CR comments	2021-07-27 09:11:42 -07:00
Vincent Wang	619a8782a5	Improve AddValueInfo (#8451 ) * change AddValueInfo * fix after merge master	2021-07-23 16:39:55 +08:00
Dmitri Smirnov	950fe5e28b	Implement SparseTensor and infrastructure suppport and advance ONNX commit (#8038 ) SparseTensor support Implement Builder pattern Fix support for 1-D and 2-D COO indices Implement and test CSR support. Handle shape inference for SparseTensors Implement conversion for COO, CSR and tests. Address the case where constant sparse initializer is the output. Implement test infra for SparseTensors Implement SparseDenseMatMul for Csr and COO and tested it. Add hash for SparseToDenseMatMul Finish shared provider refactor Refactor GetOrCreate to Create Working on py interface Expose OrtDevice and use it in allocate_numpy Adjust Sparse interfaces, add support for string SparseTensor. Add tests. Add and test to_cuda() Add accessors to format specific indices Test values and indices views, read-only flag, after GC access Add sparse related methods to OrtValue Re-work SparseTensor wrapper, add OrtValue methods Rework numpy_array_to_cuda/to_cpu Add run_with_ort_values Add models and test sparse_mat_mul with run_with_ort_values Refactor sparse tensor to use a single buffer Ifdef x86 Eigen CSR sparse matmul implementation Exclude broken test, check for string type when copying cross device Split pybind schema, regenerate docs, add exclusion Conditionally exclude schema module Update docs fix cuda build Add test to a filter and renerate JS docs Add conversion and test string support for sparse tensors Exclude conversion utils from minimal build Add CUDA Memcpy and adjust provider interfaces	2021-07-22 15:24:36 -07:00
Hariharan Seshadri	3360024a0b	Support plugging in custom user-defined allocators for sharing between sessions (#8059 )	2021-07-22 10:17:35 -07:00
Edward Chen	989491c333	[NNAPI EP] Make partitioning stop ops configurable. (#8444 ) Enable NNAPI EP partitioning stop ops to be overridden by a session configuration option.	2021-07-22 09:21:42 -07:00
Edward Chen	695536a7ac	Make some common macros safer to use. (#8445 )	2021-07-21 12:14:36 -07:00
Ryan Hill	cc9f793b48	Move one function from cuda_provider_factory.h (#8407 )	2021-07-19 17:55:59 -07:00
satyajandhyala	84bc20fe9d	Enable cast propagation with level one by default. (#8286 )	2021-07-08 14:38:09 -07:00
RandySheriffH	f40df30219	Replace functions with secured version for OSX compliance (#7586 ) * replace strlen with strnlen * replace vsnprintf with vsnprintf_l * add macro * switch to std numeric::limits * apply uint16 max * fix build err * fix mac build * define MAX_STR_LEN * define MAX_STR_LEN * fix typo * trim empty lines * apply constexpr * fix typo * add namespace * fix build err * rename global constant Co-authored-by: Randy <Randy@randysmac.attlocal.net> Co-authored-by: Randy Shuai <rashuai@microsoft.com> Co-authored-by: Randy <Randy@randysmac.local>	2021-07-08 11:02:36 -07:00
Zuwei Zhao	b46310b349	Integrate onnxruntime-extensions into onnxruntime. (#8143 ) Co-authored-by: Zuwei Zhao <zuzhao@microsoft.com>	2021-07-01 09:34:03 -07:00
Scott McKay	4993680e56	Graph::GetNodeProvidesGraphOutput -> NodeProducesGraphOutput (#8243 ) 'GetNode' is a little confusing as it returns a bool. Update a couple more places where GetNodeOutputsInGraphOutputs was being used unnecessarily.	2021-06-30 20:43:33 +10:00
Scott McKay	b3479367cf	Add helper to check if node provides a graph output. (#8186 ) * Add helper to check if node provides a graph output. The current approach unnecessarily creates a vector when most of the optimizers only care about a true/false response. * Undo accidental change * Fix a couple of issues due to copying from larger set of changes.	2021-06-30 12:15:42 +10:00
Changming Sun	c716b56f26	Update C++ Standard from 14 to 17 (#8041 ) Switched the code to C++17. To build ONNX Runtime on old distros like CentOS 7, you need to install a newer GCC from additionary repos. If you build onnxruntime with the newer GCC, typically the result binary can't be distributed to other places because it depends on the new GCC's runtime libraries, something that the stock OS doesn't have. But on RHEL/CentOS, it can be better. We use Red Hat devtoolset 8/9/10 with CentOS7 building our code. The new library features(like std::filesystem) that not exists in the old C++ runtime will be statically linked into the applications with some restrictions: 1. GCC has dual ABI, but we can only use the old one. It means std::string is still copy-on-write and std::list::size() is still O(n). Also, if you build onnxruntime on CentOS 7 and link it with some binaries that were built on CentOS 8 or Ubuntu with the new ABI and export C++ symbols directly(instead of using a C API), the it won't work. 2. We still can't use std::optional. It is a limitation coming from macOS. We will solve it when we got macOS 11 build machines. It won't be too long. 3. Please avoid to use C++17 in CUDA files(.cu). Also, the .h files that they include(like core/framework/float16.h). This is Because CUDA 10.2 doesn't support C++17. You are welcome to use the new features in any *.cc files.	2021-06-25 14:08:01 -07:00
Chi Lo	91075255a7	Enable TRT provider option configuration for C# (updated version) (#7808 ) * prepare for C# to configure provider options * add c# code * revert modification * Add update provider info configuration in trt ep side * fix bugs * fix bug for compiler error C2259 * Add c# test * fix bug * fix bug * Properly deal with string * Add c# api for accepting trt provider options * fix bug * Modify C# test * add shared lib test * Add get provider options functionality * clean up * clean up * fix bug * fix bugs for CI * Fix bugs for CI and documentation * Move TRT EP provider options related functions out of C API * revert * fix bug * refactor * add check for provider options string * code refactor * fix CI bug * Fix CI bugs * clean up * fix bug * Fix bug for Post Analysis * fix accidental bug * Add API_IMPL_BEGIN/API_IMPL_END * clean up * code refactor * code refactor * fix CI fail * fix bug * use string append * Change the code to better handle strncpy and string append	2021-06-25 03:21:22 -07:00
Negin Raoof	80b7b134bf	Adding optional ops in contrib ops (#7946 ) * Added optional const spec	2021-06-24 13:16:31 -07:00
Guoyu Wang	f6292d9b38	[Android] Output error message to android log instead of stderr (#8114 ) * Output error message to android log instead of stderr * Address CR comments, move macro to a helper function * Address CR comments * Fix ort minimal build break	2021-06-22 17:50:06 -07:00
Tang, Cheng	059d705988	support pass in custom op registry for eager mode (#8087 ) * support pass in custom op registry for eager mode * fix the comments	2021-06-20 13:38:09 -07:00
Nat Kershaw (MSFT)	0237225117	Add @file annotation to support doxygen generation of C API docs (#7458 )	2021-06-10 16:10:32 -07:00
Edward Chen	ab973dce33	[Objective-C API] Enable CoreML EP (#7914 ) Enable CoreML EP in Objective-C API.	2021-06-03 18:59:10 -07:00

... 3 4 5 6 7 ...

957 commits