onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-12 17:57:38 +00:00

Author	SHA1	Message	Date
Sherlock	e71668f92c	Expose recompute configs to the frontend (#5318 ) * Expose recompute configs to the frontend * Add frontend test * Ensure recompute graph transformer is only applied once Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-02 09:49:47 -07:00
Guoyu Wang	3a3f26f38e	Move ort flatbuffers helper functions and value info r/w functions into separated lib (#5276 ) * Move fbs include from header to cc * add initial cmake for flatbuffers * Move most flatbuffers util to ort_flatbuffers * move code around * fix * move test/perf runner to use flatbuffer directly instead of model * minor update * Fix build break * Clean up includes and foward decl * Fix traning CI build breaks * Addressed PR comment, replaced some include with forward decls * Remove ORT_MUST_USE_RESULT temporarily	2020-09-25 05:36:29 -07:00
Sherlock	b03fb82ab7	Transformer layer-wise Recompute (#4526 ) * Build Recomputation Graph * Make topological sort to run FW nodes first * Pattern match start and end of transformer layer * Topological sort with Priority * Add logger to Gradient Graph Builder * Use Logger * Introduce Execution Order	2020-09-24 19:56:32 -07:00
Josh Bradley	4ed31ca214	Combine custom logger global threadpools (#4857 ) * add custom logger and global threadpools to C and C++ API * code cleanup and formatting * reformat code * tidy up some more code formatting * remove comment * fix API break from merging from master * renamed API function to CreateEnvWithCustomLoggerAndGlobalThreadPools * rename log variable and apply clang-format	2020-09-24 00:50:26 -07:00
Sherlock	038192bdb2	Place shape related compute nodes in CPU (#4940 ) * Place shape related nodes in CPU * visit candidates by topological order * Make CPU node placement a utility function * skip placing on CPU if the data typs is float16 or bfloat16	2020-09-21 17:10:39 -07:00
Pranav Sharma	974b9bfc09	Allow sharing of initializers between sessions. (#5092 ) * Allow sharing of initializers between sessions. * Allow sharing of initializers between sessions (2). * Add test for C# * Add test for C#; address PR comments * Address PR comments Moved AddInitializer logic to internal session options Added tests for owned buffer Clarified documentation Fix bug where memory info and not device was getting compared * Fix test * Fix training build * Add ver 5 end marker and ver 6 starter, add scenario and usage examples.	2020-09-21 14:09:37 -07:00
Pranav Sharma	d535894297	Add API to allow configuration of the global thread pools. (#5199 )	2020-09-17 09:19:18 -07:00
Dmitri Smirnov	e6f85f338e	Refactor TensorAt, prepare for release (#5180 ) * Refactor TensorAt locations* must be const and int64_t since our dims are int64_t Remove unnecessary copy of locations. Remove unnecesary casting and C-casting. Simplify implementation. Add a check for string type. Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it covers more ground and eliminate redundant copy. Eliminate inner loop, compute strides first.	2020-09-16 10:20:45 -07:00
Chun-Wei Chen	7f3aa3a163	Add GetStartTime() for profiler to get private profiling_start_time_ (#4994 ) * add GetStartTime() for profiler * add function in inference_session * remove qualified name * add the api in cxx_api.h * rename starttime to StartTimeNs, expost profiling object * rename GetProfilingStartTime * move Ortapis to the right place * move to the end * add const for session * const the right place * use const auto instead of const auto* for session * remove const for auto getstarttime * remove const for auto getstarttime add unit tests * nit: update test name and add comments	2020-09-16 00:17:04 -07:00
S. Manohar Karlapalem	f7edf0aa57	[OpenVINO-EP] Enable EP config options for VPU hardware (#5119 ) * Added config flags for VPU Fast Recompile * clean-up ifdefs * Add VPU Fast compile config option Adds an option that enables Fast compilation of models to VPU hardware specific format. * Add config option to choose specific device id for inference Inference of all subgraphs will be scheduled only on this device even if other devices of the same type are available. * Add Python API to list available device IDs * code cleanup * Add second C/C++ API with settings string parameter Adds an additional C/C++ API that allows passing multiple key-value pairs for settings as a single string. Multiple settings are delimited by '\n' while the key and value within a setting are delimited by '\|'. * Append 'Ex' to the extended C/C++ API * Use set_providers Py API to set config options. Uses Session.set_providers Python API to set EP runtime config options as key/val pairs Deprecated older module function definitions for config settings. Updates documentation. * avoid globals for py config options where possible Co-authored-by: intel <you@example.com>	2020-09-14 15:46:14 -07:00
Scott McKay	323a1ba8a4	Add option to exclude support for loading ORT format models in full build. (#5129 ) * Add ability to exclude support for loading ORT format models. Disable support for ORT format models in packages	2020-09-12 12:21:30 +10:00
Scott McKay	59ee8ffb17	Remove SparseTensor support from minimal build. (#5114 ) * Remove SparseTensor support from minimal build. Currently the only valid usage of a SparseTensor is as an attribute of a Constant node. That would have been lifted to a dense tensor initializer when loading the onnx model, so would not exist when saving the ORT format model. Due to that there can be no SparseTensors in an ORT format model. Co-authored-by: gwang <wanggy@outlook.com>	2020-09-11 17:56:54 +10:00
Ryan Hill	3207de276c	Remove IDeviceAllocator class as it doesn't extend IAllocator in any way. (#5067 )	2020-09-10 00:46:35 -07:00
Guoyu Wang	433061531e	Enable onnx_test_runner for ort format (#5100 ) * Enable onnx_test_runner using ort format, for ort minimal build only Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-09-10 17:15:19 +10:00
Scott McKay	796ddeb2cb	Remove serialization of outer scope value info in ORT format model (#5077 ) * Remove serialization of outer scope node arg info in ORT format model. We don't currently need it in a minimal build as only SessionState calls Graph::IsConstantInitializer and it doesn't search outer scope. If we do need it in the future the information can be calculated at runtime (small binary size cost to do so). Motivation: ORT format model was 32% bigger for a BERT model with multiple levels of subgraph and a lot of nodes due to this. Size is about 5% larger of the original ONNX model with the change. ORT format has type/shape info for all nodes, and this model has 2000 nodes so this seems reasonable. Added example code to dump ORT format model to json. Fixed misc bug in python test script around handling float and non-float expected output.	2020-09-08 17:43:42 +10:00
Pranav Sharma	2c1410afe7	Remove usage of macros for constants in public header. (#5061 ) * Remove usage of macros for constants * Fix linkage issue	2020-09-05 01:27:20 -07:00
Vincent Wang	84de14a833	Register OpSet13 CUDA Kernels for BERT/UniLMv2 (#4856 ) * opset13 cuda kernels for BERT. * add opset13 SoftmaxCrossEntropyLoss. * opset13 size. * fix argmax/min for ut. * fix ut failure for argmax/min. * OrtMemTypeCPUInput Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-09-05 08:09:52 +08:00
Scott McKay	b5c2932ae8	Last major set of ORT format model changes (#5056 ) * Add minimal build option to build.py Group some of the build settings so binary size reduction options are all together Make some cmake variable naming more consistent Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used. Add initial doco and ONNX to ORT model conversion script Misc cleanups of minimal build breaks.	2020-09-05 07:59:01 +10:00
Du Li	6134994db9	Parallelizing elementwise kernels (#4577 ) * Parallelizing unary elementarywise ops. * Parallelizing binary elementwise ops. * Accommodating PR comments.	2020-09-04 14:45:43 -07:00
Xiang Zhang	0dad79b495	Add SetLanguageProjection C Api and use it in four projections (#5023 ) * Add SetLanguageProjection C Api and use it in four projections * static cast enum languageprojection to uint32_t * resolve comments * fix typo and line added unintentionally * revert unecessary change * reorder c# api * add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis	2020-09-04 14:26:39 -07:00
Ryan Hill	d792af776d	Remove Cuda dependency from TensorRT shared provider (#5014 )	2020-09-04 11:35:02 -07:00
Scott McKay	28445c88f9	Changes to enable saving and loading an ORT format model (#4995 ) * Changes to enable saving and loading an ORT format model via the public APIs. Cleanup session.py to try and make slightly more understandable. More refactoring is needed here. Couple of bug fixes * Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info. * Address PR comments - tweak SessionOptions config to avoid double lookup - merge duplicated functionality in python binding around registering an EP with optional options Fix a couple of build issues. * Update C API to be consistent with python API - only load model in InferenceSession ctor if required - support loading ORT model in minimal build * Fix nodejs test. We get an invalid path error from LoadInterOp first now * Another attempt at fixing nodejs test. Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent. The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed. * Fix couple of build issues. * Disable test temporarily so PR can be checked in. Will fix in separate PR that adds final pieces for minimal build as the test is required there. * Give up on nodejs test and make the match simpler. Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion. * Fix call to Session.__init__ in TrainingSession. Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.	2020-09-03 09:10:48 -07:00
Tim Harris	bbb9d92a5f	Remove SchedulingParams variants of ThreadPool::TryParallelFor (#5050 )	2020-09-03 09:04:31 -07:00
gwang-msft	64237d999c	Add Cmake config for onnxruntime_NO_EXCEPTIONS (#4975 ) * additional noexception setting, added compile options * more no exception changes * addressed PR comments * Fix build issue when MSVC static library is used. * Clarify comment * add fatal message for onnxruntime_NO_EXCEPTIONS enabled without onnxruntime_MINIMAL_BUILD Co-authored-by: Scott McKay <skottmckay@gmail.com>	2020-09-01 10:17:50 -07:00
Pranav Sharma	ad1701dfb1	Rename DeviceAllocatorRegistrationInfo to a more generic name; Use OrtArenaCfg for arena members; Remove unused OrtMemType; Simplify CreateAllocator interface. (#4970 ) * Rename DeviceAllocatorRegistrationInfo to a more generic name; Remove OrtMemType; Simplify CreateAllocator interface. * - fix builds - fixed mixed aggregation + constructor calls (which were coded before this PR) - changed default value of max_mem in API header - added some validation of values for for arena_extend_strategy * fix tensorrt and cuda tests	2020-09-01 09:25:32 -07:00
gwang-msft	7ca8388dc9	[ORT Mobile] file format schema and file I/O code (#4973 ) * ort mobile file format schema and [de]serializing code	2020-09-01 11:51:31 +10:00
Ashwini Khade	8679a7244e	Enable rejecting models based on onnx opset (#4912 ) * enable rejecting models based on onnx opset * enable unreleased opsets in linux and mac CI * test fixes and more updates * enable unreleased opsets in CI builds * enable released opsets in linux cis * try fix windows ci yml * yml fixes * update yml * yml updates post master merge * review comments * bug fix	2020-08-31 13:35:36 -07:00
gwang-msft	ea5732319e	Add option ORT_NO_EXCEPTIONS to disable most exception/throw in /onnxruntime/ (#4894 ) * init no exception changes * initial test * disable exceptions * more throw handling * minor update * fix linux build break * fix windows/nuphar build break * address cr comments, move #ifdef to ORT_CATCH * address cr comments, move #ifdef to ORT_CATCH * handle return statement in ORT_CATCH * linux build break fix * addressed cr comments, remove ort_catch_end * addressed cr comments, remove ort_catch_end * move mlas to a separated ifdef flag * merge master, move some new code in master to no_exc Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-08-28 23:03:51 -07:00
Scott McKay	08eb15068c	Exclude the Map types from the build if ML ops are disabled. (#4908 ) * Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.	2020-08-27 17:48:12 +10:00
Dudeldu	3d63d8d4f1	Extend C++ API for Map/Sequence Type Info (#3517 ) (#4781 ) * Extend C++ API for Map/Sequence Type Info (#3517) Expose functionality to view type information about sequences/maps to C++ API. - Add functions - `TypeInfo::GetSequenceTypeInfo` - `SequenceTypeInfo::GetSequenceElementType` - `TypeInfo::GetMapTypeInfo` - `MapTypeInfo::GetMapValueType` - `MapTypeInfo::GetMapKeyType` - Add structs - `SequenceTypeInfo` - `MapTypeInfo` Co-authored-by: Dudeldu <mustermann.informatik@gmail.com> Co-authored-by: Jonas-Heinrich <Jonas@JonasHeinrich.com> * Extend tests to cover new type info functionality for sequences and maps - two new test case in test_nontensor_types for maps and sequences Co-authored-by: Jonas-Heinrich <Jonas@JonasHeinrich.com>	2020-08-25 12:03:23 -07:00
Scott McKay	14c691030f	Fix build break from removing custom ORT onnx protobuf (#4904 ) Exclude parsing of json config in model (also excludes json parsing library)	2020-08-25 18:10:42 +10:00
Changming Sun	26546f81fe	Remove the private ONNX protobuf definition file (#4878 )	2020-08-24 12:40:33 -07:00
Scott McKay	728e886bba	Add kernel def hash logic for minimal build (#4891 ) * Add hash based lookup of kernels	2020-08-23 14:39:07 +10:00
Scott McKay	db7669b225	Reduce ONNX dependency in minimal build (#4890 ) * Next round of changes. Remove inclusion of ONNX schema header Exclude custom registry related things Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.	2020-08-23 07:02:13 +10:00
Pranav Sharma	29dcfb24ab	Allow multiple sessions to share an allocator, optimize constant folding memory usage, expose arena configs. (#4813 ) * Add support for sharing allocators * Incremental update * Address some PR comments, add unit tests, add documentation. * Address PR comments, add tests and some documentation. * Fix build and test issues * Remove RegisterAllocator API restoring the OrtAllocator interface changes. Changed docs to reflect this. Also fixed the orttraining segfault. The segfault was because in the case of training session, the CPU exec prov is not available at the time the transformers are applied. Changed it to create a new one.	2020-08-22 10:03:17 -07:00
RandySheriffH	3fa73a5b6a	ReduceBinarySize (#4747 ) * cancel night build on pyop * add rewriter to rewrite cpu provider * skip BuildKernelCreateInfo<void> * refactor variable name and comment * include ops from csv file * process multiple eps * add default function to cuda provider * rename function and add license header * fix import * add doc * fix typo * deal with empty kernel entry in cuda * rename the rewriter file * add comment into provider file * add comment and rename function * log warnings * refactor extracting logic * add entry for script to run solo * add better example * avoid onnx importing * fix flake8 alerts * minor fixes to better comments and doc * add entries for all domains * add void entry into contrib providers * format cuda_contrib_kernels.cc * format cpu_contrib_kernels.cc * add all providers * add default entry to all providers * include op_kernel header * cancelling change in providers beyond cpu/cuda * rename file and switch file format to domain;opset;op1,op2... * update doc * restore non-regular ending grammar in cuda_contrib_kernels.cc * add ort_root as input argument of script * enable test in ci * update doc * update doc * revert change on linux gnu ci * switch to set to host ops * simplify trimming logic * add domain map to track current model * allow ort_root to take relative path	2020-08-21 19:50:13 -07:00
Scott McKay	e00ad83f2b	Initial changes to disable code in a minimal build (#4872 ) * Initial set of changes to start disabling code in the minimal build. Breaking changes into multiple PRs so they're more easily reviewed. Focus on InferenceSession, Model and Graph here. SessionState will be next. Needs to be integrated with de/serialization code before being testable so changes are all off by default. Changes are limited to - #ifdef'ing out code - moving some things around so there are fewer #ifdef statements - moving definition of some one-line methods into the header so we don't need to #ifdef out in a .cc as well - exclude some things in the cmake setup * Update session state and a few other places. The core code builds if ORT_MINIMAL_BUILD is specified.	2020-08-22 07:14:53 +10:00
Scott McKay	ef19916d07	Add Node::SinceVersion() (#4874 ) * Add Node::SinceVersion() so that the value is known when loading a graph from the ORT format (OpSchema is not available). * Fix build warning from returning 'const int'	2020-08-21 16:48:52 +10:00
gwang-msft	fff0b41fcb	Nuget build break fix (#4854 ) * rename new header file to fix build break * update code to use the new header file name	2020-08-19 13:51:33 +10:00
Vincent Wang	5eaac31faa	support opset13 on transformers. (#4837 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-08-19 11:13:37 +08:00
gwang-msft	dee7596724	Add a generic collection of session configurations to the SessionOptions (#4718 ) * adding generic configurations for session options * fix a build break on linux * fix training ci build break * fix training ci build break * addressed CR comments * fix traning ci build break * move config_key from enum to string * add c# api * add python api * fix build break * move prepacking from 2 new api entries to session options configs * fix traning ci build break * add python test, update some comments, move const key definition to avoid build break * addressed comments * move definitions of keys to common.h * move api to version 5 * remove accidental change in build.py * remove pragma to avoid build break * addressed CR comments * fix the python build break, and move location of config keys definition * small typo changes	2020-08-18 13:40:40 -07:00
Hariharan Seshadri	ea3b4e1f8d	Fix bug in DispatchOnTensorType macro (#4808 )	2020-08-17 01:16:01 -07:00
Bogdan Bugaev	8ba6b6a21e	Support usage of C API with C++ standards older than C++11 (#4257 ) * Use throw() in C API if noexcept is not supported	2020-08-15 11:39:28 -07:00
Maxim Kalinin	ec36c793e8	Eliminate redundant subexpressions (#3047 ) * Eliminate redundant subexpressions Apply local value numbering to merge graph nodes that will always evaluate to the same value. * Rename cpp->cc * Handle optional arguments * Add test models * Add more tests with optional arguments * Fix processing of subgraphs Also, be resilient to possible mixture of optional and variadic parameters * Fix random operators * Address PR comments * Minor changes and a test * Move CSE before constant folding * Random* operators are always non-deterministic Even when seed is provided. * Fix a CSE test * Reuse the list of non-deterministic operators with constant folding pass * Address PR comments * Fix formatting * Address PR comment * Minor cleanup / comments * Fix build failure in Linux * Reuse existing optimizer/utils file. Also, check for graph outputs when removing a node. * Add a test * Fix compiler warnings * Fix build in older compilers * More compatibility with old STL versions	2020-08-14 01:13:05 -07:00
Scott McKay	8fb743f767	Refactor Cast to reduce binary size. (#4765 ) * Refactor Cast to reduce binary size. 82.5 -> 60.8KB on Windows * Address PR comments. Fix build issue.	2020-08-13 20:43:22 +10:00
Tim Harris	9cec98ec1b	Honor allow_spinning at barrier at end of parallel sections (#4767 ) This commit means that when the thread pool is configured to spin, then we spin at the barrier at the end of parallel sections in the main thread, in addition to having workers spin waiting for work. The change updates Barrier.h to take an additional boolean to select spin/block, and passes this in based on the thread pool configuration. It adds an additional test case for barriers, although no problems were identified by the test case.	2020-08-13 09:40:40 +01:00
Josh Bradley	b7254551f0	Add new api function At() (#4457 ) * add modern standards to function arguments * add first version of At for better tensor element access	2020-08-11 18:34:03 -07:00
Ryan Hill	ac725b53f6	Convert TensorRT provider into a shared library (#4721 ) Lots of changes to shared library interfaces, new lighter weight design.	2020-08-10 21:17:16 -07:00
Dmitri Smirnov	3530ce541c	Expose IOBinding features via C/C++/C# language bindings. (#4646 ) Expose I/O Binding in C/C++/C# Expose OrtAllocator, OrtMemoryAllocation, OrtMemoryInfo and OrtIoBinding	2020-08-10 13:33:49 -07:00
Yufeng Li	b22091dc91	Add the framework to support prepack (#4413 ) * add support of prepack * add support for QAttention and DynamicQuantizeMatMul * add an use_prepacking option * add use_prepacking in c_sharp api	2020-08-07 09:39:19 -07:00
Sherlock	eb0f57f0e4	Localized Recompute for Gelu and AttentionDropout (#4402 ) * Gelu Activation Recompute Draft * Prototype for localized recompute * Introduce localized_recompute rewriter * Command line args for enabling recompute * Add logger to Gradient Graph Builder * use const when possible	2020-08-04 21:48:15 -07:00
edgchen1	9d7284fc3b	Enable MatMul + Scale fusion (#4669 ) Update TransposeMatMul to support scaling of the matrix product by a constant scalar value (analogous to the GEMM alpha parameter). Rename TransposeMatMul to TransposeScaleMatMul. Fuse MatMul with surrounding Mul/Div with constant scalar into TransposeScaleMatMul.	2020-08-04 16:27:22 -07:00
Tim Harris	4bd9e8d05c	Stress-test and fix thread pool when work queues are full (#4690 ) While investigating an unrelated issue, I noticed that the thread pool may drop tasks when a burst of 1024+ tasks is submitted by a thread from inside the pool. Today, in general, we execute work synchronously in this case. However, there is a bug where work submitted by a thread already inside the pool will be discarded instead of executed. Currently the only scenario where I can see this occurring is when the parallel executor is used with a model in which such a large number of nodes become eligible to run all at once. This PR fixes the underlying issue and adds a test case for burst-submission of work.	2020-08-04 10:19:49 +01:00
Wei-Sheng Chin	e9d20e9dba	Revise Send and Recv (#4547 ) * Add ability to retrieve inferred shapes when executing a kernel. This ability helps Recv to know its output shapes without doing actual cummunication. Of course, if the output shapes cannot be inferred, Recv still needs to do communication to get shapes from Send. * Avoid communicating shape information when it can be inferred statically * Replace unordered_map with thread-safe wrapper. We don't want to have racing condition and undefined behavior when using parallel executor.y * Remove cout * Add missing file * Address comments * Check dim_value. -1 means missing * lock properly * Address comments (remove thread-safe map) * Remove poc header * Replace Stream with DeferredReleaseCPUPtr	2020-07-30 23:02:45 -07:00
Xiang Zhang	d73e01e5b9	remove ENABLE_TELEMETRY macro (#4633 )	2020-07-27 20:06:11 -07:00
Alisha Sonawalla	1e67fff93c	Add GetStringTensorElement, GetStringTensorElementLength and FillStringTensorElement API (#4374 ) Add new string tensor APIs and unit tests	2020-07-24 21:35:46 -07:00
Chi Lo	affdeb53c2	Add Python API for specifying device options. (#4205 ) * Add python API for specifying CUDA device id * Modification for providing session based python api for specifying device id * When include header file pybind11/stl.h, conversion between c++ containers and Python list, vector and dict data structure are automatically enabled. https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html# Therefore, refactor the code for better leverage this advantage. * Make struct CudaDeviceOptions as default cuda device options * Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts) But still stay consistent with existing sess.set_providers(list_of_provider) * Add cuda provider option default setting * Add support for setting cuda cuda_mem_limit and arena_extend_strategy. Also resolved the merge conflict on session.py * Use python ctypes to call cuda library to help python unittest * Refine the code with reviewer's suggestions * Add the capability of getting execution provider's configuration - Once we introduced the capability to set execution provider's configuration, it makes sense to add capability of getting ep's configuration. * Modify the code with reviewer's suggestions. * Using stoull() and stoul() depends on 32/64-bits architecture. * Rewrite the testcases for testing setting CUDA device id Note: We need to make sure every ORT process be run on one CUDA device at a time. * Make sure old session object is destroyed by python gc before new session object is being created * Move testcases to original onnxruntime_test_python.py * Fix bugs to pass CI build * Make it pass CI build (cont.) * Make it pass CI build (cont.)	2020-07-21 07:28:13 -07:00
Tracy Sharpe	08235e1662	add Output() overloads (#4546 )	2020-07-19 15:21:12 -07:00
Yulong Wang	0229a6a929	[C++ API] add SessionOptions::SetLogSeverityLevel() (#4545 )	2020-07-17 21:14:41 -07:00
Yulong Wang	fdc5c308c4	introduce macro ORT_API_MANUAL_INIT in C++ API (#4536 ) * introduce macro ORT_API_MANUAL_INIT in C++ API * resolve comments	2020-07-17 13:23:30 -07:00
Tiago Koji Castro Shibata	2189c77e5b	static_typename (#4520 ) * Use static_typename * Disable RTTI outside of Release * Fix unused var * Add test types * PR feedback	2020-07-16 16:31:02 -07:00
Xueyun Zhu	7d96960ec8	support pipeline partition with shared initializer (#4321 ) * support bert partition with shared initializer * address feedback * address feedback * address feedback * add more test * remove bert-tiny model * address feedback * address function comment * move CreateNodeArg to graph_utils * rename function name * rename function name * fix windows build * fix windows type conversion warning * add function comment	2020-07-14 17:21:40 -07:00
Tim Harris	a95ae164f7	Create N-1 threads in intra-op pool, given main thread now active (#4493 ) Create N-1 threads in a thread pool when configured with intra-op parallelism of N. This ensures we have N active threads, given that the main thread also runs work. To avoid ambiguity on the value returned, rename ThreadPool::NumThreads method to ThreadPool::DegreeOfParallelism, and make corresponding updates in MLAS and operators.	2020-07-14 09:48:50 +01:00
edgchen1	6c7da5e9d3	Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418 ) For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.	2020-07-10 10:20:23 -07:00
Josh Bradley	ca5af9d622	Add modern C++ standards for Ort::Value (#4367 ) * add modern standards to function arguments * code cleanup * fix code formatting * add element access convenience function * change template type name to match rest of code * remove new At() convenience function * add better documentation message	2020-07-09 00:35:41 -07:00
Tixxx	b156ae4448	Support training_mode flag in eval (#4324 ) * add training_mode feed for evaluation to support opset12	2020-07-08 10:38:54 -07:00
Ashwini Khade	ef602835b0	update getfunctionbody (#4396 )	2020-07-02 09:00:37 -07:00
liqunfu	5dcb9b4858	Liqun/backprop deterministic graph (#4315 ) make gradient graph deterministic add to session option use_deterministic_compute.	2020-07-01 12:39:10 -07:00
Ashwini Khade	0404763f23	Update function body initialization for ONNX functions (#4332 ) * Update function body initialization * minor fix * changes per review comments * minor fix * format fix * add function initialization in mixed precision transformer * more updates * more fixes	2020-06-30 14:30:59 -07:00
Scott McKay	274e6b4153	Cleanup SessionState. Move allocator lookup to SessionState. (#4194 ) * Move allocators to SessionState so they're decoupled from ExecutionProviders - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored - add device based lookup - simplifies logic for copying feeds/fetches across devices Cleanup SessionState and SessionStateInitializer - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters - simplify SessionStateInitializer - reduced down to FinalizeSessionState method	2020-06-28 14:55:42 +10:00
Josh Bradley	990b43ddf2	Add modern C++ standards to the C++ API (#4217 ) As a zero-cost wrapper around the C API, the current state of the C++ API is still pretty low-level and requires programmers to use C-style standards to interact with ONNX.	2020-06-25 22:28:00 -07:00
Tim Harris	3fc68cb150	Remove non-trivially-destructible thread-local from thread pool state, blocking ARM64 builds (#4336 ) - Move thread hint vectors from thread-local struct - Add static_assert that the per-thread state in the thread pool is trivially-destructible - Rename "thread_data" to "worker_data" (only allocated for workers in the pool, not threads calling into the pool)	2020-06-25 19:04:31 +01:00
Prabhat	151ef1c8a5	Add C++ wrapper for GetAvailableProviders() C API (#4313 )	2020-06-25 13:11:55 +05:30
Tim Harris	9e3b5c62fb	Use OpenMP-like synchronization patterns in Eigen thread pool (#4236 ) Updates the thread pool implementation to make work distribution over the Eigen thread pool more closely resemble techniques used in OpenMP. In particular: (1) A thread entering a parallel loop works on the iterations itself, rather than requiring a thread switch to/from a thread in the pool, if called from outside the thread pool. (2) To support this, work items pushed to the thread pool run a loop to claim iterations from a shared counter via atomic-fetch-and-add, as opposed to having work items themselves represent individual batches of iterations. This means that any thread working on the loop can execute any batch of iterations, including having the main thread run through all of the batches itself if the loop turns out to be short-running. (3) As with OpenMP active scheduling, the worker loop spins waiting for work prior to blocking. This avoids OS blocking / wake-up paths in workloads with series of short-running parallel sections.	2020-06-22 10:04:53 +01:00
Prabhat	57fabfba7a	Added GetAvailableProviders() to C API (#4247 ) * Added GetAvailableProviders to C API * Fix API version and Windows build error * Changed function name * Changed ORT_API_VERSION to 4 * Moved all_providers array to constants.h * Move check for providers to constants.h * Changed name of array to avoid warning * Address review comment * Added unit test	2020-06-22 10:10:25 +08:00
Scott McKay	175983c082	Move memory info into IAllocator (#2850 ) - Update IAllocator setup to move the OrtMemoryInfo to the base class instead of requiring derived classes to have that as a member and override a virtual method to return it. - Cleanup CreateAllocator setup to take an argument as to whether to wrap the device allocator in an arena allocator. The choice to do that isn't a property of the underlying device allocator. - Minor cleanups in the various EPs to adjust to the change to IAllocator and CreateAllocator, and to use the create_arena flag consistently when available.	2020-06-22 11:18:52 +10:00
Wei-Sheng Chin	de9da123cf	Enable static memory planning for pipeline. (#4204 ) * Enable static memory planning for pipeline. 1. We fix a bug when resolving symbolic shape for scalars. 2. We pass the original inputs to all pipeline stages so that the symbolic shapes can be resolved. * Further Improvements 1. Address comments. 2. Further reduce activation size by ~50% when pipeline is on. This is done by removing all but one gradient tensor from the last RecordEvent in the backward pass. * Address a comment * Fix Windows build	2020-06-12 21:43:50 -07:00
Xueyun Zhu	65a682354b	enable pipeline to run with mixed precision (#4113 ) * enable pipeline to run with mixed precision * address feedback * address feedback * test log * pipe infomation if test fails * ci failure	2020-06-10 22:16:24 -07:00
suffiank	7f5339505e	Discover trainable parameters using reverse DFS from loss node (#4116 ) Discover trainable parameters using reverse DFS from loss node, omitting recursion along untrainable inputs. Co-authored-by: suffian khan <sukha@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: suffian khan <sukha@microsoft.com>	2020-06-08 14:16:10 -07:00
Scott McKay	9790e19424	Handle mem pattern allocation failure better. Make BFCArena behavior more consistent (#4062 ) * Fixes from investigating issue running BERT-Squad model with larger batch sizes. When the batch size gets large enough the initial run will be successful (no memory pattern in use) but the second will fail to allocate the memory pattern block. The cause of this failure is that we still have the smaller blocks from the first run allocated, as BFCArena has no logic to free those. This essentially results in 2x the memory being required to run the model. There was inconsistency in BFCArena::Extend which on one path threw an exception if it couldn't do the allocation, and on another just returned false (resulting in Alloc returning a nullptr). Make the behavior consistent by always throwing if BFCArena fails to find a buffer to return. There are a huge number of places in the code where we assume Alloc returns a valid pointer so throwing will result in more correct behavior as a whole. It's also consistent with what happens when CUDA or the standard library fails to allocate memory. Next, update ExecutionFrame to check for this failure and not insert a memory block entry if it happens. With the existing code if BFCArena Alloc returned a nullptr we happily inserted that in the blocks, delaying detection of the failure to when we attempted to use the block in AllocateMLValueTensorSelfOwnBufferHelper. Finally update AllocateMLValueTensorSelfOwnBufferHelper to expect a location may not have a block. A log message will be provided when the block allocation fails so it's not necessary to have more on each individual allocation that would have used the block. Falls through to default behavior of doing a normal allocation.	2020-06-05 18:54:01 +10:00
Andrews548	62b44527e5	Add ArmNN Execution Provider (#3714 ) * Add ArmNN Execution Provider Add a new execution provider targeting Arm architecture based on ArmNN. Validated on NXP i.MX8QM CPU with ResNet50, MobileNetv2 and VGG models. reviewed-by: mike.caraman@nxp.com * Minor fixes - renamed onnxruntime_ARMNN_RELU_USECPU to onnxruntime_ARMNN_RELU_USE_CPU - fixed acl typo * remove extra includes. added exception for ArmNN in test * fix indentation * Separated the activation implementation from the cpu and fixed the blockage from the endif Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>	2020-06-03 22:57:51 +05:30
Xueyun Zhu	633008b5ef	Add pipeline online partition logic for pipeline (#3996 ) * online partition * fix when multiple consumer nodes is in cut info * fix windows build * address feedback * adding test * feedback * address feedback * add parser for cut edge * windows build	2020-05-26 17:44:09 -07:00
Paul Fultz II	7759136610	Add amd migraphx execution provider to onnx runtime (#2929 ) * Add amd migraphx execution provider to onnx runtime * rename MiGraphX to MIGraphX * remove unnecessary changes in migraphx_execution_provider.cc * add migraphx EP to tests * add input requests of the batchnorm operator * add to support an onnx operator PRelu * update migrapx dockerfile and removed one unused line * sync submodules with mater branch * fixed a small bug * fix various bugs to run msft real models correctly * some code cleanup * fix python file format * fixed a code style issue * add default provider for migraphx execution provider Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>	2020-05-27 04:24:59 +08:00
edelaye	64b5f7edf6	Initial release of Vitis-AI Execution Provider (#3771 ) * Initial release of Vitis-AI Execution Provider * Add documentation, fix for onnxruntime::Model changes and use stringstream instead of file dump for model passing * - Add Vitis-AI docker file - Add online quantization flow Vitis-AI execution provider - Fix remarks * - Add fatal error build message for Vitis-AI cmake build on Windows - Fix pep8 issue in build.py - Add Vitis-AI execution provider example in docs Co-authored-by: Elliott Delaye <elliott@xilinx.com> Co-authored-by: Jorn Tuyls <jornt@xilinx.com> Co-authored-by: Jorn Tuyls <jtuyls@users.noreply.github.com>	2020-05-19 05:32:32 -07:00
Vincent Wang	3c24841569	Fold Shape Node During Constant Folding (#3748 ) * Fold Shape node in constant folding. * bugfix * Fix test failure. * Bugfix for C++ frontend. * Bugfix for C++ frontend. Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-05-09 20:15:03 +08:00
Sheil Kumar	cf6a1c1715	Fix Windows Inbox build failing on 1) building raw api tests and 2) referencing _winml namespace in onnxruntime.dll (#3872 ) * add build inbox flag * remove raw tests and wstring for utf filenames * enable raw tests * use ToWideString * create new utf8 helper * update string helper to utf8 Co-authored-by: Sheil Kumar <sheilk@microsoft.com>	2020-05-08 15:59:16 -07:00
Ryan Hill	d5ec353e58	Ryanunderhill/mkldnn dll (#3314 ) First version of allowing providers to work as DLLs, only implemented for DNNL so far. More improvements to come next!	2020-05-06 00:57:09 -07:00
airockchip	edaf8a542c	Initial PR for RKNPU execution provider (#3609 ) * Initial RKNPU execution provider * Init * Support Ops: Conv, Relu, Clip, LeakyRelu, MaxPool, AveragePool, GlobalAveragePool, Concat, Softmax, BatchNormalization, Gemm, Add, Mul, Sub, Reshape, Squeeze, Unsqueeze, Flatten, Transpose, QLinearConv, DequantizeLinear * Add rknpu unittest * Update BUILD.md and Add RKNPU-ExecutionProvider.md * misc code update * fix CLIP accuracy issue. * fix "Error: Duplicate definition of name". * move rknpu_ddk out of onnxruntime submodule. * remove temporary code. * add rknpu namespace. * update misc of node_attr_helper * add const & comment for onnx_converter * add const & comment for shaper * unify variable name Co-authored-by: dkm <dkm@rock-chips.com> Co-authored-by: George Wu <jywu@microsoft.com>	2020-05-05 20:36:47 -07:00
Changming Sun	bd78364411	Parallel all the activations ops (#3722 ) 1. Parallel all the activations ops. 2. Parallel the performance critical path of the LRN op, which makes the ONNX model zoo googlenet model runs 60% faster(latency reduced from 21ms to 13ms). 3. Make the Gemm-Activation fusion support with all the activations ops. Before this change, it only supports LeakyRelu/Relu/Sigmoid/Tanh. 4. Delete onnxruntime/test/framework/op_kernel_test.cc because the file is almost empty. 5. Remove the loggings in KernelRegistry::TryFindKernel, return Status with error message instead.	2020-05-05 01:18:17 -07:00
Scott McKay	15eca74d15	Make ThreadPool::PartitionWork a bit more user friendly. Update a few places to use PartitionWork. (#3795 )	2020-05-02 17:09:55 +10:00
Changming Sun	edd5855fb7	Remove eigen device from thread pool	2020-05-01 02:21:57 -07:00
Pranav Sharma	e42e0d4787	Update documentation + Update mlas threading lib to use the new TrySimpleParallelFor. (#3779 )	2020-05-01 00:23:06 -07:00
Scott McKay	3421ec1110	Add Threadpool::TrySimpleParallelFor (#3759 ) * Add TrySimpleParallerFor so that there's a path with OpenMP awareness for SimpleParallelFor. Makes it consistent with [Try]BatchParallelFor and [Try]ParallelFor. Update TopK to check for the number of threads better, and to use TrySimpleParallelFor. * Update doco to mention TrySimpleParallelFor	2020-04-30 20:03:33 +10:00
Tixxx	0638565fe0	Fix evaluation issues (#3538 ) * allow switching between eval and training modes dynamically Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>	2020-04-28 21:03:37 -07:00
Jeff Bloomfield	1a11ba8a7e	Merge remote-tracking branch 'upstream/master' into jeffbloo/MergeDmlDev	2020-04-28 00:45:22 -07:00
Wei-Sheng Chin	7627e6bcc2	Improve node and node argument name generation (#3649 )	2020-04-27 13:57:24 -07:00
Jeff Bloomfield	f1c19f8495	merge master	2020-04-25 19:04:58 -07:00
Ethan Tao	e9f1e7e797	resolve conflicts	2020-04-24 15:15:36 -07:00
edgchen1	4aa033b99e	Addressing review comments (#3690 ) - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414359326 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414359463 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414360023 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414361667 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414368707 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414371480 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414379362 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414374516 - https://github.com/microsoft/onnxruntime/pull/3681#discussion_r414801087	2020-04-24 14:57:18 -07:00
Tiago Koji Castro Shibata	f48b9e2ea7	Add adapter session tests (#3522 ) * Start adapter tests * Fix more adapter session CMake * Implememt adapter session tests * Fix adapter test breaks * Test fixes, profiling test * Fix adapter w/ DML tests * Cleanup * Fix WinML adapter profiling test * Fix memory leaks * Remove FIXME	2020-04-24 14:39:54 -07:00

1 2 3 4 5 ...

410 commits