onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-01 03:45:06 +00:00

Author	SHA1	Message	Date
Dmitri Smirnov	3433576fd3	Support for Sparse Initializers (#5540 ) Introduce sparse_initializers support. Convert them to dense on model load and prune graph_proto_ so they don't consume space. Convert back to sparse on ORT Format model save. Implement serializing sparse initializers to OrtFormat. Fix Model::ToProto() to return original sparse initializers Set a flag that graph_sync is needed when loading a simple ORT Format model. otherwise nothing is resolved. Add ORT Format history to README.md ifdef MINIMAL build for DenseToSparseTensorInitializer Allow duplicate initializers to support existing models. Issue a warning instead of aborting. * Revert "Remove SparseTensor support from minimal build. (#5114)" This reverts commit `59ee8ffb17`. Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-27 10:32:06 -07:00
Yufeng Li	30cdc74bc0	Enable prepacking in subgraph (#5433 ) Prepacking in subgraph is not supported currently. We see more and more models with subgraph, which has MatMul, MatMulInteger and other ops. Prepacking can speed up those models significantly.	2020-10-26 22:22:31 -07:00
Du Li	860cb22260	Bug fix for C API (#5520 ) * remove if_def from C api * Fix CI issues. * revert change for symbols.txt	2020-10-24 13:37:58 -07:00
Ryan Hill	82c7a9756e	Fix shared provider unload crash (#5553 )	2020-10-21 13:01:21 -07:00
Changming Sun	280cdf31f5	Revert "Fix shared provider unload crash (#5523 )" (#5547 ) This reverts commit `610676293e`. Because Linux DNNL pipeline is failing.	2020-10-20 08:01:28 -07:00
Ryan Hill	610676293e	Fix shared provider unload crash (#5523 ) * Change shared providers so that they are shutdown before shared library unload * Move UnloadSharedProviders declaration into a shared header to avoid bugs.	2020-10-19 18:08:38 -07:00
Sunghoon	645d978589	Sunghcho/denormals (#5391 ) * Add session option and global thread pool option to set denormal as zero. * Revert unneccessary changes. * Add cpuinfo submodule * Add more comments * Remove cpuinfo submodule dependency and check only SSE3 support for ftz and daz inspired by Tensorflow * Preserve API order in C api * Clean up and utilize SSE3 detection logic from existeing cpuid_info.h * Keep the same order with header file * Fix build issue with Linux pipeline, which has old g++ compiler * Fix broken build on Linux and remove a duplicated unit test * Remove reformatting at eigen thread pool * Remove flatbuffers which is not intentionally added * Revert "Remove flatbuffers which is not intentionally added" This reverts commit 9f509a9aaaa3c7832d88854c82fd26b234770b7f. * Remove flatbuffers which is not intentionally added * Resolve comments - Put details on APIs - Add a log for ftz/daz initialization - Add clang - Fix typo * Remove unnecessary header include * Resolve comments	2020-10-15 12:47:42 -07:00
Chun-Wei Chen	2b6b3a2ee6	Add GetProfilingStartTimeNs() to Python/C# APIs (#5280 ) * add Python API for getProfilingStartTime * debug for using Python API * add in C# api * use uint intead of uint64_t to prevent warning * typo for GetProfilingStartTimeNs * remove const * Update onnxruntime/python/session.py Co-authored-by: Pranav Sharma <emailpranav@gmail.com> * remove unnecessary return * Add Python unit test * Add C# unit test and refactor Python test * use ulong in C# for uint64_t in C++ * remove time.monotonic_ns * syntax: remove public for inner function * correct the API's order * getprofilingstarttime after run * Correct the right order in NativeMethod.cs * update order * nit: remove spaces * Update csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> * use the updated function * add comment about the precision * add more comments * add session.py back * fix flake8 * remove session.py * Add comments in C, C#, Python APIs about precision Co-authored-by: Pranav Sharma <emailpranav@gmail.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>	2020-10-14 05:32:43 -07:00
Xiang Zhang	b12824fa7a	add telemetry event for nodejs binding (#5463 )	2020-10-12 22:53:01 -07:00
KeDengMS	c444b9d76a	Add CUDA option to run copy in default stream (#5445 ) * Add CUDA option to run copy in default stream This change fixes #4829. Thanks @maherzog for providing the repro! The bug is caused by memory reuse in BFC arena, where copy and compute stream in CUDA has a racing condition. BFC arena is an arena allocator on top of cudaMalloc/Free to reduce the cost in syncing CPU and GPU when alloc/free. It means when CPU alloc/free the memory, GPU might not finished previous work on the memory, so that CPU and GPU could run asynchronously. This is OK if there's only one stream, where the execution order in CPU and GPU are consistent. For example, if we have two kernels A and B, CPU runs allocA->computeA->freeA->allocB->computeB->freeB, A and B could shares the same memory since computeA and computeB will not have racing as long as they run in the same GPU compute stream. However, if CPU runs allocA->CopyA->freeA->allocB->computeB->freeB, the order of execution in GPU could have copyA happen after computeB, if copy and compute happens in different GPU streams. This change makes copy to run in default compute stream, while adding an option to fall back to previous behavior if there's perf hit. This is a short term fix before BFC arena could support multiple streams. User may use following options to revert to previous behavior: C API: struct OrtCUDAProviderOptions cudaProviderOpt; cudaProviderOpt.do_copy_in_default_stream = false; C++ API: CUDAExecutionProviderInfo cudaEPInfo; cudaEPInfo.do_copy_in_default_stream = false; C# API: pending... Python: import onnxruntime onnxruntime.capi._pybind_state.set_do_copy_in_default_stream(False) * Confirmed the test failes in CI when doing copy in separate stream Revert the test to get CI pass now * Fix Windows test * Address CR	2020-10-12 22:12:05 -07:00
Scott McKay	a92ccbe1bc	Various armv7 related fixes (#5394 ) * - Link with libatomic if needed - Install pip differently so it doesn't clash with the system pip which may involve a wrapper script - Remove ability to specify offset when Tensor allocates the data. The data prior to offset isn't accessible by anything. - Fix use of offset in TensorOpTest to work on armv7 where it must be aligned to the type it points to. - Fix ActivationOpNoInfTest.Softsign to allow for armv7 behavior - Fix ReductionOpTest.ReduceMean_keepdims to allow for armv7 floating point inaccuracy Address PR comments	2020-10-09 22:34:32 +10:00
Du Li	323c4dfe02	Adding an option for cudnn conv algorithms. (#5159 ) * adding cudnn conv algorithm selection options. * adding cudnn conv algorithm selection options. * export the api * adding the perf test option. * accomodating pr comments. * Move OrtSessionOptionsAppendExecutionProvider_CUDA to onnxruntime_c_api.h * Accomodating PR comments.	2020-10-05 16:53:52 -07:00
Sherlock	e71668f92c	Expose recompute configs to the frontend (#5318 ) * Expose recompute configs to the frontend * Add frontend test * Ensure recompute graph transformer is only applied once Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-02 09:49:47 -07:00
Guoyu Wang	3a3f26f38e	Move ort flatbuffers helper functions and value info r/w functions into separated lib (#5276 ) * Move fbs include from header to cc * add initial cmake for flatbuffers * Move most flatbuffers util to ort_flatbuffers * move code around * fix * move test/perf runner to use flatbuffer directly instead of model * minor update * Fix build break * Clean up includes and foward decl * Fix traning CI build breaks * Addressed PR comment, replaced some include with forward decls * Remove ORT_MUST_USE_RESULT temporarily	2020-09-25 05:36:29 -07:00
Sherlock	b03fb82ab7	Transformer layer-wise Recompute (#4526 ) * Build Recomputation Graph * Make topological sort to run FW nodes first * Pattern match start and end of transformer layer * Topological sort with Priority * Add logger to Gradient Graph Builder * Use Logger * Introduce Execution Order	2020-09-24 19:56:32 -07:00
Josh Bradley	4ed31ca214	Combine custom logger global threadpools (#4857 ) * add custom logger and global threadpools to C and C++ API * code cleanup and formatting * reformat code * tidy up some more code formatting * remove comment * fix API break from merging from master * renamed API function to CreateEnvWithCustomLoggerAndGlobalThreadPools * rename log variable and apply clang-format	2020-09-24 00:50:26 -07:00
Sherlock	038192bdb2	Place shape related compute nodes in CPU (#4940 ) * Place shape related nodes in CPU * visit candidates by topological order * Make CPU node placement a utility function * skip placing on CPU if the data typs is float16 or bfloat16	2020-09-21 17:10:39 -07:00
Pranav Sharma	974b9bfc09	Allow sharing of initializers between sessions. (#5092 ) * Allow sharing of initializers between sessions. * Allow sharing of initializers between sessions (2). * Add test for C# * Add test for C#; address PR comments * Address PR comments Moved AddInitializer logic to internal session options Added tests for owned buffer Clarified documentation Fix bug where memory info and not device was getting compared * Fix test * Fix training build * Add ver 5 end marker and ver 6 starter, add scenario and usage examples.	2020-09-21 14:09:37 -07:00
Pranav Sharma	d535894297	Add API to allow configuration of the global thread pools. (#5199 )	2020-09-17 09:19:18 -07:00
Dmitri Smirnov	e6f85f338e	Refactor TensorAt, prepare for release (#5180 ) * Refactor TensorAt locations* must be const and int64_t since our dims are int64_t Remove unnecessary copy of locations. Remove unnecesary casting and C-casting. Simplify implementation. Add a check for string type. Make CXX api return T& to fully expose C API in C++, const std::vector& by value as it covers more ground and eliminate redundant copy. Eliminate inner loop, compute strides first.	2020-09-16 10:20:45 -07:00
Chun-Wei Chen	7f3aa3a163	Add GetStartTime() for profiler to get private profiling_start_time_ (#4994 ) * add GetStartTime() for profiler * add function in inference_session * remove qualified name * add the api in cxx_api.h * rename starttime to StartTimeNs, expost profiling object * rename GetProfilingStartTime * move Ortapis to the right place * move to the end * add const for session * const the right place * use const auto instead of const auto* for session * remove const for auto getstarttime * remove const for auto getstarttime add unit tests * nit: update test name and add comments	2020-09-16 00:17:04 -07:00
S. Manohar Karlapalem	f7edf0aa57	[OpenVINO-EP] Enable EP config options for VPU hardware (#5119 ) * Added config flags for VPU Fast Recompile * clean-up ifdefs * Add VPU Fast compile config option Adds an option that enables Fast compilation of models to VPU hardware specific format. * Add config option to choose specific device id for inference Inference of all subgraphs will be scheduled only on this device even if other devices of the same type are available. * Add Python API to list available device IDs * code cleanup * Add second C/C++ API with settings string parameter Adds an additional C/C++ API that allows passing multiple key-value pairs for settings as a single string. Multiple settings are delimited by '\n' while the key and value within a setting are delimited by '\|'. * Append 'Ex' to the extended C/C++ API * Use set_providers Py API to set config options. Uses Session.set_providers Python API to set EP runtime config options as key/val pairs Deprecated older module function definitions for config settings. Updates documentation. * avoid globals for py config options where possible Co-authored-by: intel <you@example.com>	2020-09-14 15:46:14 -07:00
Scott McKay	323a1ba8a4	Add option to exclude support for loading ORT format models in full build. (#5129 ) * Add ability to exclude support for loading ORT format models. Disable support for ORT format models in packages	2020-09-12 12:21:30 +10:00
Scott McKay	59ee8ffb17	Remove SparseTensor support from minimal build. (#5114 ) * Remove SparseTensor support from minimal build. Currently the only valid usage of a SparseTensor is as an attribute of a Constant node. That would have been lifted to a dense tensor initializer when loading the onnx model, so would not exist when saving the ORT format model. Due to that there can be no SparseTensors in an ORT format model. Co-authored-by: gwang <wanggy@outlook.com>	2020-09-11 17:56:54 +10:00
Ryan Hill	3207de276c	Remove IDeviceAllocator class as it doesn't extend IAllocator in any way. (#5067 )	2020-09-10 00:46:35 -07:00
Guoyu Wang	433061531e	Enable onnx_test_runner for ort format (#5100 ) * Enable onnx_test_runner using ort format, for ort minimal build only Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-09-10 17:15:19 +10:00
Scott McKay	796ddeb2cb	Remove serialization of outer scope value info in ORT format model (#5077 ) * Remove serialization of outer scope node arg info in ORT format model. We don't currently need it in a minimal build as only SessionState calls Graph::IsConstantInitializer and it doesn't search outer scope. If we do need it in the future the information can be calculated at runtime (small binary size cost to do so). Motivation: ORT format model was 32% bigger for a BERT model with multiple levels of subgraph and a lot of nodes due to this. Size is about 5% larger of the original ONNX model with the change. ORT format has type/shape info for all nodes, and this model has 2000 nodes so this seems reasonable. Added example code to dump ORT format model to json. Fixed misc bug in python test script around handling float and non-float expected output.	2020-09-08 17:43:42 +10:00
Pranav Sharma	2c1410afe7	Remove usage of macros for constants in public header. (#5061 ) * Remove usage of macros for constants * Fix linkage issue	2020-09-05 01:27:20 -07:00
Vincent Wang	84de14a833	Register OpSet13 CUDA Kernels for BERT/UniLMv2 (#4856 ) * opset13 cuda kernels for BERT. * add opset13 SoftmaxCrossEntropyLoss. * opset13 size. * fix argmax/min for ut. * fix ut failure for argmax/min. * OrtMemTypeCPUInput Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-09-05 08:09:52 +08:00
Scott McKay	b5c2932ae8	Last major set of ORT format model changes (#5056 ) * Add minimal build option to build.py Group some of the build settings so binary size reduction options are all together Make some cmake variable naming more consistent Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used. Add initial doco and ONNX to ORT model conversion script Misc cleanups of minimal build breaks.	2020-09-05 07:59:01 +10:00
Du Li	6134994db9	Parallelizing elementwise kernels (#4577 ) * Parallelizing unary elementarywise ops. * Parallelizing binary elementwise ops. * Accommodating PR comments.	2020-09-04 14:45:43 -07:00
Xiang Zhang	0dad79b495	Add SetLanguageProjection C Api and use it in four projections (#5023 ) * Add SetLanguageProjection C Api and use it in four projections * static cast enum languageprojection to uint32_t * resolve comments * fix typo and line added unintentionally * revert unecessary change * reorder c# api * add TensorAt and CreateAndRegisterAllocator in Csharp to keep the same order as C apis	2020-09-04 14:26:39 -07:00
Ryan Hill	d792af776d	Remove Cuda dependency from TensorRT shared provider (#5014 )	2020-09-04 11:35:02 -07:00
Scott McKay	28445c88f9	Changes to enable saving and loading an ORT format model (#4995 ) * Changes to enable saving and loading an ORT format model via the public APIs. Cleanup session.py to try and make slightly more understandable. More refactoring is needed here. Couple of bug fixes * Fix bug in handling NodeArg serialization for optional inputs which has a name and no type info. * Address PR comments - tweak SessionOptions config to avoid double lookup - merge duplicated functionality in python binding around registering an EP with optional options Fix a couple of build issues. * Update C API to be consistent with python API - only load model in InferenceSession ctor if required - support loading ORT model in minimal build * Fix nodejs test. We get an invalid path error from LoadInterOp first now * Another attempt at fixing nodejs test. Error message depends on whether ENABLE_LANGUAGE_INTEROP_OPS is defined. Make the output consistent. The interop implementation looks suspicious given it appears to be internal code that is going via the public api. TBD if that should be fixed. * Fix couple of build issues. * Disable test temporarily so PR can be checked in. Will fix in separate PR that adds final pieces for minimal build as the test is required there. * Give up on nodejs test and make the match simpler. Fix init call in TrainingSession python to not pass through sess. it wasn't being used in Session anyway so passing it through just adds confusion. * Fix call to Session.__init__ in TrainingSession. Session now initializes Session._sess to None to make it clearer where the 'ownership' of that member is, and that needs to happen before TrainingSession sets it.	2020-09-03 09:10:48 -07:00
Tim Harris	bbb9d92a5f	Remove SchedulingParams variants of ThreadPool::TryParallelFor (#5050 )	2020-09-03 09:04:31 -07:00
gwang-msft	64237d999c	Add Cmake config for onnxruntime_NO_EXCEPTIONS (#4975 ) * additional noexception setting, added compile options * more no exception changes * addressed PR comments * Fix build issue when MSVC static library is used. * Clarify comment * add fatal message for onnxruntime_NO_EXCEPTIONS enabled without onnxruntime_MINIMAL_BUILD Co-authored-by: Scott McKay <skottmckay@gmail.com>	2020-09-01 10:17:50 -07:00
Pranav Sharma	ad1701dfb1	Rename DeviceAllocatorRegistrationInfo to a more generic name; Use OrtArenaCfg for arena members; Remove unused OrtMemType; Simplify CreateAllocator interface. (#4970 ) * Rename DeviceAllocatorRegistrationInfo to a more generic name; Remove OrtMemType; Simplify CreateAllocator interface. * - fix builds - fixed mixed aggregation + constructor calls (which were coded before this PR) - changed default value of max_mem in API header - added some validation of values for for arena_extend_strategy * fix tensorrt and cuda tests	2020-09-01 09:25:32 -07:00
gwang-msft	7ca8388dc9	[ORT Mobile] file format schema and file I/O code (#4973 ) * ort mobile file format schema and [de]serializing code	2020-09-01 11:51:31 +10:00
Ashwini Khade	8679a7244e	Enable rejecting models based on onnx opset (#4912 ) * enable rejecting models based on onnx opset * enable unreleased opsets in linux and mac CI * test fixes and more updates * enable unreleased opsets in CI builds * enable released opsets in linux cis * try fix windows ci yml * yml fixes * update yml * yml updates post master merge * review comments * bug fix	2020-08-31 13:35:36 -07:00
gwang-msft	ea5732319e	Add option ORT_NO_EXCEPTIONS to disable most exception/throw in /onnxruntime/ (#4894 ) * init no exception changes * initial test * disable exceptions * more throw handling * minor update * fix linux build break * fix windows/nuphar build break * address cr comments, move #ifdef to ORT_CATCH * address cr comments, move #ifdef to ORT_CATCH * handle return statement in ORT_CATCH * linux build break fix * addressed cr comments, remove ort_catch_end * addressed cr comments, remove ort_catch_end * move mlas to a separated ifdef flag * merge master, move some new code in master to no_exc Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-08-28 23:03:51 -07:00
Scott McKay	08eb15068c	Exclude the Map types from the build if ML ops are disabled. (#4908 ) * Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.	2020-08-27 17:48:12 +10:00
Dudeldu	3d63d8d4f1	Extend C++ API for Map/Sequence Type Info (#3517 ) (#4781 ) * Extend C++ API for Map/Sequence Type Info (#3517) Expose functionality to view type information about sequences/maps to C++ API. - Add functions - `TypeInfo::GetSequenceTypeInfo` - `SequenceTypeInfo::GetSequenceElementType` - `TypeInfo::GetMapTypeInfo` - `MapTypeInfo::GetMapValueType` - `MapTypeInfo::GetMapKeyType` - Add structs - `SequenceTypeInfo` - `MapTypeInfo` Co-authored-by: Dudeldu <mustermann.informatik@gmail.com> Co-authored-by: Jonas-Heinrich <Jonas@JonasHeinrich.com> * Extend tests to cover new type info functionality for sequences and maps - two new test case in test_nontensor_types for maps and sequences Co-authored-by: Jonas-Heinrich <Jonas@JonasHeinrich.com>	2020-08-25 12:03:23 -07:00
Scott McKay	14c691030f	Fix build break from removing custom ORT onnx protobuf (#4904 ) Exclude parsing of json config in model (also excludes json parsing library)	2020-08-25 18:10:42 +10:00
Changming Sun	26546f81fe	Remove the private ONNX protobuf definition file (#4878 )	2020-08-24 12:40:33 -07:00
Scott McKay	728e886bba	Add kernel def hash logic for minimal build (#4891 ) * Add hash based lookup of kernels	2020-08-23 14:39:07 +10:00
Scott McKay	db7669b225	Reduce ONNX dependency in minimal build (#4890 ) * Next round of changes. Remove inclusion of ONNX schema header Exclude custom registry related things Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.	2020-08-23 07:02:13 +10:00
Pranav Sharma	29dcfb24ab	Allow multiple sessions to share an allocator, optimize constant folding memory usage, expose arena configs. (#4813 ) * Add support for sharing allocators * Incremental update * Address some PR comments, add unit tests, add documentation. * Address PR comments, add tests and some documentation. * Fix build and test issues * Remove RegisterAllocator API restoring the OrtAllocator interface changes. Changed docs to reflect this. Also fixed the orttraining segfault. The segfault was because in the case of training session, the CPU exec prov is not available at the time the transformers are applied. Changed it to create a new one.	2020-08-22 10:03:17 -07:00
RandySheriffH	3fa73a5b6a	ReduceBinarySize (#4747 ) * cancel night build on pyop * add rewriter to rewrite cpu provider * skip BuildKernelCreateInfo<void> * refactor variable name and comment * include ops from csv file * process multiple eps * add default function to cuda provider * rename function and add license header * fix import * add doc * fix typo * deal with empty kernel entry in cuda * rename the rewriter file * add comment into provider file * add comment and rename function * log warnings * refactor extracting logic * add entry for script to run solo * add better example * avoid onnx importing * fix flake8 alerts * minor fixes to better comments and doc * add entries for all domains * add void entry into contrib providers * format cuda_contrib_kernels.cc * format cpu_contrib_kernels.cc * add all providers * add default entry to all providers * include op_kernel header * cancelling change in providers beyond cpu/cuda * rename file and switch file format to domain;opset;op1,op2... * update doc * restore non-regular ending grammar in cuda_contrib_kernels.cc * add ort_root as input argument of script * enable test in ci * update doc * update doc * revert change on linux gnu ci * switch to set to host ops * simplify trimming logic * add domain map to track current model * allow ort_root to take relative path	2020-08-21 19:50:13 -07:00
Scott McKay	e00ad83f2b	Initial changes to disable code in a minimal build (#4872 ) * Initial set of changes to start disabling code in the minimal build. Breaking changes into multiple PRs so they're more easily reviewed. Focus on InferenceSession, Model and Graph here. SessionState will be next. Needs to be integrated with de/serialization code before being testable so changes are all off by default. Changes are limited to - #ifdef'ing out code - moving some things around so there are fewer #ifdef statements - moving definition of some one-line methods into the header so we don't need to #ifdef out in a .cc as well - exclude some things in the cmake setup * Update session state and a few other places. The core code builds if ORT_MINIMAL_BUILD is specified.	2020-08-22 07:14:53 +10:00
Scott McKay	ef19916d07	Add Node::SinceVersion() (#4874 ) * Add Node::SinceVersion() so that the value is known when loading a graph from the ORT format (OpSchema is not available). * Fix build warning from returning 'const int'	2020-08-21 16:48:52 +10:00

1 2 3 4 5 ...

372 commits