onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-16 21:00:14 +00:00

Author	SHA1	Message	Date
Hariharan Seshadri	d42399e1b0	Allow querying a GraphProto's doc_string as part of ModelMetadata (#6248 )	2021-01-05 22:18:03 -08:00
Edward Chen	ce6161cf67	Add MakeStringLite which uses current locale, update some MakeString call sites to use it instead. (#6252 ) * Add MakeStringLite which uses current locale, update macros to use that to generate messages. * Convert calls to MakeStringLite().	2021-01-04 19:27:24 -08:00
Hector Li	ffb4b62826	Fix allocator issue for TensorRT IOBinding (#6240 ) * Fix issue: https://github.com/microsoft/onnxruntime/issues/6094 Root cause: we didn't expose the OrtMemoryInfo for TRT, so it will cause issue if user want use IObinding for Tensorrt. Short term fix, add the OrtMemoryInfo for TRT. Long term should unify the allocator for CUDA and TRT	2020-12-31 20:15:43 -08:00
Scott McKay	2da8060f34	Helper for compiling EP to generate deterministic unique ids for use in MetaDef names (#6156 ) * Create a helper for generating unique ids that can be used by an EP that creates compiled nodes and needs ids to be deterministic for a model when used in multiple sessions. Added to IExecutionProvider as this can potentially be used by all compiling EPs and is more robust than a simplistic counter (although EP implementer is free to choose either approach). * Restructure the helper so it can be called across the EP bridge. Add ability to call id generation helper from EP bridge - convert DNNL EP to use helper to validate Address issue where a new Model may be loaded into the same address as a previous one. - hash the bytes in the Graph instance (1728 bytes currently) to use as the key to the full hash for the model Add lock around id generation to ensure no issues if multiple sessions partitions graphs at exactly the same time. - Extremely unlikely but would be hard to debug and the locking cost is not an issue as it's only incurred during graph partitioning and not execution.	2020-12-21 12:17:58 +10:00
Pranav Sharma	efa1b0d864	Minor fix to satisfy c++14 (#6162 )	2020-12-17 13:53:24 -08:00
Ryan Hill	ac62cf8058	Unify IExecutionProvider and IExecutionProviderFactory interfaces (#6108 ) * Remove Provider_IExecutionProvider and make the internal IExecutionProvider usable by shared providers * Change Provider_IExecutionProviderFactory to be the core version.	2020-12-15 16:45:53 -08:00
Edward Chen	64709b1335	Deprecate Python global configuration functions [Part 1] (#5923 ) Enable options to be set via execution provider (EP)-specific options and log deprecation warning from current global configuration functions.	2020-12-15 11:32:43 -08:00
Sherlock	eb5c1f0fcc	Unify activation and initializer alignment value (#6109 ) * Unify activation and initializer alignment value * Fix VerifyInputTensorsAllocatedContiguously	2020-12-14 13:13:41 -08:00
Sherlock	a53f4dd379	Introduce VariadicAlias, remove hardcoded alias limits (#6106 ) * Introduce VariadicAlias, remove hardcoded alias limits * Include optional-lite in winml build Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-11 10:47:08 -08:00
RandySheriffH	404982ded5	Enable varied input type for custom op (#6066 ) * allow custom op taking varied types * refactor test case * add test model * refactor test case * enable copy elision * update test case * fix issue in ToString function	2020-12-09 15:10:42 -08:00
Pranav Sharma	2c5ba9ab00	Bump up API version for 1.6 release (#6076 )	2020-12-08 01:24:29 -08:00
Moshe David	06ad516a5d	w (#5947 ) Co-authored-by: modav <modav@microsoft.com>	2020-11-30 10:35:44 +10:00
Hariharan Seshadri	d46dbeafd3	Expose knobs to create and share (CPU) allocators across sessions in C# and Python (#5634 )	2020-11-21 14:12:33 -08:00
Guoyu Wang	cc6e8fb7cc	Filter initializers for GraphViewer with IndexedSubGraph (#5884 ) * fix filtered subgraph initializer issue * minor fix * Inlcude implicit input of nodes to see if they are initializers * Add test case * minor update * Address PR comments * Fix some code error	2020-11-20 18:36:58 -08:00
Ryan Hill	ba739a8000	Convert OpenVINO into a shared provider (#5778 ) Same as Dnnl and TensorRT before it, now with more methods and more cleanup.	2020-11-20 17:39:57 -08:00
Scott McKay	00412a76e9	Exclude some training specific code from the minimal build. Cleanup some related aspects of allocation planner. (#5861 ) * Exclude some training specific code around the allocation planning and initializer handling from the minimal build. Simplify the code around tracking start/end usage of a value.	2020-11-20 20:25:46 +10:00
S. Manohar Karlapalem	ff58f621fa	Remove nGraph Execution Provider (#5858 ) * Remove nGraph Execution Provider Pursuant to nGraph deprecation notice: https://github.com/microsoft/onnxruntime/blob/master/docs/execution_providers/nGraph-ExecutionProvider.md#deprecation-notice Deprecation Notice \| \| \| \| --- \| --- \| \| Deprecation Begins \| June 1, 2020 \| \| Removal Date \| December 1, 2020 \| Starting with the OpenVINO™ toolkit 2020.2 release, all of the features previously available through nGraph have been merged into the OpenVINO™ toolkit. As a result, all the features previously available through ONNX RT Execution Provider for nGraph have been merged with ONNX RT Execution Provider for OpenVINO™ toolkit. Therefore, ONNX RT Execution Provider for nGraph will be deprecated starting June 1, 2020 and will be completely removed on December 1, 2020. Users are recommended to migrate to the ONNX RT Execution Provider for OpenVINO™ toolkit as the unified solution for all AI inferencing on Intel® hardware. * Remove nGraph Licence info from ThirdPartyNotices.txt * Use simple Test.Run() for tests without EP exclusions To be consistent with rest of test code. * Remove nGraph EP functions from Java code	2020-11-19 16:47:55 -08:00
Guoyu Wang	261462be0d	Change NNAPI runtime options to use uint32_t (#5863 ) * Change nnapi options unsigned long -> uint32_t * Move options from long to int in java code	2020-11-19 13:38:49 -08:00
Pranav Sharma	c2a993e745	Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. (#5831 ) * Add documentation for OrtArenaCfg for CreateAndRegisterAllocator API. * Address PR comments * More comments	2020-11-18 10:21:20 -08:00
Scott McKay	7b76b57fc8	Support EPs that compile nodes in a minimal build. (#5776 ) * Support EPs that compile nodes in a minimal build. This enables NNAPI being used.	2020-11-17 13:52:22 +10:00
Guoyu Wang	dc0f7b8f82	Remove onnxruntime_session_options_config_keys.h from c_api (#5772 ) * Remove seesion config keys header from c_api * remove copy session config header in release package * Keep the session option config header in the package	2020-11-12 09:12:13 -08:00
Hariharan Seshadri	b92fc66ea1	Support opset-13 specs of controlflow ops (Loop, If) (#5665 )	2020-11-11 23:44:14 -08:00
Tim Harris	48b14b52b8	Remove Env::Task wrapper around std::function (#5753 ) This is a small perf / clean-up change. It removes the Env::Task abstraction which wraps a single std::function field, and adds at least one virtual method call overhead when creating a Task and when executing it. The POSIX and Windows implementations are now identical.	2020-11-10 20:22:07 +00:00
Tim Harris	5e44d25c5a	Support multi-loop parallel sections, use multi-loop sections in GRU (#5602 ) This PR updates the ThreadPool API to support multi-loop parallel sections. As with the OpenMP "parallel" construct, this allows per-loop work to be amortized over a series of loops. For ORT, it also promotes locality between successive loops in the sense that iteration X of one loop will tend to run on the same worker thread as iteration X of preceding loops. The change was developed while optimizing the implementation of a model that performed better with OpenMP. Profiling indicated that OpenMP was providing lower loop entry/exit costs and that, via OpenMP's static scheduling, it was leading to a lower L2 miss rate in the series of parallel loops used in GRU. The main changes are: - Addition of ThreadPool::ParallelSection and underlying support in the modified Eigen thread pool. - In EigenNonBlockingThreadPool.h, refactoring the RunInParallel method to support two variants: one that takes an existing parallel section object created by the caller, and another (used by default) that creates its own parallel section. - Simplify ThreadPool::LoopCounter (used by worker threads to claim loop iterations), basing it an ID supplied by the underlying Eigen thread pool for affinity in a series of loops. - Fix a possible perf issue where a loop with iterations scheduled in batches would have more threads than batches available. - Use of parallel sections in the GRU operator. - Additional test cases in threadpool_test.h. - Additional comments at the top of threadpool.h and EigenNonBlockingThreadPool.h.	2020-11-10 12:24:57 +00:00
edgchen1	2acdc3cd82	Move GetUseDeterministicCompute() to OpKernelContext to avoid need to downcast to OpKernelContextInternal. (#5729 )	2020-11-09 11:37:06 -08:00
Dmitri Smirnov	2bf5046d4e	Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs (#5716 ) Add tag types for Ort::Float16_t and Ort:Bfloat16_t structs that contain uint16_t values for float16 and bfloat16. These will serve as type dispatching types for C++ API. They are of uint16_t size and arrays of these types can be used to create Tensors of the corresponding types. Make documentation Doxygen compliant.	2020-11-06 16:41:26 -08:00
Scott McKay	2127a229d7	The IndexedSubGraph is used to create the Function body, but after that is invalid as the nodes it referred to have been removed from the main Graph. As such there's no need to store it in the FunctionImpl instance. (#5669 )	2020-11-05 17:21:56 +10:00
Guoyu Wang	a2b551ff08	Add runtime options for NNAPI EP (#5576 ) * Add options for nnapi ep * Add nnapi flags test * add comments * Add flag comments * Make the flags bitset const * Fix build break * Add stub changes to java and c# api * Fix java related build break * Fix java build break * Switch to bit flags instead of bitset	2020-11-04 10:08:43 -08:00
edgchen1	07bd4ef470	Upgrade optional implementation to https://github.com/martinmoene/optional-lite . (#5563 )	2020-11-03 15:27:47 -08:00
Scott McKay	c9f44276da	Add ability to filter GraphViewer using IndexedSubGraph. (#5614 ) * Add ability to filter GraphViewer using IndexedSubGraph. This is to support compiling execution providers in a minimal build.	2020-11-04 07:08:18 +10:00
Wenbing Li	5b44982971	Change the OrtCustomOp invocation as a constant. (#5506 ) * Chanage the OrtCustomOp invocation as a constant. * fix build on macos * build fixing	2020-11-02 10:38:07 -08:00
M. Zeeshan Siddiqui	9af0d48524	Memory planner and pattern generation enhancements. (#4443 ) * static allocation. * chanegs. * contigious dynamic allocation. * contigious dynamic allocation. * fix bugs. * fix bug. * build errors. * PR feedback. * PR feedback. * Update Graph builder for nccl_allreduce, mps. * misc. * fix windows build break. * changes. * fine-grained memory-time scheduling. * merge. * fix misc stuff. * fix windows build. * fix windows build. * fix merge bug. * merge conflicts. * revert onnx-tensorrt submodule commit. * fix submodule commit. * misc. * merge conflicts. * Revert "merge conflicts." This reverts commit `319a071a6e`. * merge conflict. * merge conflict. * merge conflicts. * fixes. * PR feedback. * build break. * build break. * Add asserts. * Add asserts. * asserts. * asserts. * asserts. * asserts. * asserts. * fixes. * fixes. Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: root <root@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-01 23:05:46 -08:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00
Dmitri Smirnov	742ffb860c	Allow Kernels refer to some attribute data directly in the protobuf (#5624 ) * Introduce OpKernelInfo GetAttrAsSpan() for floats and ints attribute proto arrays and GetAttrsStringRefs() to return a vector of string references. These new APIs allow kernels not copy attribute arrays especially if they are large and save on memory. but refer directly to data that is in AttributeProto. Modify TfIdfVectorizer to take advantage of the new API. Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-29 16:12:54 -07:00
Sergii Dymchenko	2e1fa3ccb7	Fix GeluRecompute for 2 inputs case. (#5573 ) * Add test for FastGelu + GeluRecompute. * Fix GeluRecompute for 2 inputs case. * Fix test for BiasGelu + GeluRecompute. * Copy all inputs to Gelu, not just 2. * Move GeluRecompute test to training-specific file.	2020-10-29 00:07:13 -07:00
Tim Harris	5e8952ef89	ThreadPool clean up : mm_pause in loops, correctly spin-then-wait, and adopt static methods consistently in the API (#5590 ) Description: This change makes three changes to the ThreadPool class to clean up issues identified during performance analysis and optimization. (1) It uses mm_pause intrinsics in spin loops, helping avoid consuming pipeline resources while waiting. (2) It re-organizes the spin-then-steal loop for work distribution to start out spinning as intended, rather than to start out trying to steal. (3) It updates the ThreadPool class's API to be consistent in the use of static methods for public functions. The PR includes minor doc updates and corresponding changes to test cases. Motivation and Context The change helps ensure consistency in behavior between the OpenMP and Eigen-based implementations. Unlike the instance methods, the static methods abstract over the different ways in which threading can be implemented; they will map onto the OpenMP or Eigen-based implementations when threading is used. When threading is not used they will run work sequentially.	2020-10-28 09:49:18 +00:00
Ryan Hill	e90b6f06d1	Factor out IAllocator so that it can be shared with shared providers (#5567 ) * Factor out IAllocator so shared providers can use it directly.	2020-10-27 17:28:17 -07:00
Dmitri Smirnov	3433576fd3	Support for Sparse Initializers (#5540 ) Introduce sparse_initializers support. Convert them to dense on model load and prune graph_proto_ so they don't consume space. Convert back to sparse on ORT Format model save. Implement serializing sparse initializers to OrtFormat. Fix Model::ToProto() to return original sparse initializers Set a flag that graph_sync is needed when loading a simple ORT Format model. otherwise nothing is resolved. Add ORT Format history to README.md ifdef MINIMAL build for DenseToSparseTensorInitializer Allow duplicate initializers to support existing models. Issue a warning instead of aborting. * Revert "Remove SparseTensor support from minimal build. (#5114)" This reverts commit `59ee8ffb17`. Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-27 10:32:06 -07:00
Yufeng Li	30cdc74bc0	Enable prepacking in subgraph (#5433 ) Prepacking in subgraph is not supported currently. We see more and more models with subgraph, which has MatMul, MatMulInteger and other ops. Prepacking can speed up those models significantly.	2020-10-26 22:22:31 -07:00
Du Li	860cb22260	Bug fix for C API (#5520 ) * remove if_def from C api * Fix CI issues. * revert change for symbols.txt	2020-10-24 13:37:58 -07:00
Ryan Hill	82c7a9756e	Fix shared provider unload crash (#5553 )	2020-10-21 13:01:21 -07:00
Changming Sun	280cdf31f5	Revert "Fix shared provider unload crash (#5523 )" (#5547 ) This reverts commit `610676293e`. Because Linux DNNL pipeline is failing.	2020-10-20 08:01:28 -07:00
Ryan Hill	610676293e	Fix shared provider unload crash (#5523 ) * Change shared providers so that they are shutdown before shared library unload * Move UnloadSharedProviders declaration into a shared header to avoid bugs.	2020-10-19 18:08:38 -07:00
Sunghoon	645d978589	Sunghcho/denormals (#5391 ) * Add session option and global thread pool option to set denormal as zero. * Revert unneccessary changes. * Add cpuinfo submodule * Add more comments * Remove cpuinfo submodule dependency and check only SSE3 support for ftz and daz inspired by Tensorflow * Preserve API order in C api * Clean up and utilize SSE3 detection logic from existeing cpuid_info.h * Keep the same order with header file * Fix build issue with Linux pipeline, which has old g++ compiler * Fix broken build on Linux and remove a duplicated unit test * Remove reformatting at eigen thread pool * Remove flatbuffers which is not intentionally added * Revert "Remove flatbuffers which is not intentionally added" This reverts commit 9f509a9aaaa3c7832d88854c82fd26b234770b7f. * Remove flatbuffers which is not intentionally added * Resolve comments - Put details on APIs - Add a log for ftz/daz initialization - Add clang - Fix typo * Remove unnecessary header include * Resolve comments	2020-10-15 12:47:42 -07:00
Chun-Wei Chen	2b6b3a2ee6	Add GetProfilingStartTimeNs() to Python/C# APIs (#5280 ) * add Python API for getProfilingStartTime * debug for using Python API * add in C# api * use uint intead of uint64_t to prevent warning * typo for GetProfilingStartTimeNs * remove const * Update onnxruntime/python/session.py Co-authored-by: Pranav Sharma <emailpranav@gmail.com> * remove unnecessary return * Add Python unit test * Add C# unit test and refactor Python test * use ulong in C# for uint64_t in C++ * remove time.monotonic_ns * syntax: remove public for inner function * correct the API's order * getprofilingstarttime after run * Correct the right order in NativeMethod.cs * update order * nit: remove spaces * Update csharp/src/Microsoft.ML.OnnxRuntime/InferenceSession.cs Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com> * use the updated function * add comment about the precision * add more comments * add session.py back * fix flake8 * remove session.py * Add comments in C, C#, Python APIs about precision Co-authored-by: Pranav Sharma <emailpranav@gmail.com> Co-authored-by: Guoyu Wang <62914304+gwang-msft@users.noreply.github.com>	2020-10-14 05:32:43 -07:00
Xiang Zhang	b12824fa7a	add telemetry event for nodejs binding (#5463 )	2020-10-12 22:53:01 -07:00
KeDengMS	c444b9d76a	Add CUDA option to run copy in default stream (#5445 ) * Add CUDA option to run copy in default stream This change fixes #4829. Thanks @maherzog for providing the repro! The bug is caused by memory reuse in BFC arena, where copy and compute stream in CUDA has a racing condition. BFC arena is an arena allocator on top of cudaMalloc/Free to reduce the cost in syncing CPU and GPU when alloc/free. It means when CPU alloc/free the memory, GPU might not finished previous work on the memory, so that CPU and GPU could run asynchronously. This is OK if there's only one stream, where the execution order in CPU and GPU are consistent. For example, if we have two kernels A and B, CPU runs allocA->computeA->freeA->allocB->computeB->freeB, A and B could shares the same memory since computeA and computeB will not have racing as long as they run in the same GPU compute stream. However, if CPU runs allocA->CopyA->freeA->allocB->computeB->freeB, the order of execution in GPU could have copyA happen after computeB, if copy and compute happens in different GPU streams. This change makes copy to run in default compute stream, while adding an option to fall back to previous behavior if there's perf hit. This is a short term fix before BFC arena could support multiple streams. User may use following options to revert to previous behavior: C API: struct OrtCUDAProviderOptions cudaProviderOpt; cudaProviderOpt.do_copy_in_default_stream = false; C++ API: CUDAExecutionProviderInfo cudaEPInfo; cudaEPInfo.do_copy_in_default_stream = false; C# API: pending... Python: import onnxruntime onnxruntime.capi._pybind_state.set_do_copy_in_default_stream(False) * Confirmed the test failes in CI when doing copy in separate stream Revert the test to get CI pass now * Fix Windows test * Address CR	2020-10-12 22:12:05 -07:00
Scott McKay	a92ccbe1bc	Various armv7 related fixes (#5394 ) * - Link with libatomic if needed - Install pip differently so it doesn't clash with the system pip which may involve a wrapper script - Remove ability to specify offset when Tensor allocates the data. The data prior to offset isn't accessible by anything. - Fix use of offset in TensorOpTest to work on armv7 where it must be aligned to the type it points to. - Fix ActivationOpNoInfTest.Softsign to allow for armv7 behavior - Fix ReductionOpTest.ReduceMean_keepdims to allow for armv7 floating point inaccuracy Address PR comments	2020-10-09 22:34:32 +10:00
Du Li	323c4dfe02	Adding an option for cudnn conv algorithms. (#5159 ) * adding cudnn conv algorithm selection options. * adding cudnn conv algorithm selection options. * export the api * adding the perf test option. * accomodating pr comments. * Move OrtSessionOptionsAppendExecutionProvider_CUDA to onnxruntime_c_api.h * Accomodating PR comments.	2020-10-05 16:53:52 -07:00
Sherlock	e71668f92c	Expose recompute configs to the frontend (#5318 ) * Expose recompute configs to the frontend * Add frontend test * Ensure recompute graph transformer is only applied once Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-02 09:49:47 -07:00

1 2 3 4 5 ...

409 commits