onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-06-24 02:47:54 +00:00

Author	SHA1	Message	Date
Sherlock	eb5c1f0fcc	Unify activation and initializer alignment value (#6109 ) * Unify activation and initializer alignment value * Fix VerifyInputTensorsAllocatedContiguously	2020-12-14 13:13:41 -08:00
Sherlock	a53f4dd379	Introduce VariadicAlias, remove hardcoded alias limits (#6106 ) * Introduce VariadicAlias, remove hardcoded alias limits * Include optional-lite in winml build Co-authored-by: Sherlock Huang <bahuang@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-12-11 10:47:08 -08:00
RandySheriffH	404982ded5	Enable varied input type for custom op (#6066 ) * allow custom op taking varied types * refactor test case * add test model * refactor test case * enable copy elision * update test case * fix issue in ToString function	2020-12-09 15:10:42 -08:00
Hariharan Seshadri	d46dbeafd3	Expose knobs to create and share (CPU) allocators across sessions in C# and Python (#5634 )	2020-11-21 14:12:33 -08:00
Scott McKay	00412a76e9	Exclude some training specific code from the minimal build. Cleanup some related aspects of allocation planner. (#5861 ) * Exclude some training specific code around the allocation planning and initializer handling from the minimal build. Simplify the code around tracking start/end usage of a value.	2020-11-20 20:25:46 +10:00
Scott McKay	7b76b57fc8	Support EPs that compile nodes in a minimal build. (#5776 ) * Support EPs that compile nodes in a minimal build. This enables NNAPI being used.	2020-11-17 13:52:22 +10:00
Hariharan Seshadri	b92fc66ea1	Support opset-13 specs of controlflow ops (Loop, If) (#5665 )	2020-11-11 23:44:14 -08:00
edgchen1	2acdc3cd82	Move GetUseDeterministicCompute() to OpKernelContext to avoid need to downcast to OpKernelContextInternal. (#5729 )	2020-11-09 11:37:06 -08:00
M. Zeeshan Siddiqui	9af0d48524	Memory planner and pattern generation enhancements. (#4443 ) * static allocation. * chanegs. * contigious dynamic allocation. * contigious dynamic allocation. * fix bugs. * fix bug. * build errors. * PR feedback. * PR feedback. * Update Graph builder for nccl_allreduce, mps. * misc. * fix windows build break. * changes. * fine-grained memory-time scheduling. * merge. * fix misc stuff. * fix windows build. * fix windows build. * fix merge bug. * merge conflicts. * revert onnx-tensorrt submodule commit. * fix submodule commit. * misc. * merge conflicts. * Revert "merge conflicts." This reverts commit `319a071a6e`. * merge conflict. * merge conflict. * merge conflicts. * fixes. * PR feedback. * build break. * build break. * Add asserts. * Add asserts. * asserts. * asserts. * asserts. * asserts. * asserts. * fixes. * fixes. Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: root <root@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-01 23:05:46 -08:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00
Dmitri Smirnov	742ffb860c	Allow Kernels refer to some attribute data directly in the protobuf (#5624 ) * Introduce OpKernelInfo GetAttrAsSpan() for floats and ints attribute proto arrays and GetAttrsStringRefs() to return a vector of string references. These new APIs allow kernels not copy attribute arrays especially if they are large and save on memory. but refer directly to data that is in AttributeProto. Modify TfIdfVectorizer to take advantage of the new API. Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-29 16:12:54 -07:00
Ryan Hill	e90b6f06d1	Factor out IAllocator so that it can be shared with shared providers (#5567 ) * Factor out IAllocator so shared providers can use it directly.	2020-10-27 17:28:17 -07:00
Dmitri Smirnov	3433576fd3	Support for Sparse Initializers (#5540 ) Introduce sparse_initializers support. Convert them to dense on model load and prune graph_proto_ so they don't consume space. Convert back to sparse on ORT Format model save. Implement serializing sparse initializers to OrtFormat. Fix Model::ToProto() to return original sparse initializers Set a flag that graph_sync is needed when loading a simple ORT Format model. otherwise nothing is resolved. Add ORT Format history to README.md ifdef MINIMAL build for DenseToSparseTensorInitializer Allow duplicate initializers to support existing models. Issue a warning instead of aborting. * Revert "Remove SparseTensor support from minimal build. (#5114)" This reverts commit `59ee8ffb17`. Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-27 10:32:06 -07:00
Ryan Hill	82c7a9756e	Fix shared provider unload crash (#5553 )	2020-10-21 13:01:21 -07:00
Changming Sun	280cdf31f5	Revert "Fix shared provider unload crash (#5523 )" (#5547 ) This reverts commit `610676293e`. Because Linux DNNL pipeline is failing.	2020-10-20 08:01:28 -07:00
Ryan Hill	610676293e	Fix shared provider unload crash (#5523 ) * Change shared providers so that they are shutdown before shared library unload * Move UnloadSharedProviders declaration into a shared header to avoid bugs.	2020-10-19 18:08:38 -07:00
Scott McKay	a92ccbe1bc	Various armv7 related fixes (#5394 ) * - Link with libatomic if needed - Install pip differently so it doesn't clash with the system pip which may involve a wrapper script - Remove ability to specify offset when Tensor allocates the data. The data prior to offset isn't accessible by anything. - Fix use of offset in TensorOpTest to work on armv7 where it must be aligned to the type it points to. - Fix ActivationOpNoInfTest.Softsign to allow for armv7 behavior - Fix ReductionOpTest.ReduceMean_keepdims to allow for armv7 floating point inaccuracy Address PR comments	2020-10-09 22:34:32 +10:00
Pranav Sharma	974b9bfc09	Allow sharing of initializers between sessions. (#5092 ) * Allow sharing of initializers between sessions. * Allow sharing of initializers between sessions (2). * Add test for C# * Add test for C#; address PR comments * Address PR comments Moved AddInitializer logic to internal session options Added tests for owned buffer Clarified documentation Fix bug where memory info and not device was getting compared * Fix test * Fix training build * Add ver 5 end marker and ver 6 starter, add scenario and usage examples.	2020-09-21 14:09:37 -07:00
Scott McKay	59ee8ffb17	Remove SparseTensor support from minimal build. (#5114 ) * Remove SparseTensor support from minimal build. Currently the only valid usage of a SparseTensor is as an attribute of a Constant node. That would have been lifted to a dense tensor initializer when loading the onnx model, so would not exist when saving the ORT format model. Due to that there can be no SparseTensors in an ORT format model. Co-authored-by: gwang <wanggy@outlook.com>	2020-09-11 17:56:54 +10:00
Ryan Hill	3207de276c	Remove IDeviceAllocator class as it doesn't extend IAllocator in any way. (#5067 )	2020-09-10 00:46:35 -07:00
Vincent Wang	84de14a833	Register OpSet13 CUDA Kernels for BERT/UniLMv2 (#4856 ) * opset13 cuda kernels for BERT. * add opset13 SoftmaxCrossEntropyLoss. * opset13 size. * fix argmax/min for ut. * fix ut failure for argmax/min. * OrtMemTypeCPUInput Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-09-05 08:09:52 +08:00
Scott McKay	b5c2932ae8	Last major set of ORT format model changes (#5056 ) * Add minimal build option to build.py Group some of the build settings so binary size reduction options are all together Make some cmake variable naming more consistent Replace usage of std::hash with murmurhash3 for kernel. std::hash is implementation dependent so can't be used. Add initial doco and ONNX to ORT model conversion script Misc cleanups of minimal build breaks.	2020-09-05 07:59:01 +10:00
Ryan Hill	d792af776d	Remove Cuda dependency from TensorRT shared provider (#5014 )	2020-09-04 11:35:02 -07:00
gwang-msft	7ca8388dc9	[ORT Mobile] file format schema and file I/O code (#4973 ) * ort mobile file format schema and [de]serializing code	2020-09-01 11:51:31 +10:00
gwang-msft	ea5732319e	Add option ORT_NO_EXCEPTIONS to disable most exception/throw in /onnxruntime/ (#4894 ) * init no exception changes * initial test * disable exceptions * more throw handling * minor update * fix linux build break * fix windows/nuphar build break * address cr comments, move #ifdef to ORT_CATCH * address cr comments, move #ifdef to ORT_CATCH * handle return statement in ORT_CATCH * linux build break fix * addressed cr comments, remove ort_catch_end * addressed cr comments, remove ort_catch_end * move mlas to a separated ifdef flag * merge master, move some new code in master to no_exc Co-authored-by: gwang0000 <62914304+gwang0000@users.noreply.github.com>	2020-08-28 23:03:51 -07:00
Scott McKay	08eb15068c	Exclude the Map types from the build if ML ops are disabled. (#4908 ) * Exclude the Map types from the build if ML ops are disabled. They're the only ops that use Map.	2020-08-27 17:48:12 +10:00
Scott McKay	728e886bba	Add kernel def hash logic for minimal build (#4891 ) * Add hash based lookup of kernels	2020-08-23 14:39:07 +10:00
Scott McKay	db7669b225	Reduce ONNX dependency in minimal build (#4890 ) * Next round of changes. Remove inclusion of ONNX schema header Exclude custom registry related things Move IsConstantInitializer from graph_utils to Graph as it's needed in a minimal build and graph_utils is excluded.	2020-08-23 07:02:13 +10:00
Pranav Sharma	29dcfb24ab	Allow multiple sessions to share an allocator, optimize constant folding memory usage, expose arena configs. (#4813 ) * Add support for sharing allocators * Incremental update * Address some PR comments, add unit tests, add documentation. * Address PR comments, add tests and some documentation. * Fix build and test issues * Remove RegisterAllocator API restoring the OrtAllocator interface changes. Changed docs to reflect this. Also fixed the orttraining segfault. The segfault was because in the case of training session, the CPU exec prov is not available at the time the transformers are applied. Changed it to create a new one.	2020-08-22 10:03:17 -07:00
RandySheriffH	3fa73a5b6a	ReduceBinarySize (#4747 ) * cancel night build on pyop * add rewriter to rewrite cpu provider * skip BuildKernelCreateInfo<void> * refactor variable name and comment * include ops from csv file * process multiple eps * add default function to cuda provider * rename function and add license header * fix import * add doc * fix typo * deal with empty kernel entry in cuda * rename the rewriter file * add comment into provider file * add comment and rename function * log warnings * refactor extracting logic * add entry for script to run solo * add better example * avoid onnx importing * fix flake8 alerts * minor fixes to better comments and doc * add entries for all domains * add void entry into contrib providers * format cuda_contrib_kernels.cc * format cpu_contrib_kernels.cc * add all providers * add default entry to all providers * include op_kernel header * cancelling change in providers beyond cpu/cuda * rename file and switch file format to domain;opset;op1,op2... * update doc * restore non-regular ending grammar in cuda_contrib_kernels.cc * add ort_root as input argument of script * enable test in ci * update doc * update doc * revert change on linux gnu ci * switch to set to host ops * simplify trimming logic * add domain map to track current model * allow ort_root to take relative path	2020-08-21 19:50:13 -07:00
Vincent Wang	5eaac31faa	support opset13 on transformers. (#4837 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-08-19 11:13:37 +08:00
Hariharan Seshadri	ea3b4e1f8d	Fix bug in DispatchOnTensorType macro (#4808 )	2020-08-17 01:16:01 -07:00
Scott McKay	8fb743f767	Refactor Cast to reduce binary size. (#4765 ) * Refactor Cast to reduce binary size. 82.5 -> 60.8KB on Windows * Address PR comments. Fix build issue.	2020-08-13 20:43:22 +10:00
Ryan Hill	ac725b53f6	Convert TensorRT provider into a shared library (#4721 ) Lots of changes to shared library interfaces, new lighter weight design.	2020-08-10 21:17:16 -07:00
Yufeng Li	b22091dc91	Add the framework to support prepack (#4413 ) * add support of prepack * add support for QAttention and DynamicQuantizeMatMul * add an use_prepacking option * add use_prepacking in c_sharp api	2020-08-07 09:39:19 -07:00
Wei-Sheng Chin	e9d20e9dba	Revise Send and Recv (#4547 ) * Add ability to retrieve inferred shapes when executing a kernel. This ability helps Recv to know its output shapes without doing actual cummunication. Of course, if the output shapes cannot be inferred, Recv still needs to do communication to get shapes from Send. * Avoid communicating shape information when it can be inferred statically * Replace unordered_map with thread-safe wrapper. We don't want to have racing condition and undefined behavior when using parallel executor.y * Remove cout * Add missing file * Address comments * Check dim_value. -1 means missing * lock properly * Address comments (remove thread-safe map) * Remove poc header * Replace Stream with DeferredReleaseCPUPtr	2020-07-30 23:02:45 -07:00
Chi Lo	affdeb53c2	Add Python API for specifying device options. (#4205 ) * Add python API for specifying CUDA device id * Modification for providing session based python api for specifying device id * When include header file pybind11/stl.h, conversion between c++ containers and Python list, vector and dict data structure are automatically enabled. https://pybind11.readthedocs.io/en/stable/advanced/cast/stl.html# Therefore, refactor the code for better leverage this advantage. * Make struct CudaDeviceOptions as default cuda device options * Implement sess.set_providers(list_of_providers, list_of_provider_option_dicts) But still stay consistent with existing sess.set_providers(list_of_provider) * Add cuda provider option default setting * Add support for setting cuda cuda_mem_limit and arena_extend_strategy. Also resolved the merge conflict on session.py * Use python ctypes to call cuda library to help python unittest * Refine the code with reviewer's suggestions * Add the capability of getting execution provider's configuration - Once we introduced the capability to set execution provider's configuration, it makes sense to add capability of getting ep's configuration. * Modify the code with reviewer's suggestions. * Using stoull() and stoul() depends on 32/64-bits architecture. * Rewrite the testcases for testing setting CUDA device id Note: We need to make sure every ORT process be run on one CUDA device at a time. * Make sure old session object is destroyed by python gc before new session object is being created * Move testcases to original onnxruntime_test_python.py * Fix bugs to pass CI build * Make it pass CI build (cont.) * Make it pass CI build (cont.)	2020-07-21 07:28:13 -07:00
Tracy Sharpe	08235e1662	add Output() overloads (#4546 )	2020-07-19 15:21:12 -07:00
Tiago Koji Castro Shibata	2189c77e5b	static_typename (#4520 ) * Use static_typename * Disable RTTI outside of Release * Fix unused var * Add test types * PR feedback	2020-07-16 16:31:02 -07:00
edgchen1	6c7da5e9d3	Optimize CUDA Sum op kernel and refactor CUDA elementwise variadic input op kernels (#4418 ) For the special case where all variadic inputs of a kernel are the same shape (i.e. no broadcasting is required) and there are few enough of them, we perform the entire computation in a single kernel. The general implementation (which was previously used for this special case) handles broadcasting by repeatedly invoking a binary kernel on successive inputs.	2020-07-10 10:20:23 -07:00
Tixxx	b156ae4448	Support training_mode flag in eval (#4324 ) * add training_mode feed for evaluation to support opset12	2020-07-08 10:38:54 -07:00
Scott McKay	274e6b4153	Cleanup SessionState. Move allocator lookup to SessionState. (#4194 ) * Move allocators to SessionState so they're decoupled from ExecutionProviders - when looking up an allocator it's based on OrtMemoryInfo not the EP so SessionState is a more natural place for that infromation to be stored - add device based lookup - simplifies logic for copying feeds/fetches across devices Cleanup SessionState and SessionStateInitializer - provide more things to SessionState at construction time so we don't construct and instance and immediately after call a bunch of setters - simplify SessionStateInitializer - reduced down to FinalizeSessionState method	2020-06-28 14:55:42 +10:00
Scott McKay	175983c082	Move memory info into IAllocator (#2850 ) - Update IAllocator setup to move the OrtMemoryInfo to the base class instead of requiring derived classes to have that as a member and override a virtual method to return it. - Cleanup CreateAllocator setup to take an argument as to whether to wrap the device allocator in an arena allocator. The choice to do that isn't a property of the underlying device allocator. - Minor cleanups in the various EPs to adjust to the change to IAllocator and CreateAllocator, and to use the create_arena flag consistently when available.	2020-06-22 11:18:52 +10:00
Scott McKay	9790e19424	Handle mem pattern allocation failure better. Make BFCArena behavior more consistent (#4062 ) * Fixes from investigating issue running BERT-Squad model with larger batch sizes. When the batch size gets large enough the initial run will be successful (no memory pattern in use) but the second will fail to allocate the memory pattern block. The cause of this failure is that we still have the smaller blocks from the first run allocated, as BFCArena has no logic to free those. This essentially results in 2x the memory being required to run the model. There was inconsistency in BFCArena::Extend which on one path threw an exception if it couldn't do the allocation, and on another just returned false (resulting in Alloc returning a nullptr). Make the behavior consistent by always throwing if BFCArena fails to find a buffer to return. There are a huge number of places in the code where we assume Alloc returns a valid pointer so throwing will result in more correct behavior as a whole. It's also consistent with what happens when CUDA or the standard library fails to allocate memory. Next, update ExecutionFrame to check for this failure and not insert a memory block entry if it happens. With the existing code if BFCArena Alloc returned a nullptr we happily inserted that in the blocks, delaying detection of the failure to when we attempted to use the block in AllocateMLValueTensorSelfOwnBufferHelper. Finally update AllocateMLValueTensorSelfOwnBufferHelper to expect a location may not have a block. A log message will be provided when the block allocation fails so it's not necessary to have more on each individual allocation that would have used the block. Falls through to default behavior of doing a normal allocation.	2020-06-05 18:54:01 +10:00
Paul Fultz II	7759136610	Add amd migraphx execution provider to onnx runtime (#2929 ) * Add amd migraphx execution provider to onnx runtime * rename MiGraphX to MIGraphX * remove unnecessary changes in migraphx_execution_provider.cc * add migraphx EP to tests * add input requests of the batchnorm operator * add to support an onnx operator PRelu * update migrapx dockerfile and removed one unused line * sync submodules with mater branch * fixed a small bug * fix various bugs to run msft real models correctly * some code cleanup * fix python file format * fixed a code style issue * add default provider for migraphx execution provider Co-authored-by: Shucai Xiao <Shucai.Xiao@amd.com>	2020-05-27 04:24:59 +08:00
Ryan Hill	d5ec353e58	Ryanunderhill/mkldnn dll (#3314 ) First version of allowing providers to work as DLLs, only implemented for DNNL so far. More improvements to come next!	2020-05-06 00:57:09 -07:00
Changming Sun	bd78364411	Parallel all the activations ops (#3722 ) 1. Parallel all the activations ops. 2. Parallel the performance critical path of the LRN op, which makes the ONNX model zoo googlenet model runs 60% faster(latency reduced from 21ms to 13ms). 3. Make the Gemm-Activation fusion support with all the activations ops. Before this change, it only supports LeakyRelu/Relu/Sigmoid/Tanh. 4. Delete onnxruntime/test/framework/op_kernel_test.cc because the file is almost empty. 5. Remove the loggings in KernelRegistry::TryFindKernel, return Status with error message instead.	2020-05-05 01:18:17 -07:00
Tixxx	0638565fe0	Fix evaluation issues (#3538 ) * allow switching between eval and training modes dynamically Co-authored-by: Tixxx <root@525204a066204ea794f942530b05ae7f000000.axlncovkyjne5caro2tmz3zryb.xx.internal.cloudapp.net>	2020-04-28 21:03:37 -07:00
Jeff Bloomfield	f1c19f8495	merge master	2020-04-25 19:04:58 -07:00
Edward Chen	daa14b64e3	Merge remote-tracking branch 'origin/master' into edgchen1/merge_from_master	2020-04-21 03:31:32 +00:00

1 2 3 4

151 commits