onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-14 20:48:00 +00:00

Author	SHA1	Message	Date
Changming Sun	deea945f80	Remove openmp and scipy from build pipelines (#4305 ) 1. Remove openmp because the default thread pool is already good enough. 2. Remove scipy from build pipelines because it stops support python 3.5.	2020-06-23 20:18:16 -07:00
Yufeng Li	867ba846f7	Implement MinMax with SIMD (#4285 ) * Implement MinMax with SIMD	2020-06-23 20:07:53 -07:00
edgchen1	4e39fda06a	Fix version of torch and torchvision in install_deps.sh. (#4316 )	2020-06-23 14:55:18 -07:00
Bowen Bao	15cb4b3023	Fix session load state & run extra_postpasses only once (#4255 ) * Fix session load state & run extra_postpasses only once * add testcase for onnx model as well	2020-06-23 11:45:26 -07:00
Prabhat	d3c5cb6349	Use providers_available array from constants.h to avoid code duplication (#4300 )	2020-06-23 11:52:51 +05:30
edgchen1	737c22a911	Refactor Python packaging builds (#4283 ) Reuse the same template file for all Python packaging builds.	2020-06-22 17:13:22 -07:00
Tim Harris	9e3b5c62fb	Use OpenMP-like synchronization patterns in Eigen thread pool (#4236 ) Updates the thread pool implementation to make work distribution over the Eigen thread pool more closely resemble techniques used in OpenMP. In particular: (1) A thread entering a parallel loop works on the iterations itself, rather than requiring a thread switch to/from a thread in the pool, if called from outside the thread pool. (2) To support this, work items pushed to the thread pool run a loop to claim iterations from a shared counter via atomic-fetch-and-add, as opposed to having work items themselves represent individual batches of iterations. This means that any thread working on the loop can execute any batch of iterations, including having the main thread run through all of the batches itself if the loop turns out to be short-running. (3) As with OpenMP active scheduling, the worker loop spins waiting for work prior to blocking. This avoids OS blocking / wake-up paths in workloads with series of short-running parallel sections.	2020-06-22 10:04:53 +01:00
Prabhat	57fabfba7a	Added GetAvailableProviders() to C API (#4247 ) * Added GetAvailableProviders to C API * Fix API version and Windows build error * Changed function name * Changed ORT_API_VERSION to 4 * Moved all_providers array to constants.h * Move check for providers to constants.h * Changed name of array to avoid warning * Address review comment * Added unit test	2020-06-22 10:10:25 +08:00
Scott McKay	175983c082	Move memory info into IAllocator (#2850 ) - Update IAllocator setup to move the OrtMemoryInfo to the base class instead of requiring derived classes to have that as a member and override a virtual method to return it. - Cleanup CreateAllocator setup to take an argument as to whether to wrap the device allocator in an arena allocator. The choice to do that isn't a property of the underlying device allocator. - Minor cleanups in the various EPs to adjust to the change to IAllocator and CreateAllocator, and to use the create_arena flag consistently when available.	2020-06-22 11:18:52 +10:00
Yang Chen	064afa0f93	define dim_idx before use it (#4290 )	2020-06-20 21:05:13 -07:00
Pranav Sharma	2204d39a06	Add build option to disable traditional ML ops from the binary. (#4272 ) * Add build option to disable traditional ML ops from the binary. * Fix python tests by splitting tests for ML ops to a separate file. Exclude ML tests from onnx_test_runner and C# tests. Exclude ML op sources. * Update Edge pkg pipelines with new MLops env variable and fix C# packaging pipeline tests to skip ML ops.	2020-06-20 06:36:06 -07:00
alkoumpa	3c633384c2	Fix TensorRT memory leaks (#4227 ) * fix tensorrt memory leaks * wrap unique_pointer in a namespace to avoid conflicts Co-authored-by: alex <act@act.com>	2020-06-20 03:37:38 -07:00
Derek Murray	a541d28fb4	Lazily get allocator when allocating an MLValue (#4276 ) According to profiling in #4267, getting the allocator can account for a large fraction of overhead when accessing a kernel output, due to STL container operations. The allocator isn't used when (i) we're not creating a fence, and (ii) we have a memory pattern and a pre-allocated buffer, so we can avoid this overhead.	2020-06-19 15:55:43 -07:00
Yang Chen	a490beedf1	update tvm submodule (#4284 )	2020-06-19 14:51:18 -07:00
Tianlei Wu	e08181f74d	Update Bert Notebooks for ORT 1.3.0 (#4274 ) * update keras notebook * re-run pytorch bert notebook	2020-06-19 14:02:16 -07:00
Tianlei Wu	466511c1c3	Update gpt2 benchmark with position_ids and fp16 (#4275 ) * support position_ids input * support fp16 conversion for gpt2 past state * output results to csv file * Remove the useless check that output of matmul is in cuda	2020-06-19 14:01:37 -07:00
Changming Sun	0349479b19	Fix component governance and codesign validation errors (#4277 ) Adjust the job steps so that these security tasks run before the build directory clean up.	2020-06-18 15:54:18 -07:00
Hariharan Seshadri	d5610e666b	Support CUDA kernel for Einsum op (#4095 )	2020-06-18 15:03:23 -07:00
goloskokovic	478b923e19	Expose ACL/ARMNN providers to Python (#4260 ) * expose ACL/ARMNN providers to python * add -acl / -armnn to package name when use_acl / use_armnn is specified * build python wheel for ARMNN EP * link ACL/ARMNN EPs into onnxruntime_pybind11_state * wrong argument order in build_python_wheel for wheel_name_suffix	2020-06-18 20:24:14 +05:30
Changming Sun	e505faa022	Fix two compiler warnings (#4263 )	2020-06-17 20:47:01 -07:00
Tracy Sharpe	5d773ee57b	MLAS: add sgemv path for aarch64 builds (#4254 ) Implement a fast path for GEMMs where M=1 and TransB=CblasNoTrans.	2020-06-17 20:10:35 -07:00
Chih-Hsuan Yen	5da849b414	Fix detection of protobuf with onnxruntime_PREFER_SYSTEM_LIB on Linux (#4230 ) The CMake module is FindProtobuf.cmake [1]. Thus the name should be capitalized so that protobuf can be found on case-sensitive file systems. [1] https://github.com/Kitware/CMake/blob/v3.17.3/Modules/FindProtobuf.cmake	2020-06-17 17:34:47 -07:00
Changming Sun	43deec2174	Temporarily remove dnnl from Linux CI build to unblock the whole team (#4266 )	2020-06-17 16:25:24 -07:00
Vincent Wang	b41fcf1570	Bugfix for shape inference and GetShape. (#4243 ) Co-authored-by: Vincent Wang <weicwang@microsoft.com>	2020-06-17 15:11:02 +08:00
Yulong Wang	12367a6b11	[C#] enable string-typed FixedBufferOnnxValue in input (#4178 )	2020-06-16 11:06:11 -07:00
Wei-Sheng Chin	189fb60ef9	Fix a bug and add code to profile memory (#4241 ) * Fix a bug and add code to profile memory 1. Compile Send/Recv again (currently broken because of HOROVOD refactor). 2. Add code to print out initializer allocation size and activation memory size. * Address comments * Split memory counts per locations * Fix a metric	2020-06-16 10:17:27 -07:00
edgchen1	63bf587623	Use azcopy to download test data (#4221 ) Use azcopy from download_e2e_test_data.py, add helper function for downloading azcopy. Update download_test_data.py to use helper function.	2020-06-16 10:14:34 -07:00
Tianlei Wu	61fa5476d5	Update PyTorch Bert notebooks (#4239 ) update PyTorch Bert SquAD notebooks to use onnxruntim-tools and update usage of intra_op_num_threads. rename python files according to coding style Fix change_input_to_int32. update keras notebook to copy script from rel-1.3.0 branch (Will update them later)	2020-06-16 09:36:51 -07:00
Weixing Zhang	7ccce4379e	Improve fast_divmod (#4224 ) * improve fast_divmod BERT-L throughput is improved about ~1.8% * fix Win build. Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-16 03:03:58 -07:00
Changming Sun	825392c25b	Fix ORT server CI build (#4165 )	2020-06-15 21:26:19 -07:00
ytaous	5d28efd434	opset12 code cleanup (#4242 ) * opset12 code cleanup * opset12 code cleanup Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 19:45:35 -07:00
ytaous	e0334f177c	Opset12 upgrade for existing models used by perf/e2e pipelines (#4238 ) * opset12 support * opset12 support * on comments Co-authored-by: Ethan Tao <ettao@microsoft.com>	2020-06-15 14:26:53 -07:00
Ashwini Khade	4486c66ed4	enable conv transpose 3D (#4218 ) * enable convtranspose 3D * test fix	2020-06-15 13:38:32 -07:00
Bowen Bao	b08771f00e	Add ONNX Training Post-Passes to Front-End - Cont (#4041 ) * Add ONNX postpasses * add flag + add bert test from onnx file * address PR comments * fix typo * fix rebase * address comments * Fix test failures * add new pass for expand for new pt version, add comments * fix rebase Co-authored-by: lahaidar <lahaidar@OrtDevTest2v100.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-06-15 10:33:26 -07:00
Cecilia Liu	0b5bbb16b8	Benchmark With IO Binding (#4206 ) * add io binding to benchmark.py	2020-06-15 10:06:33 -07:00
Weixing Zhang	b4b1c6440a	Enable ORT with CUDA 11 toolkit (#4168 ) * ORT on CUDA 11 1. Seperate HOROVOD and MPI 2. Seperate NCCL from HOROVOD in CMakeLists.txt 2. Remove dependency on external cub 3. cudnnSetRNNDescriptor is changed in cuDNN 8.0 * polish the code about MPI/NCCL in CMakeLists.txt and build.py * check CUDA version * ${MPI_INCLUDE_DIRS} should be PUBLIC * sm30, sm50 are deprecated in CUDA 11 Toolkit * update change based on code review feedback. * add sm_52 * improve MPI/NCCL build path Co-authored-by: Weixing Zhang <wezhan@microsoft.com>	2020-06-15 08:47:03 -07:00
Emad El-Haraty	88a9cceb41	fix relative links in CONTRIBUTING.md (#4212 ) * fix a links to Engineering Design and API in CONTRIBUTING.md * fix additional links in CONTRIBUTING.md * correct the link to the public API in CONTRIBUTING.md Co-authored-by: Emad El-Haraty <emad.elharaty@limebike.com>	2020-06-15 06:48:09 -07:00
Guoliang Hua	d0d31efd86	fix transformer doc format (#4003 ) fix transformer doc format	2020-06-15 01:30:47 -07:00
Wei-Sheng Chin	ecc901717e	Use subset to release gradient tensors earlier (#4222 )	2020-06-14 22:52:54 -07:00
Andrews548	886befaba1	Add BatchNorm and Concat to ACL EP (#4190 ) * Fix acl padding * Add BatchNormalization operator to ACL Execution Provider * Add Concat operator to ACL Execution Provider Co-authored-by: Andrei-Alexandru <andrei-alexandru.avram@nxp.com>	2020-06-14 21:48:22 -07:00
Hariharan Seshadri	877862184e	Fix subgraph based reshape fusion (#4185 )	2020-06-14 21:10:08 -07:00
Tracy Sharpe	bf3c32166d	fix optional input/outputs (#4229 )	2020-06-15 08:10:22 +10:00
Hariharan Seshadri	5708c4feaf	Handle corner case in Resize op (#4183 ) * Handle corner case in Resize op * Nit * Fix build * PR feedback	2020-06-13 18:05:25 -07:00
Tracy Sharpe	7a96cfc8f5	operator code cleanup (#4228 ) Search/replace of the pattern "const auto foo = tensor.Shape()" to "const auto& foo = tensor.Shape()" to avoid unneeded copies at runtime and reduce code size (8KB drop for onnxruntime.dll). Remove some unnecessary header includes.	2020-06-13 14:47:44 -07:00
jornt-xilinx	c55f6d76be	[Vitis-AI EP] Fix to enable multi-output subgraphs inside Vitis-AI EP + edit docs (#4171 )	2020-06-13 04:56:07 -07:00
Wei-Sheng Chin	de9da123cf	Enable static memory planning for pipeline. (#4204 ) * Enable static memory planning for pipeline. 1. We fix a bug when resolving symbolic shape for scalars. 2. We pass the original inputs to all pipeline stages so that the symbolic shapes can be resolved. * Further Improvements 1. Address comments. 2. Further reduce activation size by ~50% when pipeline is on. This is done by removing all but one gradient tensor from the last RecordEvent in the backward pass. * Address a comment * Fix Windows build	2020-06-12 21:43:50 -07:00
Hariharan Seshadri	b377266eb3	Fix Mac build linker warnings (#4155 )	2020-06-12 21:10:12 -07:00
Hariharan Seshadri	91a41298cc	Fix ORT build when onnxruntime_PYBIND_EXPORT_OPSCHEMA is enabled (#3954 )	2020-06-12 19:32:57 -07:00
Tracy Sharpe	155e22d1ab	MLAS: fuse float output into quantized GEMM (#4215 ) Add more variants of MlasGemm that do a u8x8 GEMM with the output type as float. This fuses the common sequence of MatMulInteger + Cast + Mul(OutputScale) + optional Add(BiasVector).	2020-06-12 17:50:40 -07:00
Tiago Koji Castro Shibata	2e3607c7cd	Remove hardcoded desktop lib (#4193 )	2020-06-12 16:51:54 -07:00

1 2 3 4 5 ...

2738 commits