onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-19 19:00:47 +00:00

Author	SHA1	Message	Date
Guoyu Wang	a2b551ff08	Add runtime options for NNAPI EP (#5576 ) * Add options for nnapi ep * Add nnapi flags test * add comments * Add flag comments * Make the flags bitset const * Fix build break * Add stub changes to java and c# api * Fix java related build break * Fix java build break * Switch to bit flags instead of bitset	2020-11-04 10:08:43 -08:00
Guoyu Wang	2ad7bcb766	NNAPI add opset version check (#5687 ) * nnapi add opset support	2020-11-04 21:48:00 +10:00
edgchen1	07bd4ef470	Upgrade optional implementation to https://github.com/martinmoene/optional-lite . (#5563 )	2020-11-03 15:27:47 -08:00
Changming Sun	67d7e3967d	Disable some model tests	2020-11-03 14:42:45 -08:00
Hector Li	b6eeadf420	Enable OpenVino build on Arm64 platform (#5682 )	2020-11-03 13:55:34 -08:00
Scott McKay	c9f44276da	Add ability to filter GraphViewer using IndexedSubGraph. (#5614 ) * Add ability to filter GraphViewer using IndexedSubGraph. This is to support compiling execution providers in a minimal build.	2020-11-04 07:08:18 +10:00
Changming Sun	357a51c75c	Update python packaging pipeline's docker image (#5680 )	2020-11-03 12:01:36 -08:00
Hariharan Seshadri	db9c1308a5	Fix Resize kernel registration (#5677 )	2020-11-03 10:43:41 -08:00
edgchen1	28f1e32898	Loosen tolerance of CudaKernelTest.ReduceSum_MidTensor, allow test random seed to be regenerated within a test run. (#5675 )	2020-11-03 10:37:00 -08:00
Ye Wang	a028ca41ec	Optimize flaubert (#5651 ) * optimize flaubert * fix an issue and format * revert non-relevent change * review comments	2020-11-03 09:51:42 -08:00
M. Zeeshan Siddiqui	9b010963b7	Turn off peak memory logging and fix memory pattern generation bug. (#5676 ) * Turn off peak memory log lines and fix memory pattern generation bug. * Turn off peak memory log lines and fix memory pattern generation bug. Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-03 08:44:15 -08:00
Dmitri Smirnov	5d66cf017c	Register Clip for OpSet13 (#5671 ) Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-11-03 07:07:28 -08:00
Wei-Sheng Chin	8856c2595b	Sync the two IDs in OrtMemoryInfo when calling ctor (#5663 ) * Sync the two IDs in OrtMemoryInfo when calling ctor * Also fix the same problem for output	2020-11-02 23:22:47 -08:00
Changming Sun	4936e10e22	Disable some model tests (#5664 ) These are the new models added by WinML team. But some of our EPs can't pass some of tests.	2020-11-02 22:01:35 -08:00
Tracy Sharpe	182d9c48e4	Merge u8u8/u8s8 QLinearConv implementations (#5662 ) Combine the u8u8/u8s8 implementations for x86/x64 builds and add special case handling for 1D convolutions.	2020-11-02 21:38:39 -08:00
ashbhandare	c875fe0919	Add option to dump activations on all ranks (#5455 ) * Add option to dump activations on all ranks * address review comments * review comments * Fix review comment Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-02 18:03:05 -08:00
Changming Sun	87e1063e19	Revert "Update Squeeze, Unsqueeze, Split and ReduceSum kernel for Opset13 (#5488 )" (#5668 ) This reverts commit `db63c5d10f`.	2020-11-02 16:09:22 -08:00
Tianlei Wu	2c02530603	Bert Model Profiling Tool (#5654 ) * Add profiler tool for BERT models	2020-11-02 13:47:37 -08:00
Jesse Benson	1495f737ca	Use cudaMemsetAsync and add checks on CUDA calls.	2020-11-02 11:25:13 -08:00
ashbhandare	db63c5d10f	Update Squeeze, Unsqueeze, Split and ReduceSum kernel for Opset13 (#5488 ) * Split change * ReduceSum and Split change * Other op changes, Grad builder, tests, registering required opset 13 ops * Rebase fixes * Fix tests, add some more * Review changes, rebase * Fix windows build Co-authored-by: Aishwarya <aibhanda@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-02 10:51:48 -08:00
Wenbing Li	5b44982971	Change the OrtCustomOp invocation as a constant. (#5506 ) * Chanage the OrtCustomOp invocation as a constant. * fix build on macos * build fixing	2020-11-02 10:38:07 -08:00
Derek Murray	ff538b8d3a	Minor fixes in BERT Inference notebook (#5637 ) Add missing commas to the code example.	2020-11-02 09:49:23 -08:00
Ashwini Khade	1cca903680	update onnx commit id (#5594 ) * update onnx commit id * update onnx commit for docker images * update docker images	2020-11-02 09:46:36 -08:00
M. Zeeshan Siddiqui	f2168cef29	Misc. cleanup. (#5659 ) Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-02 07:05:28 -08:00
M. Zeeshan Siddiqui	9af0d48524	Memory planner and pattern generation enhancements. (#4443 ) * static allocation. * chanegs. * contigious dynamic allocation. * contigious dynamic allocation. * fix bugs. * fix bug. * build errors. * PR feedback. * PR feedback. * Update Graph builder for nccl_allreduce, mps. * misc. * fix windows build break. * changes. * fine-grained memory-time scheduling. * merge. * fix misc stuff. * fix windows build. * fix windows build. * fix merge bug. * merge conflicts. * revert onnx-tensorrt submodule commit. * fix submodule commit. * misc. * merge conflicts. * Revert "merge conflicts." This reverts commit `319a071a6e`. * merge conflict. * merge conflict. * merge conflicts. * fixes. * PR feedback. * build break. * build break. * Add asserts. * Add asserts. * asserts. * asserts. * asserts. * asserts. * asserts. * fixes. * fixes. Co-authored-by: Ubuntu <OrtTrainingDev3@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: root <root@OrtTrainingDev3.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-11-01 23:05:46 -08:00
Maajid khan	d98062da0c	[OpenVINO-EP] Hetero support (#5627 ) * Implement Hetero in UEP * Added security checks to take valid Hetero combinations as device type * Integrating Hetero features * Get the statistics Report in Debug Mode Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Passing right device type for vadm_baackend Added simple fix to pick the right device type when using vadm_backend with Hetero as well. Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Fixed batching logic for 2020.4 and above * Fixed flake8 PEP8 errors Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor Fixes Added Added security checks for device_type passed in for Hetero build during run time code cleanup Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> * Minor changes Added Fixed batch_size bug in vadm_backend code cleanup *Documentation updated for Hetero Signed-off-by: MaajidKhan <n.maajidkhan@gmail.com> Co-authored-by: suryasidd <surya.siddharth.pemmaraju@intel.com>	2020-10-30 22:35:08 -07:00
Changming Sun	d9293f38e6	Revert "Custom Op on GPU (#5620 )" This reverts commit `2c63196600`.	2020-10-30 21:23:51 -07:00
Changming Sun	7948a4b0bc	Revert "add header (#5648 )" This reverts commit `d7f3baed18`.	2020-10-30 21:23:51 -07:00
KeDengMS	32bf6390ad	Some fixes to symbolic shape inference (#5642 ) * Some fixes to symbolic shape inference 1. Topological sort before iteration in graph 2. Fix a case in slice: start=100000, end=-100000, step=-1, dim=2 3. Fix Nuphar Gemm test's random seed 4. Slice opset 1 axes is optional	2020-10-30 19:28:47 -07:00
Hariharan Seshadri	7a80a4b526	Support more C# APIs (#5608 )	2020-10-30 19:19:50 -07:00
Zhang Lei	17bce6f07e	Implement Im2colNd NHWC and related qlinearconv logic for u8s8. (#5612 ) Implement Im2colNd NHWC and related qlinearconv logic for u8s8, and training.	2020-10-30 15:28:30 -07:00
RandySheriffH	d7f3baed18	add header (#5648 ) Co-authored-by: RandySheriffH <rashuai@microsoft.com>	2020-10-30 14:26:10 -07:00
Changming Sun	3e71e8bd7e	Revert "[CUDA EP] remove per-thread allocator (#5415 )" (#5647 ) This reverts commit `b4869926d3` because it broke our multiple GPU test pipeline.	2020-10-30 13:58:33 -07:00
RandySheriffH	2c63196600	Custom Op on GPU (#5620 ) * add case for cpu custom op on gpu * format doc * restrict GPU custom op on Linux GPU CI only * separate cu file to a independent project * fix typo Co-authored-by: RandySheriffH <rashuai@microsoft.com>	2020-10-30 12:25:44 -07:00
S. Manohar Karlapalem	aa38893afb	[OpenVINO-EP] Add Dockerfile with C# API bindings (#5633 ) * Update Dockerfile README with C# info * Add OpenVINO EP dockerfile with C# APIs	2020-10-30 11:27:15 -07:00
Weixing Zhang	aec4cb489e	ROCm EP for AMD GPU (#5480 ) The ROCm EP is designed and implemented based on AMD GPU software stack named ROCm. Here is the link for the details about ROCm: https://rocmdocs.amd.com/en/latest/ ROCm EP was created based on the following things: 1. AMD GPU programming language: HIP 2. AMD GPU HIP language runtime: amdhip64 3. BLAS: rocBLAS, hipBLAS 4. DNN: miOpen 5. Collective Communication library: RCCL 6. cub: hipCub 7. … Current status: BERT-L and GPT2 training can be ran on AMD GPU with data parallel. Next: 1. Make more GPU code be sharable between ROCm EP and CUDA EP since HIP language and HIP runtime API are very close to CUDA. 2. Continue improving the implementation. 3. Continue GPU kernel optimization. 4. Support model parallelism on ROCm EP. …… The rocm kernels have been removed from this commit and will be in a separate PR. Since the original PR was too big(~180 files), it was suggested to split the PR into two parts, one is rocm-kernels, the other is non rocm kernels. Co-authored-by: Weixing Zhang <wezhan@microsoft.com> Co-authored-by: sabreshao <sabre.shao@amd.com> Co-authored-by: anghostcici <11013544+anghostcici@users.noreply.github.com> Co-authored-by: Suffian Khan <sukha@microsoft.com> Co-authored-by: Edward Chen <18449977+edgchen1@users.noreply.github.com>	2020-10-29 17:13:04 -07:00
Dmitri Smirnov	742ffb860c	Allow Kernels refer to some attribute data directly in the protobuf (#5624 ) * Introduce OpKernelInfo GetAttrAsSpan() for floats and ints attribute proto arrays and GetAttrsStringRefs() to return a vector of string references. These new APIs allow kernels not copy attribute arrays especially if they are large and save on memory. but refer directly to data that is in AttributeProto. Modify TfIdfVectorizer to take advantage of the new API. Signed-off-by: Dmitri Smirnov <dmitrism@microsoft.com>	2020-10-29 16:12:54 -07:00
Vincent Wang	1fa1c51544	bug fix for name of gradient constant (#5626 ) Co-authored-by: Vincent Wang <weicwang@AiFramework2080ti2.corp.microsoft.com>	2020-10-30 07:08:19 +08:00
KeDengMS	b4869926d3	[CUDA EP] remove per-thread allocator (#5415 ) Now that we are using legacy default stream, which is shared among all inference threads, there is no need to have per-thread allocator. In the past, the race could happen when two threads running concurrently on GPU: thread1: allocA->copyA->computeA->freeA thread2: allocB->copyB->computeB->freeB Note that freeA/B only means the buffer is ready to be allocated on CPU, while the corresponding operation on GPU is not finished yet. It is possible for thread1/2 use the same buffer, when the alloc/free pair are not interleaved (note that alloc/free is thread-safe) If the GPU commands run in separate per-thread default stream, there's a chance that copyA/computeA are interleaved with copyB/computeB, even when the order in CPU execution is not interleaved. This would cause incorrect results if computeB uses copyA's results. By using one legacy default stream, CPU execution order would match the GPU execution order, so if A and B use the same buffer from alloc, the correpsonding copy/compute won't be interleaved. If the copy/compute is indeed interleaved, then allocA and allocB would return different buffers, thus no racing either.	2020-10-29 11:33:05 -07:00
Sergii Dymchenko	2e1fa3ccb7	Fix GeluRecompute for 2 inputs case. (#5573 ) * Add test for FastGelu + GeluRecompute. * Fix GeluRecompute for 2 inputs case. * Fix test for BiasGelu + GeluRecompute. * Copy all inputs to Gelu, not just 2. * Move GeluRecompute test to training-specific file.	2020-10-29 00:07:13 -07:00
Dwayne Robinson	b85e7a19ea	isalnum is not defined - include cctype (#5623 )	2020-10-28 23:31:34 -07:00
Changming Sun	e6956be40c	Publish no-openmp python packages to test pypi (#5610 ) Publish no-openmp python packages to test pypi	2020-10-28 19:49:53 -07:00
Tracy Sharpe	b68e98e0b0	optimize QLinearConv depthwise convolutions (#5605 )	2020-10-28 16:42:53 -07:00
liqunfu	5129b4d5bc	batch size tests (#5508 ) * batch size tests Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-28 15:55:40 -07:00
Rohith_Kvsp	50582abe93	Fix IS_ANDROID Issue (#5599 ) Fixed static IS_ANDROID detection final static IS_ANDROID is causing an Error Unsupport arch:aarch64, so removed IS_ANDROID & replaced with IS_ANDROID with isAndroid().	2020-10-28 14:42:33 -07:00
Ryan Lai	bbfd914d72	Skip new model test additions (#5611 )	2020-10-28 13:27:49 -07:00
Juliana Franco	27c6d1eeb2	move variable declaration to avoid unused variable error (#5603 ) Co-authored-by: Juliana <jufranc@microsoft.com>	2020-10-28 09:23:58 -07:00
George Wu	0dbf3e8893	enable arena for arm64 (#5613 )	2020-10-28 08:40:43 -07:00
Tim Harris	5e8952ef89	ThreadPool clean up : mm_pause in loops, correctly spin-then-wait, and adopt static methods consistently in the API (#5590 ) Description: This change makes three changes to the ThreadPool class to clean up issues identified during performance analysis and optimization. (1) It uses mm_pause intrinsics in spin loops, helping avoid consuming pipeline resources while waiting. (2) It re-organizes the spin-then-steal loop for work distribution to start out spinning as intended, rather than to start out trying to steal. (3) It updates the ThreadPool class's API to be consistent in the use of static methods for public functions. The PR includes minor doc updates and corresponding changes to test cases. Motivation and Context The change helps ensure consistency in behavior between the OpenMP and Eigen-based implementations. Unlike the instance methods, the static methods abstract over the different ways in which threading can be implemented; they will map onto the OpenMP or Eigen-based implementations when threading is used. When threading is not used they will run work sequentially.	2020-10-28 09:49:18 +00:00
liqunfu	92662659ba	Liqun/remove number matching (#5606 ) replace number matching with relaxed comparison in frontend tests Co-authored-by: liqun <liqun@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2020-10-27 21:27:37 -07:00

1 2 3 4 5 ...

3664 commits