onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-16 18:31:27 +00:00

Author	SHA1	Message	Date
Hubert Lu	dbcf54aa41	Add hipified SkipLayerNorm code for ROCmEP (#12107 ) * First attempt for half2 vectorized memory access in SkipLayerNorm * Add some functions for debugging * Clean up the code * Clean up the code * Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp * Add a unit test for a larger input size * Fix some Lint C++ warnings * Use ILP = 4 for the vectorized kernels * Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm * Use conditional operator for input_v * Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel * Clean some comments and rename the layernorm function * Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel * Resolve a Lint C++ warning * Fix SkipLayerNormBatch1_Float16_vec output data * Add hipified code of bert SkipLayerNorm for ROCmEP * Resolve some Lint C++ warnings * Resolve some Lint C++ warnings * Resolve some Lint C++ warnings * Resolve Python formatting issue	2022-07-06 22:13:11 -07:00
Yufeng Li	97b03fedff	check consumers of dq node before swap dq and transpose (#12099 ) * check consumers of dq node before swap dq and transpose * add unit test	2022-07-06 11:11:38 -07:00
ytaous	446f899fed	[ROCm] Temp disable AMD UT (#12105 ) temp disable UT Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-07-06 11:08:26 -07:00
Edward Chen	bd76e21fb3	Add pipeline for building perf test binaries. (#12067 ) Add initial pipeline for building perf test binaries. It only builds Android binaries now but can be expanded later.	2022-07-06 09:42:49 -07:00
Wil Brady	1948b7c726	Add eager support for eq and ne ops. (#12031 ) * Add eager support for aten::eq and aten:ne. * Add generator support for resizing output param.	2022-07-06 12:39:04 -04:00
Edward Chen	07b0469a23	Fix unused function warning for decodeMIDR(). (#12069 ) Changed from static function defined in header to function declared in header and defined in separate .cc file.	2022-07-06 09:18:01 -07:00
Hubert Lu	835ecb264d	Leverage vectorized load/write for SkipLayerNorm (#11803 ) * First attempt for half2 vectorized memory access in SkipLayerNorm * Add some functions for debugging * Clean up the code * Clean up the code * Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp * Add a unit test for a larger input size * Fix some Lint C++ warnings * Use ILP = 4 for the vectorized kernels * Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm * Use conditional operator for input_v * Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel * Clean some comments and rename the layernorm function * Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel * Resolve a Lint C++ warning * Fix SkipLayerNormBatch1_Float16_vec output data	2022-07-05 22:28:15 -07:00
ytaous	7b8f45dd60	[ROCm] Enable build option for autograd (#11945 ) * add autograd build option * disable UTs * disable UTs * UT-step1 * UT-step1 * UT-step2 * UT-step2 * UT-step2 * UT-step2 * UT-step2 * UT-step2 * Fix UTs * increase shm * code clean up Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-07-05 18:11:29 -07:00
Dwayne Robinson	32a8751dc4	DML EP Update to DML 1.9 (#12090 ) * Update to DML 1.9 * Appease obnoxious Python formatting tool	2022-07-05 16:30:54 -07:00
Yufeng Li	3446a3750c	generate quantization parameter for outputs (#12089 )	2022-07-05 14:57:43 -07:00
Wenbing Li	479e71a7a8	enable the extensions custom build for java and android (#11823 )	2022-07-05 10:34:14 -07:00
Scott McKay	c20cbf0c97	Add undocumented attribute to disable generation of Java bindings from the Android AAR. (#12075 ) The generated bindings causes C# build errors that require workaround code. Disabling generation should avoid the need for any workarounds. As the user has the C# ORT package with the C# to C bindings there's no need for binding generation that calls the ORT Java API (which is C# -> Java ->C).	2022-07-05 10:29:32 -07:00
zhangyaobit	ddb6202df7	Add op tuning functionality and an example for vector add. (#12060 ) * Add op tuning functionality and example for vector add. * Add namespace. * Various improvements. * use unique pointer * fix lint errors * Check return error.	2022-07-03 21:12:04 -07:00
Hariharan Seshadri	df712d80ca	Add data type check in ConvAddRelu fusion (#12058 )	2022-07-01 15:31:15 -07:00
Justin Stoecker	57ac3d0a61	Disable DML command list reuse for Xbox (#12063 ) disable cl reuse for xbox	2022-07-01 13:22:35 -07:00
Jameson Miller	ae88f43550	Eager mode: structure for supporting out= operators (#12066 ) * Add utility methods for resize_output * Eager mode: implement abs.out This is an initial hand written implementation of an out= operator to demonstrate how to structure out= methods using resize_out helper methods. This is meant to be used as a reference when we update the code generator to generate implementations for out= operations.	2022-07-01 13:35:12 -04:00
Ye Wang	8dc8d44087	remove --disable_iobinding for trt ep benchmark (#12053 ) Update run_benchmark.sh	2022-07-01 10:33:35 -07:00
Gary Miguel	043816f895	Make C# runtest.sh automatically set latest opset (#12039 ) * Update C# runtest.sh for opset 17 Should have been part of https://github.com/microsoft/onnxruntime/pull/11924 * get appropriate opset version from onnx doc * use absolute rather than relative path * fix typo in var name	2022-07-01 10:12:33 -07:00
Jeff Bloomfield	02b9b12127	Fix DML custom operators which set descriptor heap to command list (#12059 )	2022-07-01 09:49:23 -07:00
Scott McKay	bfe1eca10c	Add targets files for new .net6 frameworks (#12016 ) * Add net6 targets. Remove maccatalyst as we don't have a native build targetting that. * Set platform in macos targets * Add targetFramework entries * Move NativeLib.DllName definition and set using preprocessor values for simplicity. Couldn't get it to build with the preprocessor based setup when it was in a separate file. Update the nuspec generation to set platform version for .net6 targets. TODO: Validate versions. I copied them from the managed nuget package the packaging pipeline generated prior to adding targets. Possibly w could/should lower some of the versions. Hopefully the need to specify a version goes away when the release version of VS2022 supports .net6. * Try android 31.1 as https://github.com/actions/virtual-environments/blob/main/images/win/Windows2022-Readme.md suggests that should be available on the CI machines * Fix patch version mismatch Add some extra debug info in case it helps * Debug nuget location in CI * Add workspace entry back in * Add steps * One more attempt with hardcoded nuget.exe path and original android31.0 version * Better fix - found explicit nuget download and updated version there. * flake8 fixes * Fix black complaints. * Exit Microsoft_ML_OnnxRuntime_CheckPrerequisites for net6 iOS. * Removed outdated comment	2022-07-01 09:13:55 -07:00
Jameson Miller	3e6b8d159a	Eager mode: implement resize_ operation (#12004 ) Add support for PyTorch `resize_` operation. The PyTorch API method is documented here: https://pytorch.org/docs/stable/generated/torch.Tensor.resize_.html Implementation notes: There are some implementation details that might deviate from expectations: - As the Onnxruntime::tensor does not support resize operation, this functionality is supported on the TensorImpl by swapping out the backing tensor if the size changes. - In the ORT model the shape of the TensorImpl is defined by the backing onnxruntime::tensor, so it is not supported to have a TensorImpl with a different shape / size than the backing onnxruntime::tensor. This means when resizing to a smaller TensorImpl, other implementations might keep the same backing storage, ORT will re-allocate a new onnxruntime::tensor and copy over as many of the existing elements that fit. Functionally, you will end up with same output, but the underlying buffer will be re-allocated. A future change could be to allow ORTTensorImpl to have a different size / shape than the onnxrutime::tensor backing it, and then we could improve this behavior. The canonical CPU / CUDA implementations in PyTorch repository: CPU: aten/src/ATen/native/Resize.cpp CUDA: aten/src/ATen/native/cuda/Resize.cpp	2022-06-30 22:14:37 -04:00
RandySheriffH	b858c2f725	Extend lifetime of KernelDef when creating a standalone op (#12057 ) place tmp kernel def as local variable to cover the lifetime of kernel creation	2022-06-30 17:38:59 -07:00
Hariharan Seshadri	2e27a7e330	Skip Constant Folding for ops producing an optional type output (#11839 )	2022-06-30 13:38:35 -07:00
Wil Brady	0fa2041f68	Add eager support for aten:: equal. (#12020 )	2022-06-30 15:46:14 -04:00
Wei-Sheng Chin	0ee0b8cf18	Disable sequence-type tests since C# infra doesn't support well (#12037 )	2022-06-30 09:49:03 -07:00
zhangyaobit	da133ad3d8	Add FastGelu to kernel explorer for profiling. (#11995 ) * Add FastGelu to kernel explorer for profiling. * fix python lint errors * Fix one more python lint error * Delete white space (python lint) * Various improvements. * Update README.md * refactor header files	2022-06-30 07:35:43 -07:00
Wil Brady	fdf12a5c35	Fix windows eager build break by pinning to torch version 1.11.0 (#12033 ) Fix windows and linux eager build to torch 1.11.0.	2022-06-30 07:01:13 -04:00
Vincent Wang	04f7c2deda	FP16_Optimizer Support for more Deepspeed Versions (#12046 ) * fp16_optimizer for more ds versions * change ds version * bugfix * fix bug	2022-06-30 18:36:17 +08:00
Tianlei Wu	ecca6f4d16	Move beamsearch shared initializers from subgraphs to main graph (#12025 ) * move shared initializers to parent graph * add --disable_shared_initializers	2022-06-29 22:43:41 -07:00
zhijxu	9f260fb60f	resolve comments	2022-06-30 11:26:13 +08:00
zhijxu	100aebbd26	resolve comments	2022-06-30 11:26:13 +08:00
zhijxu	2295b24cd5	support optimizer opt for deepspeed 0.5.9	2022-06-30 11:26:13 +08:00
George Wu	102d01b206	update roialign cuda impl to onnx opset16 (#12036 ) * roialign opset16 * fix * fix	2022-06-29 17:32:59 -07:00
Yi-Hong Lyu	c8cd36da01	Resize optimization for all architectures (#11956 ) With this patch, it optimizes Resize when the input X is 4D int8/uint8 tensor and the mode is linear by: * Transforming NCHW Resize to NHWC variant * Using the NHWC Resize kernel without floating-point computation It improves DeepLab V3 with uint8 quantization by 19% on X64. It also improves Resize of DeepLab V3 with int8 quantization by 15%~18% on X64.	2022-06-29 09:19:19 -07:00
Chun-Wei Chen	4eb54ff9a5	Add warning about future computation change for ConvTranspose with auto_pad (#11984 ) * Add warning about future computation change for Convtranspose with auto_pad * improve msg * update TODO to make lint happy * update more contents for warning and add if * valid was not infected * move it into kernel registration * parse auto_pad myself * try to use conv_transpose_attrs_.auto_pad directly	2022-06-29 06:53:31 -07:00
Valery Chernov	8ba8146650	[TVM] handshake mechanism for support of TVMso EP (#11437 ) * infrastructure for handshake mechanism was implemented. sha256 was selected as first hash algorithm * check hash during compile in TVMso EP * add IPP-CRYPTO to external dependencies for TVM EP * made checkHash method constant * removed the public implementation of the SHA-256 algorithm so as not to cause a license conflict * implemented SHA-256 calculation using ipp-crypto library * fix dependency for ipp-crypto * add provider options for hash check * update documentation for added provider options * add hash check condition * fix docs * fix lint * fix ORT_THROW Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2022-06-29 14:57:18 +02:00
dependabot[bot]	c0dd9be7ba	Bump electron from 13.6.6 to 15.5.5 in /js/web (#11884 ) Bumps [electron](https://github.com/electron/electron) from 13.6.6 to 15.5.5. - [Release notes](https://github.com/electron/electron/releases) - [Changelog](https://github.com/electron/electron/blob/main/docs/breaking-changes.md) - [Commits](https://github.com/electron/electron/compare/v13.6.6...v15.5.5) --- updated-dependencies: - dependency-name: electron dependency-type: direct:development ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-28 15:50:44 -07:00
Yosshi999	0702364d7a	[js/web][bugfix] fix negative axes for unsqueeze (#11944 ) [js/web] fix negative axes for unsqueeze	2022-06-28 11:28:35 -07:00
Tianlei Wu	9be2b6046b	convert_beam_search supports large gpt2 model (#11989 ) (1) add --run_shape_inference to make shape inference optional (2) add --vocab_mask to make the input optional (3) add --overwrite in gpt2 convert_to_onnx to allow overwrite existed raw onnx from PyTorch (4) save gpt2 model tensors to one external data file by default (5) group convert_beam_search arguments to multiple groups (6) make --decoder_onnx optional for gpt2 model (7) replace print by logger (8) update shape inference function to support external data. (9) when saving external data, show warning if onnx version < 1.12	2022-06-28 10:02:35 -07:00
sumitsays	4552dd38c6	[DML EP] Pad operator: Handle negative pad counts (#11974 ) * Pad fallback to CPU * Added queryPad in operatorRegistration.cpp * Acknowledged PR comments * Used any_of * used none_of instead of any_of Co-authored-by: Sumit Agarwal <sumitagarwal@microsoft.com>	2022-06-28 00:41:57 -07:00
RandySheriffH	d5fcb432fa	Generalize native op creation (#11539 ) * create op from ep * read input count from context * create holder to host nodes * fix typo * cast type before comparison * throw error on API fail * silence warning from minimal build * switch to unique_ptr with deleter to host nodes * fix typo * fix build err for minimal * fix build err for minimal * add UT for conv * enable test on CUDA * add comment * fix typo * use gsl::span and string view for Node constructor * Added two APIs - CopyKernelInfo and ReleaseKernelInfo * pass gsl::span by value * switch to span<NodeArg* const> to allow for reference to const containers * fix typo * fix reduced build err * fix reduced build err * refactoring node construction logic * rename exceptions * add input and output count as arguments for op creation * refactor static member * use ORT_CATCH instead of catch * cancel try catch * add static value name map * format input definition and set err code * fix comments * fix typo	2022-06-27 21:12:15 -07:00
Dwayne Robinson	fc0143fe68	DML EP ResNet50 opset 15 fails in ONNX checker for FusedBatchNormalization lacking training_mode attribute (#12010 ) FusedBatchNormalization include training_mode attribute	2022-06-27 19:41:34 -07:00
Edward Chen	f045994389	[NNAPI EP] Update NNAPI headers (#11954 ) Update the NNAPI headers to a more recent version (copied from TF Lite v2.9.1).	2022-06-27 18:54:06 -07:00
Edward Chen	466b2d9f3d	[C# Tests] Add support for double tensor output in TestPreTrainedModels. (#12008 ) Add support for double tensor output in TestPreTrainedModels.	2022-06-27 18:49:19 -07:00
Sheil Kumar	7d712c8f8b	Fix WinML Tests are still targetting deprecated (deleted) experimental signal op definitions (#12006 ) * fix winml tests * remove legacy test * switch idft -> dft+inverse attr * upgrade opset 13->17 for signal ops tests	2022-06-27 16:35:50 -07:00
Yulong Wang	bd973bcf1e	[js/rn] upgrade dependencies for e2e test (#11863 ) * [js/rn] upgrade dependencies for e2e test * use JDK11 only for gradle * expand variable	2022-06-27 14:56:49 -07:00
Dwayne Robinson	8cd02508c8	Include opset 15 in Conv+BatchNormalization fusion (#11960 )	2022-06-27 10:59:14 -07:00
dependabot[bot]	68afa2d362	Bump async from 2.6.3 to 2.6.4 in /js/react_native/e2e (#11280 ) Bumps [async](https://github.com/caolan/async) from 2.6.3 to 2.6.4. - [Release notes](https://github.com/caolan/async/releases) - [Changelog](https://github.com/caolan/async/blob/v2.6.4/CHANGELOG.md) - [Commits](https://github.com/caolan/async/compare/v2.6.3...v2.6.4) --- updated-dependencies: - dependency-name: async dependency-type: indirect ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>	2022-06-27 10:30:01 -07:00
George Nash	9583841ef7	Improve performance of BiasGelu on oneDNN execution provider (#11935 ) Improve performance of BiasGelu on OneDNN execution provider This modifies how BiasGelu is handled by the OneDNN execution provider by executing the gelu_erf primitive as a postop of the binary_add primitive. Also fixes extra data copies made when running on GPU. Signed-off-by: George Nash <george.nash@intel.com>	2022-06-27 08:34:11 -07:00
Scott McKay	f72288b453	Fix a couple of typos (#11943 ) Fix couple of typos	2022-06-27 10:32:14 +10:00

1 2 3 4 5 ...

6967 commits