onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-07-08 17:17:15 +00:00

Author	SHA1	Message	Date
Vincent Wang	173bcdbc71	[CUDA] Split/Concat Kernel Optimization (#12175 ) * split concat optimization * bugfix * fix ut * deprecate LooseVersion	2022-07-19 08:10:46 +08:00
Yulong Wang	ced7c2deac	[js/web] use windowed Chrome for perf mode (#12157 )	2022-07-18 14:04:27 -07:00
Tianlei Wu	b81b652608	Add --disable_shape_inference option to optimizer.py (#12215 )	2022-07-18 13:52:02 -07:00
Sean Murray	93229949d4	Fix bug where onnxruntime_USE_NCCL flag would default to ON (#12195 ) Fix bug where onnxruntime_USE_NCCL flag would default to ON, causing ORT to not build properly. New functionality: flag is ON when training is enabled and NCCL is not disabled. Flag is OFF otherwise	2022-07-18 12:13:08 -07:00
Tianlei Wu	17b84c78f7	remove identity in transformers model graph fusion (#12194 ) * remove identity in fusion	2022-07-18 09:59:42 -07:00
caoting-dotcom	4d38b84e26	Add file mapping for windows platform. (#12183 ) * Add file mapping for windows platform. * Add unit test for file mapping for windows. Also add an error message for mis-aligned offset * Add unit test for file mapping for windows. Also add an error message for mis-aligned offset * Update data type to avoid warnings * Compitable data type to avoid warnings. Update CreatFileMapping2 condition for winml compiling. * Add type conversion to avoid warnings for X86 release build. Co-authored-by: Ting Cao <ticao@microsoft.com>	2022-07-18 09:24:12 -07:00
leqiao-1	09af4a7fdd	remove wrong placed libs (#12201 )	2022-07-18 09:22:22 -07:00
Alexey Gladyshev	d31db1aa57	[TVM EP][CI] Integrate TVM EP into ORT public CI on Windows (#12161 ) * Integrate TVM EP into ORT public CI on Windows * empty commit for restart pylint * empty commit for restart pylint	2022-07-18 11:12:16 +02:00
msftlincoln	52095fb042	Fix line spacing/break issue, extend existing tests (#12191 ) * fix line length * extend test cases * lint	2022-07-15 19:32:34 -04:00
msftlincoln	a2dc6d32fc	OnnxRuntime Eager: Implement log_softmax with ONNX Ops (#12190 ) * share CHECK_STATUS * log_softmax	2022-07-15 15:03:08 -04:00
msftlincoln	9bca8405aa	bitwise_and ONNX support (#12189 ) * bitwise_and ONNX support * whitespace lint	2022-07-15 12:59:56 -04:00
Wil Brady	89bf6c9b5d	Simple eager training models (#12180 ) * Simple NN using ort, and added or modified ort op support.	2022-07-15 09:18:00 -04:00
msftlincoln	fafb24142f	add comment to explain local scalar dense (#12179 ) * add comment to explain local scalar dense * spacing	2022-07-15 09:03:43 -04:00
Viswanath Boga	05c31a036d	fixing positions for beam search gpt2 (#12156 ) * fixing positions for beam search gpt2 Co-authored-by: Tianlei Wu <tlwu@microsoft.com>	2022-07-14 13:31:59 -07:00
Wil Brady	9ebef91a6f	Update eager Readme.md (#12170 )	2022-07-14 06:05:50 -04:00
PeixuanZuo	7b53b223b8	[UPDATE] update AMD CI pipeline to Rocm5.2 with torch1.11 (#12162 ) * [UPDATE] update ci to rocm5.2 + torch1.11 * [Revert] disable ort module test * [DELETE] delete Rocm5.1.1 ci test result * [UPDATE] update the comments	2022-07-14 16:38:16 +08:00
Vincent Wang	a7eb9fe3ac	Remove Apex Dependency For Deepspeed FP16_Optimizer (#12077 ) * remove apex dependency * fix amd build	2022-07-14 11:15:53 +08:00
Wil Brady	5da1e5d36d	Eager mode: Fix some python warnings. (#12167 )	2022-07-13 20:24:42 -04:00
Maxiwell S. Garcia	51f8456c4d	ppc64le: Optimizing the MlasQLinearMulKernel() to use VSX instructions (#12051 )	2022-07-13 11:11:29 -07:00
Chen Fu	040c2f4517	x86/64 U8S8 Gemm Precision Fix (#12088 ) Add a graph optimization that convert u8s8 matrix multiplication to u8u8 if needed In x86/64 platforms, specifically SSE4.1, AVX2 and AVX512 CPUs provide better performance computing u8s8 matrix multiplications. Unfortunately, the higher performance comes with value overflow problems, as described in: https://www.intel.com/content/www/us/en/develop/documentation/onednn-developer-guide-and-reference/top/advanced-topics/nuances-of-int8-computations.html In this change we added a session option "session.x64quantprecision" (default off). For operators that calls u8s8 matrix multiplications, e.g. QAttention, we convert them to u8u8 when the following conditions are all satisfied: 1. Current CPU is SSE4.1, AVX2 or AVX512 with no VNNI support 2. Session option "session.x64quantprecision" is on. 3. Constant weight tensor contains values outside of [-64, 63] range Note that when weight tensor is not constant, QDQS8ToU8Transformer should already convert it to u8.	2022-07-13 10:12:25 -07:00
Wil Brady	48647bc7d7	Fix NonZero eager impl. (#12143 )	2022-07-13 05:50:33 -04:00
Valery Chernov	3b0aaa9e0e	[TVM EP] support build on Windows (#11851 ) * add description of build ORT+TVM EP on Windows * fix cmake error related to symlink creation on Windows * add llvm config path to build flags for correct build on Windows * update TVM_EP.md for llvm_config build arg * fix warnings skipping during build on Windows * fix using string or wstring for model path to correct build on Windows (MSVC error) * fix error in custom logger for correct build on Windows * implement glob algorithm for Windows * additional build fixes * update TVM with export of VM symbols for dll * description of nasm issue and workaround * update TVM with export of Executable from VM symbols for dll * description of installation of ipp-crypto dependencies on Windows * cmake key for ipp-crypto build * fix wstring for TVMso EP * fix ipp-crypto build * cmake key onnxruntime_TVM_USE_HASH switch off not specific methods, but full hash functionality * fix absolute path to compiled lib * update TVM_EP.md, fix lint warnings * update TVM_EP.md * small fixes after review * switch on handshake functionality for Linux workflow Co-authored-by: Valery Chernov <valery.chernov@deelvin.com> Co-authored-by: KJlaccHoeUM9l <wotpricol@mail.ru>	2022-07-13 10:48:42 +02:00
Scott McKay	75cf5dc2c9	Fix GH issue 12151 by using inverse perms for updating DQ axis attribute (#12158 ) * Fix GH issue 12151. Need to use inverse perms for updating that axis to what is used for transposing the input. This only applies if the DQ node is doing per-axis dequantization.	2022-07-13 18:02:58 +10:00
cloudhan	785f74979b	Rework cmake for kernel_explorer (#12079 ) Improve CMake for deep integration with ORT, so that we can easily hook ort function of microbenchmarking purpose.	2022-07-13 15:43:32 +08:00
PeixuanZuo	5579d81fc8	[add] Add operator gemmfastgelu for ROCM (#12101 ) * [ADD] add gemm fast gelu * [UPDATE] refunction matmul_impl * [Update] delete tuning_ in this pr * [FIX] code format * [FIX] compiler warning * [Update] update doc	2022-07-13 15:40:16 +08:00
jingyanwangms	a9d0d3323e	Use updated symbolic_helper.check_training_mode (#11900 ) Co-authored-by: Jingyan Wang, Baiju Meswani	2022-07-12 17:26:06 -07:00
RandySheriffH	178a413ca1	List 3.10 as supported python version and remove 3.6 (#12141 ) list 3.10 as supported python version and remove 3.6 Co-authored-by: Randy Shuai <rashuai@microsoft.com>	2022-07-12 15:28:30 -07:00
Adam Pocock	e0ed9f0f2f	[java] First part of the JNI error handling rewrite (#12013 ) Description: This fixes error handling in the JNI code in OnnxMap, OnnxSequence, OnnxRuntime, RunOptions. SessionOptions and OrtEnvironment are correct as is. The bulk of the work will be in rewriting OnnxTensor, OnnxSparseTensor (after the merge of #10653) and OrtSession, along with the helper methods in OrtJniUtil. I plan to tackle those in separate PRs to reduce the amount of code to review. Motivation and Context - Why is this change required? What problem does it solve? The current native interop code doesn't return control to Java immediately on throwing an exception from an ORT error code, which can cause incorrect interactions with native ORT, and issues with exception propagation on the Java side. - If it fixes an open issue, please link to the issue here. Partial work towards solving #11451.	2022-07-12 15:16:54 -07:00
msftlincoln	a6fd1a3b85	Eager mode generator improvements for multiple onnx operators and extra test cases (#12111 ) * test case for masked_select * isolate variables per onnx_op, include line numbers for ORT errors * format errors * correct masked_select impl, broadcast test * node attrs naming fixed	2022-07-12 16:05:09 -04:00
Edward Chen	6e051016c1	Add Python package to perf test pipeline. (#12135 )	2022-07-12 10:50:24 -07:00
LironKesem	9647a3be40	Add tests for all unary aten ops supported in eager mode (#12087 ) * Add tests for all uniary aten ops supported in eager mode * fixing the PR draft * fixing the merge * changing eval to be at compile time * adding requirements for eager * 1.adding function to {ops}_out 2.cleaning the code and adding comments * editing the code according to code review Co-authored-by: root <root@AHA-LIRONKESE-1>	2022-07-12 08:53:19 -04:00
Hariharan Seshadri	73310b2a0f	Fix Reduced Ops build pipeline (#12144 ) Fix ReducedOps build pipeline	2022-07-11 19:02:38 -07:00
Carson Swope	c675c4750a	include coreml_provider_factory.h in macos build instead of coreml_ex… (#12138 ) include coreml_provider_factory.h in macos build instead of coreml_execution_provider.h	2022-07-11 18:27:01 -07:00
Dwayne Robinson	742f843efc	RoiAlign CPU EP add warning for max mode with samples != 1 (#12136 ) * RoiAlign add warning about incorrect max summation when sample size not 1	2022-07-11 17:44:41 -07:00
Wil Brady	f1047e0456	Fix minor python and cpp warnings from previous PR. (#12140 ) Description: In the PR 12018 a few fixable python and cpp warning were introduced that this PR cleans up. Also adding a comment on the intent of test_mul_bool and out testing on test_ones. Motivation and Context When iterating in Python, use a list instead of a set and don't use reserved words Fix long line in cpp Clarify test_mul_bool intent for future developers. fill_ implements torch.ones under the covers but in previous pr verification on the out param was not added so adding it here.	2022-07-11 16:18:40 -04:00
Preetha Veeramalai	99a370dd02	Update readme for OVEP (#12122 ) * Add changes for training module in Readme * Update ReadMeOV.rst	2022-07-11 10:54:12 -07:00
Wil Brady	418cfdc766	Update create_ort_attribute to set the tensor dimension and value correctly. Implement eager fill_ (#12018 ) * Update create_ort_attribute to set the tensor dimension and value correctly. * Eager mode support for fill_ and mm.out (mm uses mm.out).	2022-07-11 11:18:04 -04:00
PeixuanZuo	1c39d22f4e	[ADD] Rocm5.2 for Rocm python packaging pipeline (#12129 ) [ADD] rocm5.2	2022-07-11 11:10:45 +08:00
Ashwini Khade	c6732c079b	pin protobuf version to be compatible with onnx (#12132 ) Co-authored-by: Ashwini Khade <askhade@microsoft.com@orttrainingdev10.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-07-08 15:01:27 -07:00
Yulong Wang	d45c1a144e	[js/rn] Support UINT8 type for onnxruntime-react-native on Android (#12112 ) * support uint8 for react native * add test	2022-07-08 14:07:46 -07:00
Wil Brady	c04afae9a9	Add eager ops for unary ops with out. (#12106 )	2022-07-08 12:09:26 -04:00
Jeff Bloomfield	2dd69cc3d9	Prevent unbounded growth of command allocator memory (#12114 ) Prevent unbounded growth of command allocator memory	2022-07-07 19:55:06 -07:00
Yulong Wang	3ce25db7eb	[js/rn] optimize exception message on Android (#12113 )	2022-07-07 13:26:50 -07:00
PeixuanZuo	b50239251d	[FIX] Add required variable for Rocm packaging ci pileine (#12118 ) [fix] packaging ci compiler error [FIX] pipeline variable [Frevert] fix compiler	2022-07-07 11:36:26 -07:00
zhangyaobit	a9b9c7f69f	Add autotuning support to FastGelu (#12093 ) * Add autotuning for FastGelu (Draft). * Clean up. * delete unused header file * Fix lint errors. * Add missing template parameter. * Improvements. * Fix type. * Fix namespace issue.	2022-07-06 23:17:48 -07:00
Hubert Lu	dbcf54aa41	Add hipified SkipLayerNorm code for ROCmEP (#12107 ) * First attempt for half2 vectorized memory access in SkipLayerNorm * Add some functions for debugging * Clean up the code * Clean up the code * Generalize the vectorized kernels with aligned_vector and remove cudaDeviceProp * Add a unit test for a larger input size * Fix some Lint C++ warnings * Use ILP = 4 for the vectorized kernels * Rewrite the vectorized kernel and templatize ComputeSkipLayerNorm * Use conditional operator for input_v * Refactor LaunchSkipLayerNormKernel and replace the original SkipLayerNormKernelSmall with the vectorized kernel * Clean some comments and rename the layernorm function * Use ComputeSkipLayerNorm to replace LaunchSkipLayerNormKernel * Resolve a Lint C++ warning * Fix SkipLayerNormBatch1_Float16_vec output data * Add hipified code of bert SkipLayerNorm for ROCmEP * Resolve some Lint C++ warnings * Resolve some Lint C++ warnings * Resolve some Lint C++ warnings * Resolve Python formatting issue	2022-07-06 22:13:11 -07:00
Yufeng Li	97b03fedff	check consumers of dq node before swap dq and transpose (#12099 ) * check consumers of dq node before swap dq and transpose * add unit test	2022-07-06 11:11:38 -07:00
ytaous	446f899fed	[ROCm] Temp disable AMD UT (#12105 ) temp disable UT Co-authored-by: Ethan Tao <ettao@microsoft.com@orttrainingdev7.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>	2022-07-06 11:08:26 -07:00
Edward Chen	bd76e21fb3	Add pipeline for building perf test binaries. (#12067 ) Add initial pipeline for building perf test binaries. It only builds Android binaries now but can be expanded later.	2022-07-06 09:42:49 -07:00
Wil Brady	1948b7c726	Add eager support for eq and ne ops. (#12031 ) * Add eager support for aten::eq and aten:ne. * Add generator support for resizing output param.	2022-07-06 12:39:04 -04:00

1 2 3 4 5 ...

7012 commits