onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-31 23:27:43 +00:00

Author	SHA1	Message	Date
Ryan Hill	1b953c6423	Fix some code defects (#9810 )	2021-11-19 15:48:15 -08:00
Sergii Dymchenko	ba339e667b	Add training performance investigation script (ONNX graph analyzer) (#9791 ) * Add first version of performance investigation script. * Simplify and update performance investigation script.	2021-11-19 13:27:00 -08:00
Tang, Cheng	fcc167dd47	fix reshape implementation in eager mode (#9741 )	2021-11-18 19:26:49 -08:00
satyajandhyala	3af14fc554	Updated SoftmaxGrad and LogSoftmaxGrad to support version 13. (#9733 ) * Updated SoftmaxGrad_13/LogSoftmaxGrad_13 to support version 13.	2021-11-18 17:39:16 -08:00
Vincent Wang	3654a5d60e	Register Custom Symbolic of torch.einsum for ORTModule (#9590 ) * register custom symbolic for einsum * bugfix for case needs permute at the end * refactor * refactor equation parser * support new case, use ReduceProd * optimize perf and graph * remove some Gather node * add more ut, fix gemm trans fusion	2021-11-18 10:13:58 +08:00
satyajandhyala	421e4c03ce	Update default cast propagation strategy from None to FloodFill (#9713 ) * Changed the default cast propagation strategy from None to FloodFill.	2021-11-16 13:15:57 -08:00
Edward Chen	9acbfeba09	Address some code scan issues. (#9752 )	2021-11-16 10:24:46 -08:00
Tang, Cheng	99257eb8e3	support build option to include external graph transformers (#9478 ) * temp code * support external graph transformer from build script * remove debug code * add test case * support register rewrite rule * fix source_group issue if external source is not share any common prefix * fix python code style checker * resolve merge conflict Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-11-15 08:16:20 -08:00
pengwa	6e09fc5152	Implement block wise softmax for reduction dimention > 1024 cases. (#9696 ) * implement block wise softmax for reduction dimention > 1024 cases. * fix builds * fix * fix amd build * fix amd build * fix win-gpu build * add tests * remove cudnn path/add python tests	2021-11-14 11:47:58 +08:00
Aidan Beggs	f6edf13513	Implement a Gemm/Sum fusion pattern (#9699 ) When the pattern Sum(Gemm(A, B), C) exists, we can convert it to Gemm(A, B, C), assuming that C the output of the original Gemm is not used elsewhere, and this change does not break broadcasting.	2021-11-11 18:33:13 -08:00
George Wu	1541784f6c	[python api] align api with other language bindings' treatment of explicit provider registrations. enforce use of providers param in python InferenceSession when execution providers other than default CPU are enabled. (#9712 ) * remove default python ep registration. raise exception if providers are not explicitly set if there are available providers * temporarily disable exception * fix python tests * explicitly set CUDAProvider for python iobinding tests * explicitly set providers param for InferenceSession()) * onnxrt * raise ValueError if not explicitly set providers when creating InferenceSession * add required providers param * explicitly set providers * typo	2021-11-10 12:17:53 -08:00
Vincent Wang	adf98feb2c	ATenOp Support for BCEWithLogitsLoss (#9670 )	2021-11-10 08:36:57 +08:00
Wei-Sheng Chin	bdc279a7ed	Use the same allocator following Pytorch (#9697 ) * Use the same allocator following Pytorch * Polish * Fix AMD build	2021-11-09 11:25:16 -08:00
satyajandhyala	229c9a4e1c	Added Trilu CUDA kernel. (#9633 ) * Added Trilu CUDA kernel. * Added TriluGrad. * Added a training testcase for Trilu. * Added Trilu gradient checker test.	2021-11-09 11:20:17 -08:00
mindest	c579ebfbc3	change a for iteration (#9678 ) Co-authored-by: Min Lin <linmin@microsoft.com>	2021-11-09 08:33:50 +08:00
Ryan Hill	24e35fba32	Change TensorShape to typically not allocate heap memory (#9542 )	2021-11-08 10:29:54 -08:00
Xavier Dupré	7e207ba3be	Use ORTMODULE_ONNX_OPSET_VERSION to modify the opset version in OrtModule (#9529 ) * Use environment variable to change the ONNX opset in ORTModule * overwrite ONNX_OPSET_VERSION * store envvar in module constant	2021-11-08 17:03:16 +01:00
ashari4	1151c661eb	Add gi overload (#9690 )	2021-11-07 16:04:00 -08:00
Edward Chen	ddb4c05852	Save graph runtime optimizations for minimal build (#9508 ) Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.	2021-11-04 10:49:46 -07:00
baijumeswani	230099e482	Make ORTModule serializable (#9634 )	2021-11-03 13:54:05 -07:00
groenenboomj	5c56fa0def	Miopen conv grad (#9574 ) * Add source for conv_grad * Add sources for ROCm EP. * Transliterate sources for conv_grad for ROCm EP. * Add conv_grad to ROCm EP Add conv_grad to ROCm execution provider. * Update ROCm EP ConvGrad Update ConvGrad for the ROCm EP to match other EP changes and fix a build issue.	2021-10-31 11:19:46 -07:00
Hariharan Seshadri	b5f7bb7d10	Update ONNX (#9462 )	2021-10-29 10:33:40 -07:00
Xavier Dupré	9c15c68ed4	Enable fallback when forward fails due to non contiguous tensor (#9369 )	2021-10-28 13:04:54 -07:00
Thiago Crepaldi	5d5c03bcdc	Fix opset version change by not using copy of global constant (#9393 )	2021-10-27 12:42:06 -04:00
satyajandhyala	f29057c7c0	Added TanhGrad. (#9507 ) * Added TanhGrad.	2021-10-26 09:10:03 -07:00
pengwa	b125446f9c	Optimize python overhead of APEX amp (#9447 ) * optimize python overhead of _post_amp_backward * overwrite apex amp's zero_grad for faster implementation * move unscale_fp16_grads_into_fp32_grads into C++ impl * improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm. * unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time. * refine the logic a bit after validating Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2021-10-26 13:13:49 +08:00
ashbhandare	0270ff7951	Minor import fix (#9538 )	2021-10-25 21:29:31 -07:00
Vincent Wang	fb4f7dbbb7	Call ATenOp for ReduceSum on ORTModule (#9471 ) * call ATenOp for ReduceSum * Enable ReduceSum ATenOp for training only * always load extension	2021-10-26 09:48:57 +08:00
Sherlock	3ed8ade675	Use SafeInt for malloc related computation (#9503 ) * Use SafeInt for malloc related computation	2021-10-22 16:42:12 -07:00
Wei-Sheng Chin	beddbdec5a	Fix PythonOp exporter (#9318 ) Register PythonOp exporter with the right symbol.	2021-10-22 10:45:45 -07:00
Wei-Sheng Chin	d2d480a0db	Allow None As Autograd Context (#9315 ) * Allow none ctx * Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py Co-authored-by: pengwa <pengwa@microsoft.com> * Address a comment Co-authored-by: pengwa <pengwa@microsoft.com>	2021-10-21 20:37:36 -07:00
Jeff Daily	66ceb6926d	rehipify ROCm EP files under orttraining (#9443 ) * rehipify rocm ep files under orttraining committed to source control * fix flake8 error	2021-10-21 13:36:21 -07:00
Xavier Dupré	5797bd6db3	Remove one unnecessary deepcopy in unflatten_user_output (#9353 ) * Removes one unnecessary deepcopy	2021-10-21 10:44:27 +02:00
Nick Kreeger	f1123c2fb3	Fix whitespace and style in concat.cc (#9452 )	2021-10-20 12:43:46 -05:00
Changming Sun	406f1629c1	Remove Featurizers code (#9300 )	2021-10-20 10:20:35 -07:00
baijumeswani	20eaed43e5	Ignore all string inputs to ORTModule AB#1310803 (#9344 )	2021-10-19 16:34:47 -07:00
baijumeswani	757bc66720	Set cuda version to be None instead of an empty string (#9435 )	2021-10-19 11:10:52 -04:00
baijumeswani	5da4e07daa	Make FusedAdam mathematically equivalent to Transformers AdamW (#9343 )	2021-10-18 16:03:18 -07:00
pengwa	f05c285a58	Exception when duplicated autograd.Function name detected (#9351 ) * Exception when duplicated autograd.Function name detected * reorder a bit for a bittle bit better perf * fix a bug in previous PR :( * correct the error message a bit	2021-10-15 12:23:13 +08:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Abhishek Jindal	23700a15a0	Abjindal/eager windows build (#9326 ) * removing warnings which are causing errors from torch and changing flags for Windows * adding MKL library resolution and comments * cleaning up the code * fixing onnxruntime_python file for windows build * fix the include order to aovid the python_d.lib issue on win debug build * changes for warnings, typos and other comments * merge conflict * adding fix for mkl library error * Revert "adding fix for mkl library error" This reverts commit `73b87c73c2`. * fix for dll path for windows * typo for dll path Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-10-14 12:54:49 -07:00
Xavier Dupré	22e3f8bf54	Refactor TrainingManager.forward (#9354 ) * Refactor TrainingManager.forward	2021-10-14 12:54:31 +02:00
pengwa	5ee47e3ffa	legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184 ) * megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional * add deepspeed zero1 and zero2 - checkoverflow & clip norm * re-structure code and add the copyright * update the document * refine the code after validation	2021-10-14 09:01:23 +08:00
Chandru Ramakrishnan	ba0cca96f0	Hooked up eager logging to ORT default logger. (#9340 ) * Hooked up eager logging to ORT default logger.	2021-10-13 18:10:32 -04:00
Tang, Cheng	f0bc35c4ba	fix a hardcode type (#9337 )	2021-10-12 13:44:46 -07:00
Tang, Cheng	48737091c0	resolve the provider options before create training session in orttrainer (#9199 ) * resolve the provider options before create training session in orttrainer * Update orttraining/orttraining/python/orttraining_pybind_common.h Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * support clear the training ep instance pool * fix status error Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-10-12 09:30:45 -07:00
ashbhandare	52c021d1f3	Fix export of aten op for Max and Avg Pool 2D (#9330 )	2021-10-12 09:03:14 -07:00
Edward Chen	79e736ed25	Make onnxruntime::Status nodiscard (#9279 ) Mark onnxruntime::Status class with [[nodiscard]] attribute. Fix existing warnings.	2021-10-08 17:10:31 -07:00
satyajandhyala	29379db432	Added SigmoidGrad schema and kernels. (#9244 ) * Added SigmoidGrad schema and kernels. * Added test_sigmoid_grad function.	2021-10-08 11:03:28 -07:00
Tang, Cheng	68601fc296	error handling ffor eager mode's data transfer (#9261 )	2021-10-07 17:16:33 -07:00

1 2 3 4 5 ...

834 commits