onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-31 23:27:43 +00:00

Author	SHA1	Message	Date
Edward Chen	ddb4c05852	Save graph runtime optimizations for minimal build (#9508 ) Add support for saving graph runtime optimizations in an ORT format model. The idea is to allow some optimizations to be "replayed" at runtime in a minimal build. The replaying part will be in a future change.	2021-11-04 10:49:46 -07:00
baijumeswani	230099e482	Make ORTModule serializable (#9634 )	2021-11-03 13:54:05 -07:00
groenenboomj	5c56fa0def	Miopen conv grad (#9574 ) * Add source for conv_grad * Add sources for ROCm EP. * Transliterate sources for conv_grad for ROCm EP. * Add conv_grad to ROCm EP Add conv_grad to ROCm execution provider. * Update ROCm EP ConvGrad Update ConvGrad for the ROCm EP to match other EP changes and fix a build issue.	2021-10-31 11:19:46 -07:00
Hariharan Seshadri	b5f7bb7d10	Update ONNX (#9462 )	2021-10-29 10:33:40 -07:00
Xavier Dupré	9c15c68ed4	Enable fallback when forward fails due to non contiguous tensor (#9369 )	2021-10-28 13:04:54 -07:00
Thiago Crepaldi	5d5c03bcdc	Fix opset version change by not using copy of global constant (#9393 )	2021-10-27 12:42:06 -04:00
satyajandhyala	f29057c7c0	Added TanhGrad. (#9507 ) * Added TanhGrad.	2021-10-26 09:10:03 -07:00
pengwa	b125446f9c	Optimize python overhead of APEX amp (#9447 ) * optimize python overhead of _post_amp_backward * overwrite apex amp's zero_grad for faster implementation * move unscale_fp16_grads_into_fp32_grads into C++ impl * improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm. * unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time. * refine the logic a bit after validating Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2021-10-26 13:13:49 +08:00
ashbhandare	0270ff7951	Minor import fix (#9538 )	2021-10-25 21:29:31 -07:00
Vincent Wang	fb4f7dbbb7	Call ATenOp for ReduceSum on ORTModule (#9471 ) * call ATenOp for ReduceSum * Enable ReduceSum ATenOp for training only * always load extension	2021-10-26 09:48:57 +08:00
Sherlock	3ed8ade675	Use SafeInt for malloc related computation (#9503 ) * Use SafeInt for malloc related computation	2021-10-22 16:42:12 -07:00
Wei-Sheng Chin	beddbdec5a	Fix PythonOp exporter (#9318 ) Register PythonOp exporter with the right symbol.	2021-10-22 10:45:45 -07:00
Wei-Sheng Chin	d2d480a0db	Allow None As Autograd Context (#9315 ) * Allow none ctx * Update orttraining/orttraining/test/python/orttraining_test_ortmodule_autograd.py Co-authored-by: pengwa <pengwa@microsoft.com> * Address a comment Co-authored-by: pengwa <pengwa@microsoft.com>	2021-10-21 20:37:36 -07:00
Jeff Daily	66ceb6926d	rehipify ROCm EP files under orttraining (#9443 ) * rehipify rocm ep files under orttraining committed to source control * fix flake8 error	2021-10-21 13:36:21 -07:00
Xavier Dupré	5797bd6db3	Remove one unnecessary deepcopy in unflatten_user_output (#9353 ) * Removes one unnecessary deepcopy	2021-10-21 10:44:27 +02:00
Nick Kreeger	f1123c2fb3	Fix whitespace and style in concat.cc (#9452 )	2021-10-20 12:43:46 -05:00
Changming Sun	406f1629c1	Remove Featurizers code (#9300 )	2021-10-20 10:20:35 -07:00
baijumeswani	20eaed43e5	Ignore all string inputs to ORTModule AB#1310803 (#9344 )	2021-10-19 16:34:47 -07:00
baijumeswani	757bc66720	Set cuda version to be None instead of an empty string (#9435 )	2021-10-19 11:10:52 -04:00
baijumeswani	5da4e07daa	Make FusedAdam mathematically equivalent to Transformers AdamW (#9343 )	2021-10-18 16:03:18 -07:00
pengwa	f05c285a58	Exception when duplicated autograd.Function name detected (#9351 ) * Exception when duplicated autograd.Function name detected * reorder a bit for a bittle bit better perf * fix a bug in previous PR :( * correct the error message a bit	2021-10-15 12:23:13 +08:00
Jeff Daily	c8789d3047	[ROCm] static re-hipify of CUDA EP to ROCm EP, now a shared provider (#8877 ) * re-hipify all rocm EP sources * fix all other files affected by re-hipify * add cuda_provider_factory.h to amd_hipify.py * do not use cudnn_conv_algo_search in ROCm EP, missing reduce min registration * Fix ReduceConsts template specialization introduced in #9101. Fixes the error when building for ROCm 4.3.1: error: too many template headers for onnxruntime::rocm::ReduceConsts<__half>::One (should be 0) * fix flake8 error in amd_hipify.py * speed up hipify with concurrent.futures * flake8 fix in amd_hipify.py	2021-10-14 15:15:51 -07:00
Abhishek Jindal	23700a15a0	Abjindal/eager windows build (#9326 ) * removing warnings which are causing errors from torch and changing flags for Windows * adding MKL library resolution and comments * cleaning up the code * fixing onnxruntime_python file for windows build * fix the include order to aovid the python_d.lib issue on win debug build * changes for warnings, typos and other comments * merge conflict * adding fix for mkl library error * Revert "adding fix for mkl library error" This reverts commit `73b87c73c2`. * fix for dll path for windows * typo for dll path Co-authored-by: Cheng Tang <chenta@microsoft.com>	2021-10-14 12:54:49 -07:00
Xavier Dupré	22e3f8bf54	Refactor TrainingManager.forward (#9354 ) * Refactor TrainingManager.forward	2021-10-14 12:54:31 +02:00
pengwa	5ee47e3ffa	legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184 ) * megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional * add deepspeed zero1 and zero2 - checkoverflow & clip norm * re-structure code and add the copyright * update the document * refine the code after validation	2021-10-14 09:01:23 +08:00
Chandru Ramakrishnan	ba0cca96f0	Hooked up eager logging to ORT default logger. (#9340 ) * Hooked up eager logging to ORT default logger.	2021-10-13 18:10:32 -04:00
Tang, Cheng	f0bc35c4ba	fix a hardcode type (#9337 )	2021-10-12 13:44:46 -07:00
Tang, Cheng	48737091c0	resolve the provider options before create training session in orttrainer (#9199 ) * resolve the provider options before create training session in orttrainer * Update orttraining/orttraining/python/orttraining_pybind_common.h Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com> * support clear the training ep instance pool * fix status error Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-10-12 09:30:45 -07:00
ashbhandare	52c021d1f3	Fix export of aten op for Max and Avg Pool 2D (#9330 )	2021-10-12 09:03:14 -07:00
Edward Chen	79e736ed25	Make onnxruntime::Status nodiscard (#9279 ) Mark onnxruntime::Status class with [[nodiscard]] attribute. Fix existing warnings.	2021-10-08 17:10:31 -07:00
satyajandhyala	29379db432	Added SigmoidGrad schema and kernels. (#9244 ) * Added SigmoidGrad schema and kernels. * Added test_sigmoid_grad function.	2021-10-08 11:03:28 -07:00
Tang, Cheng	68601fc296	error handling ffor eager mode's data transfer (#9261 )	2021-10-07 17:16:33 -07:00
ytaous	7166586d7e	Enable SkipCheck by default (#9215 ) * Enable SkipCheck by default * fix UTs * fix UT * fix UTs * fix UTs * address comments * fix UT * enable skipchecks * move _SkipCheck back * move _SkipCheck back * move _SkipCheck back * Update orttraining/orttraining/python/training/ortmodule/_inference_manager.py * Update orttraining/orttraining/python/training/ortmodule/_utils.py Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net> Co-authored-by: Thiago Crepaldi <thiago.crepaldi@microsoft.com>	2021-10-07 15:47:14 -07:00
Tang, Cheng	c002dc86a3	set mpi group init flag after add group (#9293 )	2021-10-07 10:09:16 -07:00
Thiago Crepaldi	52d067402a	Fix all-or-nothing fallback for bad ORTModule init (#9277 ) * Fix all-or-nothing fallback for bad ORTModule init * Address comments	2021-10-06 15:12:27 -04:00
baijumeswani	bcdb411c8d	Implement FusedAdam for ORT adapted from DeepSpeed (#9266 )	2021-10-05 20:50:34 -07:00
ashbhandare	35c2102cfa	Fixes for GatherND, Multinomial (#9143 ) * register gathernd kernel, aten multinomial * fix CI, add test * review comments	2021-10-05 14:51:58 -07:00
G. Ramalingam	0b77c9ca7c	Cleanup function definitions of contrib ops (#9265 ) * Simplify function definitions * Simplify fast-gelu function definition * Simplify training function op body definitions Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Eliminate redundant function Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Formatting changes Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Minor formatting changes Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Add comment Signed-off-by: Ganesan Ramalingam <grama@microsoft.com> * Specify int64 type for constant 1 Signed-off-by: Ganesan Ramalingam <grama@microsoft.com>	2021-10-05 11:38:42 -07:00
Thiago Crepaldi	6e2f66ee9c	Allow custom exporter args + bug fix (#9242 )	2021-10-04 11:32:42 -04:00
baijumeswani	45399d5ace	Remove TORCH_WARN to avoid torch string related operations that take up time (#9238 )	2021-10-01 13:56:04 -04:00
Tang, Cheng	be4d887439	Fix ONNX exporter call with latest API for ORTrainer (#9228 ) * update the exporter call with latest api in orttrainer * use official export api instead of the private call	2021-10-01 13:49:55 -04:00
G. Ramalingam	e79be39081	LayerNormGrad function body and LayerNorm inference/body fix (#9160 ) * Add function body for LayerNormGrad * Fix LayerNorm schema for multiple normalization dims	2021-09-30 12:03:08 -07:00
Thiago Crepaldi	ceb51dda4a	Support external torch cpp extensions on ORTModule (#9223 )	2021-09-30 10:37:35 -04:00
satyajandhyala	278928a102	Added a test case for python gradient builder. (#9207 ) * Register Cos operator gradient using ORTModule's register_gradient and compare gradient against PyTorch.	2021-09-29 09:24:12 -07:00
Suffian Khan	6f580f07de	Switch AMD CI pipeline to use environment image from onnxruntimecibuildenvironment (#9206 ) * shift docker image reference for amd ci pipeline * fix service endpoint * reduce perf tolerance	2021-09-28 13:06:16 -07:00
ytaous	d3f859fe30	Dropout Vectorized Kernel (#9157 ) * vectorized kernel * fix build * re-calibrate expected loss * fix build * re-calibrate convergence results * more re-calibrate on loss * divide kernels * adress comments * more calibration * calibration * per comments * enable sync Co-authored-by: Ethan Tao <ettao@OrtTrainingDev4.af05slrtruoetgaxwwjv5nsq5e.px.internal.cloudapp.net>	2021-09-27 17:19:12 -07:00
Wei-Sheng Chin	1b0816859f	Only wrap sub-modules which can be wrapped as ORTModule (#9021 )	2021-09-27 17:18:22 -07:00
baijumeswani	c30cc9190a	Change the agent pool for orttraining-distributed pipeline (#9179 )	2021-09-26 21:26:44 -07:00
baijumeswani	fd91bf91c9	Print full stacktrace exception when exporter fails (#9169 )	2021-09-24 10:24:37 -04:00
Vincent Wang	39dc6ea8a3	Fix to_dlpack Failure on PyTorch-1.10 (#9151 ) * workaround to_dlpack fail in new pt version * add torch code link	2021-09-24 09:48:07 +08:00

1 2 3 4 5 ...

816 commits