onnxruntime

mirror of https://github.com/saymrwulf/onnxruntime.git synced 2026-05-20 21:40:57 +00:00

Author	SHA1	Message	Date
pengwa	a0c25e5c2f	Fix segment fault for alltoall (#12701 ) * fix segment fault * formatting	2022-08-30 11:27:14 +08:00
Vincent Wang	04f7c2deda	FP16_Optimizer Support for more Deepspeed Versions (#12046 ) * fp16_optimizer for more ds versions * change ds version * bugfix * fix bug	2022-06-30 18:36:17 +08:00
Justin Chu	fdce4fa6af	Format all python files under onnxruntime with black and isort (#11324 ) Description: Format all python files under onnxruntime with black and isort. After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame. #11315, #11316	2022-04-26 09:35:16 -07:00
pengwa	b125446f9c	Optimize python overhead of APEX amp (#9447 ) * optimize python overhead of _post_amp_backward * overwrite apex amp's zero_grad for faster implementation * move unscale_fp16_grads_into_fp32_grads into C++ impl * improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm. * unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time. * refine the logic a bit after validating Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>	2021-10-26 13:13:49 +08:00
pengwa	5ee47e3ffa	legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184 ) * megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional * add deepspeed zero1 and zero2 - checkoverflow & clip norm * re-structure code and add the copyright * update the document * refine the code after validation	2021-10-14 09:01:23 +08:00

5 commits