Commit graph

5 commits

Author SHA1 Message Date
pengwa
a0c25e5c2f
Fix segment fault for alltoall (#12701)
* fix segment fault

* formatting
2022-08-30 11:27:14 +08:00
Vincent Wang
04f7c2deda
FP16_Optimizer Support for more Deepspeed Versions (#12046)
* fp16_optimizer for more ds versions

* change ds version

* bugfix

* fix bug
2022-06-30 18:36:17 +08:00
Justin Chu
fdce4fa6af
Format all python files under onnxruntime with black and isort (#11324)
Description: Format all python files under onnxruntime with black and isort.

After checking in, we can use .git-blame-ignore-revs to ignore the formatting PR in git blame.

#11315, #11316
2022-04-26 09:35:16 -07:00
pengwa
b125446f9c
Optimize python overhead of APEX amp (#9447)
* optimize python overhead of _post_amp_backward

* overwrite apex amp's zero_grad for faster implementation

* move unscale_fp16_grads_into_fp32_grads into C++ impl

* improve the efficiency furthur, reducing 3.5ms to 1.7ms for unilm.

* unilm 1.7ms to 338us: 1). optimize python list <==> std::vector copy, 2). launch the kernels as long as num_elem reach thresh hold. This help reduce the CUDA idel time.

* refine the logic a bit after validating

Co-authored-by: Baiju Meswani <bmeswani@microsoft.com>
2021-10-26 13:13:49 +08:00
pengwa
5ee47e3ffa
legacy_megatron-lm/deepspeed_ZERO1&2 FP16_Optimizer wrapper (#9184)
* megatron-lm FP16_Optimizer Wrap, allow model parallelism aggregation optional

* add deepspeed zero1 and zero2 - checkoverflow & clip norm

* re-structure code and add the copyright

* update the document

* refine the code after validation
2021-10-14 09:01:23 +08:00