pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Teng Li 5b7951057d Distributed Data Parallel Module Implementation (#8584 ) Summary: This is an initial implementation of Distributed Data Parallel module for c10d GLOO and NCCL backend. Have done performance testing and made sure that both single GPU / process and multi-GPU / process are able to overlap communication with BW computation The idea is, DDP will bucket parameters and do all reduce in the reverse order of the bucket. Since all C10D ops are async ops, no more dedicated thread is needed and we simply queue the all-reduce kernels once the bucket is ready following the deterministic reduction order. Tested with 8 nodes 64 GPUs, ResNet 50, hit the required accuracy within 90 epochs Closes https://github.com/pytorch/pytorch/pull/8584 Reviewed By: goldsborough Differential Revision: D8678696 Pulled By: teng-li fbshipit-source-id: 440341b804befc6762e92acece2759ba47157cea		2018-06-28 17:25:40 -07:00
..
c10d	Distributed Data Parallel Module Implementation (#8584 )	2018-06-28 17:25:40 -07:00
__init__.py
launch.py	Use customized python interpreter (#7520 )	2018-05-12 13:06:39 -04:00
remote_types.py