pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Ke Wen effc545274 [DDP] Use NCCL allocated memory for gradient bucket (#146589 ) So that NVLink SHARP comes with zero-copy on H100+ platforms, for DDP applications. Less SM usage, less memory contention between NCCL kernel and compute kernels. Added env `DDP_DISABLE_COMM_MEM` as a back-out option: ``` An environment variable to disable comm-optimized memory pool. Default is 0, which means comm-optimized memory pool is enabled. Users can set it to 1 in case of seeing regression or OOM (because this comm MemPool may not share space with regular compute MemPool). ``` Differential Revision: [D69297766](https://our.internmc.facebook.com/intern/diff/D69297766) Pull Request resolved: https://github.com/pytorch/pytorch/pull/146589 Approved by: https://github.com/syed-ahmed, https://github.com/c-p-i-o, https://github.com/fduwjj		2025-02-10 05:23:11 +00:00
..
autograd
c10d	[DDP] Use NCCL allocated memory for gradient bucket (#146589 )	2025-02-10 05:23:11 +00:00
rpc	Enable bugprone-unchecked-optional-access (#144226 )	2025-01-10 03:16:56 +00:00