mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
This PR adds an API `FSDPModule.set_reduce_scatter_divide_factor` to allow setting a custom gradient divide factor for reduce-scatter. This can be useful when using parallelisms in combination with FSDP (e.g. expert parallelism), where gradients need to be divided by a custom factor (e.g. an extra `EP` factor). Pull Request resolved: https://github.com/pytorch/pytorch/pull/129286 Approved by: https://github.com/weifengpy |
||
|---|---|---|
| .. | ||
| fsdp | ||
| __init__.py | ||
| checkpoint_activation.py | ||
| contract.py | ||
| fully_shard.py | ||
| replicate.py | ||