Fix docs typos. (#35465)

Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
This commit is contained in:
湛露先生 2025-01-02 18:29:46 +08:00 committed by GitHub
parent 6b1e86fd4d
commit b2b04e86e7
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
2 changed files with 2 additions and 2 deletions

View file

@ -58,7 +58,7 @@ Otherwise, you can choose a size-based wrapping policy where FSDP is applied to
### Checkpointing
Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`]` method.
Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`] method.
```py
# directory containing checkpoints

View file

@ -74,7 +74,7 @@ FSDP 是通过包装网络中的每个层来应用的。通常,包装是以嵌
应该使用 `fsdp_state_dict_type: SHARDED_STATE_DICT` 来保存中间检查点,
因为在排名 0 上保存完整状态字典需要很长时间,通常会导致 `NCCL Timeout` 错误,因为在广播过程中会无限期挂起。
您可以使用 [`~accelerate.Accelerator.load_state`]` 方法加载分片状态字典以恢复训练。
您可以使用 [`~accelerate.Accelerator.load_state`] 方法加载分片状态字典以恢复训练。
```py
# 包含检查点的目录