mirror of
https://github.com/saymrwulf/transformers.git
synced 2026-05-14 20:58:08 +00:00
Fix docs typos. (#35465)
Signed-off-by: zhanluxianshen <zhanluxianshen@163.com>
This commit is contained in:
parent
6b1e86fd4d
commit
b2b04e86e7
2 changed files with 2 additions and 2 deletions
|
|
@ -58,7 +58,7 @@ Otherwise, you can choose a size-based wrapping policy where FSDP is applied to
|
|||
|
||||
### Checkpointing
|
||||
|
||||
Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`]` method.
|
||||
Intermediate checkpoints should be saved with `fsdp_state_dict_type: SHARDED_STATE_DICT` because saving the full state dict with CPU offloading on rank 0 takes a lot of time and often results in `NCCL Timeout` errors due to indefinite hanging during broadcasting. You can resume training with the sharded state dicts with the [`~accelerate.Accelerator.load_state`] method.
|
||||
|
||||
```py
|
||||
# directory containing checkpoints
|
||||
|
|
|
|||
|
|
@ -74,7 +74,7 @@ FSDP 是通过包装网络中的每个层来应用的。通常,包装是以嵌
|
|||
|
||||
应该使用 `fsdp_state_dict_type: SHARDED_STATE_DICT` 来保存中间检查点,
|
||||
因为在排名 0 上保存完整状态字典需要很长时间,通常会导致 `NCCL Timeout` 错误,因为在广播过程中会无限期挂起。
|
||||
您可以使用 [`~accelerate.Accelerator.load_state`]` 方法加载分片状态字典以恢复训练。
|
||||
您可以使用 [`~accelerate.Accelerator.load_state`] 方法加载分片状态字典以恢复训练。
|
||||
|
||||
```py
|
||||
# 包含检查点的目录
|
||||
|
|
|
|||
Loading…
Reference in a new issue