pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Yu, Guangye c09205a057 Deprecate device-specific GradScaler autocast API (#126527 ) # Motivation ## for `torch.amp.GradScaler`, - `torch.cpu.amp.GradScaler(args...)` is completely equivalent to `torch. amp.GradScaler("cpu", args...)`. - `torch.cuda.amp.GradScaler(args...)` is completely equivalent to `torch.amp.GradScaler("cuda", args...)`. So, we intend to depreate them and strongly recommend developer to use `torch.amp.GradScaler`. ## for `custom_fwd` and `custom_bwd`, this is a good solution to make the custom function run with or without effect even in an autocast-enabled region and can be shared by other backends, like CPU and XPU. So we generalize it to be device-agnostic and put them int `torch/amp/autocast_mode.py` and re-expose to `torch.amp.custom_fwd` and `torch.amp.custom_bwd`. Meanwhile, we deprecate `torch.cuda.amp.custom_fwd` and `torch.cuda.amp.custom_bwd`. # Additional Context Add UT to cover the deprecated warning. No need for more UTs to cover the functionality of `torch.amp.custom_f/bwd`, the existing UTs that previously covered the functionality of `torch.cuda.amp.custom_f/bwd` can cover them. To facilitate the review, we separate these code changes to two PRs. The first PR cover `torch.amp.GradScaler`. The follow-up covers `custom_fwd` and `custom_bwd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/126527 Approved by: https://github.com/jgong5, https://github.com/gujinghui, https://github.com/janeyx99, https://github.com/EikanWang		2024-05-25 06:41:34 +00:00
..
_composable	[FSDP2] Added test for N-way TP and 1-way FSDP with CPU offloading (#127024 )	2024-05-24 17:09:12 +00:00
_shard
_spmd
_tensor	[dtensor][debug] add c10d allreduce_coalesced_ tracing to CommDebugMode (#127040 )	2024-05-24 22:25:44 +00:00
_tools
algorithms
bin
checkpoint	[DSD] Add a test to verify FSDP lazy initialization case (#127069 )	2024-05-24 21:09:11 +00:00
elastic	[TorchElastic] Option for sharing TCPStore created by rdzv handlers (#125743 )	2024-05-22 18:24:11 +00:00
fsdp	Deprecate device-specific GradScaler autocast API (#126527 )	2024-05-25 06:41:34 +00:00
launcher
nn/jit
optim
pipeline/sync
pipelining	[pipelining] test composability with DDP and FSDP (#127066 )	2024-05-25 04:30:40 +00:00
rpc
tensor/parallel	[DTensor] Turn on foreach implementation of optimizer for DTensor by default (#123394 )	2024-05-15 16:45:42 +00:00
argparse_util_test.py
test_c10d_common.py	[C10D] make get_node_local_rank() accept fallback_rank (#126737 )	2024-05-21 03:38:02 +00:00
test_c10d_functional_native.py	[Traceable FSDP2] Add all_gather_into_tensor out variant (#126334 )	2024-05-16 10:27:06 +00:00
test_c10d_gloo.py
test_c10d_logger.py
test_c10d_nccl.py	Capture dtype in Flight Recorder (#126581 )	2024-05-22 03:38:09 +00:00
test_c10d_object_collectives.py	[Distributed] Add P2P versions of *object_list operations (#124379 )	2024-05-03 23:22:58 +00:00
test_c10d_ops_nccl.py	[c10d] Reduce test time by reusing ProcessGroup (#125648 )	2024-05-08 22:33:40 +00:00
test_c10d_pypg.py
test_c10d_spawn.py
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py
test_collective_utils.py
test_compute_comm_reordering.py	Remove Inductor IRs for legacy functional collectives (#124992 )	2024-05-05 19:49:58 +00:00
test_control_collectives.py	Reapply "c10d: add Collectives abstraction (#125978 )" (#126695 )	2024-05-21 18:00:09 +00:00
test_cuda_p2p.py	Introduce ProcessGroupCudaP2P (#122163 )	2024-05-24 18:33:18 +00:00
test_data_parallel.py
test_device_mesh.py	[DeviceMesh] Supported N groups in `from_group` (#126258 )	2024-05-17 01:03:21 +00:00
test_distributed_spawn.py
test_dynamo_distributed.py	[dynamo][fsdp] Use Tensor match for FSDP modules (#125827 )	2024-05-09 21:26:15 +00:00
test_fake_pg.py
test_functional_api.py
test_inductor_collectives.py
test_launcher.py
test_multi_threaded_pg.py
test_nccl.py
test_pg_wrapper.py
test_store.py