pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Andrew Gu c30659ffcc [ZeRO] (Reland) Add ctor support for multiple param groups (#72932 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/72578. Overview Windows CI was failing due to the multi-rank single-GPU case (see [here](https://github.com/pytorch/pytorch/runs/5204906995?check_suite_focus=true)). To address this, I - added `common_distributed.skip_if_no_gpu` for `test_multiple_param_groups()` to ensure that each rank can safely call `to(self.device)` -- this targets the expected SPSD use case where each rank has its own GPU; - moved `test_constructor()` back to `TestZeroRedundancyOptimizerSingleRank` to check that the multiple parameter group method for construction works even on a single rank. Test Plan - I checked both tests for CPU, 1 GPU, 2 GPUs, 4 GPUs, and 8 GPUs. - I added the `ciflow/win` label to run the failing Windows CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/72932 Reviewed By: rohan-varma Differential Revision: D34281482 Pulled By: awgu fbshipit-source-id: c4fe604ddd9d2c123c3071249741e6b8a6454b6e (cherry picked from commit 6bea9bcc6349ff1aad403563206fb170a3af0c70)		2022-02-22 16:29:55 +00:00
..
_shard	[PT-D][Sharded Tensor] new init api for local tensor and sharding spec auto inference (#72733 )	2022-02-16 17:42:39 +00:00
algorithms
bin
elastic	Revise the socket implementation of c10d (#68226 )	2021-11-16 20:49:25 -08:00
fsdp	Revert D33919683: [FSDP] Implement local_state_dict and load_local_state_dict	2022-02-20 02:32:48 +00:00
launcher	[torchelastic][1/n] Fix `caffe2.test.distributed.launcher.api_test` flaky tests (#68624 )	2021-11-19 15:23:30 -08:00
nn/jit
optim	[ZeRO] (Reland) Add ctor support for multiple param groups (#72932 )	2022-02-22 16:29:55 +00:00
pipeline/sync
rpc
argparse_util_test.py
test_c10d_common.py	[BE] rename some tests in test_c10d_common (#67828 )	2021-11-18 17:14:58 -08:00
test_c10d_gloo.py	no longer coalesce sparse COO tensors before comparison (#69751 )	2022-02-17 02:33:08 +00:00
test_c10d_nccl.py	Implement scatter primitive for ProcessGroupNCCL (#70029 )	2022-01-27 19:37:55 +00:00
test_c10d_spawn.py	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 )	2021-12-06 13:38:58 -08:00
test_c10d_spawn_gloo.py	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 )	2021-12-06 13:38:58 -08:00
test_c10d_spawn_nccl.py	[PyTorch][Distributed] Enable Reduce Scatter and modify all_to_all for sharded linear with more test cases. (#68786 )	2021-12-06 13:38:58 -08:00
test_data_parallel.py	no longer coalesce sparse COO tensors before comparison (#69751 )	2022-02-17 02:33:08 +00:00
test_distributed_spawn.py
test_launcher.py
test_nccl.py	[NCCL] Patch bfloat16 support (#67843 )	2021-11-09 13:46:13 -08:00
test_pg_wrapper.py
test_store.py	Add support for deleteKey for FileStore (#69953 )	2022-01-07 06:20:59 -08:00