pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Shen Li ba1da47e8f Add OnCompletion Hook to ProcessGroup (#106988 ) This allows infra/trainers to get detailed stats about communication efficiencies without know anything about what model or distributed training paradigms have been used. This is helpful as infra/trainer package usually prefers to be as model/algorithm agnostic as possible. Therefore, we cannot assume that infra/trainer can have access to all collectives used by the model authors. This commit adds an `OnCompletion` hook to `ProcessGroupNCCL` which will be fired on every work completion event. Pull Request resolved: https://github.com/pytorch/pytorch/pull/106988 Approved by: https://github.com/kumpera, https://github.com/H-Huang ghstack dependencies: #107140, #107141, #107160		2023-08-15 04:32:23 +00:00
..
_composable	[FSDP][9/N] Introduce `CustomPolicy` (#104986 )	2023-08-03 12:46:36 +00:00
_shard
_spmd	[device_mesh][BE] remove allgather from DM (#105614 )	2023-07-27 01:33:05 +00:00
_tensor	[TP][DTensor Perf] Some perf improvement to reduce DTensor CPU overhead (#106524 )	2023-08-14 20:03:19 +00:00
_tools
algorithms
bin
checkpoint	[DCP] Modify tensor saving logic in DCP (#106415 )	2023-08-09 00:16:10 +00:00
elastic	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00
fsdp	[PT-D][FSDP] Handle corner case of load with multi-backend PG (#107172 )	2023-08-14 23:24:44 +00:00
launcher	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00
nn/jit
optim	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 )	2023-08-08 15:27:34 +00:00
pipeline/sync	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00
rpc
tensor/parallel	Clean up unsed MHA code to avoid confusion (#105956 )	2023-07-27 17:10:17 +00:00
argparse_util_test.py
test_c10d_common.py	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00
test_c10d_gloo.py	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00
test_c10d_logger.py
test_c10d_nccl.py	Add OnCompletion Hook to ProcessGroup (#106988 )	2023-08-15 04:32:23 +00:00
test_c10d_object_collectives.py	[c10d] Remove test for init barrier (#103223 )	2023-06-08 16:56:40 +00:00
test_c10d_pypg.py
test_c10d_spawn.py	[BE] f-stringify torch/ and scripts (#105538 )	2023-07-21 19:35:24 +00:00
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00
test_collective_utils.py	Initial commit of collective_utils (#101037 )	2023-06-27 02:15:16 +00:00
test_data_parallel.py	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 )	2023-08-08 15:27:34 +00:00
test_distributed_spawn.py	Back out "Revert "[DDP] multiple forward support for static graph (#103487 )" (#103873 )" (#103938 )	2023-06-22 21:55:58 +00:00
test_dynamo_distributed.py	Back out "Reland "Make adding buffers more like adding parameters (#104069 )" (#106224 )" (#106743 )	2023-08-08 15:27:34 +00:00
test_fake_pg.py
test_functional_api.py	[device_mesh][BE] reduce_scatter fallback to funcol and remove from DM (#105642 )	2023-07-27 01:33:05 +00:00
test_inductor_collectives.py	[ROCm] enable additional inductor/dynamo UTs (#104624 )	2023-07-11 20:44:02 +00:00
test_launcher.py
test_multi_threaded_pg.py	[C10D] Improve MTPG autograd test. Fixes #105106 (#105356 )	2023-07-20 13:51:21 +00:00
test_nccl.py
test_pg_wrapper.py
test_store.py	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 )	2023-07-19 14:27:11 +00:00