pytorch/test/distributed
Wei Feng 2a8e94347f [TP] verify numeric parity on Transfromers for multiple iterations (#132543)
Before setting up float8 numeric parity test, I have to set up regular TP numeric parity test, preferrably testing 10 iterations

this PR sets a baseline of TP numerics. I can verify fp8 on top of it

Summary:

Test Plan:

Reviewers:

Subscribers:

Tasks:

Tags:

Pull Request resolved: https://github.com/pytorch/pytorch/pull/132543
Approved by: https://github.com/tianyu-l
ghstack dependencies: #132350
2024-08-04 06:43:27 +00:00
..
_composable Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
_shard Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
_tensor Revert "Grouped Query Attention (#128898)" 2024-08-02 18:58:46 +00:00
_tools Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
algorithms Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
bin
checkpoint Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
elastic Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
fsdp Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
launcher
nn/jit
optim Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
pipelining [pipelining] Make test_schedule quiet (#132369) 2024-08-02 20:38:17 +00:00
rpc
tensor/parallel [TP] verify numeric parity on Transfromers for multiple iterations (#132543) 2024-08-04 06:43:27 +00:00
argparse_util_test.py
test_c10d_common.py Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
test_c10d_functional_native.py [inductor]Add DtypeView to avoid memory leak and unnecessary kernel generations (#128883) 2024-07-23 17:31:39 +00:00
test_c10d_gloo.py Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
test_c10d_logger.py
test_c10d_nccl.py Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
test_c10d_object_collectives.py
test_c10d_ops_nccl.py
test_c10d_pypg.py
test_c10d_spawn.py
test_c10d_spawn_gloo.py
test_c10d_spawn_nccl.py
test_c10d_spawn_ucc.py
test_c10d_ucc.py Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
test_collective_utils.py
test_compute_comm_reordering.py [Traceable FSDP2][Inductor] Create grouped nodes for FSDP2 all-gather code block and reduce-scatter code block (after Buffer/Operation split) (#131510) 2024-07-27 08:39:58 +00:00
test_control_collectives.py
test_data_parallel.py Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
test_device_mesh.py [DeviceMesh] Remove _parent_mesh as an attribute from DeviceMesh and remove it from DeviceMesh's hash (#131636) 2024-07-25 22:47:22 +00:00
test_distributed_spawn.py
test_dynamo_distributed.py Ensure compiler collective is called even when no graph is compiled (#132163) 2024-08-02 16:31:54 +00:00
test_fake_pg.py
test_functional_api.py Only make wait_tensor as a side_effect op (#132341) 2024-08-02 01:24:40 +00:00
test_inductor_collectives.py
test_launcher.py
test_multi_threaded_pg.py
test_nccl.py
test_pg_wrapper.py
test_store.py Add None return type to init -- tests rest (#132376) 2024-08-01 15:44:51 +00:00
test_symmetric_memory.py [micro_pipeline_tp] implement the pass for fused_scaled_matmul_reduce_scatter (#131951) 2024-07-30 23:02:49 +00:00