pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Oguz Ulgen	920f0426ae	Add None return type to init -- tests rest (#132376 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/132376 Approved by: https://github.com/jamesjwu ghstack dependencies: #132335, #132351, #132352	2024-08-01 15:44:51 +00:00
Anshul Sinha	407c87a32c	[debug][dtensor] fixed updating current module (#130995 ) Summary Fixed issue with updating the current module when transitioning between child module to parent module and in the backward pass. The first issue is caused because the prehook is not called again when we go back to the parent module and that the hook being used was a register_module_forward_hook, which runs before the register_module_hook used in redistribute, causing the collective call to be assigned to the incorrect module. In order to do this, I updated the current module to be the parent module in a register_forward_hook in the module tracker. The second issue was caused by the parent set in the module tracker I inherit from being incorrect. I fixed this issue by saving the parents of each module and using them in collective counter instead of the incorrect set. I have updated the example in module_operation_tracing to reflect the correct output. In addition, I changed the test cases that used the incompatible old CommDebugMode. Test Case 1. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e MLP_operation_tracing 2. pytest test/distributed/_tensor/debug/test_comm_mode_features.py -s -k test_transformer_module_tracing 3. python test/distributed/_composable/fsdp/test_fully_shard_training.py -k TestFullyShardGradientAccumulation.test_gradient_accumulation 4. python test/distributed/_tensor/test_math_ops.py -k DistMathOpsTest.test_layer_norm_bwd Pull Request resolved: https://github.com/pytorch/pytorch/pull/130995 Approved by: https://github.com/XilunWu ghstack dependencies: #130410	2024-07-20 20:57:29 +00:00
Xuehai Pan	db3290846e	[BE][Easy][10/19] enforce style for empty lines in import segments in `test/d*/` (#129761 ) See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501. Most changes are auto-generated by linter. You can review these PRs via: ```bash git diff --ignore-all-space --ignore-blank-lines HEAD~1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/129761 Approved by: https://github.com/fegin	2024-07-17 16:57:39 +00:00
Anshul Sinha	f059201e0d	[dtensor][debug] added deviceMesh for relevant operations and module parameter sharding and module fqn (#130072 ) Summary In order to give users more information, I have added the deviceMesh for operations with DTensor inputs, and module parameter sharding and FQN. These changes have only been placed in operation tracing log. In the future, I plan to just have one logging function with an argument to show how detailed a user wants the log to be, and will get rid of the module tracing log function. This information has also been added to the JSON dump and can be seen in the browser visual. I have also edited the test case file as the module_depth dictionary has been replaced with module_helper_dict and have edited the example output for the MLP operation tracing which can be seen below: Test Plan 1. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e MLP_json_dump 2. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e transformer_json_dump 3. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e MLP_operation_tracing 4. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e transformer_operation_tracing 5. pytest test/distributed/_tensor/debug/test_comm_mode_features.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/130072 Approved by: https://github.com/XilunWu ghstack dependencies: #129994	2024-07-08 20:12:52 +00:00
Anshul Sinha	0b9995c1ce	[dtensor][debug] Added forward and backward differentiation for module level tracing (#129602 ) Summary Currently, comm_mode only allowed users to differentiate between forward and backward passes at the operational level. I modified the code so that users can now see the collective counts for the passes at a module level. I decided to slightly change how the output was formatted making it easier to differentiate between a collective count and an operation. I have designed the operational trace table function so that in the future, a user can use command line arguments in order to determine the level of information they want to display instead of having two similar functions. Finally, I have updated the new output and test cases for comm_mode example and test files. The expected output for the first 3 examples are shown below: <img width="320" alt="Screenshot 2024-06-26 at 2 30 25 PM" src="https://github.com/pytorch/pytorch/assets/50644008/b8e88075-a07f-4e84-b728-a08959df3661"> <img width="497" alt="Screenshot 2024-06-26 at 2 29 15 PM" src="https://github.com/pytorch/pytorch/assets/50644008/5ef4bea7-1355-4089-bfb0-c7e3f588ac77"> <img width="615" alt="Screenshot 2024-06-26 at 2 31 05 PM" src="https://github.com/pytorch/pytorch/assets/50644008/feacae51-76f7-403b-b6cd-dd15e981770e"> Test Plan 1. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e MLP_module_tracing 2. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e transformer_module_tracing 3. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e MLP_operation_tracing 4. torchrun --standalone --nnodes=1 --nproc-per-node=4 torch/distributed/_tensor/examples/comm_mode_features_example.py -e transformer_operation_tracing 5. pytest test/distributed/_tensor/debug/test_comm_mode_features.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/129602 Approved by: https://github.com/XilunWu, https://github.com/wz337	2024-07-04 06:00:58 +00:00
Anshul Sinha	bd3a11776f	[dtensor][test] test case suite for comm_mode features (#128729 ) Summary Currently, there is only an example file for comm_mode and its features. I have created test cases that mirror the examples while the more complicated test cases also ensure that comm_mode resets all variables when used multiple times in the same function. This test case suite will also help developers ensure that new code they add to comm_mode does not affect correctness of old features. #128536 Test Plan pytest test/distributed/_tensor/debug/test_comm_mode_features.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/128729 Approved by: https://github.com/XilunWu	2024-06-26 05:25:57 +00:00
Chien-Chin Huang	60baeee59f	[BE] Skip the test if CUDA is not available (#128885 ) As title Differential Revision: [D58690210](https://our.internmc.facebook.com/intern/diff/D58690210/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/128885 Approved by: https://github.com/wz337	2024-06-18 07:02:44 +00:00
Anshul Sinha	e76b28c765	[dtensor][debug] added c10d alltoall_ and alltoall_base_ to CommDebugMode (#127360 ) Summary Added c10d alltoall_ and alltoall_base tracing to CommDebugMode and edited test case in test_comm_mode to include added features. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127360 Approved by: https://github.com/wz337, https://github.com/XilunWu, https://github.com/yifuwang ghstack dependencies: #127358	2024-06-04 18:29:48 +00:00
Anshul Sinha	01e6d1cae4	[dtensor][debug] added c10d reduce_scatter_ and reduce_scatter_tensor_coalesced tracing_ to CommDebugMode (#127358 ) Summary Added c10d reduce_scatter_ and reduce_scatter_tensor_coalesced tracing to CommDebugMode and edited test case in test_comm_mode to include added features. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127358 Approved by: https://github.com/wz337, https://github.com/XilunWu, https://github.com/yifuwang	2024-06-04 18:29:48 +00:00
Anshul Sinha	15cc9f2e7e	[dtensor][be] added checksAssert function and refactored test cases (#127356 ) Summary Added c10d checksAsserts functions to reduce written lines of code and refactored test cases. Merged one test case into another. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127356 Approved by: https://github.com/XilunWu ghstack dependencies: #127025, #127029, #127040, #127134, #127334	2024-05-30 03:48:17 +00:00
Anshul Sinha	998f38814c	[dtensor][debug] added c10d allgather, allgather_coalesced, and allgather_into_tensor_coalesced tracing to CommDebugMode (#127334 ) Summary Added c10d allgather, allgather_coalesced, and allgather_into_tensor_coalesced tracing to CommDebugMode and edited test case in test_comm_mode to include added features. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127334 Approved by: https://github.com/XilunWu, https://github.com/yifuwang ghstack dependencies: #127025, #127029, #127040, #127134	2024-05-30 03:48:17 +00:00
Wanchao Liang	2c9a420da3	[dtensor] move some modules to private namespace (#127339 ) as titled, moving some modules that are mainly for DTensor private usage to be a private module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/127339 Approved by: https://github.com/awgu ghstack dependencies: #127338	2024-05-29 05:18:47 +00:00
Anshul Sinha	6b24155827	[dtensor][debug] added c10d gather, reduce, scatter tracing to CommDebugMode (#127134 ) Summary Added c10d gather, reduce, and scatter tracing to CommDebugMode and edited test case in test_comm_mode to include added features. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127134 Approved by: https://github.com/XilunWu ghstack dependencies: #127025, #127029, #127040	2024-05-28 22:48:07 +00:00
Anshul Sinha	83617017e0	[dtensor][debug] add c10d allreduce_coalesced_ tracing to CommDebugMode (#127040 ) Summary Added c10d all_reduce_coalesced tracing to CommDebugMode and added test case to test_comm_mode.py. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127040 Approved by: https://github.com/XilunWu ghstack dependencies: #127025, #127029	2024-05-24 22:25:44 +00:00
Anshul Sinha	27594be3ed	[dtensor][be] remove repeated test in test_comm_mode.py (#127029 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/127029 Approved by: https://github.com/XilunWu ghstack dependencies: #127025	2024-05-24 01:42:13 +00:00
Anshul Sinha	89c638f9a5	[dtensor][debug] add all_reduce_coalesced tracing to CommDebugMode (#127025 ) Summary Added all_reduce_coalesced tracing to CommDebugMode and added test case to test_comm_mode test suite. Test Plan pytest test/distributed/_tensor/debug/test_comm_mode.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/127025 Approved by: https://github.com/XilunWu	2024-05-24 01:42:13 +00:00
Wanchao Liang	ff061baa94	[comm_mode] adding some initial c10d ops to CommDebugMode (#125475 ) looks like we can make it work :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/125475 Approved by: https://github.com/awgu	2024-05-04 04:20:46 +00:00
Brian Hirsh	9e0631cc8a	get CommsDebugMode to work with DTensor (#118769 ) Tested with Wanchao's repro: ``` from typing import Tuple, List, Dict, cast import torch import torch.nn as nn from torch.distributed.device_mesh import init_device_mesh from torch.distributed._tensor import distribute_tensor, DTensor, Shard, Placement, Replicate mesh = init_device_mesh(device_type="cuda", mesh_shape=(2,)) x = torch.randn(4, 8, requires_grad=True) y = torch.randn(4, 32, requires_grad=True) x_dtensor = DTensor.from_local(x, mesh, [Shard(0)], run_check=False) y_dtensor = DTensor.from_local(y, mesh, [Shard(0)], run_check=False) from torch.distributed._tensor.debug import CommDebugMode comm_mode = CommDebugMode() with comm_mode: z = torch.mm(x_dtensor, y_dtensor) print(comm_mode.get_comm_counts()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/118769 Approved by: https://github.com/wanchaol	2024-02-29 01:11:05 +00:00
Wanchao Liang	624f202522	[dtensor] add CommDebugMode for debugging (#113592 ) This PR adds a CommDebugMode debugging tool to record the number of distributed collectives, utilizing TorchDispatchMode, the idea borrows from the FlopCounterMode and we can expand this later to make it more feature complete like the FlopCounterMode This is useful for debugging with DTensor and testing, in general this fits for any complex distributed algorithms where it's non-trival to understand the algorithm, we can use this tool to understand what happened under the hood., we can later cover c10d collectives directly Not sure if it would be a good general distributed debug tool yet, so adding to the dtensor package first Pull Request resolved: https://github.com/pytorch/pytorch/pull/113592 Approved by: https://github.com/wconstab	2023-11-27 02:40:28 +00:00
Justin Chu	232b96b6e2	[BE] Enable ruff's UP rules and autoformat distributed/ (#105433 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/105433 Approved by: https://github.com/albanD	2023-07-19 14:27:11 +00:00
Wanchao Liang	123be4b694	[dtensor] add debug tool to track op coverage (#100124 ) This PR adds a debug tool to track the op coverage needed in DTensor. Note that we specifically target ops after decomp table in inductor Pull Request resolved: https://github.com/pytorch/pytorch/pull/100124 Approved by: https://github.com/XilunWu	2023-05-02 01:45:55 +00:00

21 commits