From ff3d773dd9b6027e0dc0567d8d85103bc6b68f71 Mon Sep 17 00:00:00 2001 From: fduwjj Date: Sat, 14 Oct 2023 10:33:36 -0700 Subject: [PATCH] [TP] Add deprecation warnings in the documentations for Pairwise parallel, sequence parallel and other prepare input/output functions (#111176) As part of TP UX improvements, we want to keep our API simple (not easy) so that users get the flexibility to do what they want and avoid a too generic API which tries to solve everything and get things too complicated. We are updating the doc accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111176 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166 --- docs/source/distributed.tensor.parallel.rst | 25 +++++++++++++-------- 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/docs/source/distributed.tensor.parallel.rst b/docs/source/distributed.tensor.parallel.rst index 2ce0082d591..f848044e8f5 100644 --- a/docs/source/distributed.tensor.parallel.rst +++ b/docs/source/distributed.tensor.parallel.rst @@ -6,7 +6,7 @@ Tensor Parallelism - torch.distributed.tensor.parallel Tensor Parallelism(TP) is built on top of the PyTorch DistributedTensor (`DTensor `__) -and provides several parallelism styles: Rowwise, Colwise and Pairwise Parallelism. +and provides several parallelism styles: Rowwise and Colwise Parallelism. .. warning :: Tensor Parallelism APIs are experimental and subject to change. @@ -27,23 +27,32 @@ Tensor Parallelism supports the following parallel styles: .. autoclass:: torch.distributed.tensor.parallel.style.ColwiseParallel :members: +.. warning:: + We are deprecating the styles below and will remove them soon: + .. autoclass:: torch.distributed.tensor.parallel.style.PairwiseParallel :members: -.. warning :: - Sequence Parallelism are still in experimental and no evaluation has been done. - .. autoclass:: torch.distributed.tensor.parallel.style.SequenceParallel :members: Since Tensor Parallelism is built on top of DTensor, we need to specify the -input and output placement of the module with DTensors so it can expectedly -interacts with the module before and after. The followings are functions -used for input/output preparation: +DTensor layout of the input and output of the module so it can interact with +the module parameters and module afterwards. Users can achieve this by specifying +the ``input_layouts`` and ``output_layouts`` which annotate inputs as DTensors +and redistribute the outputs, if needed. + +If users only want to annotate the DTensor layout for inputs/outputs and no need to +distribute its parameters, the following classes can be used in the ``parallelize_plan`` +of ``parallelize_module``: .. currentmodule:: torch.distributed.tensor.parallel.style +.. autofunction:: PrepareModuleInput +.. autofunction:: PrepareModuleOutput +.. warning:: + We are deprecating the methods below and will remove them soon: .. autofunction:: make_input_replicate_1d .. autofunction:: make_input_reshard_replicate .. autofunction:: make_input_shard_1d @@ -53,8 +62,6 @@ used for input/output preparation: .. autofunction:: make_output_shard_1d .. autofunction:: make_output_tensor -.. autofunction:: PrepareModuleInput -.. autofunction:: PrepareModuleOutput Currently, there are some constraints which makes it hard for the ``MultiheadAttention`` module to work out of box for Tensor Parallelism, so we recommend users to try ``ColwiseParallel``