From ff3d773dd9b6027e0dc0567d8d85103bc6b68f71 Mon Sep 17 00:00:00 2001
From: fduwjj <fduwjj@gmail.com>
Date: Sat, 14 Oct 2023 10:33:36 -0700
Subject: [PATCH] [TP] Add deprecation warnings in the documentations for
 Pairwise parallel, sequence parallel and other prepare input/output functions
 (#111176)

As part of TP UX improvements, we want to keep our API simple (not easy) so that users get the flexibility to do what they want and avoid a too generic API which tries to solve everything and get things too complicated. We are updating the doc accordingly.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/111176
Approved by: https://github.com/wanchaol
ghstack dependencies: #111160, #111166
---
 docs/source/distributed.tensor.parallel.rst | 25 +++++++++++++--------
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/docs/source/distributed.tensor.parallel.rst b/docs/source/distributed.tensor.parallel.rst
index 2ce0082d591..f848044e8f5 100644
--- a/docs/source/distributed.tensor.parallel.rst
+++ b/docs/source/distributed.tensor.parallel.rst
@@ -6,7 +6,7 @@ Tensor Parallelism - torch.distributed.tensor.parallel
 
 Tensor Parallelism(TP) is built on top of the PyTorch DistributedTensor
 (`DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md>`__)
-and provides several parallelism styles: Rowwise, Colwise and Pairwise Parallelism.
+and provides several parallelism styles: Rowwise and Colwise Parallelism.
 
 .. warning ::
     Tensor Parallelism APIs are experimental and subject to change.
@@ -27,23 +27,32 @@ Tensor Parallelism supports the following parallel styles:
 .. autoclass:: torch.distributed.tensor.parallel.style.ColwiseParallel
   :members:
 
+.. warning::
+    We are deprecating the styles below and will remove them soon:
+
 .. autoclass:: torch.distributed.tensor.parallel.style.PairwiseParallel
   :members:
 
-.. warning ::
-    Sequence Parallelism are still in experimental and no evaluation has been done.
-
 .. autoclass:: torch.distributed.tensor.parallel.style.SequenceParallel
   :members:
 
 Since Tensor Parallelism is built on top of DTensor, we need to specify the
-input and output placement of the module with DTensors so it can expectedly
-interacts with the module before and after. The followings are functions
-used for input/output preparation:
+DTensor layout of the input and output of the module so it can interact with
+the module parameters and module afterwards. Users can achieve this by specifying
+the ``input_layouts`` and ``output_layouts`` which annotate inputs as DTensors
+and redistribute the outputs, if needed.
+
+If users only want to annotate the DTensor layout for inputs/outputs and no need to
+distribute its parameters, the following classes can be used in the ``parallelize_plan``
+of ``parallelize_module``:
 
 
 .. currentmodule:: torch.distributed.tensor.parallel.style
+.. autofunction::  PrepareModuleInput
+.. autofunction::  PrepareModuleOutput
 
+.. warning::
+    We are deprecating the methods below and will remove them soon:
 .. autofunction::  make_input_replicate_1d
 .. autofunction::  make_input_reshard_replicate
 .. autofunction::  make_input_shard_1d
@@ -53,8 +62,6 @@ used for input/output preparation:
 .. autofunction::  make_output_shard_1d
 .. autofunction::  make_output_tensor
 
-.. autofunction::  PrepareModuleInput
-.. autofunction::  PrepareModuleOutput
 
 Currently, there are some constraints which makes it hard for the ``MultiheadAttention``
 module to work out of box for Tensor Parallelism, so we recommend users to try ``ColwiseParallel``