mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
[TP] Add deprecation warnings in the documentations for Pairwise parallel, sequence parallel and other prepare input/output functions (#111176)
As part of TP UX improvements, we want to keep our API simple (not easy) so that users get the flexibility to do what they want and avoid a too generic API which tries to solve everything and get things too complicated. We are updating the doc accordingly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/111176 Approved by: https://github.com/wanchaol ghstack dependencies: #111160, #111166
This commit is contained in:
parent
73d288fdf9
commit
ff3d773dd9
1 changed files with 16 additions and 9 deletions
|
|
@ -6,7 +6,7 @@ Tensor Parallelism - torch.distributed.tensor.parallel
|
|||
|
||||
Tensor Parallelism(TP) is built on top of the PyTorch DistributedTensor
|
||||
(`DTensor <https://github.com/pytorch/pytorch/blob/main/torch/distributed/_tensor/README.md>`__)
|
||||
and provides several parallelism styles: Rowwise, Colwise and Pairwise Parallelism.
|
||||
and provides several parallelism styles: Rowwise and Colwise Parallelism.
|
||||
|
||||
.. warning ::
|
||||
Tensor Parallelism APIs are experimental and subject to change.
|
||||
|
|
@ -27,23 +27,32 @@ Tensor Parallelism supports the following parallel styles:
|
|||
.. autoclass:: torch.distributed.tensor.parallel.style.ColwiseParallel
|
||||
:members:
|
||||
|
||||
.. warning::
|
||||
We are deprecating the styles below and will remove them soon:
|
||||
|
||||
.. autoclass:: torch.distributed.tensor.parallel.style.PairwiseParallel
|
||||
:members:
|
||||
|
||||
.. warning ::
|
||||
Sequence Parallelism are still in experimental and no evaluation has been done.
|
||||
|
||||
.. autoclass:: torch.distributed.tensor.parallel.style.SequenceParallel
|
||||
:members:
|
||||
|
||||
Since Tensor Parallelism is built on top of DTensor, we need to specify the
|
||||
input and output placement of the module with DTensors so it can expectedly
|
||||
interacts with the module before and after. The followings are functions
|
||||
used for input/output preparation:
|
||||
DTensor layout of the input and output of the module so it can interact with
|
||||
the module parameters and module afterwards. Users can achieve this by specifying
|
||||
the ``input_layouts`` and ``output_layouts`` which annotate inputs as DTensors
|
||||
and redistribute the outputs, if needed.
|
||||
|
||||
If users only want to annotate the DTensor layout for inputs/outputs and no need to
|
||||
distribute its parameters, the following classes can be used in the ``parallelize_plan``
|
||||
of ``parallelize_module``:
|
||||
|
||||
|
||||
.. currentmodule:: torch.distributed.tensor.parallel.style
|
||||
.. autofunction:: PrepareModuleInput
|
||||
.. autofunction:: PrepareModuleOutput
|
||||
|
||||
.. warning::
|
||||
We are deprecating the methods below and will remove them soon:
|
||||
.. autofunction:: make_input_replicate_1d
|
||||
.. autofunction:: make_input_reshard_replicate
|
||||
.. autofunction:: make_input_shard_1d
|
||||
|
|
@ -53,8 +62,6 @@ used for input/output preparation:
|
|||
.. autofunction:: make_output_shard_1d
|
||||
.. autofunction:: make_output_tensor
|
||||
|
||||
.. autofunction:: PrepareModuleInput
|
||||
.. autofunction:: PrepareModuleOutput
|
||||
|
||||
Currently, there are some constraints which makes it hard for the ``MultiheadAttention``
|
||||
module to work out of box for Tensor Parallelism, so we recommend users to try ``ColwiseParallel``
|
||||
|
|
|
|||
Loading…
Reference in a new issue