Improve the clarity of the torch.Tensor.backward doc (#127201)

Improve the clarity of the torch.Tensor.backward doc, particularly wrt the arg `gradient`.
Reference https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html,
```
We need to explicitly pass a gradient argument in Q.backward() because it is a vector. gradient is a tensor of the same shape as Q, and it represents the gradient of Q w.r.t. itself
```

@janeyx99 feel free to assign to the corresponding reviewers, thanks
Co-authored-by: Jeffrey Wan <soulitzer@gmail.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127201
Approved by: https://github.com/soulitzer
This commit is contained in:
lancerts 2024-05-28 19:25:49 +00:00 committed by PyTorch MergeBot
parent f600faf248
commit 03005bb655

View file

@ -468,8 +468,8 @@ class Tensor(torch._C.TensorBase):
The graph is differentiated using the chain rule. If the tensor is
non-scalar (i.e. its data has more than one element) and requires
gradient, the function additionally requires specifying ``gradient``.
It should be a tensor of matching type and location, that contains
gradient, the function additionally requires specifying a ``gradient``.
It should be a tensor of matching type and shape, that represents
the gradient of the differentiated function w.r.t. ``self``.
This function accumulates gradients in the leaves - you might need to zero
@ -491,12 +491,9 @@ class Tensor(torch._C.TensorBase):
See https://github.com/pytorch/pytorch/pull/60521#issuecomment-867061780 for more details.
Args:
gradient (Tensor or None): Gradient w.r.t. the
tensor. If it is a tensor, it will be automatically converted
to a Tensor that does not require grad unless ``create_graph`` is True.
None values can be specified for scalar Tensors or ones that
don't require grad. If a None value would be acceptable then
this argument is optional.
gradient (Tensor, optional): The gradient of the function
being differentiated w.r.t. ``self``.
This argument can be omitted if ``self`` is a scalar.
retain_graph (bool, optional): If ``False``, the graph used to compute
the grads will be freed. Note that in nearly all cases setting
this option to True is not needed and often can be worked around
@ -505,10 +502,10 @@ class Tensor(torch._C.TensorBase):
create_graph (bool, optional): If ``True``, graph of the derivative will
be constructed, allowing to compute higher order derivative
products. Defaults to ``False``.
inputs (sequence of Tensor): Inputs w.r.t. which the gradient will be
accumulated into ``.grad``. All other Tensors will be ignored. If not
inputs (sequence of Tensor, optional): Inputs w.r.t. which the gradient will be
accumulated into ``.grad``. All other tensors will be ignored. If not
provided, the gradient is accumulated into all the leaf Tensors that were
used to compute the attr::tensors.
used to compute the :attr:`tensors`.
"""
if has_torch_function_unary(self):
return handle_torch_function(