pytorch/docs/source
Jesse Cai 2da6cae43c [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135)
This PR adds in support for semi-structured sparsity via a tensor
subclass. It currently uses the CUTLASS kernels merged in PR #100881.

In the future we plan to add in cuSPARSELt support (see the other PRs in
the stack), which will give us larger performance gains.

This PR adds in 2 things:
- a Tensor subclass, `SparseSemiStructuredTensor` to store the
  sparse tensor in copmressed form and override `__torch_dispatch__`.
- a conversion function that takes in a dense tensor and a
  semi-structured sparse bool mask and creates an instance of the
  subclass.

**SparseSemiStructuredTensor**

The subclass stores the dense tensor in a contiguous flattened tensor
for future compatability with cuSPARSELt, which expects this format.
Note that the CUTLASS kernels do not have this limitation, as the
specified values and the metadata are passed separately in
`_structured_sparse_linear`. In the future we can use the cuSPARSELT bindings
[here](https://github.com/pytorch/pytorch/pull/103700) for faster matmul, better dtype converage, and relaxed shape
constraints.

Since we currently don't have a way to go back from the sparse
representation to the dense representation, and we store the weights in
compressed form, we don't have a great way to handle .t().

Instead, we keep track of how often we've called transpose on our
tensor, and if it's an unexpected number we throw an error. When the first
argument is sparse, we expect an even number of calls to transpose,
while when the second argument is sparse, we expect an odd number of
calls. This is because we support second argument sparse matrix
multiplications by using transpose properties.

**to_sparse_semi_structured**

This is a conversion function to convert a dense tensor and a
semi-structured sparse bool mask into a subclass. Currently, we must
pass in a bool mask, since we can't infer it becuase there may be
additional zero elements in the dense tensor, so `tensor !=0` is not 2:4
sparse.

Once we add either a method to derive the mask from the dense tensor or
cuSPARSELt, we no longer need to pass in the mask. cuSPARSELt has it's
own helper functions to create the metadata mask.

**User Details**

We have implemented support for the following ops for `torch.float16`
and `torch.int8`:
```
torch.addmm(bias, dense, sparse.t())
torch.mm(dense, sparse)
torch.mm(sparse, dense)
aten.linear.default
aten.t.default
aten.t.detach
```

The end user interface to accelerate a nn.Linaer module with the
subclass would look like this:

```
from torch.sparse import to_sparse_semi_structured

mask = torch.Tensor([0, 0, 1, 1]).tile(128, 32).cuda().bool()
linear = Model(128, 128).half().cuda()

linear.weight = nn.Parameter(to_sparse_semi_structured(linear.weight,
                                                       mask=linear.weight.bool())

```

This also updates tests and the `torch.sparse` module docstring to
reflect these changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/102135
Approved by: https://github.com/albanD
2023-06-27 19:21:06 +00:00
..
_static torch.compile docs: "Profiling to understand torch.compile performance (#102862) 2023-06-06 22:00:36 +00:00
_templates Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
community Update CODEOWNERS (#103934) 2023-06-26 19:29:29 +00:00
compile Revert "To add brief intro for CPU backend optimization (#103666)" 2023-06-20 18:33:01 +00:00
elastic
notes Add docstring to torch.serialization.register_package (#104046) 2023-06-26 23:28:32 +00:00
rpc
scripts
amp.rst [docs] Warn that GradScaler can scale under 1 (#101569) 2023-05-16 23:56:07 +00:00
autograd.rst [docs] Add missing functions to autograd.rst (#98854) 2023-04-11 20:45:49 +00:00
backends.rst Fix Backend docs search items (#101214) 2023-05-22 14:58:38 +00:00
benchmark_utils.rst
bottleneck.rst
checkpoint.rst add checkpoint support for custom device (#99626) 2023-05-04 00:23:42 +00:00
compiler.rst torch.compiler public namespace (#102182) 2023-06-13 19:52:17 +00:00
complex_numbers.rst Remove CUDA 11.6 note from complex docs (#100118) 2023-04-27 16:26:27 +00:00
conf.py Use the new analytics ID (#103766) 2023-06-16 23:21:08 +00:00
config_mod.rst
cpp_extension.rst
cpp_index.rst
cuda._sanitizer.rst
cuda.rst Add more GPU metric instrumentation (#91717) 2023-02-24 00:38:03 +00:00
cudnn_persistent_rnn.rst
cudnn_rnn_determinism.rst
data.rst missed StackDataset documentation (#101927) 2023-05-22 21:12:16 +00:00
ddp_comm_hooks.rst [DOCS][DDP]Fix the simple of saving and reloading PowerSGD state and hook. (#102721) 2023-06-10 00:15:00 +00:00
deploy.rst
distributed.algorithms.join.rst
distributed.checkpoint.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
distributed.elastic.rst
distributed.optim.rst
distributed.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
distributed.tensor.parallel.rst [TP] Enable more generic attn in Tensor Parallelism (#100508) 2023-05-07 18:15:49 +00:00
distributions.rst
dlpack.rst
docutils.conf
fft.rst
fsdp.rst
func.api.rst
func.batch_norm.rst Fix typo under docs directory (#97202) 2023-03-21 01:24:10 +00:00
func.migrating.rst
func.rst
func.ux_limitations.rst
func.whirlwind_tour.rst
futures.rst
fx.rst [fx] change from #users to num_users in graph printout (#101140) 2023-06-20 21:24:32 +00:00
hub.rst
index.rst torch.compiler public namespace (#102182) 2023-06-13 19:52:17 +00:00
ir.rst
jit.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
jit_builtin_functions.rst
jit_language_reference.rst [BE] [1/3] Rewrite super() calls in caffe2 and benchmarks (#94587) 2023-02-11 18:19:48 +00:00
jit_language_reference_v2.rst Fix typo under docs directory (#97202) 2023-03-21 01:24:10 +00:00
jit_python_reference.rst
jit_unsupported.rst
jit_utils.rst
library.rst
linalg.rst
logging.rst Add graph break logging option instead of config flag (#103202) 2023-06-12 19:52:31 +00:00
masked.rst Fix link in docs (#94686) 2023-02-13 20:42:24 +00:00
math-quantizer-equation.png
mobile_optimizer.rst
model_zoo.rst
monitor.rst
mps.rst [MPS] Add support for MPSProfiler Python bindings (#101002) 2023-05-12 21:55:34 +00:00
multiprocessing.rst
name_inference.rst Add itemsize and nbytes properties to Tensor (#98322) 2023-04-05 12:11:55 +00:00
named_tensor.rst
nested.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
nn.functional.rst
nn.init.rst
nn.rst [easy] Expose documentation for a few global nn.Module hooks (#97185) 2023-03-21 20:09:29 +00:00
onnx.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
onnx_diagnostics.rst [ONNX] Introduce 'diagnostics' to 'dynamo_export' api (#99668) 2023-05-01 19:58:49 +00:00
onnx_supported_aten_ops.rst
optim.rst Optimized EMA implementation (#94820) 2023-04-26 18:02:11 +00:00
package.rst
pipeline.rst docs: Linking ResNeXt PyTorch Hub Pipeline (#98689) 2023-04-11 02:20:26 +00:00
profiler.rst
quantization-accuracy-debugging.rst
quantization-backend-configuration.rst
quantization-support.rst
quantization.rst Quantization oneDNN backend only support VNNI CPU (#103653) 2023-06-19 09:50:07 +00:00
random.rst
rpc.rst Replace master with main in links and docs/conf.py (#100176) 2023-05-02 18:20:32 +00:00
signal.rst
sparse.rst [core][pruning][sparse][feature] SparseSemiStructured tensor subclass (#102135) 2023-06-27 19:21:06 +00:00
special.rst
storage.rst
tensor_attributes.rst
tensor_view.rst
tensorboard.rst
tensors.rst Add itemsize and nbytes properties to Tensor (#98322) 2023-04-05 12:11:55 +00:00
testing.rst
torch.ao.ns._numeric_suite.rst
torch.ao.ns._numeric_suite_fx.rst
torch.overrides.rst Add torch_dispatch and modes to extending.rst note (#102087) 2023-06-22 12:56:35 +00:00
torch.rst Re-land _cycleviz.py: visualize reference cycles holding cuda memory (#104051) 2023-06-23 13:44:58 +00:00
type_info.rst