mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-14 20:57:59 +00:00
Summary: ... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026 This is done through the following: 1) Absorb starting chunks into FusionGroup as a part of the graph fuser pass. 2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked. 3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an input tensor on the CPU. This chunk directly takes in an at::Tensor and creates four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors. - Expect test and correctness test to see if a single chunk is fused by the graph fuser - Correctness test for a variety of chunks (dimension = beginning, middle, end) and tensors (contiguous, non-contiguous, edge case (splitSize = 1) for both CPU/CUDA - Expect test for multiple chunks fused into the same kernel and correctness test. cc zdevito apaszke LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights. After changes: ``` thnn cudnn jit 8.8468 6.5797 9.3470 ``` Before changes: ``` thnn cudnn jit 9.9221 6.6539 11.2550 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178 Differential Revision: D9382661 Pulled By: zou3519 fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8 |
||
|---|---|---|
| .. | ||
| bottleneck | ||
| cpp/api | ||
| cpp_extensions | ||
| custom_operator | ||
| data | ||
| error_messages | ||
| expect | ||
| ffi/src | ||
| onnx | ||
| optim | ||
| common.py | ||
| common_cuda.py | ||
| common_nn.py | ||
| run_test.py | ||
| test_autograd.py | ||
| test_c10d.py | ||
| test_cpp_extensions.py | ||
| test_cuda.py | ||
| test_dataloader.py | ||
| test_distributed.py | ||
| test_distributed_trap.py | ||
| test_distributions.py | ||
| test_indexing.py | ||
| test_jit.py | ||
| test_legacy_nn.py | ||
| test_multiprocessing.py | ||
| test_nccl.py | ||
| test_nn.py | ||
| test_optim.py | ||
| test_sparse.py | ||
| test_torch.py | ||
| test_utils.py | ||