pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

History

Richard Zou f1420adfe3 Move at::chunk into the graph fuser (#10178 ) Summary: ... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026 This is done through the following: 1) Absorb starting chunks into FusionGroup as a part of the graph fuser pass. 2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked. 3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an input tensor on the CPU. This chunk directly takes in an at::Tensor and creates four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors. - Expect test and correctness test to see if a single chunk is fused by the graph fuser - Correctness test for a variety of chunks (dimension = beginning, middle, end) and tensors (contiguous, non-contiguous, edge case (splitSize = 1) for both CPU/CUDA - Expect test for multiple chunks fused into the same kernel and correctness test. cc zdevito apaszke LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights. After changes: ``` thnn cudnn jit 8.8468 6.5797 9.3470 ``` Before changes: ``` thnn cudnn jit 9.9221 6.6539 11.2550 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178 Differential Revision: D9382661 Pulled By: zou3519 fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8		2018-08-18 16:10:11 -07:00
..
bottleneck
cpp/api	Make torch::Tensor -> at::Tensor (#10516 )	2018-08-15 21:25:12 -07:00
cpp_extensions	Creates CUDAContext (#9435 )	2018-07-20 12:56:15 -07:00
custom_operator	Build mechanism for custom operators (#10226 )	2018-08-16 18:56:17 -07:00
data
error_messages
expect	Move at::chunk into the graph fuser (#10178 )	2018-08-18 16:10:11 -07:00
ffi/src
onnx	ATen layer norm symbolic (#10513 )	2018-08-15 08:28:52 -07:00
optim
common.py	enable unit tests and other changes (#10266 )	2018-08-06 14:54:01 -07:00
common_cuda.py
common_nn.py	Add CELU activation to pytorch (#8551 )	2018-08-01 07:54:44 -07:00
run_test.py	improve use of ROCm libraries, enable more tests, small fixes (#10406 )	2018-08-13 11:39:43 -07:00
test_autograd.py	improve use of ROCm libraries, enable more tests, small fixes (#10406 )	2018-08-13 11:39:43 -07:00
test_c10d.py	fixed c10d test (#10557 )	2018-08-15 17:22:38 -07:00
test_cpp_extensions.py
test_cuda.py	Fix corner case with torch.multinomial (#9960 )	2018-08-15 13:25:39 -07:00
test_dataloader.py	improve use of ROCm libraries, enable more tests, small fixes (#10406 )	2018-08-13 11:39:43 -07:00
test_distributed.py	Fix Python lint errors. (#10441 )	2018-08-11 21:08:50 -07:00
test_distributed_trap.py
test_distributions.py	remove implicit conversion from gpu to cpu (#10553 )	2018-08-16 12:10:39 -07:00
test_indexing.py	Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077 )	2018-07-31 16:43:45 -07:00
test_jit.py	Move at::chunk into the graph fuser (#10178 )	2018-08-18 16:10:11 -07:00
test_legacy_nn.py	Add CTC loss (#9628 )	2018-07-31 11:09:48 -07:00
test_multiprocessing.py	Correctly share CUDA Parameters. (#10220 )	2018-08-10 13:54:56 -07:00
test_nccl.py
test_nn.py	Fix dropout fused kernel applied in eval mode (#10621 )	2018-08-17 14:54:42 -07:00
test_optim.py	improve use of ROCm libraries, enable more tests, small fixes (#10406 )	2018-08-13 11:39:43 -07:00
test_sparse.py	set coalesced=false at sparse transpose() and removed transpose invariants (#10496 )	2018-08-14 21:25:37 -07:00
test_torch.py	Fix bincount for empty input (#9757 )	2018-08-15 20:55:59 -07:00
test_utils.py	improve use of ROCm libraries, enable more tests, small fixes (#10406 )	2018-08-13 11:39:43 -07:00