pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Frank Seide	6bde0ca6d3	T66557700 Support default argument values of a method (#48863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863 Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`). Test Plan: buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg Reviewed By: raziel, iseeyuan Differential Revision: D25152559 fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0	2020-12-16 15:55:03 -08:00
Igor Gitman	1b6d18aa7c	Adding support for CuDNN-based LSTM with projections (#47725 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46213 I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should. 1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes. 2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that. 3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places. 4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that? 5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725 Reviewed By: zou3519 Differential Revision: D25449794 Pulled By: ngimel fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c	2020-12-16 11:27:02 -08:00
Scott Wolchok	22c6dafd33	[PyTorch] Use plain old function pointer for RecordFunctionCallback (reapply) (#49408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118665808 Test Plan: Wait for GitHub CI since we had C++14-specific issues with this one in previous PR https://github.com/pytorch/pytorch/pull/48629 Reviewed By: malfet Differential Revision: D25563207 fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d	2020-12-15 19:16:01 -08:00
Mikhail Zolotukhin	38a59a67f3	[JIT] Support multiple outputs in subgraph matcher. (#48992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48992 Differential Revision: D25388100 Test Plan: Imported from OSS Reviewed By: heitorschueroff Pulled By: ZolotukhinM fbshipit-source-id: d95713af2220cf4f99ac92f59f8e5b902f2f3822	2020-12-15 13:09:24 -08:00
Mike Ruberry	25bc906281	Revert D25135415: [PyTorch] Use plain old function pointer for RecordFunctionCallback Test Plan: revert-hammer Differential Revision: D25135415 (`7e23ee1598`) Original commit changeset: 5e92dc79da64 fbshipit-source-id: 45b1634a100084c84dca158a1f16ca760fef6988	2020-12-14 21:04:27 -08:00
Scott Wolchok	7e23ee1598	[PyTorch] Use plain old function pointer for RecordFunctionCallback (#48629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118568240 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25135415 fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3	2020-12-14 20:08:16 -08:00
Scott Wolchok	900aa4ee97	[PyTorch] remove convenience RecordFunctionCallback interface (#48620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620 In preparation for storing bare function pointer (8 bytes) instead of std::function (32 bytes). ghstack-source-id: 118568242 Test Plan: CI Reviewed By: ezyang Differential Revision: D25132183 fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da	2020-12-14 20:03:15 -08:00
Bram Wasti	6b78644623	[te] Add BitCast to the IR (#49184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49184 Adds BitCasting to NNC. This will enable fast approximation algorithms implemented directly in TensorExpressions Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr Reviewed By: bertmaher Differential Revision: D25466476 fbshipit-source-id: f063ab29ba7bab2dcce463e499f2d4a16bdc1f0e	2020-12-11 16:12:20 -08:00
Shijun Kong	f965b0fcfb	Expose run_async function on torch::jit::Method (#48607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48607 This change builds on top of https://github.com/pytorch/pytorch/pull/46865 further exposing the async interface to `torch::jit::Method`. added unit test for new `run_async` Test Plan: `buck test caffe2/test/cpp/jit/...` Reviewed By: dzhulgakov Differential Revision: D25219726 fbshipit-source-id: 89743c82a0baa1affe0254c1e2dbf873de8e5c76	2020-12-11 11:17:58 -08:00
generatedunixname89002005325676	dcd1e3d78d	[AutoAccept][Codemod][FBSourceClangFormatLinter] Daily `arc lint --take CLANGFORMAT` Reviewed By: zertosh Differential Revision: D25490983 fbshipit-source-id: b24a11214a485a4a24ccf7da1e72715b450d3a81	2020-12-11 08:43:24 -08:00
Nick Gibson	5469aa5e7f	[NNC] Add a non functional Tensor kind (#48750 ) Summary: Adds the CompoundTensor, a specialisation of the NNC Tensor which allows arbitrary production statements. This will allow lowering of aten ops into specific NNC IR patterns (which don't need to be functional) - allowing us to shortcut to the optimized form of common patterns. This is part 1 of trying to clean up the lowering of aten::cat so it is easier to optimize. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48750 Reviewed By: tugsbayasgalan Differential Revision: D25433517 Pulled By: nickgg fbshipit-source-id: de13c4719f8f87619ab254e5f324f13b5be1c9da	2020-12-10 19:43:50 -08:00
Elias Ellison	3b57be176e	[NNC] Preserve strided output (#48264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48264 Preserves the strided representation of NNC Tensor outputs by transforming them into the right layout at the end of the kernel. Fix for https://github.com/pytorch/pytorch/issues/45604 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25286213 Pulled By: eellison fbshipit-source-id: 64d94ac463741e2568a1c9d44174e15ea26e511f	2020-12-10 12:19:51 -08:00
Elias Ellison	413caa7fd2	[NNC] Compute Tensor Output Properties in ininitialization (#47813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47813 We have some code paths that at kernel invocation seem to handle dynamic sizes, but I'm not sure how well it works because we have other parts of our code base that assume that tenso shapes are always fully specified. https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/tensorexpr/kernel.cpp#L1572 As with some other PRs in the stack, I think it would be good to remove the features that aren't on/actively being worked on while they are not used. I initially did this PR to try to speed up perf. I couldn't observe too much of a speed up, so we can decide to keep drop this PR if we want. Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25286212 Pulled By: eellison fbshipit-source-id: 4ae66e0af88d649dd4e592bc78686538c2fdbaeb	2020-12-10 12:19:45 -08:00
Bram Wasti	195b92bfa6	Revert D25441716: [te] Add BitCast to the IR Test Plan: revert-hammer Differential Revision: D25441716 (`3384145418`) Original commit changeset: c97b871697bc fbshipit-source-id: e6eff02e28e1ae8c826dd2cfed79f869839ed2ba	2020-12-10 09:31:35 -08:00
Bram Wasti	3384145418	[te] Add BitCast to the IR Summary: Adds BitCasting to NNC. This will enable fast approximation algorithms implemented directly in TensorExpressions Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr Reviewed By: bertmaher Differential Revision: D25441716 fbshipit-source-id: c97b871697bc5931d09cda4a9cb0a81bb420f4e2	2020-12-10 09:25:46 -08:00
Nick Gibson	c5bc6b40ab	[NNC] Dead Store Elimination (#49030 ) Summary: Adds a new optimization method to LoopNest which eliminates stores that do not contribute to any output. It's unlikely any of the lowerings of aten operators produce these stores yet, but this creates some wiggle room for transformations in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49030 Reviewed By: tugsbayasgalan Differential Revision: D25434538 Pulled By: nickgg fbshipit-source-id: fa1ead82e6f7440cc783c6116b23d0b7a5b5db4b	2020-12-09 18:49:53 -08:00
Bert Maher	492580b855	[te] Remove vestigial __init__.py from test/cpp/tensorexpr (#49061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49061 We don't use the python harness for cpp tests anymore. ghstack-source-id: 118140485 Test Plan: Careful thinking. Reviewed By: navahgar Differential Revision: D25410290 fbshipit-source-id: 879e3c6fb296298d567e1d70b18bde96b5cac90d	2020-12-09 10:09:46 -08:00
Mikhail Zolotukhin	2b70bcd014	[TensorExpr] Enable inlining for output tensors too. (#48967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48967 We previously didn't inline output tensors which resulted in correctness issues like #48533. This PR allows inlining for output tensors too - this could result in duplicated computations, but we can address that later once correctness is ensured. Performance results on FastRNNS: Before the fix: ``` Benchmarking LSTMs... name avg_fwd std_fwd avg_bwd std_bwd cudnn 10.09 0.05431 17.55 0.2108 aten 21.52 0.1276 26.7 1.471 jit 13.25 0.8748 22.47 1.73 jit_premul 11.43 0.3226 19.43 2.231 jit_premul_bias 11.84 0.2245 20.33 2.205 jit_simple 13.27 0.9906 22.15 0.9724 jit_multilayer 13.38 0.8748 22.82 1.01 py 33.55 4.837 46.41 6.333 ``` After the fix: ``` Benchmarking LSTMs... name avg_fwd std_fwd avg_bwd std_bwd cudnn 10.09 0.05979 17.45 0.1987 aten 21.21 0.144 26.43 0.7356 jit 13.01 0.2925 23.21 0.8454 jit_premul 11.4 0.3905 19.62 2.448 jit_premul_bias 11.85 0.2461 20.29 0.6592 jit_simple 13.08 0.8533 22.81 1.315 jit_multilayer 12.93 0.1095 23.57 1.459 py 31.21 2.783 44.63 6.073 ``` Differential Revision: D25383949 Test Plan: Imported from OSS Reviewed By: SplitInfinity Pulled By: ZolotukhinM fbshipit-source-id: 16f5727475109a278499bef7905f6aad18c8527a	2020-12-08 13:24:40 -08:00
Peter Bell	5180caeeb4	Remove deprecated spectral ops from torch namespace (#48594 ) Summary: Ref https://github.com/pytorch/pytorch/issues/42175 This removes the 4 deprecated spectral functions: `torch.{fft,rfft,ifft,irfft}`. `torch.fft` is also now imported by by default. The actual `at::native` functions are still used in `torch.stft` so can't be full removed yet. But will once https://github.com/pytorch/pytorch/issues/47601 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48594 Reviewed By: heitorschueroff Differential Revision: D25298929 Pulled By: mruberry fbshipit-source-id: e36737fe8192fcd16f7e6310f8b49de478e63bf0	2020-12-05 04:12:32 -08:00
Chen Lai	416dc68341	[Pytorch][Annotation] Update inlined callstack with module instance info (#47416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47416 Test Plan: Imported from OSS Reviewed By: kimishpatel Differential Revision: D24752846 Pulled By: cccclai fbshipit-source-id: 94d3c18c56161d1de3a16bb7c93502fedf71644c	2020-12-03 10:44:46 -08:00
Scott Wolchok	d1df4038ff	[PyTorch] Make RecordFunctionCallback::should_run_ a function pointer (#48274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48274 The std::function-ness of it was used only for tests. (std::function is huge at 32 bytes, and not particularly efficient.) ghstack-source-id: 117498491 Test Plan: CI Reviewed By: dzhulgakov Differential Revision: D25102077 fbshipit-source-id: fd941ddf32235a9659a1a17609c27cc5cb446a54	2020-12-01 13:02:25 -08:00
Bert Maher	adb4fd3f2f	[te] Fix comparison ops on booleans (#48384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48384 As title Test Plan: buck test //caffe2/test:jit -- test_binary_ops Reviewed By: asuhan Differential Revision: D25115773 fbshipit-source-id: c5f8ee21692bcf0d78f099789c0fc7c457a1e4a2	2020-11-30 18:21:35 -08:00
ProGamerGov	d6ddd78eb0	Fix multiple spelling and grammar mistakes (#48592 ) Summary: I found a number of spelling & grammatical mistakes in the repository. Previously I had these fixes submitted individually, but I saw that a single word change was apparently too small for a PR to be merged. Hopefully this new PR has a sufficient number of changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48592 Reviewed By: ejguan Differential Revision: D25224216 Pulled By: mrshenli fbshipit-source-id: 2af3db2aee486563efd0dffc4e8f777306a73e44	2020-11-30 15:18:44 -08:00
Ilia Cherniavskii	f7a8bf2855	Use libkineto in profiler (#46470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470 Adding ability to use Kineto (CUPTI) to profile CUDA kernels Test Plan: USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install python test/test_profiler.py python test/test_autograd.py -k test_profile python test/test_autograd.py -k test_record ``` ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg Self CUDA Self CUDA % CUDA total CUDA time avg # of Calls ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ Memcpy HtoD (Pageable -> Device) 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 1.000us 2 sgemm_32x32x32_NN 0.00% 0.000us 0.00% 0.000us 0.000us 2.000us 33.33% 2.000us 2.000us 1 void at::native::vectorized_elementwise_kernel<4, at... 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 Memcpy DtoH (Device -> Pageable) 0.00% 0.000us 0.00% 0.000us 0.000us 1.000us 16.67% 1.000us 1.000us 1 aten::randn 5.17% 74.000us 6.71% 96.000us 48.000us 0.000us 0.00% 0.000us 0.000us 2 aten::empty 1.33% 19.000us 1.33% 19.000us 4.750us 0.000us 0.00% 0.000us 0.000us 4 aten::normal_ 1.05% 15.000us 1.05% 15.000us 7.500us 0.000us 0.00% 0.000us 0.000us 2 aten::to 77.90% 1.114ms 91.61% 1.310ms 436.667us 0.000us 0.00% 3.000us 1.000us 3 aten::empty_strided 2.52% 36.000us 2.52% 36.000us 12.000us 0.000us 0.00% 0.000us 0.000us 3 aten::copy_ 2.73% 39.000us 11.19% 160.000us 53.333us 0.000us 0.00% 3.000us 1.000us 3 cudaMemcpyAsync 4.34% 62.000us 4.34% 62.000us 20.667us 0.000us 0.00% 0.000us 0.000us 3 cudaStreamSynchronize 1.61% 23.000us 1.61% 23.000us 7.667us 0.000us 0.00% 0.000us 0.000us 3 aten::mm 0.21% 3.000us 7.20% 103.000us 103.000us 0.000us 0.00% 2.000us 2.000us 1 aten::stride 0.21% 3.000us 0.21% 3.000us 1.000us 0.000us 0.00% 0.000us 0.000us 3 cudaLaunchKernel 2.45% 35.000us 2.45% 35.000us 17.500us 0.000us 0.00% 0.000us 0.000us 2 aten::add 0.49% 7.000us 4.27% 61.000us 61.000us 0.000us 0.00% 1.000us 1.000us 1 ------------------------------------------------------- ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ------------ ``` benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a Reviewed By: Chillee Differential Revision: D25142223 Pulled By: ilia-cher fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80	2020-11-25 04:32:16 -08:00
Elias Ellison	a00ba63023	Disable old fuser internally (#48322 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48322 Disable old fuser internally. I would like to find where we are inadvertently setting the old fuser, but in the meantime I would like to land a diff that I know will 100% cause it not to be run, and verify that it fixes the issue. Test Plan: sandcastle Reviewed By: ZolotukhinM Differential Revision: D25126202 fbshipit-source-id: 5a4d0742f5f829e536f50e7ede1256c94dd05232	2020-11-21 00:42:23 -08:00
Erjia Guan	c542614e53	Implement C++ ModuleDict (#47707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47707 Fixes #45896 Test Plan: Imported from OSS Reviewed By: glaringlee Differential Revision: D24872641 Pulled By: ejguan fbshipit-source-id: 3d1dc9148ba3bcf66ab9c44ddb5774060bbc365d	2020-11-19 08:07:51 -08:00
Nikolay Korovaiko	0d8ddb5ec2	Make softmax and log_softmax handle negative dims, add tests (#48156 ) Summary: Make softmax and log_softmax handle negative dims, add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/48156 Reviewed By: bertmaher Differential Revision: D25059788 Pulled By: Krovatkin fbshipit-source-id: 985963e7df400857c9774660c76be7d56201a1ad	2020-11-19 01:38:14 -08:00
Scott Wolchok	bef460a803	[PyTorch] Return raw ptr from ThreadLocalDebugInfo::get() (#47796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47796 `ThreadLocalDebugInfo::get()` is a hot function. For example, it is called by `DefaultCPUAllocator::allocate()`. Most callers do not even bother to keep the returned `shared_ptr` around, proving that they have no lifetime issues currently. For the rest, it appears that the only way that the returned pointer could become invalid is if they then called a function that swapped out `ThreadLocalDebugInfo` using `ThreadLocalStateGuard`. There are very few such paths, and it doesn't look like any current callers of `ThreadLocalDebugInfo::get()` needed a `shared_ptr` at all. ghstack-source-id: 116979577 Test Plan: 1) reviewers to double-check audit of safety 2) run framework overhead benchmarks Reviewed By: dzhulgakov Differential Revision: D24902978 fbshipit-source-id: d684737cc2568534cac7cd3fb8d623b971c2fd28	2020-11-18 20:37:17 -08:00
Scott Wolchok	4c9eb57914	[PyTorch] Narrow Device to 2 bytes by narrowing DeviceType and DeviceIndex (#47023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47023 DeviceType pretty clearly only needs 1 byte. DeviceIndex only needs 1 byte given that machines don't have anywhere near 255 GPUs in them as far as I know. ghstack-source-id: 116901430 Test Plan: Existing tests, added assertion to catch if my assumption about DeviceIndex is incorrect Reviewed By: dzhulgakov Differential Revision: D24605460 fbshipit-source-id: 7c9a89027fcf8eebd623b7cdbf6302162c981cd2	2020-11-18 19:39:40 -08:00
Bert Maher	07657b6001	[tensorexpr] Switch cpp tests to pure gtest (#48160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48160 We no longer use the custom c++ test infra anyways, so move to pure gtest. Fixes #45703 ghstack-source-id: 116977283 Test Plan: `buck test //caffe2/test/cpp/tensorexpr` Reviewed By: navahgar, nickgg Differential Revision: D25046618 fbshipit-source-id: da34183d87465f410379048148c28e1623618553	2020-11-18 12:23:34 -08:00
Nick Gibson	aabc87cd04	[NNC] Fix HalfChecker when half present but unused (#48068 ) Summary: Fixes an internally reported issue in the tensorexpr fuser when using FP16 on Cuda. The HalfChecker analysis to determine if we need to define the Half type searches the IR for expressions that use Half. If one of the parameters is of type Half but it (or any other Half expr) are not used in the IR we'll return a false negative. Fix this by adding the parameter list to the HalfChecker. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48068 Reviewed By: ZolotukhinM Differential Revision: D25009680 Pulled By: nickgg fbshipit-source-id: 24fddef06821f130db3d3f45d6d041c7f34a6ab0	2020-11-17 12:07:57 -08:00
Bert Maher	af37f8f810	[pytorch][te] Do not merge Tensor[] variant of aten::where into fusion group (#48063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48063 The TE fuser does not know how to construct a list of Tensors. Test Plan: new unit test Reviewed By: eellison Differential Revision: D25007234 fbshipit-source-id: 1a8ffdf5ffecb39a727357799ed32df8f53150d6	2020-11-16 22:41:10 -08:00
Bram Wasti	43a9d6fb6e	[TorchScript] Support user defined classes as constants (#5062 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/5062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45556 User defined classes can be used as constants. This is useful when freezing and removing the module from the graph. Test Plan: waitforsadcastle Reviewed By: eellison Differential Revision: D23994974 fbshipit-source-id: 5b4a5c91158aa7f22df39d71f2658afce1d29317	2020-11-16 20:52:02 -08:00
Nick Gibson	957e45a97c	[NNC] Support vectorization of reductions (#47924 ) Summary: Add support for ReduceOp in the Vectorizer, which allows vectorization of reductions. Only non-reduce axes can be vectorized currently, we'd need either automatically pulling out the RHS of reductions (better as a separate transform, I think) or special handling of vector reduce in the LLVM codegen (tricky, maybe not useful?) to make vectorizing reduce axes work. There was a disabled LLVM test for this case which I reenabled with a bit of massaging, and added a few more. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47924 Reviewed By: bertmaher Differential Revision: D24963464 Pulled By: nickgg fbshipit-source-id: 91d91e9e2696555ab5690b154984b1ce48359d51	2020-11-16 10:43:53 -08:00
Mike Ruberry	013e6a3d9d	Revert D24698027: Fix auto exponent issue for torch.pow Test Plan: revert-hammer Differential Revision: D24698027 (`8ef7ccd669`) Original commit changeset: f23fdb65c925 fbshipit-source-id: 9a67a2c6310c9e4fdefbb421a8cd4fa41595bc9a	2020-11-15 03:58:44 -08:00
anjali411	8ef7ccd669	Fix auto exponent issue for torch.pow (#47024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47024 Fixes https://github.com/pytorch/pytorch/issues/46936 Stack from [ghstack](https://github.com/ezyang/ghstack): * #47024 Fix auto exponent issue for torch.pow Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D24698027 Pulled By: anjali411 fbshipit-source-id: f23fdb65c925166243593036e08214c4f041a63d	2020-11-14 22:50:12 -08:00
Bert Maher	5adf840259	[pytorch][te][easy] Remove KernelScope from fusion pass tests (#47952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47952 We don't actually generate a TE kernel so no need to use the arena-allocation guard. Test Plan: ``` buck test //caffe2/test/cpp/tensorexpr -- FuserPass ``` Reviewed By: ZolotukhinM Differential Revision: D24967107 fbshipit-source-id: 302f65b2fcff704079e8b51b942b7b3baff95585	2020-11-14 20:25:01 -08:00
Bert Maher	6b8d20c023	[pytorch][te] Don't start TE fusion groups with an unknown-typed result (#47884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47884 We need to know output types of everything in a fusion group to ensure that we generate correctly-typed tensors. We were incorrectly starting a fusion group with an unknown-typed output. Test Plan: New unit tests: ``` buck test //caffe2/test:jit //caffe2/test/cpp/tensorexpr:tensorexpr ``` Reviewed By: eellison Differential Revision: D24932786 fbshipit-source-id: 83978a951f32c1207bbc3555a7d3bd94fe4e70fb	2020-11-13 10:52:53 -08:00
Nick Gibson	eab809377d	[NNC] Remove all deferred expansion from Reductions (#47709 ) Summary: Refactors the ReduceOp node to remove the last remaining deferred functionality: completing the interaction between the accumulator buffer and the body. This fixes two issues with reductions: 1. Nodes inside the interaction could not be visited or modified, meaning we could generate bad code when the interaction was complex. 2. The accumulator load was created at expansion time and so could not be modified in some ways (ie. vectorization couldn't act on these loads). This simplifies reduction logic quite a bit, but theres a bit more involved in the rfactor transform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47709 Reviewed By: ZolotukhinM Differential Revision: D24904220 Pulled By: nickgg fbshipit-source-id: 159e5fd967d2d1f8697cfa96ce1bb5fc44920a40	2020-11-12 20:17:52 -08:00
Nick Gibson	76ff557de7	[NNC] add hazard analysis to Bounds Inference (#47684 ) Summary: Adds a helper function to Bounds Inference / Memory Analaysis infrastructure which returns the kind of hazard found between two Stmts (e.g. Blocks or Loops). E.g. ``` for (int i = 0; i < 10; ++i) { A[x] = i * 2; } for (int j = 0; j < 10; ++j) { B[x] = A[x] / 2; } ``` The two loops have a `ReadAfterWrite` hazard, while in this example: ``` for (int i = 0; i < 10; ++i) { A[x] = i * 2; } for (int j = 0; j < 10; ++j) { A[x] = B[x] / 2; } ``` The loops have a `WriteAfterWrite` hazard. This isn't 100% of what we need for loop fusion, for example we don't check the strides of the loop to see if they match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47684 Reviewed By: malfet Differential Revision: D24873587 Pulled By: nickgg fbshipit-source-id: 991149e5942e769612298ada855687469a219d62	2020-11-12 11:34:31 -08:00
Yanan Cao	00a3add425	[TorchBind] Support using lambda function as TorchBind constructor (#47819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47819 Reviewed By: wanchaol Differential Revision: D24910065 Pulled By: gmagogsfm fbshipit-source-id: ad5b4f67b0367e44fe486d31a060d9ad1e0cf568	2020-11-12 09:29:34 -08:00
Wanchao Liang	553ccccc54	[c10d] switch ProcessGroup to be managed by intrusive_ptr (#47343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47343 Test Plan: Imported from OSS Reviewed By: gmagogsfm Differential Revision: D24723418 Pulled By: wanchaol fbshipit-source-id: 0463819b96c53b12bdbb3905431110d7b21beb77	2020-11-12 07:36:23 -08:00
Wanchao Liang	665ac2f7b0	[reland] [c10d] switch Store to be managed by intrusive_ptr (#47808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47808 reland https://github.com/pytorch/pytorch/pull/47074 Test Plan: wait for ci Reviewed By: gmagogsfm Differential Revision: D24905246 fbshipit-source-id: edeb7e6e486570ce889f12512e9dc02061d6cc03	2020-11-11 22:53:20 -08:00
Wanchao Liang	1f946e942d	Revert D24667128: [c10d] switch Store to be managed by intrusive_ptr Test Plan: revert-hammer Differential Revision: D24667128 (`0cfe3451d4`) Original commit changeset: 9b6024c31c85 fbshipit-source-id: d8ddf9eb2fccef5023e05698e0c4662708fe4945	2020-11-11 10:49:58 -08:00
Wanchao Liang	0cfe3451d4	[c10d] switch Store to be managed by intrusive_ptr (#47074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47074 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D24667128 Pulled By: wanchaol fbshipit-source-id: 9b6024c31c851b7c3243540f460ae57323da523b	2020-11-10 23:36:44 -08:00
Nick Gibson	0b30a8d007	[NNC] Simplify and fix some bugs in Bounds Inference (#47450 ) Summary: Refactors NNC bounds inference to use the dependency analysis added in https://github.com/pytorch/pytorch/issues/46952. This ends up being a pretty good simplification because we no longer need the complicated bound merging code that we used to determine contiguous ranges. There were no usages of that code and the memory dependency analyzer is closer to what we want for those use cases anyway. Added tests for a few cases uncovered by the existing bounds inference test - much of the coverage for this feature is in tests of it's uses: rfactor, computeAt and cacheAccesses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47450 Reviewed By: heitorschueroff Differential Revision: D24834458 Pulled By: nickgg fbshipit-source-id: f93e40b09c0745dcc46c7e34359db594436d04f0	2020-11-09 21:37:04 -08:00
Martin Yuan	a1fef453b6	Support extra files in _load_for_mobile (#47425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47425 Extra files can be exported in lite interpreter model, but it could not be loaded. This PR is to add the capability to load extra files from lite interpreter model. Because extra_files is a default argument, it should not affect the existing usage of _load_for_mobile. It's a simple assembly or a generic unordered_map. No additional dependency should be introduced and the size overhead should be small (to be tested). Test Plan: Imported from OSS Reviewed By: kwanmacher Differential Revision: D24770266 Pulled By: iseeyuan fbshipit-source-id: 7e8bd301ce734dbbf36ae56c9decb045aeb801ce	2020-11-06 20:26:54 -08:00
Raghavan Raman	8eb228a7f3	Add support for log_softmax (#47409 ) Summary: This diff adds support for `log_softmax` op in NNC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47409 Reviewed By: ejguan Differential Revision: D24750203 Pulled By: navahgar fbshipit-source-id: c4dacc7f62f9df65ae467f0d578ea03d3698273d	2020-11-06 13:29:27 -08:00
Nick Gibson	0edc6a39c8	[NNC] Read/Write Dependency analysis (#46952 ) Summary: Adds a new piece of infrastructure to the NNC fused-kernel generation compiler, which builds a dependency graph of the reads and writes to memory regions in a kernel. It can be used to generate graphs like this from the GEMM benchmark (not this only represents memory hierarchy not compute hierarchy): ![image](https://user-images.githubusercontent.com/701287/97368797-e99d5600-1868-11eb-9a7e-ceeb91ce72b8.png) Or to answer questions like this: ``` Tensor* c = Compute(...); Tensor* d = Compute(...); LoopNest loop({d}); MemDependencyChecker analyzer; loop.root_stmt()->accept(analyzer); if (analyzer.dependsDirectly(loop.getLoopStmtsFor(d)[0], loop.getLoopStmtsFor(c)[0]) { // do something, maybe computeInline } ``` Or this: ``` Tensor* d = Compute(...); LoopNest loop({d}); MemDependencyChecker analyzer(loop.getInputs(), loop.getOutputs()); const Buf* output = d->buf(); for (const Buf* input : inputs) { if (!analyzer.dependsIndirectly(output, input)) { // signal that this input is unused } } ``` This is a monster of a diff, and I apologize. I've tested it as well as possible for now, but it's not hooked up to anything yet so should not affect any current usages of the NNC fuser. How it works: Similar to the registerizer, the MemDependencyChecker walks the IR aggregating memory accesses into scopes, then merges those scopes into their parent scope and tracks which writes are responsible for the last write to a particular region of memory, adding dependency links where that region is used. This relies on a bunch of math on symbolic contiguous regions which I've pulled out into its own file (bounds_overlap.h/cpp). Sometimes this wont be able to infer dependence with 100% accuracy but I think it should always be conservative and occaisionally add false positives but I'm aware of no false negatives. The hardest part of the analysis is determining when a Load inside a For loop depends on a Store that is lower in the IR from a previous iteration of the loop. This depends on a whole bunch of factors, including whether or not we should consider loop iteration order. The analyzer comes with configuration of this setting. For example this loop: ``` for (int i = 0; i < 10; ++i) { A[x] = B[x] + 1; } ``` has no inter loop dependence, since each iteration uses a distinct slice of both A and B. But this one: ``` for (int i = 0; i < 10; ++i) { A[0] = A[0] + B[x]; } ``` Has a self loop dependence between the Load and the Store of A. This applies to many cases that are not reductions as well. In this example: ``` for (int i =0; i < 10; ++i) { A[x] = A[x+1] + x; } ``` Whether or not it has self-loop dependence depends on if we are assuming the execution order is fixed (or whether this loop could later be parallelized). If the read from `A[x+1]` always comes before the write to that same region then it has no dependence. The analyzer can correctly handle dynamic shapes, but we may need more test coverage of real world usages of dynamic shapes. I unit test some simple and pathological cases, but coverage could be better. Next Steps: Since the PR was already so big I didn't actually hook it up anywhere, but I had planned on rewriting bounds inference based on the dependency graph. Will do that next. There are few gaps in this code which could be filled in later if we need it: * Upgrading the bound math to work with write strides, which will reduce false positive dependencies. * Better handling of Conditions, reducing false positive dependencies when a range is written in both branches of a Cond. * Support for AtomicAdd node added in Cuda codegen. Testing: See new unit tests, I've tried to be verbose about what is being tested. I ran the python tests but there shouldn't be any way for this work to affect them yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46952 Reviewed By: ejguan Differential Revision: D24730346 Pulled By: nickgg fbshipit-source-id: 654c67c71e9880495afd3ae0efc142e95d5190df	2020-11-04 19:52:20 -08:00
Gaoxiang Liu	735f8cc6c2	[DI] Allow explicit taskLauncher for torchscript interpreter (#46865 ) Summary: By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865 Test Plan: unit test is passed. fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac Fixes #{issue number} Reviewed By: houseroad Differential Revision: D24616102 Pulled By: garroud fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59	2020-11-04 17:07:55 -08:00

1 2 3 4 5 ...

1165 commits