pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-15 21:00:47 +00:00

Author	SHA1	Message	Date
Martin Yuan	23c50a4a50	[PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432 ghstack-source-id: 120976584 torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class. CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed. This diff is to support the case of torchbind custom classes. 1. The class type can be resolved at import level. 2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested. 3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes. Test Plan: CI Reviewed By: raziel Differential Revision: D26168913 fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427	2021-02-03 21:57:19 -08:00
Mikhail Zolotukhin	c639513378	[TensorExpr] Resubmit: Introduce ExternalCall nodes to TE IR. (#51594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51594 ExternalCall nodes represent opaque calls to external functions to fill a tensor (buffer) with values. It could be used to include nodes that are otherwise not-representable as TE, or whose TE representation is currently too slow. To make an external function available in NNC as ExternalCall, one needs to implement a "bridge" function that would take raw (void*) pointers to the data along with the arrays containing dimension info. This function would then internally call the desired external function and make sure the results of the call are correctly placed in the provided raw data buffers. The reason the PR was previously reverted was that the LLVM generated calls to bridge functions were breaking unwind tables. This is now fixed by requiring bridge functions to never throw and setting the corresponding attribute in the LLVM generated code. Differential Revision: D26213882 Test Plan: Imported from OSS Reviewed By: pbelevich, ngimel Pulled By: ZolotukhinM fbshipit-source-id: db954d8338e2d750c2bf0a41e88e38bd494f2945	2021-02-03 10:22:54 -08:00
Xu Zhao	4fdebdc0c9	Improve PyTorch profiler flop computation formulas (#51377 ) Summary: Improve the flops computation formula of aten::conv2d operator to support stride, pad, dilation, and groups arguments. This diff also fixes the following issues: - Apply a factor of 2 to aten::mm because output accounts for multiplication and addition. - Fix incorrect names of scalar operators to aten::mul and aten::add. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51377 Test Plan: ```python python test/test_profiler.py ``` Reviewed By: jspark1105 Differential Revision: D26165223 Pulled By: xuzhao9 fbshipit-source-id: 2c5f0155c47af2e6a19332fd6ed73ace47fa072a	2021-02-02 11:49:04 -08:00
Luca Wehrstedt	4f37150f40	Revert D26179083: [TensorExpr] Introduce ExternalCall nodes to TE IR. Test Plan: revert-hammer Differential Revision: D26179083 (`f4fc3e3920`) Original commit changeset: 9e44de098ae9 fbshipit-source-id: d15684e04c65c395b4102d4f98a4488482822d1b	2021-02-02 05:29:41 -08:00
Mikhail Zolotukhin	f4fc3e3920	[TensorExpr] Introduce ExternalCall nodes to TE IR. (#51475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51475 ExternalCall nodes represent opaque calls to external functions to fill a tensor (buffer) with values. It could be used to include nodes that are otherwise not-representable as TE, or whose TE representation is currently too slow. To make an external function available in NNC as ExternalCall, one needs to implement a "bridge" function that would take raw (void*) pointers to the data along with the arrays containing dimension info. This function would then internally call the desired external function and make sure the results of the call are correctly placed in the provided raw data buffers. Test Plan: Imported from OSS Reviewed By: pbelevich, Chillee Differential Revision: D26179083 Pulled By: ZolotukhinM fbshipit-source-id: 9e44de098ae94d25772cf5e2659d539fa6f3f659	2021-02-02 00:50:46 -08:00
Scott Wolchok	7328710cbc	[PyTorch][codemod] Replace immediately-dereferenced cast calls w/castRaw (#50229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229 `fastmod -m 'cast(<((at\|c10)::)?\w+Type>\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the result of `cast()` wasn't being saved anywhere, so we didn't need it, so we can use a raw pointer instead of a new `shared_ptr`. ghstack-source-id: 120769170 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837494 fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada	2021-02-01 23:12:07 -08:00
Frank Seide	87ad77eb4e	T66557700 Support default argument values of a method (#48863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863 Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`). Test Plan: buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg Reviewed By: iseeyuan Differential Revision: D25896212 fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5	2021-02-01 18:35:13 -08:00
Cheng Chang	109bc1047e	[NNC] Generate C++ code for Allocate and Free (#51070 ) Summary: This is the initial skeleton for C++ codegen, it includes generations for Allocate and Free. Pull Request resolved: https://github.com/pytorch/pytorch/pull/51070 Test Plan: New unit tests are added to `test_cpp_codegen.cpp`. Reviewed By: ZolotukhinM Differential Revision: D26061818 Pulled By: cheng-chang fbshipit-source-id: b5256b2dcee6b2583ba73b6c9684994dbe7cdc1f	2021-02-01 13:06:51 -08:00
anjali411	508bab43e7	Support complex number list in JIT (#51145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51145 Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D26154025 Pulled By: anjali411 fbshipit-source-id: 74645f9b6467757ddb9d75846e778222109848f0	2021-01-31 23:54:14 -08:00
Mikhail Zolotukhin	e975169426	[TensorExpr] Redesign `Tensor` class. (#50995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995 This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and merges it with recently introduced 'CompoundTensor'. A statement for the tensor is either passed directly to the Tensor constructor (akin to 'CompoundTensor'), or is built immediately in constructor. LoopNest is no longer responsible for constructing statements from tensors - it simply stitches already constructed statements contained in Tensors. This has a side effect that now we cannot construct several loopnests from the same tensors - we need to explicitly clone statements if we want to do that. A special copy constructor was added to LoopNest to make it more convenient (note: this only affects tests, we don't usually create multiple loopnests in other places). Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26038223 Pulled By: ZolotukhinM fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17	2021-01-27 16:14:22 -08:00
Mikhail Zolotukhin	42aeb68128	[TensorExpr] Move 'initializer' field from 'Tensor' to 'Buf'. (#50993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50993 This is the first step to make 'Tensor` a thin wrapper over 'Buf' and 'Stmt', which will be finished in subsequent PRs. This change also allows to remove 'buf_initializers_' from 'LoopNest', making it "less stateful". Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D26038224 Pulled By: ZolotukhinM fbshipit-source-id: f418816e54c62f291fa45812901487394e9b95b5	2021-01-27 16:10:53 -08:00
Pritam Damania	40eea6d9d1	Support device map for distributed autograd while using TensorPipe. (#44859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44859 TensorPipe's `set_device_map` option was applied during the forward pass. However, if we ran the backward pass for the graph we would not automatically pick up the reverse device mapping. As a result, users had to specify both forward and backward device mapping which is very tedious to do. In this PR, I've added this functionality such that TensorPipe automatically picks up the reverse device mapping during the backward pass. This is done by storing the appropriate device mapping in the "recv" autograd function for distributed autograd. #Closes: https://github.com/pytorch/pytorch/issues/44170 ghstack-source-id: 119950842 Test Plan: 1) waitforbuildbot 2) Unit test added. Reviewed By: mrshenli Differential Revision: D23751975 fbshipit-source-id: 2717d0ef5bde3db029a6172d98aad95734d52140	2021-01-27 13:01:44 -08:00
Yanli Zhao	c9cae1446f	fix unflatten_dense_tensor when there is empty tensor inside (#50321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50321 Quantization team reported that when there are two empty tensors are replicated among ranks, the two empty tensors start to share storage after resizing. The root cause is unflatten_dense_tensor unflattened the empty tensor as view of flat tensor and thus share storage with other tensors. This PR is trying to avoid unflatten the empty tensor as view of flat tensor so that empty tensor will not share storage with other tensors. Test Plan: unit test Reviewed By: pritamdamania87 Differential Revision: D25859503 fbshipit-source-id: 5b760b31af6ed2b66bb22954cba8d1514f389cca	2021-01-23 12:14:34 -08:00
Richard Barnes	89cafde8a4	Modernize for-loops (#50912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50912 Test Plan: Sandcastle tests Reviewed By: ansley Differential Revision: D26001948 fbshipit-source-id: 3bfe6a8283a2b1882ed472f836ae1b6e720e519f	2021-01-22 10:53:24 -08:00
Bram Wasti	d60d108280	[nnc] Expose fast tanh/sigmoid (#50736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50736 Exposes tanh and sigmoid to other backends Test Plan: buck test caffe2/test/cpp/tensorexpr:tensorexpr -- "ATen.fast" Reviewed By: bertmaher Differential Revision: D25884911 fbshipit-source-id: f9a5286450331f60935cfd40bb23f4a4f4c1d087	2021-01-22 09:56:02 -08:00
Meghan Lele	4aea007351	[JIT] Fix archive file extension in examples and docs (#50649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50649 Summary Tutorials, documentation and comments are not consistent with the file extension they use for JIT archives. This commit modifies certain instances of `.pth` in `torch.jit.save` calls with `.pt`. Test Plan Continuous integration. Fixes This commit fixes #49660. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D25961628 Pulled By: SplitInfinity fbshipit-source-id: a40c97954adc7c255569fcec1f389aa78f026d47	2021-01-20 02:04:46 -08:00
Meghan Lele	8f5ad00e13	[JIT] Print out CU address in `ClassType::repr_str()` (#50194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50194 Summary `ClassType::repr_str()` prints out only the name of a `ClassType`, which is not always enough to disambiguate it. In some situations, two `ClassTypes` are compared and do not match despite having identical names because they are in separate compilation units. In such cases, the error message can seem nonsensical (e.g. `expected type T but found type T`). This commit modifies `ClassType::repr_str()` so that it prints out the address of the type's compilation unit to make these messages less puzzling (e.g. `expected type T (0x239023) but found type T (0x230223)`). Test Plan This commit adds a unit test, `ClassTypeTest.IdenticalTypesDifferentCus` that reproduces this situation. Fixes This commit fixes #46212. Test Plan: Imported from OSS Reviewed By: tugsbayasgalan Differential Revision: D25933082 Pulled By: SplitInfinity fbshipit-source-id: ec71b6728be816edd6a9c2b2d5075ead98d8bc88	2021-01-19 23:04:30 -08:00
Pritam Damania	8b501dfd98	Fix memory leak in TensorPipeAgent. (#50564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50564 When an RPC was sent, the associated future was stored in two maps: pendingResponseMessage_ and timeoutMap_. Once the response was received, the entry was only removed from pendingResponseMessage_ and not timeoutMap_. The pollTimedoudRpcs method then eventually removed the entry from timeoutMap_ after the time out duration had passed. Although, in scenarios where there is a large timeout and a large number of RPCs being used, it is very easy for the timeoutMap_ to grow without any bounds. This was discovered in https://github.com/pytorch/pytorch/issues/50522. To fix this issue, I've added some code to cleanup timeoutMap_ as well once we receive a response. ghstack-source-id: 119925182 Test Plan: 1) Unit test added. 2) Tested with repro in https://github.com/pytorch/pytorch/issues/50522 #Closes: https://github.com/pytorch/pytorch/issues/50522 Reviewed By: mrshenli Differential Revision: D25919650 fbshipit-source-id: a0a42647e706d598fce2ca2c92963e540b9d9dbb	2021-01-18 16:34:28 -08:00
Scott Wolchok	4a0d17ba2d	[PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228 `fastmod -m 'expect(<((at\|c10)::)?\w+Type>\s*)->' 'expectRef${1}.'` Presuming it builds, this is a safe change: the result of `expect()` wasn't being saved anywhere, so we didn't need it, so we can take a reference instead of a new `shared_ptr`. ghstack-source-id: 119782961 Test Plan: CI Reviewed By: SplitInfinity Differential Revision: D25837374 fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08	2021-01-13 16:13:55 -08:00
Andres Suarez	8530c65e25	[codemod][fbcode/caffe2] Apply clang-format update fixes Test Plan: Sandcastle and visual inspection. Reviewed By: igorsugak Differential Revision: D25849205 fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0	2021-01-09 14:37:36 -08:00
Thomas Viehmann	ea087e2d92	JIT: guard DifferentiableGraph node (#49433 ) Summary: This adds guarding for DifferentiableGraph nodes in order to not depend on Also bailing out on required gradients for the CUDA fuser. Fixes https://github.com/pytorch/pytorch/issues/49299 I still need to look into a handful of failing tests, but maybe it can be a discussion basis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433 Reviewed By: ngimel Differential Revision: D25681374 Pulled By: Krovatkin fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296	2021-01-08 20:01:27 -08:00
Shen Li	098751016e	Completely Remove FutureMessage from RPC cpp tests (#50027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50027 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25753815 Pulled By: mrshenli fbshipit-source-id: 85b9b03fec52b4175288ac3a401285607744b451	2021-01-07 19:50:50 -08:00
Shen Li	008206decc	Replace FutureMessage with ivalue::Future in RRefContext (#49960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49960 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25730530 Pulled By: mrshenli fbshipit-source-id: 5d54572c653592d79c40aed616266c87307a1ad8	2021-01-07 19:50:19 -08:00
Shen Li	25ef605132	Replace FutureMessage with ivalue::Future in distributed/autograd/utils.* (#49927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49927 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D25724241 Pulled By: mrshenli fbshipit-source-id: d608e448f5224e41fbb0b5be6b9ac51a587f25b4	2021-01-07 19:50:16 -08:00
Elias Ellison	efe1fc21fc	Dont inlinine intermediates on cpu (#49565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49565 Test Plan: Imported from OSS Reviewed By: Krovatkin, ZolotukhinM Differential Revision: D25688271 Pulled By: eellison fbshipit-source-id: 9ea7858e2db4fb31292e04440fc72ee04623c688	2021-01-04 15:46:20 -08:00
Edward Yang	8eee8460f8	codegen: Resolve overload ambiguities created by defaulted arguments (#49348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348 This is a redux of #45666 post refactor, based off of `d534f7d4c5` Credit goes to peterbell10 for the implementation. Fixes #43945. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: smessmer Differential Revision: D25594004 Pulled By: ezyang fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f	2021-01-04 11:59:16 -08:00
Sebastian Messmer	c7e9abb66a	Making ops c10-full: list of optional tensors (#49138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138 See for details: https://fb.quip.com/QRtJAin66lPN We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this. ## Backwards Compatibility - This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`. - This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57 - This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`. ghstack-source-id: 119269131 Test Plan: ## Benchmarks (C++ instruction counts): ### Forward #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x[0] = 1 \|11566015 \|11566015\|0 \|0.00% \| \|x.index({0}) \|6807019 \|6801019 \|-6000 \|-0.09%\| \|x.index({0, 0}) \|13529019 \|13557019\|28000 \|0.21% \| \|x.index({0, 0, 0}) \|10677004 \|10692004\|15000 \|0.14% \| \|x.index({"..."}) \|5512015 \|5506015 \|-6000 \|-0.11%\| \|x.index({Slice(None, None, None)}) \|6866016 \|6936016 \|70000 \|1.02% \| \|x.index({None}) \|8554015 \|8548015 \|-6000 \|-0.07%\| \|x.index({false}) \|22400000 \|22744000\|344000 \|1.54% \| \|x.index({true}) \|27624088 \|27264393\|-359695\|-1.30%\| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})\|123472000\|123463306\|-8694\|-0.01%\| ### Autograd #### Script ```py from torch.utils.benchmark import Timer counts = Timer( stmt=""" auto t = {{op call to measure}}; """, setup=""" using namespace torch::indexing; auto x = torch::ones({4, 4, 4}, torch::requires_grad()); """, language="cpp", ).collect_callgrind(number=1_000) print(counts) ``` Note: the script measures the forward path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path. #### Results \| Op call \|before \|after \|delta \| \| \|------------------------------------------------------------------------\|---------\|--------\|-------\|------\| \|x.index({0}) \|14839019\|14833019\|-6000\| 0.00% \| \|x.index({0, 0}) \|28342019\|28370019\|28000\| 0.00% \| \|x.index({0, 0, 0}) \|24434004\|24449004\|15000\| 0.00% \| \|x.index({"..."}) \|12773015\|12767015\|-6000\| 0.00% \| \|x.index({Slice(None, None, None)}) \|14837016\|14907016\|70000\| 0.47% \| \|x.index({None}) \|15926015\|15920015\|-6000\| 0.00% \| \|x.index({false}) \|36958000\|37477000\|519000\| 1.40% \| \|x.index({true}) \|41971408\|42426094\|454686\| 1.08% \| \|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) \|168184392\|164545682\|-3638710\| -2.16% \| Reviewed By: bhosmer Differential Revision: D25454632 fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d	2021-01-04 05:04:02 -08:00
Nikitha Malgi	12b73fdbbf	Adding JIT support for cuda streams and events (#48020 ) Summary: ======= This PR addresses the following: * Adds JIT support for CUDA Streams * Adds JIT support for CUDA Events * Adds JIT support for CUDA Stream context manager Testing: ====== python test/test_jit.py -v TestCUDA Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020 Reviewed By: navahgar Differential Revision: D25725749 Pulled By: nikithamalgifb fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c	2020-12-29 20:24:57 -08:00
anjali411	97c17b4772	Fix auto exponent issue for torch.pow (#49809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49809 Fixes https://github.com/pytorch/xla/issues/2688 #46936 Test Plan: Imported from OSS Reviewed By: nikithamalgifb Differential Revision: D25724176 Pulled By: anjali411 fbshipit-source-id: 16287a1f481e9475679b99d6fb45de840da225be	2020-12-29 17:02:56 -08:00
Joel Schlosser	68d438c9da	Add PixelUnshuffle (#49334 ) Summary: Adds an implementation of `torch.nn.PixelUnshuffle` as the inverse operation of `torch.nn.PixelShuffle`. This addresses https://github.com/pytorch/pytorch/issues/2456 Pull Request resolved: https://github.com/pytorch/pytorch/pull/49334 Test Plan: ``` # Unit tests. python test/test_nn.py TestNN.test_pixel_shuffle_unshuffle # Module test. python test/test_nn.py TestNN.test_PixelUnshuffle # C++ API tests. build/bin/test_api # C++ / python parity tests. python test/test_cpp_api_parity.py # JIT test. python test/test_jit.py TestJitGeneratedFunctional.test_nn_pixel_unshuffle # Override tests. python test/test_overrides.py # Type hint tests. python test/test_type_hints.py ``` Screenshots of rendered docs: <img width="876" alt="Screen Shot 2020-12-18 at 12 19 05 PM" src="https://user-images.githubusercontent.com/75754324/102642255-6b07bb00-412b-11eb-88fa-e53e7e8ba720.png"> <img width="984" alt="Screen Shot 2020-12-18 at 12 19 26 PM" src="https://user-images.githubusercontent.com/75754324/102642276-70fd9c00-412b-11eb-8548-445082a2db02.png"> <img width="932" alt="Screen Shot 2020-12-18 at 12 19 34 PM" src="https://user-images.githubusercontent.com/75754324/102642704-19abfb80-412c-11eb-9546-95bdd1c3cf22.png"> <img width="876" alt="Screen Shot 2020-12-22 at 12 51 36 PM" src="https://user-images.githubusercontent.com/75754324/102918259-986aa680-4454-11eb-99e7-a0b4c8b3e283.png"> <img width="869" alt="Screen Shot 2020-12-22 at 12 51 44 PM" src="https://user-images.githubusercontent.com/75754324/102918274-9ef91e00-4454-11eb-94bb-91b58aff47d3.png"> Reviewed By: mruberry Differential Revision: D25401439 Pulled By: jbschlosser fbshipit-source-id: 209d92ce7295e51699e83616d0c62170a7ce75c8	2020-12-22 20:14:55 -08:00
Mikhail Zolotukhin	a5b27d7a31	[TensorExpr] Move `SimpleIREval` implementation from .h to .cpp. (#49697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49697 Mostly mechanical move. This refactoring helps to hide unnecessary details from the SimpleIREval interface and make it more similar to a pure 'codegen'. Test Plan: Imported from OSS Reviewed By: nickgg Differential Revision: D25668696 Pulled By: ZolotukhinM fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c	2020-12-21 20:20:15 -08:00
Mikhail Zolotukhin	e1f73ced1e	[TensorExpr] Change `LoopNest::vectorize` to accept `For` instead of `Stmt`. (#49696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49696 And make it static. Test Plan: Imported from OSS Reviewed By: navahgar, nickgg Differential Revision: D25668695 Pulled By: ZolotukhinM fbshipit-source-id: 8d7fb507d6f3beca70e868d9e0f4c46247311a99	2020-12-21 20:17:20 -08:00
Raghavan Raman	46c9a0e679	Do not use negative values in GCD computation. (#49379 ) Summary: GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49379 Reviewed By: ezyang Differential Revision: D25597115 Pulled By: navahgar fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927	2020-12-21 15:08:43 -08:00
Nick Gibson	db2e9c1e7f	[NNC] Intermediate allocs flattened and dependency support (#49554 ) Summary: Makes two changes in NNC for intermediate buffer allocations: 1. Flattens dimensions of buffers allocated in LoopNest::prepareForCodegen() to match their flattened usages. 2. Adds support for tracking memory dependencies of Alloc/Free to the MemDependencyChecker, which will allow us to check safety of accesses to intermediate buffers (coming in a future diff). I didn't add any new tests as the mem dependency checker tests already cover it pretty well, particularly the GEMM test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49554 Reviewed By: VitalyFedyunin Differential Revision: D25643133 Pulled By: nickgg fbshipit-source-id: 66be3054eb36f0a4279d0c36562e63aa2dae371c	2020-12-21 10:35:15 -08:00
Dhruv Matani	4a870f6518	[PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385 Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile. What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead. Also updated the logic in `converter`. ### Before this change: 1. Get operator List from Torch Script Model 2. Convert to bytecode mobile model ### After this change: 1. Convert to bytecode mobile model 2. Use this converted mobile model to get the list of operators for each method on the model ghstack-source-id: 118796752 Test Plan: Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from. Verified that the list of operators produced before and after this change for an example model (segmentation) are the same. {P147863234} Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132} Reviewed By: iseeyuan Differential Revision: D24690094 fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff	2020-12-18 11:17:57 -08:00
Peng Wu	6568572712	Support integral types for kAbs in SimpleIREvaluator (#49357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357 This is a follow-up fix for PR #48679, where the previous PR adds support for integer inputs to aten::abs by promoting integers to float and then demote the result back to integers. This PR supports integer inputs to aten::abs more efficiently in the SimpleIREvaluator by allowing implementing integer inputs for kAbs (renamed from kFabs). - Rename kFabs to kAbs - Add support for integer input to kAbs in SimpleIREvalator (note that: llvm_codegen and cuda_codegen already supports integer inputs to kAbs) Test Plan: - `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` - `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops` Imported from OSS Reviewed By: eellison Differential Revision: D25545791 fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230	2020-12-18 07:57:58 -08:00
Nikita Shulga	020c443fd1	Fix CustomAutogradTest.ReentrantPriority rerun failures (#49581 ) Summary: Clear static variable at the end of the test to ensure test passes after re-runs Pull Request resolved: https://github.com/pytorch/pytorch/pull/49581 Test Plan: `./bin/test_api "--gtest_filter=CustomAutogradTest.ReentrantPriority" --gtest_repeat=50` Before the change all subsequent runs of the test failed with ``` ../test/cpp/api/autograd.cpp:681: Failure Expected equality of these values: order.size() Which is: 310 10 ``` Reviewed By: mrshenli Differential Revision: D25632374 Pulled By: malfet fbshipit-source-id: 4814d22b5dff15e1b38a0187e51070771fd58370	2020-12-18 00:34:06 -08:00
Xu Zhao	573f4aa352	FLOPS Roofline Analysis Feature for PyTorch Profiler. (#46506 ) Summary: FLOPs Roofline Analysis Feature for PyTorch Profiler. Currently, PyTorch Profiler lacks the ability to measure the FLOPs of operators, such as mm and conv. FLOPs are helpful to estimate the computation complexity of the operators. For now, we use input shapes to estimate the number of floating pointer operations. In the future, we may compute this information by tracking hardware counters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/46506 Test Plan: Run `python test/test_profiler_flops.py -k test_flops`. The test will print a profiler table with "FLOPS" column, like the following: ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Name Self CPU % Self CPU CPU total % CPU total CPU time avg # of Calls Input Shapes MFLOPS ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ aten::matmul 0.06% 57.653us 82.97% 79.310ms 79.310ms 1 [[40, 33, 1, 243], [243, 243]] -- aten::mm 82.84% 79.186ms 82.86% 79.204ms 79.204ms 1 [[1320, 243], [243, 243]] 984.323 aten::conv2d 0.04% 36.345us 16.06% 15.347ms 15.347ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ 44065010.318 aten::convolution 0.02% 16.016us 16.02% 15.310ms 15.310ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::_convolution 0.07% 63.855us 16.00% 15.294ms 15.294ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::mkldnn_convolution 15.89% 15.188ms 15.93% 15.225ms 15.225ms 1 [[40, 16, 18, 260], [33, 16, 18, 18], [33], [ -- aten::relu 0.10% 98.223us 0.64% 612.157us 306.079us 2 [[40, 33, 1, 243]] -- aten::threshold 0.49% 465.416us 0.54% 513.934us 256.967us 2 [[40, 33, 1, 243], [], []] -- aten::add_ 0.29% 279.301us 0.29% 279.301us 279.301us 1 [[40, 33, 1, 243], [243], []] -- aten::empty 0.10% 99.113us 0.10% 99.113us 24.778us 4 [[], [], [], [], [], []] -- ---------------------------- ------------ ------------ ------------ ------------ ------------ ------------ --------------------------------------------- ------------ Self CPU time total: 95.584ms . ---------------------------------------------------------------------- Ran 1 test in 0.176s For now, we only provide FLOPs calculation for aten::conv2d and aten::mm operators. Reviewed By: ezyang Differential Revision: D25214452 Pulled By: xuzhao9 fbshipit-source-id: 0ae841bd8dbdeb032346dc3d9d38e19875aa1da3	2020-12-17 21:19:25 -08:00
Bram Wasti	1047957831	[te][reapply] Add fast log approximation based on sleef (#49575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49575 This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25627157 fbshipit-source-id: a4920f4f4005ce617d372b375e790ca966275cd9	2020-12-17 17:02:00 -08:00
Edward Yang	ea4ccc730e	Revert D25445815: [te] Add fast log approximation based on sleef Test Plan: revert-hammer Differential Revision: D25445815 (`1329066b69`) Original commit changeset: 20696eacd12a fbshipit-source-id: 38830a6abd16260d60e5dd9a5594e65736a9c782	2020-12-17 15:03:17 -08:00
Bram Wasti	1329066b69	[te] Add fast log approximation based on sleef Summary: This is a fast log implementations benchmark: ``` buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none' ``` Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat Reviewed By: bertmaher Differential Revision: D25445815 fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888	2020-12-17 14:28:34 -08:00
Martin Yuan	2b61e4d84c	Revert D25152559: T66557700 Support default argument values of a method Test Plan: revert-hammer Differential Revision: D25152559 (`6bde0ca6d3`) Original commit changeset: bbf52f1fbdbf fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1	2020-12-17 14:05:49 -08:00
Elias Ellison	50386b9988	[NNC] Add Support For is_nan (#48973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48973 Test Plan: Imported from OSS Reviewed By: bertmaher Differential Revision: D25413166 Pulled By: eellison fbshipit-source-id: 0c79258345df18c60a862373fa16931228fb92ef	2020-12-16 18:31:01 -08:00
Elias Ellison	9056173acc	[NNC] Dont inline outputs buffers on cpu (#49488 ) Summary: In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining, which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in perf slowdown. The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488 Reviewed By: ezyang Differential Revision: D25596071 Pulled By: eellison fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd	2020-12-16 16:28:25 -08:00
Frank Seide	6bde0ca6d3	T66557700 Support default argument values of a method (#48863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863 Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`). Test Plan: buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg Reviewed By: raziel, iseeyuan Differential Revision: D25152559 fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0	2020-12-16 15:55:03 -08:00
Igor Gitman	1b6d18aa7c	Adding support for CuDNN-based LSTM with projections (#47725 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46213 I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should. 1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes. 2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that. 3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places. 4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that? 5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725 Reviewed By: zou3519 Differential Revision: D25449794 Pulled By: ngimel fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c	2020-12-16 11:27:02 -08:00
Scott Wolchok	22c6dafd33	[PyTorch] Use plain old function pointer for RecordFunctionCallback (reapply) (#49408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118665808 Test Plan: Wait for GitHub CI since we had C++14-specific issues with this one in previous PR https://github.com/pytorch/pytorch/pull/48629 Reviewed By: malfet Differential Revision: D25563207 fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d	2020-12-15 19:16:01 -08:00
Mikhail Zolotukhin	38a59a67f3	[JIT] Support multiple outputs in subgraph matcher. (#48992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48992 Differential Revision: D25388100 Test Plan: Imported from OSS Reviewed By: heitorschueroff Pulled By: ZolotukhinM fbshipit-source-id: d95713af2220cf4f99ac92f59f8e5b902f2f3822	2020-12-15 13:09:24 -08:00
Mike Ruberry	25bc906281	Revert D25135415: [PyTorch] Use plain old function pointer for RecordFunctionCallback Test Plan: revert-hammer Differential Revision: D25135415 (`7e23ee1598`) Original commit changeset: 5e92dc79da64 fbshipit-source-id: 45b1634a100084c84dca158a1f16ca760fef6988	2020-12-14 21:04:27 -08:00
Scott Wolchok	7e23ee1598	[PyTorch] Use plain old function pointer for RecordFunctionCallback (#48629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629 Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback. ghstack-source-id: 118568240 Test Plan: CI Reviewed By: dhruvbird Differential Revision: D25135415 fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3	2020-12-14 20:08:16 -08:00

1 2 3 4 5 ...

1209 commits