pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Ilia Cherniavskii	f5c95d5cf1	Source code level attribution in profiler (#43898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43898 Adding with_source parameter to enable tracking source code (filename and line) in profiler for eager, torchscript and autograd modes Test Plan: python test/test_profiler.py ``` Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Source Location ----------------------------------- --------------- --------------- --------------- --------------- --------------- --------------- -------------------------------------------- ts_method_1 10.43% 235.364us 36.46% 822.920us 822.920us 1 test/test_profiler.py(70): test_source aten::add 7.52% 169.833us 8.88% 200.439us 200.439us 1 test/test_profiler.py(69): test_source aten::normal_ 6.26% 141.380us 6.26% 141.380us 141.380us 1 test/test_profiler.py(67): test_source aten::add 5.80% 130.830us 8.41% 189.800us 63.267us 3 test/test_profiler.py(72): test_source aten::sum 5.02% 113.340us 8.39% 189.475us 189.475us 1 test/test_profiler.py(64): ts_method_1 aten::add 4.58% 103.346us 6.33% 142.847us 142.847us 1 test/test_profiler.py(62): ts_method_1 aten::mul 4.05% 91.498us 9.62% 217.113us 217.113us 1 test/test_profiler.py(71): test_source aten::add 4.03% 90.880us 5.60% 126.405us 126.405us 1 test/test_profiler.py(58): ts_method_2 aten::empty 3.49% 78.735us 3.49% 78.735us 19.684us 4 test/test_profiler.py(72): test_source ``` Reviewed By: ngimel Differential Revision: D23432664 Pulled By: ilia-cher fbshipit-source-id: 83ad7ebe0c2502494d3b48c4e687802db9c77615	2020-09-30 00:57:35 -07:00
Rohan Varma	27ab9bc0f9	[RPC profiling] Extend RPC profiling to support async function execution over RPC. (#44664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44664 Closes https://github.com/pytorch/pytorch/issues/39971. This PR adds support for functions decorated with `rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run. To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node. For example, if the following async function is ran on a server over RPC: ``` def slow_add(x, y): time.sleep(1) return torch.add(x, y) rpc.functions.async_execution def slow_async_add(to, x, y): return rpc.rpc_async(to, slow_add, args=(x, y)) ``` we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output: ``` ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls Node ID ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- rpc_async#slow_async_add(worker1 -> worker2) 0.00% 0.000us 0 1.012s 1.012s 1 1 aten::empty 7.02% 11.519us 7.02% 11.519us 11.519us 1 1 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3) 0.00% 0.000us 0 1.006s 1.006s 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty 7.21% 11.843us 7.21% 11.843us 11.843us 1 2 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add 71.94% 118.107us 85.77% 140.802us 140.802us 1 3 rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty 13.82% 22.695us 13.82% 22.695us 22.695us 1 3 ------------------------------------------------------------------------------------------------------------------------- --------------- --------------- --------------- -------- ------- --------------- --------------- --------------- Self CPU time total: 164.164us ``` This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code. ghstack-source-id: 112868470 Test Plan: ``` rvarm1@devbig978:fbcode (52dd34f6)$ buck test mode/no-gpu mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_rpc_profiling_async_function --print-passing-details --stress-runs 1 ``` Reviewed By: mrshenli Differential Revision: D23638387 fbshipit-source-id: eedb6d48173a4ecd41d70a9c64048920bd4807c4	2020-09-25 13:19:26 -07:00
Michael Suo	22401b850b	port all JIT tests to gtest (#45264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45264 Context for why we are porting to gtest in: https://github.com/pytorch/pytorch/pull/45018. This PR completes the process of porting and removes unused files/macros. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23901392 Pulled By: suo fbshipit-source-id: 89526890e1a49462f3f77718f4ee273c5bc578ba	2020-09-25 11:37:43 -07:00
jjsjann123	99e0a87bbb	[nvFuser] Latency improvements for pointwise + reduction fusion (#45218 ) Summary: A lot of changes are in this update, some highlights: - Added Doxygen config file - Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR) - Improved latency with dynamic shape handling for the fusion logic - Prevent recompilation for pointwise + reduction fusions when not needed - Improvements to inner dimension reduction performance - Added input -> kernel + kernel launch parameters cache, added eviction policy - Added reduction fusions with multiple outputs (still single reduction stage) - Fixed code generation bugs for symbolic tiled GEMM example - Added thread predicates to prevent shared memory form being loaded multiple times - Improved sync threads placements with shared memory and removed read before write race - Fixes to FP16 reduction fusions where output would come back as FP32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218 Reviewed By: ezyang Differential Revision: D23905183 Pulled By: soumith fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79	2020-09-24 23:17:20 -07:00
Raziel Alvarez Guevara	2b38c09f69	Moves prim ops from C10 back to JIT (#45144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45144 Moves prim ops from C10 back to JIT. These were originally moved to C10 from JIT in D19237648 (`f362cd510d`) ghstack-source-id: 112775781 Test Plan: buck test //caffe2/test/cpp/jit:jit https://pxl.cl/1l22N buck test adsatlas/gavel/lib/ata_processor/tests:ata_processor_test https://pxl.cl/1lBxD Reviewed By: iseeyuan Differential Revision: D23697598 fbshipit-source-id: 36d1eb8c346e9b161ba6af537a218440a9bafd27	2020-09-24 09:44:20 -07:00
Michael Suo	6d21d5f0b3	gtest-ify JIT tests, through the letter c (#45249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45249 Reland of https://github.com/pytorch/pytorch/pull/45055 and https://github.com/pytorch/pytorch/pull/45020 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D23892645 Pulled By: suo fbshipit-source-id: e7fe58d5e1a5a0c44f4e2aec9694145afabde0fd	2020-09-24 00:21:20 -07:00
Michael Suo	e9aa6898ab	Revert D23802296: gtest-ify JIT tests, through the letter c Test Plan: revert-hammer Differential Revision: D23802296 (`d2b045030e`) Original commit changeset: 20c9798a414e fbshipit-source-id: a28d56039ca404fe94ed7572f1febd1673e3e788	2020-09-23 17:42:19 -07:00
Nikita Shulga	89c570ed0a	Revert D23811085: gtestify dce and fuser tests Test Plan: revert-hammer Differential Revision: D23811085 (`246bd9422a`) Original commit changeset: 45008e41f239 fbshipit-source-id: 94c981f565cab9b710fe52a55bbe8dbf9c179c23	2020-09-23 17:27:59 -07:00
Michael Suo	246bd9422a	gtestify dce and fuser tests (#45055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45055 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23811085 Pulled By: suo fbshipit-source-id: 45008e41f2394d2ba319745b0340392e1b3d3172	2020-09-23 14:33:22 -07:00
Michael Suo	d2b045030e	gtest-ify JIT tests, through the letter c (#45020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45020 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23802296 Pulled By: suo fbshipit-source-id: 20c9798a414e9ba30869a862012cbdee0613c8b1	2020-09-23 14:28:45 -07:00
Rohan Varma	70d2e4d1f6	[RPC profiling] Allow disableProfiler() to be called from another thread. (#44653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653 This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling. This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default. Added a test in `test_misc.cpp` to test this. ghstack-source-id: 112605620 Reviewed By: mrshenli Differential Revision: D23638499 fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b	2020-09-22 21:16:58 -07:00
Michael Suo	666223df46	[jit] gtestify test_argument_spec.cpp (#45019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45019 See https://github.com/pytorch/pytorch/pull/45018 for context. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23802298 Pulled By: suo fbshipit-source-id: 0e36d095d4d81dcd5ebe6d56b3dc469d6d5482d0	2020-09-22 19:44:14 -07:00
Elias Ellison	ae286d81e0	[JIT] improve alias analysis for list constructs (#39111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111 In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach: - it is not to hard to maintain the aliasDb implementation - it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated - It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements. The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op. In an example like: ``` def foo(input): x = torch.tensor([1, 2, 3, 4]) y = [x, x] input.add_(1) return torch.cat(y) ``` we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized. Test Plan: Imported from OSS Reviewed By: SplitInfinity Differential Revision: D23828003 Pulled By: eellison fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537	2020-09-22 09:38:59 -07:00
Michael Suo	42af2c7923	[jit] gtest-ify test_alias_analysis.cpp (#45018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45018 Now that https://github.com/pytorch/pytorch/pull/44795 has landed, we can convert the bulk of our cpp tests to use gtest APIs. Eventually we'll want to get rid of our weird harness for cpp tests entirely in favor of using regular gtest everywhere. This PR demonstrates some of the benefits of this approach: 1. You don't need to register your test twice (once to define it, once in tests.h). 2. Consequently, it's easier to have many individual test cases. Failures can be reported independently (rather than having huge functions to test entire modules. 3. Some nicer testing APIs, notably test fixtures. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23802297 Pulled By: suo fbshipit-source-id: 774255da7716294ac573747dcd5e106e5fe3ac8f	2020-09-21 12:19:37 -07:00
Michael Suo	374e9373b5	[jit] Pull (most) tests out of libtorch_python (#44795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795 Today, we build our cpp tests twice, once as a standalone gtest binary, and once linked in `libtorch_python` so we can call them from `test_jit.py`. This is convenient (it means that `test_jit.py` is a single entry point for all our tests), but has a few drawbacks: 1. We can't actually use the gtest APIs, since we don't link gtest into `libtorch_python`. We're stuck with the subset that we want to write polyfills for, and an awkward registration scheme where you have to write a test then include it in `tests.h`). 2. More seriously, we register custom operators and classes in these tests. In a world where we may be linking many `libtorch_python`s, this has a tendency to cause errors with `libtorch`. So now, only tests that explicitly require cooperation with Python are built into `libtorch_python`. The rest are built into `build/bin/test_jit`. There are tests which require that we define custom classes and operators. In these cases, I've built thm into separate `.so`s that we call `torch.ops.load_library()` on. Test Plan: Imported from OSS Reviewed By: SplitInfinity, ZolotukhinM Differential Revision: D23735520 Pulled By: suo fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff	2020-09-18 14:04:40 -07:00
Louis Feng	eb75cfb9c0	Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702 Original commit changeset: c6bd6d277aca This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value. Test Plan: Tested and previously landed. Reviewed By: mruberry Differential Revision: D23703215 fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459	2020-09-16 11:32:11 -07:00
Mike Ruberry	7036e91abd	Revert D23323486: DPP Async Tracing Test Plan: revert-hammer Differential Revision: D23323486 (`71673b31f9`) Original commit changeset: 4b6ca6c0e320 fbshipit-source-id: c6bd6d277aca070bef2de3522c2a60e23b4395ad	2020-09-15 01:19:23 -07:00
Louis Feng	71673b31f9	DPP Async Tracing (#44252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252 Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context. Test Plan: Tested with dpp perf test and able to collect trace. {F307824044} Reviewed By: ilia-cher Differential Revision: D23323486 fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b	2020-09-14 18:43:14 -07:00
Martin Yuan	7862827269	[pytorch] Add variadic run_method for lite intepreter (#44337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337 Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit. ghstack-source-id: 111909068 Test Plan: Added new unit test to test_jit test suite Reviewed By: linbinyu, ann-ss Differential Revision: D23585763 fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e	2020-09-13 13:26:30 -07:00
Ann Shan	a61318a535	[pytorch] Replace mobile run_method with get_method and operator() (#44202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202 In preparation for changing mobile run_method() to be variadic, this diff: * Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist. * Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects. ghstack-source-id: 111848222 Test Plan: CI, and all the unit tests which currently contain run_method that are being changed. Reviewed By: iseeyuan Differential Revision: D23436351 fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577	2020-09-11 10:23:06 -07:00
Lillian Johnson	b0bcdbb1ab	[JIT] Support partially specified sizes/strides in IRParser (#44113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D23508149 Pulled By: Lilyjjo fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49	2020-09-09 14:45:51 -07:00
Ann Shan	9b3c72d46e	[pytorch] Make mobile find_method return an optional (#43965 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965 As part of a larger effort to unify the API between the lite interpreter and full JIT: - implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function - add support for overloaded operator() to mobile Method and Function - mobile find_method now returns a c10::optional<Method> (so signature matches full jit) - moves some implementation of Function from module.cpp to function.cpp ghstack-source-id: 111161942 Test Plan: CI Reviewed By: iseeyuan Differential Revision: D23330762 fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d	2020-09-03 14:46:18 -07:00
Nikolay Korovaiko	f91bdbeabd	Enable function calls in TEFuser and SpecializeAutogradZero (#43866 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866 Reviewed By: ezyang Differential Revision: D23452798 Pulled By: Krovatkin fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5	2020-09-03 14:42:52 -07:00
Elias Ellison	544a56ef69	[JIT] Always map node output in vmap (#43988 ) Summary: Previously when merging a node without a subgraph, we would merge the node's outputs to the corresponding subgraph values, but when merging a node with a subgraph the node's outputs would be absent in the value mapping. This PR makes it so they are included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43988 Reviewed By: ZolotukhinM Differential Revision: D23462116 Pulled By: eellison fbshipit-source-id: 232c081261e9ae040df0accca34b1b96a5a5af57	2020-09-02 10:30:43 -07:00
generatedunixname89002005287564@sandcastle1415.cln1.facebook.com	1dd658f28f	[Codemod][GleanFbcode] Remove dead includes in caffe2/test (#43953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43953 Reviewed By: malfet Differential Revision: D23445556 fbshipit-source-id: 89cd6833aa06f35c5d3c99d698abb08cd61ae4ab	2020-09-01 21:48:28 -07:00
Gao, Xiang	5e97f251a8	Enable TF32 support for cuDNN (#40737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737 Reviewed By: mruberry Differential Revision: D22801525 Pulled By: ngimel fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2	2020-09-01 15:34:24 -07:00
Pritam Damania	f1624b82b5	Preserve python backtrace in autograd engine errors. (#43684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684 This PR attempts to address #42560 by capturing the appropriate exception_ptr in the autograd engine and passing it over to the Future. As part of this change, there is a significant change the Future API where we now only accept an exception_ptr as part of setError. For the example in #42560, the exception trace would now look like: ``` > Traceback (most recent call last): > File "test_autograd.py", line 6914, in test_preserve_backtrace > Foo.apply(t).sum().backward() > File "torch/tensor.py", line 214, in backward > torch.autograd.backward(self, gradient, retain_graph, create_graph) > File "torch/autograd/__init__.py", line 127, in backward > allow_unreachable=True) # allow_unreachable flag > File "torch/autograd/function.py", line 87, in apply > return self._forward_cls.backward(self, *args) > File "test_autograd.py", line 6910, in backward > raise ValueError("something") > ValueError: something ``` ghstack-source-id: 111109637 Test Plan: waitforbuildbot Reviewed By: albanD Differential Revision: D23365408 fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5	2020-09-01 01:28:47 -07:00
Nikolay Korovaiko	000739c31a	Function calls for fallback paths (#43274 ) Summary: This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274 Reviewed By: malfet Differential Revision: D23406961 Pulled By: Krovatkin fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c	2020-08-28 23:31:02 -07:00
Mikhail Zolotukhin	776c2d495f	[JIT] IRParser: store list attributes as generic ivalue lists. (#43785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43785 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D23400565 Pulled By: ZolotukhinM fbshipit-source-id: e248eb1854c4ec40da9455d4279ea6e47b1f2a16	2020-08-28 13:27:28 -07:00
Martin Yuan	288a2effa0	Operator generator based on templated selective build. (#43456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43456 Introduce the template OperatorGenerator, which returns an optional Operator. It's null if the templated bool value is null. RegisterOperators() is updated to take the optional Operator. A null will not be registered. With this update the selective operator registration can be done at compile time. Tests are added to show an operator can be registered if it's in a whitelist and it will not be registered if it's not in the whitelist. Test Plan: Imported from OSS Reviewed By: ljk53 Differential Revision: D23283563 Pulled By: iseeyuan fbshipit-source-id: 456e0c72b2f335256be800aeabb797bd83bcf0b3	2020-08-27 07:26:07 -07:00
James Reed	a070c619b9	[FX] Native callables in FX lowering (#43426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43426 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D23273427 Pulled By: jamesr66a fbshipit-source-id: 3a9d04486c72933d8afd9c181578fe98c3d825b0	2020-08-27 00:00:03 -07:00
Mikhail Zolotukhin	b763666f9f	[JIT] Subgraph utils: add an optional vmap argument to the API to allow retrieving value mappings. (#43235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43235 This functionality is needed when we want to not lose track of nodes/values as we merge and unmerge them into other nodes. For instance, if we have a side data structure with some meta information about values or nodes, this new functionality would allow to keep that metadata up to date after merging and unmerging nodes. Differential Revision: D23202648 Test Plan: Imported from OSS Reviewed By: eellison Pulled By: ZolotukhinM fbshipit-source-id: 350d21a5d462454166f8a61b51d833551c49fcc9	2020-08-25 18:13:29 -07:00
Ann Shan	7cc1efec13	Add lite SequentialSampler to torch mobile (#43299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43299 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23228415 Pulled By: ann-ss fbshipit-source-id: eebe54353a128783f039c7dac0e2dd765a61940d	2020-08-24 09:45:24 -07:00
Nikolay Korovaiko	a97ca93c0e	remove prim::profile and special-casing (#43160 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160 Reviewed By: ZolotukhinM Differential Revision: D23284421 Pulled By: Krovatkin fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309	2020-08-22 23:52:36 -07:00
Zino Benaissa	40c77f926c	Add prim::TypeCheck operation (#43026 ) Summary: TypeCheck is a new operation to check the shape of tensors against expectd shapes. TypeCheck is a variadic operation. An example, %t0 : Tensor = ... %t1 : Tensor = ... %2 : FLOAT(20, 20), %3 : FLOAT(30, 30), %1 : bool = prim::TypeCheck(%t1, %t2) prim::If(%1) Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/43026 Reviewed By: ZolotukhinM Differential Revision: D23115830 Pulled By: bzinodev fbshipit-source-id: fbf142126002173d2d865cf4b932dea3864466b4	2020-08-21 20:03:24 -07:00
Ann Shan	dd194c1612	add _save_parameters to serialize map (#43163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43163 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175287 Pulled By: ann-ss fbshipit-source-id: ddfd734513c07e8bdbec108f26d1ca1770d098a6	2020-08-18 14:58:04 -07:00
Ann Shan	2e6e295ecc	refactor _save_parameters to _save_data (#43162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43162 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23175286 Pulled By: ann-ss fbshipit-source-id: 6f930b98c367242fd4efbf51cb1d09995f7c4b40	2020-08-18 14:57:03 -07:00
Christian Sarofeen	b3bda94393	[NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129 ) Summary: Had a bunch of merged commits that shouldn't have been there, reverted them to prevent conflicts. Lots of new features, highlights listed below. Overall: - Enables pointwise fusion, single (but N-D) broadcast -- pointwise fusion, single (but N-D) broadcast -- pointwise -- single (but N-D) reduction fusion. Integration: - Separate "magic scheduler" logic that takes a fusion and generates code generator schedule - Reduction fusion scheduling with heuristics closely matching eagermode (unrolling supported, but no vectorize support) - 2-Stage caching mechanism, one on contiguity, device, type, and operations, the other one is input size->reduction heuristic Code Generation: - More generic support in code generation for computeAt - Full rework of loop nest generation and Indexing to more generically handle broadcast operations - Code generator has automatic kernel launch configuration (including automatic allocation of grid reduction buffers) - Symbolic (runtime) tilling on grid/block dimensions is supported - Simplified index generation based on user-defined input contiguity - Automatic broadcast support (similar to numpy/pytorch semantics) - Support for compile time constant shared memory buffers - Parallelized broadcast support (i.e. block reduction -> block broadcast support) Pull Request resolved: https://github.com/pytorch/pytorch/pull/43129 Reviewed By: mrshenli Differential Revision: D23162207 Pulled By: soumith fbshipit-source-id: 16deee4074c64de877eed7c271d6a359927111b2	2020-08-18 09:10:08 -07:00
Ann Shan	248b6a30f4	add training mode to mobile::Module (#42880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42880 Enable switching between and checking for training and eval mode for torch::jit::mobile::Module using train(), eval(), and is_training(), like exists for torch::jit::Module. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D23063006 Pulled By: ann-ss fbshipit-source-id: b79002148c46146b6e961cbef8aaf738bbd53cb2	2020-08-17 00:20:03 -07:00
Elias Ellison	91f3114fc1	[JIT] Represent profiled types as a node attribute (#43035 ) Summary: This changes profiled types from being represented as: `%23 : Float(4:256, 256:1, requires_grad=0, device=cpu) = prim::profile(%0)` -> `%23 : Tensor = prim::profile[profiled_type=Float(4:256, 256:1, requires_grad=0, device=cpu)](%0)` Previously, by representing the profiled type in the IR directly it was very easy for optimizations to accidentally use profiled types without inserting the proper guards that would ensure that the specialized type would be seen. It would be a nice follow up to extend this to prim::Guard as well, however we have short term plans to get rid of prim::Guard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/43035 Reviewed By: ZolotukhinM Differential Revision: D23120226 Pulled By: eellison fbshipit-source-id: c78d7904edf314dd65d1a343f2c3a947cb721b32	2020-08-14 20:17:46 -07:00
taivu	ccd9f3244b	Get, save, and load module information for each operator (#42133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42133 Test Plan: We save a module with module debugging information as follows. ``` import torch m = torch.jit.load('./detect.pt') # Save module without debug info m._save_for_lite_interpreter('./detect.bc') # Save module with debug info m._save_for_lite_interpreter('./detect.bc', _save_debug_info_in_bytecode=True) ``` Size of the file without module debugging information: 4.508 MB Size of the file with module debugging information: 4.512 MB Reviewed By: kimishpatel Differential Revision: D22803740 Pulled By: taivu1998 fbshipit-source-id: c82ea62498fde36a1cfc5b073e2cea510d3b7edb	2020-08-14 01:25:27 -07:00
Ann Shan	13bc542829	Fix lite trainer unit test submodule registration (#42714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42714 Change two unit tests for the lite trainer to register two instances/objects of the same submodule type instead of the same submodule object twice. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22990736 Pulled By: ann-ss fbshipit-source-id: 2bf56b5cc438b5a5fc3db90d3f30c5c431d3ae77	2020-08-07 18:26:56 -07:00
Ilia Cherniavskii	a53fdaa23f	Remove ProfiledType (#42570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42570 ProfiledType doesn't do anything and is not used atm, removing Test Plan: CI Reviewed By: ezyang Differential Revision: D22938664 Pulled By: ilia-cher fbshipit-source-id: 037c512938028f44258b702bbcde3f8c144f4aa0	2020-08-06 01:52:08 -07:00
Ann Shan	d707d4bf6d	Implement a light SGD optimizer (#42137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42137 This PR implements an SGD optimizer class similar to torch::optim::SGD, but it doesn't inherit from torch::optim::Optimizer, for use on mobile devices (or other lightweight use case). Adding Martin's comment for visibility: "SGD may be the only optimizer used in near future. If more client optimizers are needed, refactoring the full optim codes and reusing the existing code would be an option." Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22846514 Pulled By: ann-ss fbshipit-source-id: f5f46804aa021e7ada7c0cd3f16e24404d10c7eb	2020-08-03 17:27:53 -07:00
Wanchao Liang	a9e7e787f8	[jit] make clone works for interface type (#42121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42121 This PR changes the Module API to allow register a module with module interface type, and therefore allows Module::clone works on the case where there's a module interface type being shared by two submodules. interface type will be shared by the new cloned instance in the same compilation unit bc it only contains a list of functionSchema, which does not involve any attributes compared to classType. fixes https://github.com/pytorch/pytorch/issues/41882 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D22781205 Pulled By: wanchaol fbshipit-source-id: f97f4b75970f0b434e38b5a1f778eda2c4e5109b	2020-07-31 10:24:27 -07:00
Ann Shan	4b108ca763	refactor save_data as non member function (#42045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/42045 This PR changes the save_data() member functions of torch::jit::mobile::Module which was introduced in #41403 to be the non member function torch::jit::mobile::_save_parameters() (taking a mobile Module as its first argument). In addition, this PR: * adds a getter function _ivalue() for the mobile::Module object * renames torch::jit::mobile::_load_mobile_data() to torch::jit::mobile_load_parameters() * refactors the import.h header file into import.h and import_data.h Test Plan: Imported from OSS Reviewed By: kwanmacher, iseeyuan Differential Revision: D22766781 Pulled By: ann-ss fbshipit-source-id: 5cabae31927187753a958feede5e9a28d71d9e92	2020-07-28 21:52:32 -07:00
Yanan Cao	890b52e09f	Reduce instability in runCleanUpPasses by reordering passes. (#41891 ) Summary: Currently constant pooling runs before const propagation, which can create more constants that need pooling. This can get in the way of serialization/deserialization stability because each time user serializes and deserializes a module, runCleanUpPasses is called upon it. Doing so multiple times would lead to different saved module. This PR moves constant pooling after const propagation, which may slow down const propagation a little bit, but would otherwise side-step aforementioned problem. test_constant_insertion in test_jit.py is also updated because after fixing the pass ordering, the number of constants is no longer a constant and it is extremely difficult to get the exact number with the current convoluted test structure. So for now, I changed the test to check only that CSE doesn't change number of "prim::constant" rather than comparing against a known number. Also left a TODO to improve this test. ConstantPropagation pass is replaced by ConstantPropagationImmutableTypes because the latter is used in runCleanUpPasses. If not replaced, the former would create new CSE opportunities by folding more constants. This voids the purpose of the test case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/41891 Reviewed By: colesbury Differential Revision: D22701540 Pulled By: gmagogsfm fbshipit-source-id: 8e60dbdcc54a93dac111d81b8d88fb39387224f5	2020-07-24 11:39:20 -07:00
Ann Shan	dfe7d27d0e	implement lite parameter serializer (#41403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41403 Test Plan: Imported from OSS Reviewed By: kwanmacher Differential Revision: D22611633 Pulled By: ann-ss fbshipit-source-id: b391e8c96234b2e69f350119a11f688e920c7817	2020-07-23 14:25:44 -07:00
Ann Shan	1039bbf4eb	add named parameters to mobile module (#41376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41376 torch::jit::mobile::Module does not currently support accessing parameters via their attribute names, but torch::jit::Module does. This diff adds an equivalent functionality to mobile::Module. Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D22609142 Pulled By: ann-ss fbshipit-source-id: 1a5272ff336f99a3c0bb6194c6a6384754f47846	2020-07-20 15:57:49 -07:00
Ilia Cherniavskii	e7a09b4d17	RecordFunction in Dispatcher (#37587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37587 Lifting RecordFunction up into the dispatcher code Test Plan: Imported from OSS Differential Revision: D21374246 fbshipit-source-id: 19f9c1719e6fd3990e451c5bbd771121e91128f7	2020-07-17 22:20:05 -07:00

1 2 3 4 5 ...

461 commits