Commit graph

473 commits

Author SHA1 Message Date
Yanan Cao
00a3add425 [TorchBind] Support using lambda function as TorchBind constructor (#47819)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47819

Reviewed By: wanchaol

Differential Revision: D24910065

Pulled By: gmagogsfm

fbshipit-source-id: ad5b4f67b0367e44fe486d31a060d9ad1e0cf568
2020-11-12 09:29:34 -08:00
Martin Yuan
a1fef453b6 Support extra files in _load_for_mobile (#47425)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47425

Extra files can be exported in lite interpreter model, but it could not be loaded. This PR is to add the capability to load extra files from lite interpreter model. Because extra_files is a default argument, it should not affect the existing usage of _load_for_mobile. It's a simple assembly or a generic unordered_map. No additional dependency should be introduced and the size overhead should be small (to be tested).

Test Plan: Imported from OSS

Reviewed By: kwanmacher

Differential Revision: D24770266

Pulled By: iseeyuan

fbshipit-source-id: 7e8bd301ce734dbbf36ae56c9decb045aeb801ce
2020-11-06 20:26:54 -08:00
Gaoxiang Liu
735f8cc6c2 [DI] Allow explicit taskLauncher for torchscript interpreter (#46865)
Summary:
By default, TorchScript execution is single threaded and uses the caller's thread pool. For the use case of distributed inference, we hope there is a way to customize the behavior where the  interpreter in torch script can be executed in other places. This diff allows an explicit taskLauncher for torchscript interpreter.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46865

Test Plan:
unit test is passed.

fbshipit-source-id: 1d7b003926c0d1f8facc53206efb960cff8897ac

Fixes #{issue number}

Reviewed By: houseroad

Differential Revision: D24616102

Pulled By: garroud

fbshipit-source-id: 79202b62f92d0b0baf72e4bf7aa3f05e0da91d59
2020-11-04 17:07:55 -08:00
Mikhail Zolotukhin
3161fe6d5a [JIT] SubgraphUtils: add a function for generating a string name for a given graph. (#47253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47253

The function simply goes over all aten nodes in the graph and
concatenates their names, truncating the final name to a given length.

Differential Revision: D24698272

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: d6e50194ca5faf0cb61f25af83247b5e40f202e4
2020-11-03 16:36:41 -08:00
Kimish Patel
82b74bd929 For torch::jit::module's attr method to moble::module (#47059)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47059

This diff adds attr getter to mobile::module similar to torchscript module at https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/object.h#L75-L83.

Test Plan: LiteInterpreterTest::CheckAttrAccess

Reviewed By: xta0

Differential Revision: D24604950

fbshipit-source-id: cfac187f47f5115807dc119fe6c203f60dbd5dff
2020-11-02 16:38:12 -08:00
Jeffrey Wan
f5073b0c5a Add inputs argument to autograd.backward() (#46855)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46373

As noted in https://github.com/pytorch/pytorch/issues/46373, there needs to be a flag passed into the engine that indicates whether it was executed through the backward api or grad api. Tentatively named the flag `accumulate_grad` since functionally, backward api accumulates grad into .grad while grad api captures the grad and returns it.

Moving changes not necessary to the python api (cpp, torchscript) to a new PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46855

Reviewed By: ngimel

Differential Revision: D24649054

Pulled By: soulitzer

fbshipit-source-id: 6925d5a67d583eeb781fc7cfaec807c410e1fc65
2020-11-02 14:32:38 -08:00
Dhruv Matani
9c1a41b724 [RFC] Add OperatorHandle overload to the RecordFunction::before() method (#46401)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46401

Broader context about selective/custom build available at https://fb.quip.com/2oEzAR5MKqbD and https://fb.workplace.com/groups/pytorch.mobile.team/permalink/735794523641956/

Basically, we want to be able to trace full operator names (with overload name). The current observer infra picks up the operator name from the schema, which doesn't seem to include the overload name. To ensure consistency with the existing uses and to accomodate the new use-case, this diff adds a new overload to accept an `OperatorHandle` object, and the code in `before()` eagerly resolves it to an `OperatorName` object (which can be cached in a member variable) as well as a string (view) operator-name which has the same semantics as before.

Why do we pass in an `OperatorHandle` but then resolve it to an `OperatorName`? This might come across as a strange design choice (and it is), but it is grounded in practicality.

It is not reasonable to cache an `OperatorHandle` object but caching an `OperatorName` object is reasonable since it holds all the data itself.

An initial version of this change was trying to test this change in the `xplat` repo, which didn't work. Thanks to ilia-cher for pointing out that the dispatcher observing mechanism is disabled under a compile time flag (macro) for xplat.
ghstack-source-id: 114360747

Test Plan:
`buck test fbcode/caffe2/fb/test:record_function_test` succeeds. Also replicated this test in OSS in the file `test_misc.cpp` where the rest of the `RecordFunction` subsystem is being tested.

Ran benchmark as reqiested by ilia-cher

{P146511280}

Reviewed By: ilia-cher

Differential Revision: D24315241

fbshipit-source-id: 239f3081e6aa2e26c3021a7dd61f328b723b03d9
2020-10-28 22:38:26 -07:00
Yanan Cao
f9b9430152 Support doc_string for TorchBind custom classes (#46576)
Summary:
With this PR, users can optionally provide a "doc_string" to describe a class or its method. doc_string for TorchBind classes and methods are stored as `doc_string` properties in `Function` and `ScriptClass`. These `dos_string` properties are then exposed in Python layer via PyBind for doc generation.

Fixes https://github.com/pytorch/pytorch/issues/46047

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46576

Reviewed By: wanchaol

Differential Revision: D24440636

Pulled By: gmagogsfm

fbshipit-source-id: bfa9b270a6c2d8bc769a88fad6be939cc6310412
2020-10-24 12:51:35 -07:00
jiej
ac146c4820 [nvFuser] Switching to CudaFusionGuard from BailOut for nvfuser - update 2 (#46452)
Summary:
1. Added CudaFusionGuard as the custom TypeCheck for nvfuser; enabled dynamic shape support with profiling executor;
2. dropped support for legacy fuser;
3. re-enabled nvfuser tests;
4. added registration for profiling record to allow profiling on user specified nodes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46452

Reviewed By: zou3519, anjali411

Differential Revision: D24364642

Pulled By: ngimel

fbshipit-source-id: daf53a9a6b6636e1ede420a3a6d0397d4a8b450b
2020-10-19 15:44:31 -07:00
Elias Ellison
c86655a815 [JIT] Fix Dict bug in constant hashing (#45929)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45929

We were checking `and` when we should have been checking `or`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D24148804

Pulled By: eellison

fbshipit-source-id: 9c394ea10ac91a588169d934b1e3208512c71b9d
2020-10-07 17:40:17 -07:00
Ansley Ussery
5072728d88 Fix stride printing/parsing formatting (#45156)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/45156

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D24078695

Pulled By: ansley

fbshipit-source-id: dab993277d43b31105c38d12098c37653747b42a
2020-10-06 15:06:46 -07:00
Thomas Viehmann
3ab88c3903 Enable TorchBind tests on ROCm (#45426)
Summary:
The torchbind tests didn't work be cause somehow we missed the rename of caffe2_gpu to torch_... (hip for us) in https://github.com/pytorch/pytorch/issues/20774 (merged 2019-06-13, oops) and still tried to link against it.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45426

Reviewed By: VitalyFedyunin

Differential Revision: D24112439

Pulled By: walterddr

fbshipit-source-id: a66a574e63714728183399c543d2dafbd6c028f7
2020-10-05 09:38:12 -07:00
Ilia Cherniavskii
f5c95d5cf1 Source code level attribution in profiler (#43898)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43898

Adding with_source parameter to enable tracking source code
(filename and line) in profiler for eager, torchscript and autograd
modes

Test Plan:
python test/test_profiler.py
```
Name                                 Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  Source Location
-----------------------------------  ---------------  ---------------  ---------------  ---------------  ---------------  ---------------  --------------------------------------------
ts_method_1                          10.43%           235.364us        36.46%           822.920us        822.920us        1                test/test_profiler.py(70): test_source
aten::add                            7.52%            169.833us        8.88%            200.439us        200.439us        1                test/test_profiler.py(69): test_source
aten::normal_                        6.26%            141.380us        6.26%            141.380us        141.380us        1                test/test_profiler.py(67): test_source
aten::add                            5.80%            130.830us        8.41%            189.800us        63.267us         3                test/test_profiler.py(72): test_source
aten::sum                            5.02%            113.340us        8.39%            189.475us        189.475us        1                test/test_profiler.py(64): ts_method_1
aten::add                            4.58%            103.346us        6.33%            142.847us        142.847us        1                test/test_profiler.py(62): ts_method_1
aten::mul                            4.05%            91.498us         9.62%            217.113us        217.113us        1                test/test_profiler.py(71): test_source
aten::add                            4.03%            90.880us         5.60%            126.405us        126.405us        1                test/test_profiler.py(58): ts_method_2
aten::empty                          3.49%            78.735us         3.49%            78.735us         19.684us         4                test/test_profiler.py(72): test_source
```

Reviewed By: ngimel

Differential Revision: D23432664

Pulled By: ilia-cher

fbshipit-source-id: 83ad7ebe0c2502494d3b48c4e687802db9c77615
2020-09-30 00:57:35 -07:00
Rohan Varma
27ab9bc0f9 [RPC profiling] Extend RPC profiling to support async function execution over RPC. (#44664)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44664

Closes https://github.com/pytorch/pytorch/issues/39971. This PR adds support for functions decorated with `rpc.functions.async_execution` to be profiled over RPC as builtins, jit functions, and blocking python UDFs currently can be. The reasoning for this is to provide complete feature support in terms of RPC profiling and the various types of functions users can run.

To enable this, the PR below this enables calling `disableProfiler()` safely from another thread. We use that functionality to defer disabling the profiler on the server until the future corresponding to the RPC request completes (rather than only the blocking `processRPC` call as was done previously). Since when the future completes we've kicked off the async function and the future corresponding to it has completed, we are able to capture any RPCs the function would have called and the actual work done on the other node.

For example, if the following async function is ran on a server over RPC:

```
def slow_add(x, y):
    time.sleep(1)
    return torch.add(x, y)

rpc.functions.async_execution
def slow_async_add(to, x, y):
    return rpc.rpc_async(to, slow_add, args=(x, y))
```

we expect to see the original RPC profiled, the nested RPC profiled, and the actual torch.add() work. All of these events should be recorded with the correct node id. Here is an example profiling output:

```
-------------------------------------------------------------------------------------------------------------------------  ---------------  ---------------  ---------------  --------
-------  ---------------  ---------------  ---------------
Name                                                                                                                       Self CPU total %  Self CPU total   CPU total %      CPU total        CPU time avg     Number of Calls  Node ID
-------------------------------------------------------------------------------------------------------------------------  ---------------  ---------------  ---------------  --------
-------  ---------------  ---------------  ---------------                                                                                                                            rpc_async#slow_async_add(worker1 -> worker2)                                                                               0.00%            0.000us          0                1.012s
         1.012s           1                1
aten::empty                                                                                                                7.02%            11.519us         7.02%            11.519us         11.519us         1                1
rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)                             0.00%            0.000us          0                1.006s
         1.006s           1                2                                                                                                                                          rpc_async#slow_async_add(worker1 -> worker2)#remote_op: aten::empty                                                        7.21%            11.843us         7.21%            11.843us
         11.843us         1                2
rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::add        71.94%           118.107us        85.77%           140.802us        140.802us        1                3
rpc_async#slow_async_add(worker1 -> worker2)#remote_op: rpc_async#slow_add(worker2 -> worker3)#remote_op: aten::empty      13.82%           22.695us         13.82%           22.695us
         22.695us         1                3                                                                                                                                          -------------------------------------------------------------------------------------------------------------------------  ---------------  ---------------  ---------------  --------
-------  ---------------  ---------------  ---------------
Self CPU time total: 164.164us
```

This PR also moves a bunch of the profiling logic to `rpc/utils.cpp` to declutter `request_callback` code.
ghstack-source-id: 112868470

Test Plan:
```
rvarm1@devbig978:fbcode  (52dd34f6)$ buck test mode/no-gpu mode/dev-nosan //caffe2/test/distributed/rpc:process_group_agent -- test_rpc_profiling_async_function --print-passing-details --stress-runs 1
```

Reviewed By: mrshenli

Differential Revision: D23638387

fbshipit-source-id: eedb6d48173a4ecd41d70a9c64048920bd4807c4
2020-09-25 13:19:26 -07:00
Michael Suo
22401b850b port all JIT tests to gtest (#45264)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45264

Context for why we are porting to gtest in: https://github.com/pytorch/pytorch/pull/45018.

This PR completes the process of porting and removes unused files/macros.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23901392

Pulled By: suo

fbshipit-source-id: 89526890e1a49462f3f77718f4ee273c5bc578ba
2020-09-25 11:37:43 -07:00
jjsjann123
99e0a87bbb [nvFuser] Latency improvements for pointwise + reduction fusion (#45218)
Summary:
A lot of changes are in this update, some highlights:

- Added Doxygen config file
- Split the fusion IR (higher level TE like IR) from kernel IR (lower level CUDA like IR)
- Improved latency with dynamic shape handling for the fusion logic
- Prevent recompilation for pointwise + reduction fusions when not needed
- Improvements to inner dimension reduction performance
- Added input -> kernel + kernel launch parameters cache, added eviction policy
- Added reduction fusions with multiple outputs (still single reduction stage)
- Fixed code generation bugs for symbolic tiled GEMM example
- Added thread predicates to prevent shared memory form being loaded multiple times
- Improved sync threads placements with shared memory and removed read before write race
- Fixes to FP16 reduction fusions where output would come back as FP32

Pull Request resolved: https://github.com/pytorch/pytorch/pull/45218

Reviewed By: ezyang

Differential Revision: D23905183

Pulled By: soumith

fbshipit-source-id: 12f5ad4cbe03e9a25043bccb89e372f8579e2a79
2020-09-24 23:17:20 -07:00
Raziel Alvarez Guevara
2b38c09f69 Moves prim ops from C10 back to JIT (#45144)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45144

Moves prim ops from C10 back to JIT.

These were originally moved to C10 from JIT in D19237648 (f362cd510d)
ghstack-source-id: 112775781

Test Plan:
buck test //caffe2/test/cpp/jit:jit

https://pxl.cl/1l22N

buck test adsatlas/gavel/lib/ata_processor/tests:ata_processor_test

https://pxl.cl/1lBxD

Reviewed By: iseeyuan

Differential Revision: D23697598

fbshipit-source-id: 36d1eb8c346e9b161ba6af537a218440a9bafd27
2020-09-24 09:44:20 -07:00
Michael Suo
6d21d5f0b3 gtest-ify JIT tests, through the letter c (#45249)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45249

Reland of https://github.com/pytorch/pytorch/pull/45055 and
https://github.com/pytorch/pytorch/pull/45020

See https://github.com/pytorch/pytorch/pull/45018 for context.

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D23892645

Pulled By: suo

fbshipit-source-id: e7fe58d5e1a5a0c44f4e2aec9694145afabde0fd
2020-09-24 00:21:20 -07:00
Michael Suo
e9aa6898ab Revert D23802296: gtest-ify JIT tests, through the letter c
Test Plan: revert-hammer

Differential Revision:
D23802296 (d2b045030e)

Original commit changeset: 20c9798a414e

fbshipit-source-id: a28d56039ca404fe94ed7572f1febd1673e3e788
2020-09-23 17:42:19 -07:00
Nikita Shulga
89c570ed0a Revert D23811085: gtestify dce and fuser tests
Test Plan: revert-hammer

Differential Revision:
D23811085 (246bd9422a)

Original commit changeset: 45008e41f239

fbshipit-source-id: 94c981f565cab9b710fe52a55bbe8dbf9c179c23
2020-09-23 17:27:59 -07:00
Michael Suo
246bd9422a gtestify dce and fuser tests (#45055)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45055

See https://github.com/pytorch/pytorch/pull/45018 for context.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23811085

Pulled By: suo

fbshipit-source-id: 45008e41f2394d2ba319745b0340392e1b3d3172
2020-09-23 14:33:22 -07:00
Michael Suo
d2b045030e gtest-ify JIT tests, through the letter c (#45020)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45020

See https://github.com/pytorch/pytorch/pull/45018 for context.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23802296

Pulled By: suo

fbshipit-source-id: 20c9798a414e9ba30869a862012cbdee0613c8b1
2020-09-23 14:28:45 -07:00
Rohan Varma
70d2e4d1f6 [RPC profiling] Allow disableProfiler() to be called from another thread. (#44653)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44653

This changes the profiler per a discussion with ilia-cher offline that enables `disableProfiler()` event consolidation logic to be called from different threads (i.e. threads where the profiler was not explicitly enabled). This is needed to support the functionality enabled by D23638387 where we defer profiling event collection until executing an async callback that can execute on a different thread, to support RPC async function profiling.

This is done by introducing 2 flags `cleanupTLSState` and `consolidate` which controls whether we should clean up thread local settings (we don't do this when calling `disableProfiler()` on non-main threads) and whether we should consolidate all profiled events. Backwards compatiblity is ensured since both options are true by default.

Added a test in `test_misc.cpp` to test this.
ghstack-source-id: 112605620

Reviewed By: mrshenli

Differential Revision: D23638499

fbshipit-source-id: f5bbb0d41ef883c5e5870bc27e086b8b8908f46b
2020-09-22 21:16:58 -07:00
Michael Suo
666223df46 [jit] gtestify test_argument_spec.cpp (#45019)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45019

See https://github.com/pytorch/pytorch/pull/45018 for context.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23802298

Pulled By: suo

fbshipit-source-id: 0e36d095d4d81dcd5ebe6d56b3dc469d6d5482d0
2020-09-22 19:44:14 -07:00
Elias Ellison
ae286d81e0 [JIT] improve alias analysis for list constructs (#39111)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/39111

In our present alias analysis, we consider any Value that enter another container as entering the heap, and thus aliasing all other heap values of the same type. There are a number of advantages to this approach:
- it is not to hard to maintain the aliasDb implementation
- it is much easier from an op schema perspective - there are many composite list ops registered internally and externally that would be tricky to register and get right if we did something more complicated
- It limits the size of the AliasDb, because a container of size 10 only contains a single memory dag element instead of 10 elements.

The downside is that we have are unable to handle the simple and extremely common case of a list of tensors being used in an ATen op.

In an example like:

```
 def foo(input):
    x = torch.tensor([1, 2, 3, 4])
    y = [x, x]
    input.add_(1)
    return torch.cat(y)
```

we will consider x to be written to. any write to any wildcard element (an element that enters a tuple, an element that is taken from a list) will mark x as written to. This can be limiting for our ability to create a functional subset and fuse graphs - as a result, 4 of TorchVision classification models could not be functionalized.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D23828003

Pulled By: eellison

fbshipit-source-id: 9109fcb6f2ca20ca897cae71683530285da9d537
2020-09-22 09:38:59 -07:00
Michael Suo
42af2c7923 [jit] gtest-ify test_alias_analysis.cpp (#45018)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/45018

Now that https://github.com/pytorch/pytorch/pull/44795 has landed, we
can convert the bulk of our cpp tests to use gtest APIs. Eventually
we'll want to get rid of our weird harness for cpp tests entirely in
favor of using regular gtest everywhere. This PR demonstrates some of
the benefits of this approach:
1. You don't need to register your test twice (once to define it, once
in tests.h).
2. Consequently, it's easier to have many individual test cases.
Failures can be reported independently (rather than having huge
functions to test entire modules.
3. Some nicer testing APIs, notably test fixtures.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23802297

Pulled By: suo

fbshipit-source-id: 774255da7716294ac573747dcd5e106e5fe3ac8f
2020-09-21 12:19:37 -07:00
Michael Suo
374e9373b5 [jit] Pull (most) tests out of libtorch_python (#44795)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44795

Today, we build our cpp tests twice, once as a standalone gtest binary,
and once linked in `libtorch_python` so we can call them from
`test_jit.py`.

This is convenient (it means that `test_jit.py` is a single entry point
for all our tests), but has a few drawbacks:
1. We can't actually use the gtest APIs, since we don't link gtest into
`libtorch_python`. We're stuck with the subset that we want to write
polyfills for, and an awkward registration scheme where you have to
write a test then include it in `tests.h`).
2. More seriously, we register custom operators and classes in these
tests. In a world where we may be linking many `libtorch_python`s, this
has a tendency to cause errors with `libtorch`.

So now, only tests that explicitly require cooperation with Python are
built into `libtorch_python`. The rest are built into
`build/bin/test_jit`.

There are tests which require that we define custom classes and
operators. In these cases, I've built thm into separate `.so`s that we
call `torch.ops.load_library()` on.

Test Plan: Imported from OSS

Reviewed By: SplitInfinity, ZolotukhinM

Differential Revision: D23735520

Pulled By: suo

fbshipit-source-id: d146bf4e7eb908afa6f96b394e4d395d63ad72ff
2020-09-18 14:04:40 -07:00
Louis Feng
eb75cfb9c0 Back out "Revert D23323486: DPP Async Tracing" plus windows build fix. (#44702)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44702

Original commit changeset: c6bd6d277aca

This diff caused windows build to fail due to a compiler bug in VS2019 (lambda capture constant int value). This back out works around the issue with explicit capture of const int value.

Test Plan: Tested and previously landed.

Reviewed By: mruberry

Differential Revision: D23703215

fbshipit-source-id: f9ef23be97540bc9cf78a855295fb8c69f360459
2020-09-16 11:32:11 -07:00
Mike Ruberry
7036e91abd Revert D23323486: DPP Async Tracing
Test Plan: revert-hammer

Differential Revision:
D23323486 (71673b31f9)

Original commit changeset: 4b6ca6c0e320

fbshipit-source-id: c6bd6d277aca070bef2de3522c2a60e23b4395ad
2020-09-15 01:19:23 -07:00
Louis Feng
71673b31f9 DPP Async Tracing (#44252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44252

Add tracing to DPP client. Because DPP requests are async, we need to be able to start a trace event in one thread and potentially end in a different thread. RecordFunction and LibgpumonObserver previously assume each trace event starts and finishes in the same thread. So they use a thread local context to track enter and exit call backs. Async events breaks this assumption. This change attaches the event context to the RecordFunction object so we do not need to use thread local context.

Test Plan:
Tested with dpp perf test and able to collect trace.

{F307824044}

Reviewed By: ilia-cher

Differential Revision: D23323486

fbshipit-source-id: 4b6ca6c0e32028fb38a476cd1f44c17a001fc03b
2020-09-14 18:43:14 -07:00
Martin Yuan
7862827269 [pytorch] Add variadic run_method for lite intepreter (#44337)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44337

Add a new run_method to mobile Module which is variadic (takes any number of arguments) to match full jit.
ghstack-source-id: 111909068

Test Plan: Added new unit test to test_jit test suite

Reviewed By: linbinyu, ann-ss

Differential Revision: D23585763

fbshipit-source-id: 007cf852290f03615b78c35aa6f7a21287ccff9e
2020-09-13 13:26:30 -07:00
Ann Shan
a61318a535 [pytorch] Replace mobile run_method with get_method and operator() (#44202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44202

In preparation for changing mobile run_method() to be variadic, this diff:

* Implements get_method() for mobile Module, which is similar to find_method but expects the method to exist.
* Replaces calls to the current nonvariadic implementation of run_method() by calling get_method() and then invoking the operator() overload on Method objects.
ghstack-source-id: 111848222

Test Plan: CI, and all the unit tests which currently contain run_method that are being changed.

Reviewed By: iseeyuan

Differential Revision: D23436351

fbshipit-source-id: 4655ed7182d8b6f111645d69798465879b67a577
2020-09-11 10:23:06 -07:00
Lillian Johnson
b0bcdbb1ab [JIT] Support partially specified sizes/strides in IRParser (#44113)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/44113

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D23508149

Pulled By: Lilyjjo

fbshipit-source-id: b6b2d32109fae599bc5347dae742b67a2e4a0a49
2020-09-09 14:45:51 -07:00
Ann Shan
9b3c72d46e [pytorch] Make mobile find_method return an optional (#43965)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43965

As part of a larger effort to unify the API between the lite interpreter and full JIT:
- implement torch::jit::mobile::Method, a proxy for torch::jit::mobile::Function
- add support for overloaded operator() to mobile Method and Function
- mobile find_method now returns a c10::optional<Method> (so signature matches full jit)
- moves some implementation of Function from module.cpp to function.cpp
ghstack-source-id: 111161942

Test Plan: CI

Reviewed By: iseeyuan

Differential Revision: D23330762

fbshipit-source-id: bf0ba0d711d9566c92af31772057ecd35983ee6d
2020-09-03 14:46:18 -07:00
Nikolay Korovaiko
f91bdbeabd Enable function calls in TEFuser and SpecializeAutogradZero (#43866)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43866

Reviewed By: ezyang

Differential Revision: D23452798

Pulled By: Krovatkin

fbshipit-source-id: 2cff4c905bf1b5d9de56e7869458ffa6fce1f1b5
2020-09-03 14:42:52 -07:00
Elias Ellison
544a56ef69 [JIT] Always map node output in vmap (#43988)
Summary:
Previously when merging a node without a subgraph, we would merge the node's outputs to the corresponding subgraph values, but when merging a node with a subgraph the node's outputs would be absent in the value mapping. This PR makes it so they are included.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43988

Reviewed By: ZolotukhinM

Differential Revision: D23462116

Pulled By: eellison

fbshipit-source-id: 232c081261e9ae040df0accca34b1b96a5a5af57
2020-09-02 10:30:43 -07:00
generatedunixname89002005287564@sandcastle1415.cln1.facebook.com
1dd658f28f [Codemod][GleanFbcode] Remove dead includes in caffe2/test (#43953)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43953

Reviewed By: malfet

Differential Revision: D23445556

fbshipit-source-id: 89cd6833aa06f35c5d3c99d698abb08cd61ae4ab
2020-09-01 21:48:28 -07:00
Gao, Xiang
5e97f251a8 Enable TF32 support for cuDNN (#40737)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40737

Reviewed By: mruberry

Differential Revision: D22801525

Pulled By: ngimel

fbshipit-source-id: ac7f7e728b4b3e01925337e8c9996f26a6433fd2
2020-09-01 15:34:24 -07:00
Pritam Damania
f1624b82b5 Preserve python backtrace in autograd engine errors. (#43684)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43684

This PR attempts to address #42560 by capturing the appropriate
exception_ptr in the autograd engine and passing it over to the Future.

As part of this change, there is a significant change the Future API where we
now only accept an exception_ptr as part of setError.

For the example in #42560, the exception trace would now look like:

```
> Traceback (most recent call last):
>   File "test_autograd.py", line 6914, in test_preserve_backtrace
>     Foo.apply(t).sum().backward()
>   File "torch/tensor.py", line 214, in backward
>     torch.autograd.backward(self, gradient, retain_graph, create_graph)
>   File "torch/autograd/__init__.py", line 127, in backward
>     allow_unreachable=True)  # allow_unreachable flag
>   File "torch/autograd/function.py", line 87, in apply
>     return self._forward_cls.backward(self, *args)
>   File "test_autograd.py", line 6910, in backward
>     raise ValueError("something")
> ValueError: something
```
ghstack-source-id: 111109637

Test Plan: waitforbuildbot

Reviewed By: albanD

Differential Revision: D23365408

fbshipit-source-id: 1470c4776ec8053ea92a6ee1663460a3bae6edc5
2020-09-01 01:28:47 -07:00
Nikolay Korovaiko
000739c31a Function calls for fallback paths (#43274)
Summary:
This PR adds API to package unoptimized/fallback blocks as function calls. It's mainly meant to be used by TensorExpressionsFuser and SpecializeAutogradZero passes as both specialize the original graph but would also like to provide a fallback path in case the assumptions under which the graph was specialized do not hold for some inputs.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43274

Reviewed By: malfet

Differential Revision: D23406961

Pulled By: Krovatkin

fbshipit-source-id: ef21fc9ad886953461b09418d02c75c58375490c
2020-08-28 23:31:02 -07:00
Mikhail Zolotukhin
776c2d495f [JIT] IRParser: store list attributes as generic ivalue lists. (#43785)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43785

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D23400565

Pulled By: ZolotukhinM

fbshipit-source-id: e248eb1854c4ec40da9455d4279ea6e47b1f2a16
2020-08-28 13:27:28 -07:00
Martin Yuan
288a2effa0 Operator generator based on templated selective build. (#43456)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43456

Introduce the template OperatorGenerator, which returns an optional Operator. It's null if the templated bool value is null.

RegisterOperators() is updated to take the optional Operator. A null will not be registered.

With this update the selective operator registration can be done at compile time. Tests are added to show an operator can be registered if it's in a whitelist and it will not be registered if it's not in the whitelist.

Test Plan: Imported from OSS

Reviewed By: ljk53

Differential Revision: D23283563

Pulled By: iseeyuan

fbshipit-source-id: 456e0c72b2f335256be800aeabb797bd83bcf0b3
2020-08-27 07:26:07 -07:00
James Reed
a070c619b9 [FX] Native callables in FX lowering (#43426)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43426

Test Plan: Imported from OSS

Reviewed By: zdevito

Differential Revision: D23273427

Pulled By: jamesr66a

fbshipit-source-id: 3a9d04486c72933d8afd9c181578fe98c3d825b0
2020-08-27 00:00:03 -07:00
Mikhail Zolotukhin
b763666f9f [JIT] Subgraph utils: add an optional vmap argument to the API to allow retrieving value mappings. (#43235)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/43235

This functionality is needed when we want to not lose track of
nodes/values as we merge and unmerge them into other nodes. For
instance, if we have a side data structure with some meta information
about values or nodes, this new functionality would allow to keep that
metadata up to date after merging and unmerging nodes.

Differential Revision: D23202648

Test Plan: Imported from OSS

Reviewed By: eellison

Pulled By: ZolotukhinM

fbshipit-source-id: 350d21a5d462454166f8a61b51d833551c49fcc9
2020-08-25 18:13:29 -07:00
Ann Shan
7cc1efec13 Add lite SequentialSampler to torch mobile (#43299)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43299

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23228415

Pulled By: ann-ss

fbshipit-source-id: eebe54353a128783f039c7dac0e2dd765a61940d
2020-08-24 09:45:24 -07:00
Nikolay Korovaiko
a97ca93c0e remove prim::profile and special-casing (#43160)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43160

Reviewed By: ZolotukhinM

Differential Revision: D23284421

Pulled By: Krovatkin

fbshipit-source-id: 35e97aad299509a682ae7e95d7cef53301625309
2020-08-22 23:52:36 -07:00
Zino Benaissa
40c77f926c Add prim::TypeCheck operation (#43026)
Summary:
TypeCheck is a new operation to check the shape of tensors against
 expectd shapes. TypeCheck is a variadic operation. An example,

 %t0 : Tensor = ...
 %t1 : Tensor = ...
 %2 : FLOAT(20, 20), %3 : FLOAT(30, 30), %1 : bool =
 prim::TypeCheck(%t1, %t2)
 prim::If(%1)

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43026

Reviewed By: ZolotukhinM

Differential Revision: D23115830

Pulled By: bzinodev

fbshipit-source-id: fbf142126002173d2d865cf4b932dea3864466b4
2020-08-21 20:03:24 -07:00
Ann Shan
dd194c1612 add _save_parameters to serialize map (#43163)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43163

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23175287

Pulled By: ann-ss

fbshipit-source-id: ddfd734513c07e8bdbec108f26d1ca1770d098a6
2020-08-18 14:58:04 -07:00
Ann Shan
2e6e295ecc refactor _save_parameters to _save_data (#43162)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/43162

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D23175286

Pulled By: ann-ss

fbshipit-source-id: 6f930b98c367242fd4efbf51cb1d09995f7c4b40
2020-08-18 14:57:03 -07:00
Christian Sarofeen
b3bda94393 [NVFuser] Enable E2E BCast-PWise-Reduction fusions (#43129)
Summary:
Had a bunch of merged commits that shouldn't have been there, reverted them to prevent conflicts. Lots of new features, highlights listed below.

**Overall:**

- Enables pointwise fusion, single (but N-D) broadcast -- pointwise fusion, single (but N-D) broadcast -- pointwise -- single (but N-D) reduction fusion.

**Integration:**

- Separate "magic scheduler" logic that takes a fusion and generates code generator schedule
- Reduction fusion scheduling with heuristics closely matching eagermode (unrolling supported, but no vectorize support)
- 2-Stage caching mechanism, one on contiguity, device, type, and operations, the other one is input size->reduction heuristic

**Code Generation:**

- More generic support in code generation for computeAt
- Full rework of loop nest generation and Indexing to more generically handle broadcast operations
- Code generator has automatic kernel launch configuration (including automatic allocation of grid reduction buffers)
- Symbolic (runtime) tilling on grid/block dimensions is supported
- Simplified index generation based on user-defined input contiguity
- Automatic broadcast support (similar to numpy/pytorch semantics)
- Support for compile time constant shared memory buffers
- Parallelized broadcast support (i.e. block reduction -> block broadcast support)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/43129

Reviewed By: mrshenli

Differential Revision: D23162207

Pulled By: soumith

fbshipit-source-id: 16deee4074c64de877eed7c271d6a359927111b2
2020-08-18 09:10:08 -07:00