Commit graph

526 commits

Author SHA1 Message Date
Nikitha Malgi
416ba5c48f Merge CUDA Streams and Events (#53902)
Summary:
-----------
- Updates current_stream and default stream API's to take `optional[device]` argument
- Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT
- Merges StreamContext manager for both Eager and JIT.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902

Test Plan:
------
Run JIT tests:
python test/test_jit.py -v TestCUDA

Run eager tests:
python test/test_cuda.py -v TestCuda

Reviewed By: SplitInfinity

Differential Revision: D27285996

Pulled By: nikithamalgifb

fbshipit-source-id: 45d9fee9a582b5f4c82330f5f99eb88584804270
2021-03-26 14:19:39 -07:00
Pritam Damania
267fc27d39 Ensure torch.futures.wait_all exits early on error. (#53953)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53953

torch.futures.wait_all, would wait for all specified futures to
complete before it returned. As a result, if there was an error it would still
wait for a long time (ex: long running RPCs) before it returned an error to the
user.

This PR ensures `wait_all` returns and error as soon as any future runs into an
error and doesn't wait for all futures to complete.

I removed the logic _invoke_rpc_python_udf which raised an error in the unwrap
function, because ideally the error should be set on the Future and not be
raised to the user only when `wait()` is called. As an example, in the case of
`wait_all`, the user never calls `wait()` on the future that errored out but a
future down the chain and we should propagate these errors via `setError`
instead.
ghstack-source-id: 124721216

Test Plan:
1) Unit test added.
2) waitforbuildbot

Reviewed By: mrshenli

Differential Revision: D27032362

fbshipit-source-id: c719e2277c27ff3d45f1511d5dc6f1f71a03e3a8
2021-03-25 07:39:14 -07:00
anjali411
f9ca0d87a7 Teach Python TS frontend to parse complex literals (#52881)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52881

**This PR adds:**
1. logic to parse complex constants (complex literals of the form `bj`)
2. logic to parse complex lists
3. support for complex constructors: `complex(tensor/int/float/bool, tensor/int/float/bool)`
4. Limited operator support
     - `add`, `sub`, `mul`, `torch.tensor`, `torch.as_tensor`

**Follow-up work:**
1. Add complex support for unary and other registered ops.
2. support complex constructor with string as input (this is supported in Python eager mode).
3. Test all emitXYZ for all XYZ in `ir_emitter.cpp` (currently only emitConst, emitValueToTensor are tested). e.g., test loops etc.
4. onnx doesn't support complex tensors, so we should error out with a clear and descriptive error message.

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D27245059

Pulled By: anjali411

fbshipit-source-id: af043b5159ae99a9cc8691b5a8401503fa8d6f05
2021-03-24 08:12:17 -07:00
Martin Yuan
524cb0a514 [PyTorch Mobile] Dedup method names in bytecode serialization (#53677)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53677

When serializing bytecode, we serialize it based on methods. It may happen that there are multiple instances of a class. In such a case, the methods inside the class may be serialized multiple times.

To reduce the duplication, we cache the qualified name of the methods, so that one method is serialized only once.

Test Plan: existing unittests and CI

Reviewed By: dhruvbird, raziel

Differential Revision: D26933945

Pulled By: iseeyuan

fbshipit-source-id: 8a9833949fa18f7103a5a0be19e2028040dc7717
2021-03-16 15:24:47 -07:00
Raziel Alvarez Guevara
c5cd993add Adds a bool is_available() method to the backend contract (#53068)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/53068

Adds a ```bool is_available()``` method to the backend contract: it returns ```true``` if ```compile()``` and ```execute()``` can be called; ```false``` otherwise.

It is used to implement the following changes in the ```LoweredModule```:
* ```compile()``` in ```__setstate__``` will run if ```is_available()```, else ```__setstate__``` throws an exception (“Backend not available.”).
* ```compile()``` at ```LoweredModule``` creation will run if ```is_available()```, else a WARNING will be thrown.
* ```execute()``` will only be executed if ```is_available()``` returns true; else throws an exception (“Backend not available.”).

The goal of these changes is to ensure we have a well defined behaviour for the different combinations of backend availability on-host and on-target.

More specifically, backends may have different capabilities to compile and/or execute the Module, depending whether this happens on-host (i.e. where the program is being written) or on-target (where the program is being executed).

First of all, we know that "preprocess" always takes place, and that only happens on-host at creation time. So, we can assume that any compilation is needed/possible on-host then all of it could be pushed here.

Overall, we want to ensure the following:

**On host**

| compile | execute | Outcome |
| -- | -- | -- |
| No | No | On module creation, LoweredModule is generated, with a warning  (since compilation and execution can still take place on-target). On module load, throws an exception (since execution is not possible). |
| No | Yes | This configuration should not be possible. This assumes the full compiler is not available, even if some work was done in preprocess the program cannot be finalized for execution. |
| Yes | No | In this case, the expectation would be for is_available() to return false, and compilation logic to move into preprocess. |
| Yes | Yes | All good. This is the only case that is_available() should return true. |

**On target**

| compile | execute | Outcome |
| -- | -- | -- |
| No | No | Loading the LoweredModule throws an exception. Since execution is not possible. |
| No | Yes | Basically this is another instance of Yes/Yes: compilation per se may not be possible on device, which means compile() can be called without issue but it is a no-op, and thus is_available should return true. Consequently, loading the LoweredModule: Succeeds, if the preprocessed module is ready for execution. Fails with exception otherwise. |
| Yes | No | This configuration should not be possible. Just putting here for completeness. |
| Yes | Yes | All good. This, along with No/Yes case (because compilation is assumed to have happened on-host, so it's just another instance of Yes/Yes), are the cases where is_available() should return true. |

**Refactoring existing code**
This change also updates other backends (Glow) code, to implement the is_available() method to have the same behaviour as before this change (i.e. always available).

This should not cause backward incompatibilities with already saved models since we're adding a new method to the PyTorchBackendInterface.
Models saved with the old interface that didn't have is_available() will still find the other 2 methods in the bound object (i.e. compile and execute), and the saved LoweredModule logic will be the old one.

**Future**
We plan to use is_available() to implement support for fallback to the PyTorch interpreter.
ghstack-source-id: 123498571

Test Plan: Added C++ (test_backend.cpp) and Python (test_backends.py) tests to validate the exceptions.

Reviewed By: jackm321, spaugh, iseeyuan

Differential Revision: D26615833

fbshipit-source-id: 562e8b11db25784348b5f86bbc4179aedf15e0d3
2021-03-10 00:24:16 -08:00
James Reed
1fe6a6507e [WIP][FX] Fix tracing support for torchbind (#52884)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52884

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D26675801

Pulled By: jamesr66a

fbshipit-source-id: 8e5100bcea17589a53163abf6ab991658e11fa3a
2021-03-05 23:40:16 -08:00
Tugsbayasgalan Manlaibaatar
4008df3507 Add property binding in torchbind (#50670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50670

This PR adds property support to Torchbind. There are two cases that it needs to work:

**Torchscript**
Inside Torchscript, we don't go through pybind so there is no issue with accessing properties through ClassType.

**Eager Mode**
In Eager Mode, Torchbind creates ScriptObject which we cannot dynamically add (aka access) properties after initializing it. (https://stackoverflow.com/questions/1325673/how-to-add-property-to-a-class-dynamically
) Therefore we created a Python wrapper (ScriptObjectWrapper) around ScriptObject where we can use property method to set properties.  By doing so, we can look up wrapped object's property through __getattr__ method of the ScriptObjectWrapper. This logic is inspired from https://github.com/pytorch/pytorch/pull/44324

Test Plan:
test cases in test_torchbind.py

Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26632781

fbshipit-source-id: dd690887cfda0c48ff0d104aa240ce0ab09055bc
2021-03-03 14:25:52 -08:00
Nikitha Malgi
ab7f6f3f5b Add default arguments to cuda stream and events (#53025)
Summary:
* **https://github.com/pytorch/pytorch/issues/53025 Add default args for CUDA stream and events**

Tests:
=====
python test/test_jit.py -v TestCUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53025

Reviewed By: H-Huang

Differential Revision: D26734499

Pulled By: nikithamalgifb

fbshipit-source-id: 5311623a501e2e6fb3fc70e39522e3970e401feb
2021-03-02 14:37:24 -08:00
Elias Ellison
6149a26adb Extend subgraph utils to cover merging a node following a subgraph (#52513)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52513

Subgraph Utils previously only worked with merging a node into a subgraph if the node was before the subgraph; extend the logic for the case where the subgraph is first.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696697

Pulled By: eellison

fbshipit-source-id: b0595b7d400161b0972321c55718b67103c7bbcd
2021-03-01 21:22:43 -08:00
Elias Ellison
dbbe21dfd7 Remove unused subgraph vmap api (#52512)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52512

This API is not used at all, and is tricky to maintain. When we were using it last we ran into lifetime issues when using `Value *` as the key. In hind sight, we should have been using `value->unique()`, but regardless, this not being used and should be removed.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696695

Pulled By: eellison

fbshipit-source-id: 97ed92e88ecab0085fabbac46573611666bf2420
2021-03-01 21:22:39 -08:00
Elias Ellison
9a990dafd9 Add a filter to remove mutation (#51923)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51923

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D26696700

Pulled By: eellison

fbshipit-source-id: 9665e9b786f55b6e5b98420eae19de262d46bb96
2021-03-01 21:22:33 -08:00
Martin Yuan
b5ae8e69a7 [Lite Interpreter] Support features from to_backend (#52870)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52870

Add the missing parts to support to_backend modules by lite interpreter.
1. Add ISINSTANCE instruction support, which is used in to_backend for output type check.
2. Bypass lite interpreter's type parser by checking the qualified name. If it starts with "torch.jit", use the same type resolver as nn module (starting with "__torch__").

Tests
Mobile module is serialized and loaded in ```BackendTest.TestCompiler```. The results are compared to those from original torchscript module.

Test Plan: Imported from OSS

Reviewed By: raziel

Differential Revision: D26715351

Pulled By: iseeyuan

fbshipit-source-id: ad9d74ee81c6aa692ab9e5dd7a9003bae5d4f01f
2021-03-01 17:56:01 -08:00
Martin Yuan
b2520ab3dc Add a demo backend with compiler (#52603)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52603

This PR introduced a backend with minimum compilation capability to the to_<backend> flow. The targets are:

- Demonstrate the end-to-end flow with adding a backend -> compilation -> runtime
- How the backend compilation errors be surfaced to the user, with the original model's source code information. (C++ only in this PR. Python APIs will be demonstrated in a following PR.)

Changes:

- Compilation

1. A backend with minimum compilation features, "backend_with_compiler_demo" is added.
2. The compilation happens AOT in the ```pre_process``` function registered to this backend.
3. Compiled results are stored in a string blob for each method. They are serialized to the lowered module with ```__get_state__``` function.
4. Error message with model source code is thrown, for features not handled by the backend compiler.

- Runtime

1. The compiled blob is loaded in ```__set_state__``` method.
2. The ```compile``` function of the backend pass through the AOT compiled blob. (TODO: parsing the blob to the format that the backend can understand can happen here.)
3. The ```execute``` function of the backend executes the specified method (handle).

Test Plan:
- ```BackendTest.TestCompiler```: the C++ end-to-end demonstration on a supported model. After compilation and running, the lowered model produces the same result as the original torchscript model.
- ```BackendTest.TestCompilerNotSupport```: Demonstrate the error message from the AOT compilation for a feature not supported from the input module. The error message looks like:

```
"The node of aten::mul is not supported in this compiler. Source code:   File "<string>", line 3

    def forward(self, x, h):
        return x * h
               ~~~~~ <--- HERE
```

Reviewed By: raziel

Differential Revision: D26593968

Pulled By: iseeyuan

fbshipit-source-id: 8f264f60a0470e9f07e36fdeccbf17da6c1d7cd7
2021-02-26 11:53:34 -08:00
Richard Barnes
29c4290a8d Use c10::irange for great good (#52153)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/52153

Test Plan: Sandcastle

Reviewed By: ngimel

Differential Revision: D26407087

fbshipit-source-id: ea8ce1c17299cb9d89621e4a39f31edc2faa9fd6
2021-02-24 18:43:50 -08:00
Dhruv Matani
755c60bffc [PyTorch Mobile] Allow loading of all extra files using the extra_file argument (#52635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52635

Currently, the method `_load_for_mobile()` accepts an extra files map named `extra_files` which serves as an in-out parameter. i.e. the call fills in the keys of this map with all files under the `extra/` folder that they wish to extract, and the method fills in the `extra_files` map with the contents of those files.

In a specific case we have encountered, it is desirable to extract all the extra files so that they can be forwarded in an opaque manner into a `save_for_mobile()` call with the same set of extra files as during load.

This change adds a method `_get_all_archive_file_names()` which returns the names of all files in the `.ptl` archive. The caller can then extract the ones within the `extra/` directory and pass them in to the `extra_files` map argument.

ghstack-source-id: 122356928

Test Plan: Added additional test + `buck test //xplat/caffe2:test_lite_interpreter`

Reviewed By: iseeyuan

Differential Revision: D26590027

fbshipit-source-id: 4dc30997929e132f319c32cb9435d8a40fe0db5e
2021-02-23 21:57:13 -08:00
Nikita Shulga
cabb1e7a94 Fix wrong TORCH_CHECK usages (#52670)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52670

TORCH_CHECK followed by a string literal is a no-op, and from the text of the message its clear that authors intended those instances to be `TORCH_CHECK(false, "msg")`

Discovered while trying to figure out of tensor_offset can be negative in Resize.h

s/TORCH_CHECK\("/TORCH_CHECK(false, "/

Test Plan: Imported from OSS

Reviewed By: walterddr, janeyx99, mruberry

Differential Revision: D26607546

Pulled By: malfet

fbshipit-source-id: 661812da84adb1d1af0284da60c93ec4bf5ef08e
2021-02-23 14:47:51 -08:00
Richard Barnes
783b5c0c9f op_whitelist -> op_allowlist (#52150)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52150

Renames "whitelist" to "allowlist" to conform to company use standards, prevent critical errors raised by linters which detect the old usage, and to move toward more self-descriptive terminology.

Test Plan: Sandcastle

Reviewed By: suo

Differential Revision: D26405520

fbshipit-source-id: 9c3a41591d4e29c0197de9a8f5858c9c29271e26
2021-02-22 12:23:42 -08:00
Elias Ellison
e1d927e552 [JIT] Update freezing api (#52337)
Summary:
Update freezing api  for 1.8,  and add a corresponding C++ API. The `optimize` flag hasn't been publicly released yet, so we are able to change it without breaking BC. I will submit a PR to branch release as well, there are a few more days to do that

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52337

Reviewed By: ejguan

Differential Revision: D26491833

Pulled By: eellison

fbshipit-source-id: 6dcd74eb8f76db64ac53183d03dabdd0f101f4b5
2021-02-18 00:17:27 -08:00
Raziel Alvarez Guevara
70bed6a55a Removes deprecated preprocess method from the backend interface (#52258)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/52258

Removes deprecated preprocess method from the backend interface.

Preprocessing logic should be now registered along with the backend interface (i.e. PyTorchBackendInterface) via the BackendPreprocessFunction.

Also refactored internal dependencies.
ghstack-source-id: 121704837

Test Plan:
Validates all related tests pass:

buck test mode/dev //caffe2/test/cpp/jit:jit -- --exact 'caffe2/test/cpp/jit:jit - BackendTest.ToBackend'

python test/test_jit.py TestBackends

===== Glow

buck test mode/dev //glow/fb/torch_glow/tests:TorchGlowBackendTests

buck test mode/dev //glow/fb/torch_glow/tests:torch_glow_backend_tests

Reviewed By: jackm321

Differential Revision: D26443479

fbshipit-source-id: afdc51ae619ced293d10c7a6a12f3530e4c4e53c
2021-02-17 17:53:36 -08:00
Meghan Lele
cbede834d4 [JIT] Add support for default argument values to Torchbind (#51253)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51253

**Summary**
This commit adds support to Torchbind for specifying default values for
arguments of custom class methods.

**Test Plan**
This commit adds a unit test to `test_torchbind.py` that exercises this
feature.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D26131529

Pulled By: SplitInfinity

fbshipit-source-id: 68bc86b045dd2f03ba41e1a116081a6eae6ba9ff
2021-02-17 11:27:03 -08:00
Nikita Shulga
f235c65a2b [TorchScript] C++ interface of to_<backend> (Re-land) (#52340)
Summary:
This is a re-land off https://github.com/pytorch/pytorch/pull/51797 with fix for spurious libcuda dependency

Fix limits the scope of `no-as-needed` linker flag to just `jitbackend_test`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/52340

Reviewed By: agolynski, iseeyuan

Differential Revision: D26476168

Pulled By: malfet

fbshipit-source-id: f909428af82182b3bffd020ca18cca7a9b5846b6
2021-02-17 07:17:50 -08:00
Nikita Shulga
cd46ee6175 Revert D26280518: [TorchScript] C++ interface of to_<backend>
Test Plan: revert-hammer

Differential Revision:
D26280518 (a184ef8df5)

Original commit changeset: fd466e4b4488

fbshipit-source-id: e4def49703ab525c063b8cc5d11296b9cc614fbb
2021-02-15 08:05:16 -08:00
Meghan Lele
73de98204d [JIT] Add static method support for TorchBind (#51177)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51177

**Summary**
This commit adds support for static methods to TorchBind. Just like
pybind, the API for declaring a static method is `def_static(...)`. A
static method must be called on the class directly, and can be called
both in Python as well as TorchScript.

Support for static methods is implemented in a manner similar to that of
instance methods. Registered static functions are wrapped in a layer of
unboxing logic, their schemas are inferred using templates and
metaprogramming, and they are added to the `ClassType` object
corresponding to the TorchBind class on which they are registered.
ScriptClass has been extended to support a `__getattr__` function so
that static methods of TorchBind classes can be invoked in Python. The
implementation of `__getattr__` returns `ScriptClassFunctionPtr`, a
version of `StrongFunctionPtr` without a compilation unit (since the
functions of a TorchBind class live inside the TorchBind registry).
Within TorchScript, TorchBind static functions are desugared in
`PythonClassValue::attr` by looking them up on the class type of the
`PythonClassValue` instance.

**Test Plan**
This commit adds a unit test that tests a simple static method on a
TorchBind class.

Test Plan: Imported from OSS

Reviewed By: pbelevich

Differential Revision: D26356942

Pulled By: SplitInfinity

fbshipit-source-id: 1b6a9bc2e5f3e22071ad78e331a0201fbbf7ab30
2021-02-13 19:41:27 -08:00
Martin Yuan
a184ef8df5 [TorchScript] C++ interface of to_<backend> (#51797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51797

The C++ API, ```codegen_backend_module``` is added to ```to_<backend>```. Python related stuffs are decoupled in this function. It can be used from both C++ and python.

* Tests
Python: The existing ```test_backends.py```, which calls the C++ API under the hood.
C++: The end-to-end test of ```jit.BackendTest.ToBackend``` is added in ```test_backend.cpp```. The original class definitions in this file is moved to ```test_backend_lib.cpp```

ghstack-source-id: 121687464

(Note: this ignores all push blocking failures!)

Test Plan: CI

Reviewed By: raziel

Differential Revision: D26280518

fbshipit-source-id: fd466e4b448847ce64010a3297fff0b5760c5280
2021-02-13 15:15:45 -08:00
Raziel Alvarez Guevara
9a964ce89b Enables backend preprocessing to take place outside of the backend interface (#51757)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51757

Enables backend preprocessing to take place outside of the backend interface.

What's new:
* A new definition for backend preprocessing (i.e. BackendPreprocessFunction).
* Registration of the backend's PyTorchBackendInterface interface implementation is augmented to take the BackendPreprocessFunction.
* A new registry is created to handle the BackendPreprocessFunction functions, using the backend's name as key.
* When a BackendPreprocessFunction is used, the PyTorchBackendInterface's "preprocess" method is not added to the LoweredModule. Instead, the BackendPreprocessFunction is called and its output used to set the LoweredModule's __processed_module.

Why?:
These changes are needed to avoid forcing backend preprocessing to be part of the LoweredModule, and in the future be able to eliminate "preprocess" from the PyTorchBackendInterface.
This is important for Mobile use cases where "preprocess" can take the bulk of the compilation process, and thus contain code dependencies that we do not want to bring (or cannot bring) to the Mobile binary.

What didn't change:
* Everything is backwards compatible:
** The existing "preprocess" method in PyTorchBackendInterface is still there.
** When backend registration is done without the BackendPreprocessFunction, as before, things work the same way: "preprocess" is added to LoweredModule, and invoked through the module's instance of the backend interface.

Longer term, the plan is to refactor existing users to move to the new backend registration.
ghstack-source-id: 121190883

Test Plan:
Updated existing tests (test_backend.py) to use the new registration mechanism.
Verified test ran and passed (in my OSS build).

Reviewed By: iseeyuan

Differential Revision: D26261042

fbshipit-source-id: 0dc378acd5f2ab60fcdc01f7373616d1db961e61
2021-02-06 01:07:17 -08:00
Martin Yuan
23c50a4a50 [PyTorch Mobile] Support torchbind custom classes in lite interpreter (#51432)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432

ghstack-source-id: 120976584

torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class.

CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed.

This diff is to support the case of torchbind custom classes.
1. The class type can be resolved at import level.
2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested.
3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes.

Test Plan: CI

Reviewed By: raziel

Differential Revision: D26168913

fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427
2021-02-03 21:57:19 -08:00
Xu Zhao
4fdebdc0c9 Improve PyTorch profiler flop computation formulas (#51377)
Summary:
Improve the flops computation formula of aten::conv2d operator to support stride, pad, dilation, and groups arguments.

This diff also fixes the following issues:
- Apply a factor of 2 to aten::mm because output accounts for multiplication and addition.
- Fix incorrect names of scalar operators to aten::mul and aten::add.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/51377

Test Plan:
```python
python test/test_profiler.py
```

Reviewed By: jspark1105

Differential Revision: D26165223

Pulled By: xuzhao9

fbshipit-source-id: 2c5f0155c47af2e6a19332fd6ed73ace47fa072a
2021-02-02 11:49:04 -08:00
Scott Wolchok
7328710cbc [PyTorch][codemod] Replace immediately-dereferenced cast calls w/castRaw (#50229)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229

`fastmod -m 'cast(<((at|c10)::)?\w+Type>\(\)\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the
result of `cast()` wasn't being saved anywhere, so we didn't need
it, so we can use a raw pointer instead of a new `shared_ptr`.
ghstack-source-id: 120769170

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837494

fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada
2021-02-01 23:12:07 -08:00
Frank Seide
87ad77eb4e T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: iseeyuan

Differential Revision: D25896212

fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5
2021-02-01 18:35:13 -08:00
anjali411
508bab43e7 Support complex number list in JIT (#51145)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/51145

Test Plan: Imported from OSS

Reviewed By: SplitInfinity

Differential Revision: D26154025

Pulled By: anjali411

fbshipit-source-id: 74645f9b6467757ddb9d75846e778222109848f0
2021-01-31 23:54:14 -08:00
Richard Barnes
89cafde8a4 Modernize for-loops (#50912)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/50912

Test Plan: Sandcastle tests

Reviewed By: ansley

Differential Revision: D26001948

fbshipit-source-id: 3bfe6a8283a2b1882ed472f836ae1b6e720e519f
2021-01-22 10:53:24 -08:00
Meghan Lele
4aea007351 [JIT] Fix archive file extension in examples and docs (#50649)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50649

**Summary**
Tutorials, documentation and comments are not consistent with the file
extension they use for JIT archives. This commit modifies certain
instances of `*.pth` in `torch.jit.save` calls with `*.pt`.

**Test Plan**
Continuous integration.

**Fixes**
This commit fixes #49660.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D25961628

Pulled By: SplitInfinity

fbshipit-source-id: a40c97954adc7c255569fcec1f389aa78f026d47
2021-01-20 02:04:46 -08:00
Meghan Lele
8f5ad00e13 [JIT] Print out CU address in ClassType::repr_str() (#50194)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50194

**Summary**
`ClassType::repr_str()` prints out only the name of a `ClassType`, which
is not always enough to disambiguate it. In some situations, two
`ClassTypes` are compared and do not match despite having identical
names because they are in separate compilation units. In such cases, the
error message can seem nonsensical (e.g. `expected type T but found type
T`). This commit modifies `ClassType::repr_str()` so that it prints out
the address of the type's compilation unit to make these messages less
puzzling (e.g. `expected type T (0x239023) but found type T (0x230223)`).

**Test Plan**
This commit adds a unit test, `ClassTypeTest.IdenticalTypesDifferentCus`
that reproduces this situation.

**Fixes**
This commit fixes #46212.

Test Plan: Imported from OSS

Reviewed By: tugsbayasgalan

Differential Revision: D25933082

Pulled By: SplitInfinity

fbshipit-source-id: ec71b6728be816edd6a9c2b2d5075ead98d8bc88
2021-01-19 23:04:30 -08:00
Scott Wolchok
4a0d17ba2d [PyTorch][codemod] Replace immediately-dereferenced expect calls w/expectRef (#50228)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228

`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D25837374

fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
2021-01-13 16:13:55 -08:00
Andres Suarez
8530c65e25 [codemod][fbcode/caffe2] Apply clang-format update fixes
Test Plan: Sandcastle and visual inspection.

Reviewed By: igorsugak

Differential Revision: D25849205

fbshipit-source-id: ef664c1ad4b3ee92d5c020a5511b4ef9837a09a0
2021-01-09 14:37:36 -08:00
Thomas Viehmann
ea087e2d92 JIT: guard DifferentiableGraph node (#49433)
Summary:
This adds guarding for DifferentiableGraph nodes in order to not depend on
Also bailing out on required gradients for the CUDA fuser.

Fixes https://github.com/pytorch/pytorch/issues/49299

I still need to look into a handful of failing tests, but maybe it can be a discussion basis.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433

Reviewed By: ngimel

Differential Revision: D25681374

Pulled By: Krovatkin

fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296
2021-01-08 20:01:27 -08:00
Nikitha Malgi
12b73fdbbf Adding JIT support for cuda streams and events (#48020)
Summary:
=======

This PR addresses the following:

 * Adds JIT support for CUDA Streams
 * Adds JIT support for CUDA Events
 * Adds JIT support for CUDA Stream context manager

Testing:
======

python test/test_jit.py -v TestCUDA

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020

Reviewed By: navahgar

Differential Revision: D25725749

Pulled By: nikithamalgifb

fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c
2020-12-29 20:24:57 -08:00
Dhruv Matani
4a870f6518 [PyTorch Mobile] Export Operator List from Mobile CompilationUnit instead of from TorchScript Model (#49385)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385

Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile.

What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead.

Also updated the logic in `converter`.

### Before this change:
1. Get operator List from Torch Script Model
2. Convert to bytecode mobile model

### After this change:
1. Convert to bytecode mobile model
2. Use this converted mobile model to get the list of operators for each method on the model

ghstack-source-id: 118796752

Test Plan:
Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from.

Verified that the list of operators produced before and after this change for an example model (segmentation) are the same.

{P147863234}

Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132}

Reviewed By: iseeyuan

Differential Revision: D24690094

fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
2020-12-18 11:17:57 -08:00
Xu Zhao
573f4aa352 FLOPS Roofline Analysis Feature for PyTorch Profiler. (#46506)
Summary:
FLOPs Roofline Analysis Feature for PyTorch Profiler.

Currently, PyTorch Profiler lacks the ability to measure the FLOPs of operators, such as mm and conv.
FLOPs are helpful to estimate the computation complexity of the operators.
For now, we use input shapes to estimate the number of floating pointer operations.
In the future, we may compute this information by tracking hardware counters.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/46506

Test Plan:
Run `python test/test_profiler_flops.py -k test_flops`. The test will print a profiler table with "FLOPS" column, like the following:
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  ------------
                        Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg    # of Calls                                   Input Shapes        MFLOPS
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  ------------
                aten::matmul         0.06%      57.653us        82.97%      79.310ms      79.310ms             1                 [[40, 33, 1, 243], [243, 243]]            --
                    aten::mm        82.84%      79.186ms        82.86%      79.204ms      79.204ms             1                      [[1320, 243], [243, 243]]       984.323
                aten::conv2d         0.04%      36.345us        16.06%      15.347ms      15.347ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [  44065010.318
           aten::convolution         0.02%      16.016us        16.02%      15.310ms      15.310ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [            --
          aten::_convolution         0.07%      63.855us        16.00%      15.294ms      15.294ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [            --
    aten::mkldnn_convolution        15.89%      15.188ms        15.93%      15.225ms      15.225ms             1  [[40, 16, 18, 260], [33, 16, 18, 18], [33], [            --
                  aten::relu         0.10%      98.223us         0.64%     612.157us     306.079us             2                             [[40, 33, 1, 243]]            --
             aten::threshold         0.49%     465.416us         0.54%     513.934us     256.967us             2                     [[40, 33, 1, 243], [], []]            --
                  aten::add_         0.29%     279.301us         0.29%     279.301us     279.301us             1                  [[40, 33, 1, 243], [243], []]            --
                 aten::empty         0.10%      99.113us         0.10%      99.113us      24.778us             4                       [[], [], [], [], [], []]            --
----------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ---------------------------------------------  ------------
Self CPU time total: 95.584ms

.
----------------------------------------------------------------------
Ran 1 test in 0.176s

For now, we only provide FLOPs calculation for aten::conv2d and aten::mm operators.

Reviewed By: ezyang

Differential Revision: D25214452

Pulled By: xuzhao9

fbshipit-source-id: 0ae841bd8dbdeb032346dc3d9d38e19875aa1da3
2020-12-17 21:19:25 -08:00
Martin Yuan
2b61e4d84c Revert D25152559: T66557700 Support default argument values of a method
Test Plan: revert-hammer

Differential Revision:
D25152559 (6bde0ca6d3)

Original commit changeset: bbf52f1fbdbf

fbshipit-source-id: 592fdb3078b1ac86cd394adc6c1bfd6b10d829e1
2020-12-17 14:05:49 -08:00
Frank Seide
6bde0ca6d3 T66557700 Support default argument values of a method (#48863)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863

Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).

Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation

buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg

Reviewed By: raziel, iseeyuan

Differential Revision: D25152559

fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
2020-12-16 15:55:03 -08:00
Scott Wolchok
22c6dafd33 [PyTorch] Use plain old function pointer for RecordFunctionCallback (reapply) (#49408)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408

Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
ghstack-source-id: 118665808

Test Plan:
Wait for GitHub CI since we had C++14-specific issues with
this one in previous PR https://github.com/pytorch/pytorch/pull/48629

Reviewed By: malfet

Differential Revision: D25563207

fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d
2020-12-15 19:16:01 -08:00
Mikhail Zolotukhin
38a59a67f3 [JIT] Support multiple outputs in subgraph matcher. (#48992)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/48992

Differential Revision: D25388100

Test Plan: Imported from OSS

Reviewed By: heitorschueroff

Pulled By: ZolotukhinM

fbshipit-source-id: d95713af2220cf4f99ac92f59f8e5b902f2f3822
2020-12-15 13:09:24 -08:00
Mike Ruberry
25bc906281 Revert D25135415: [PyTorch] Use plain old function pointer for RecordFunctionCallback
Test Plan: revert-hammer

Differential Revision:
D25135415 (7e23ee1598)

Original commit changeset: 5e92dc79da64

fbshipit-source-id: 45b1634a100084c84dca158a1f16ca760fef6988
2020-12-14 21:04:27 -08:00
Scott Wolchok
7e23ee1598 [PyTorch] Use plain old function pointer for RecordFunctionCallback (#48629)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629

Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
ghstack-source-id: 118568240

Test Plan: CI

Reviewed By: dhruvbird

Differential Revision: D25135415

fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3
2020-12-14 20:08:16 -08:00
Scott Wolchok
900aa4ee97 [PyTorch] remove convenience RecordFunctionCallback interface (#48620)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48620

In preparation for storing bare function pointer (8 bytes)
instead of std::function (32 bytes).
ghstack-source-id: 118568242

Test Plan: CI

Reviewed By: ezyang

Differential Revision: D25132183

fbshipit-source-id: 3790cfb5d98479a46cf665b14eb0041a872c13da
2020-12-14 20:03:15 -08:00
Shijun Kong
f965b0fcfb Expose run_async function on torch::jit::Method (#48607)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48607

This change builds on top of
https://github.com/pytorch/pytorch/pull/46865

further exposing the async interface to `torch::jit::Method`.

added unit test for new `run_async`

Test Plan: `buck test caffe2/test/cpp/jit/...`

Reviewed By: dzhulgakov

Differential Revision: D25219726

fbshipit-source-id: 89743c82a0baa1affe0254c1e2dbf873de8e5c76
2020-12-11 11:17:58 -08:00
Chen Lai
416dc68341 [Pytorch][Annotation] Update inlined callstack with module instance info (#47416)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/47416

Test Plan: Imported from OSS

Reviewed By: kimishpatel

Differential Revision: D24752846

Pulled By: cccclai

fbshipit-source-id: 94d3c18c56161d1de3a16bb7c93502fedf71644c
2020-12-03 10:44:46 -08:00
Scott Wolchok
d1df4038ff [PyTorch] Make RecordFunctionCallback::should_run_ a function pointer (#48274)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48274

The std::function-ness of it was used only for tests. (std::function is huge at 32 bytes, and not particularly efficient.)
ghstack-source-id: 117498491

Test Plan: CI

Reviewed By: dzhulgakov

Differential Revision: D25102077

fbshipit-source-id: fd941ddf32235a9659a1a17609c27cc5cb446a54
2020-12-01 13:02:25 -08:00
Ilia Cherniavskii
f7a8bf2855 Use libkineto in profiler (#46470)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/46470

Adding ability to use Kineto (CUPTI) to profile CUDA kernels

Test Plan:
USE_KINETO=1 USE_CUDA=1 USE_MKLDNN=1 BLAS=MKL BUILD_BINARY=1 python setup.py develop install
python test/test_profiler.py

python test/test_autograd.py -k test_profile
python test/test_autograd.py -k test_record

```
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                                                   Name    Self CPU %      Self CPU   CPU total %     CPU total  CPU time avg     Self CUDA   Self CUDA %    CUDA total  CUDA time avg    # of Calls
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
                       Memcpy HtoD (Pageable -> Device)         0.00%       0.000us         0.00%       0.000us       0.000us       2.000us        33.33%       2.000us       1.000us             2
                                      sgemm_32x32x32_NN         0.00%       0.000us         0.00%       0.000us       0.000us       2.000us        33.33%       2.000us       2.000us             1
void at::native::vectorized_elementwise_kernel<4, at...         0.00%       0.000us         0.00%       0.000us       0.000us       1.000us        16.67%       1.000us       1.000us             1
                       Memcpy DtoH (Device -> Pageable)         0.00%       0.000us         0.00%       0.000us       0.000us       1.000us        16.67%       1.000us       1.000us             1
                                            aten::randn         5.17%      74.000us         6.71%      96.000us      48.000us       0.000us         0.00%       0.000us       0.000us             2
                                            aten::empty         1.33%      19.000us         1.33%      19.000us       4.750us       0.000us         0.00%       0.000us       0.000us             4
                                          aten::normal_         1.05%      15.000us         1.05%      15.000us       7.500us       0.000us         0.00%       0.000us       0.000us             2
                                               aten::to        77.90%       1.114ms        91.61%       1.310ms     436.667us       0.000us         0.00%       3.000us       1.000us             3
                                    aten::empty_strided         2.52%      36.000us         2.52%      36.000us      12.000us       0.000us         0.00%       0.000us       0.000us             3
                                            aten::copy_         2.73%      39.000us        11.19%     160.000us      53.333us       0.000us         0.00%       3.000us       1.000us             3
                                        cudaMemcpyAsync         4.34%      62.000us         4.34%      62.000us      20.667us       0.000us         0.00%       0.000us       0.000us             3
                                  cudaStreamSynchronize         1.61%      23.000us         1.61%      23.000us       7.667us       0.000us         0.00%       0.000us       0.000us             3
                                               aten::mm         0.21%       3.000us         7.20%     103.000us     103.000us       0.000us         0.00%       2.000us       2.000us             1
                                           aten::stride         0.21%       3.000us         0.21%       3.000us       1.000us       0.000us         0.00%       0.000us       0.000us             3
                                       cudaLaunchKernel         2.45%      35.000us         2.45%      35.000us      17.500us       0.000us         0.00%       0.000us       0.000us             2
                                              aten::add         0.49%       7.000us         4.27%      61.000us      61.000us       0.000us         0.00%       1.000us       1.000us             1
-------------------------------------------------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------  ------------
```

benchmark: https://gist.github.com/ilia-cher/a5a9eb6b68504542a3cad5150fc39b1a

Reviewed By: Chillee

Differential Revision: D25142223

Pulled By: ilia-cher

fbshipit-source-id: b0dff46c28da5fb0a8e01cf548aa4f2b723fde80
2020-11-25 04:32:16 -08:00