Commit graph

1409 commits

Author SHA1 Message Date
Kimish Patel
ede3f5421f [Pytorch Delegated Backend] Save function name in debug info (#57481)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57481

This diff introduces function name to InlinedCallStack.
Since we are using InlinedCallStack for debug information in lite
interpreter as well as delegate backends, where InlinedCallStack cannot
be constructed from model source code, we need to save function name.
In the absence of function name Function* is used to get name of the
function. This is when JIT compiles code at runtime.
When that is not possible, this diff introduces a way to obtain function
name.

Test Plan:
test_backend
test_cs_debug_info_serialization

test_backend
test_cs_debug_info_serialization

Imported from OSS

Differential Revision:
D28159097
D28159097

Reviewed By: raziel, ZolotukhinM

Pulled By: kimishpatel

fbshipit-source-id: deacaea3325e27273f92ae96cf0cd0789bbd6e72
2021-05-25 13:19:02 -07:00
Kimish Patel
813adf1076 [Pytorch Delegated Backend] Save operator name and function name in (#57441)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57441

debug info

Previous diffs did not save operator name in debug info. For delegated
backends that only idenfity op for profiling with debug handle, operator
name should be stores as well.
Furthermore to complete debug informaton also serialize function name.

Test Plan:
Existing lite interpreter and backend tests

Existing lite interpreter and backend tests

Imported from OSS

Differential Revision:
D28144581
D28144581

Reviewed By: raziel

Pulled By: kimishpatel

fbshipit-source-id: 415210f147530a53b444b07f1d6ee699a3570d99
2021-05-25 13:17:54 -07:00
Raghavan Raman
dd7bbe1a63 [NNC] Make splitWithMask transform in-place (#58269)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58269

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28427227

Pulled By: navahgar

fbshipit-source-id: 4e38a436abcf4752fd7ef6ab3666876eec6ea5ba
2021-05-25 11:32:51 -07:00
Raghavan Raman
e2467cc43e [NNC] Make splitWithTail transform in-place (#58268)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58268

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28427228

Pulled By: navahgar

fbshipit-source-id: 270b62c4e83739ad21dd68f375120e56881b394f
2021-05-25 11:31:14 -07:00
Adnios
09a8f22bf9 Add mish activation function (#58648)
Summary:
See issus: https://github.com/pytorch/pytorch/issues/58375

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58648

Reviewed By: gchanan

Differential Revision: D28625390

Pulled By: jbschlosser

fbshipit-source-id: 23ea2eb7d5b3dc89c6809ff6581b90ee742149f4
2021-05-25 10:36:21 -07:00
Zhengxu Chen
2b0ec9c3cf Reapply "[jit] Implement ScriptProfile to collect instruction profiles." (#58783)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58783

This reverts commit fc804b5def.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28617037

Pulled By: zhxchen17

fbshipit-source-id: 645de2ede20500a5c218d6ec3c7faae94de37a14
2021-05-24 18:23:21 -07:00
Thomas J. Fan
a7f4f80903 ENH Adds dtype to nn.functional.one_hot (#58090)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/33046
Related to https://github.com/pytorch/pytorch/issues/53785

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58090

Reviewed By: zou3519

Differential Revision: D28640893

Pulled By: jbschlosser

fbshipit-source-id: 3686579517ccc75beaa74f0f6d167f5e40a83fd2
2021-05-24 13:48:25 -07:00
Jacob Szwejbka
1c5f63d86d [Pytorch Edge] Model Ops compatibility api (#57501)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57501

Add an api _get_model_ops_and_info to get root operators and versioning info of a model in both cxx and python, and the input can be from a file path or buffer.
ghstack-source-id: 129620112

Test Plan: unit test.

Reviewed By: xcheng16, raziel

Differential Revision: D28162765

fbshipit-source-id: 4413c1e906b8a872e4a717d849da37347adbbea4
2021-05-24 12:00:06 -07:00
Kimish Patel
d6d726f781 [Pytorch Backend delegation] Add api for backend lowering to query debug (#55462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55462

handles and symbolicate exception callstack thrown from backend.

Objective of this diff is to achieve improve error reporting when
exceptions are raised from lowered backend. We would effectively like to
get the same model level stack trace that you would get without having
lowered some module to backend.

For example:
```
class AA(nn.Module):
  def forward(self, x, y):
    return x + y

class A(nn.Module):
  def __init__(...):
    self.AA0 = AA()
  def forward(self, x, y):
    return self.AA0.forward(x, y) + 3

class B(nn.Module):
  def forward(self, x):
    return x + 2

class C(nn.Module):
  def __init__(...):
    self.A0 = A()
    self.B0 = B()
  def forward(self, x, y):
    return self.A0.forward(x, y) + self.B0.forward(x)
```
If the we then do C().forward(torch.rand((2,3)), torch.rand(14,2))) we
will likely see error stack like:
```
C++ exception with description "The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in forward

    def forward(self, x, y):
      return self.A0.forward(x, y) + self.B0.forward(x)
             ~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in forward

    def forward(self, x, y):
      return self.AA0.forward(x, y) + 3
             ~~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in forward

    def forward(self, x, y):
      return x + y
             ~~~~~ <--- HERE
```

We would like to see the same error stack if we lowered C.A0 to some
backend.

With this diff we get something like:
```
  Module hierarchy:top(C).A0(backend_with_compiler_demoLoweredModule).AA0(AA)
Traceback of TorchScript (most recent call last):
  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return self.A0.forward(x, y) + self.B0.forward(x)
             ~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 5, in FunctionName_UNKNOWN
                typed_inputs: List[Any] = [x, y, ]
                if self.__backend.is_available() :
                  _0, = self.__backend.execute(self.__handles["forward"], typed_inputs)
                        ~~~~~~~~~~~~~~~~~~~~~~ <--- HERE
                  assert isinstance(_0, Tensor)
                  return _0
  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return self.AA0.forward(x, y) + 3
             ~~~~~~~~~~~~~~~~ <--- HERE

  File "<string>", line 3, in FunctionName_UNKNOWN

    def forward(self, x, y):
      return x + y
             ~~~~~ <--- HERE
```
This is achieved in 3 parts:
Part 1:
A. BackendDebugInfoRecorder:
   During backend lowering, in `to_backend`, before calling the preprocess
   function corresponding to the backend. This will facilitate recording of
   debug info (such as source range + inlined callstack) for the lowered module.
B. Instantiate WithBackendDebugInfoRecorder with BackendDebugInfoRecorder.
   This initializes thread local pointer to BackendDebugInfoRecorder.
C. generate_debug_handles:
   In preprocess function, the backend will call generate_debug_handles
   for each method being lowered separately. generate_debug_handles
   takes `Graph` of the method being lowered and returns a map
   of Node*-to-debug_handles. Backend is responsible for storing debug
   handles appropriately so as to raise exception (and later profiling)
   using debug handles when the exception being raised corresponds to
   particular Node that was lowered.
   Inside generate_debug_handles, we will query the current
   BackendDebugHandleInfoRecorder, that is issuing debug handles. This debug
   handle manager will issue debug handles as well as record
   debug_handles-to-<source range, inlined callstack> map.
D. Back in `to_backend`, once the preprocess function is has finished
   lowering the module, we will call `stopRecord` on
   BackendDebugInfoRecorder. This will return the debug info map. This
   debug info is then stored inside the lowered module.

Part 2:
Serialization:
During serialization for bytecode (lite interpreter), we will do two
things:
1. Extract all the source ranges that are contained inside
debug_handles-to-<source range, inlined callstack> map for lowered
module. This will be source range corresponding to debug handles,
including what is there is inlined callstack. Since we replaced original
module with lowered module, we wont be serializing code for the original
module and thus no source range. That is why the source range will have
to be stored separately. We will lump all the source ranges for all the
lowered modules in one single debug_pkl file.
2. Then we will serialize debug_handles-to-<source range, inlined
callstack> map.

Now during deserialization we will be able to reconstruct
debug_handles-to-<source range, inlined callstack> map. Given all
debug_handles are unique we would not need any module information.

Test Plan:
Tests are added in test_backend.cpp

Tests are added in test_backend.cpp

Imported from OSS

Differential Revision:
D27621330
D27621330

Reviewed By: raziel

Pulled By: kimishpatel

fbshipit-source-id: 0650ec68cda0df0a945864658cab226a97ba1890
2021-05-22 08:33:07 -07:00
Xiaodong Wang
4c961beacb Revert D28474878: Always use intrusive_ptr for Message (1 out of 2)
Test Plan: revert-hammer

Differential Revision:
D28474878 (4d704e607d)

Original commit changeset: 5b76d45e05f6

fbshipit-source-id: 677c5bc7f02dca23213f778eb0e626a2f6600f3b
2021-05-21 19:24:22 -07:00
Xiaodong Wang
b8a04e25ec Revert D28474982: Make TP agent use streams from Future when sending response
Test Plan: revert-hammer

Differential Revision:
D28474982 (19a7472702)

Original commit changeset: c0034eb3f2a2

fbshipit-source-id: fb260c71e6c9dd5a2c44121fe4729a4f4418532b
2021-05-21 19:23:01 -07:00
Luca Wehrstedt
19a7472702 Make TP agent use streams from Future when sending response (#58428)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58428

Until now, the TP agent expected the output of a remote function to be on the same streams as the inputs. In other words, it used the lazy stream context of the inputs to synchronize the output tensors. This was true in the most common case of a synchronous remote function. However it wasn't true for async functions, for fetching RRefs, ... The more generic way is to use the CUDA events held by the Future to perform this synchronization. (These events may be on the input streams, or they may not be!).
ghstack-source-id: 129567045

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28474982

fbshipit-source-id: c0034eb3f2a2ea525efb63a31b839bc086060e7e
2021-05-21 13:15:35 -07:00
Luca Wehrstedt
4d704e607d Always use intrusive_ptr for Message (1 out of 2) (#58422)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58422

Similar to Future (which I tackled recently), Message is an ivalue type (a "custom class" one), and the natural way to represent it is inside an intrusive_ptr. However in the RPC code we had a mix of usages, often passing Message by value. This has undesirable consequences, as it could easily trigger a copy by accident, which I believe is why in many places we accepted _rvalue references_ to Message, in order to force the caller to move. In my experience this is non-idiomatic in C++ (normally a function signature specifies how the function consumes its arguments, and it's up to the caller to then decide whether to copy or move).

By moving to intrusive_ptr everywhere I think we eliminate and simplify many of the problems above.

In this PR I do half of the migration, by updating everything except the `toMessageImpl` methods, which will come in the next PR.
ghstack-source-id: 129567053

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28474878

fbshipit-source-id: 5b76d45e05f6fa58c831e369c5c964d126187a6c
2021-05-21 13:15:24 -07:00
Edward Yang
fc804b5def Revert D28133579: [jit] Implement ScriptProfile to collect instruction profiles.
Test Plan: revert-hammer

Differential Revision:
D28133579 (034a238bab)

Original commit changeset: e7e30e961513

fbshipit-source-id: 5a7756468b4f2eeed24d2abb7b52ab46d081a95e
2021-05-21 08:18:40 -07:00
Zhengxu Chen
034a238bab [jit] Implement ScriptProfile to collect instruction profiles. (#57397)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57397

Introduces two main classes in C++ runtime:

ScriptProfile is the implementation for enalbing and disabling interpreter
profiling in C++. This should be only used from Python, and we will add
corresponding Python API in the next diff.

InstructionSpan is a utility class to instrument execution of each single
instruction. A start timestamp is recorded in the consturctor, and an end
timestamp is recorded in the destructor. During destruction, this will send
runtime data to all enabled ScriptProfile instances.

Test Plan:
build/bin/test_jit --gtest_filter='ScriptProfileTest.Basic'

Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28133579

fbshipit-source-id: e7e30e96151367022793ab3ad323f01c51ad4a3b
2021-05-20 14:11:03 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
9db64e6e56 Revert "Striding for lists Part 2 (#49352)" (#58523)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58523

This reverts commit fee7e8b91d.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D28528023

Pulled By: tugsbayasgalan

fbshipit-source-id: 9fa1d86f0c81fcc6fd3798e0d51a712a3c9b3952
2021-05-20 13:20:33 -07:00
Edvard Ghazaryan
ccad77aa22 Added OperatorMap for mapping Operator to any template <T> (#58060)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58060

Generic way to check if Operator belongs to predefined map, and if so via public method(s) access to map value. In general value can be anything for example Operator's schema.

Test Plan: buck test caffe2/test/cpp/jit:jit -- OperatorMap

Reviewed By: Krovatkin

Differential Revision: D28357933

fbshipit-source-id: ba3248cf06c07f16aebafccb7ae71c1245afb083
2021-05-19 11:38:49 -07:00
Bert Maher
dcfc2050bd VaryingShape<Strides>::isComplete() needs to consider whether each Stride is complete (#58510)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58510

In some case that I don't fully understand we're getting a stride that is:
```
{2:1, 1:1, 0:*}
```
(in this debug output, M:N means stride index M, stride value N).  This shape
should be considered incomplete, since we don't actually know the values of the
stride, but VaryingShape::isComplete considers it complete because it only
checks the presence of elements in the vector, not whether those elements are
themselves complete.
ghstack-source-id: 129279583

Test Plan:
new unit test in test/cpp/jit

To see the failure in the context of a real model:
```
./fblearner/predictor/loadgen/download-requests.sh 272478342_0 10 ~/local/requests/272478342_0.recordio

buck-out/gen/fblearner/predictor/loadgen/replay_model_requests --model_id=272478342_0 --replay_record_source=recordio:/data/users/bertrand/requests/272478342_0.recordio --remote_port=9119 --output_file=/data/users/bertrand/responses/272478342_0_actual.recordio --output_type=recordio

buck-out/gen/fblearner/predictor/loadgen/replay_model_requests --model_id=272478342_0 --replay_record_source=recordio:/data/users/bertrand/requests/272478342_0.recordio --remote_port=9119 --output_file=/data/users/bertrand/responses/272478342_0_actual.recordio --output_type=recordio
```

Reviewed By: Krovatkin

Differential Revision: D28520062

fbshipit-source-id: 3ca900337d86480a40fbd90349a698cbb2fa5f11
2021-05-18 21:45:46 -07:00
Raghavan Raman
4b859cbca1 [NNC] Do not optimize conditionals when the corresponding loop is not normalized (#57675)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57675

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231375

Pulled By: navahgar

fbshipit-source-id: bcbcebca25577744c7190a0aa9fa376f76dea77d
2021-05-18 14:25:53 -07:00
Raghavan Raman
a71b99b50d [NNC] Add a method to check if a loop is normalized (#57674)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57674

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231377

Pulled By: navahgar

fbshipit-source-id: 3d92d532f1e1f78c9d94619980340622b73f99ec
2021-05-18 14:25:50 -07:00
Raghavan Raman
3fe72d30dc [NNC] Optimize conditionals that correspond to the form generated for aten::cat op. (#57673)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57673

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28231374

Pulled By: navahgar

fbshipit-source-id: 1777a63df4e5ebed6d515683bd772a88be465b3a
2021-05-18 14:23:48 -07:00
Raghavan Raman
34d6618386 [NNC] Fixing a bug in simplifier (#58291)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58291

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28435393

Pulled By: navahgar

fbshipit-source-id: 517e47385a93a43d2ddf054382adc81c18484066
2021-05-18 01:28:33 -07:00
Elias Ellison
211bac53ef [JIT] Add optimize_for_inference API (#58193)
Summary:
Freezing exists as a pass which partially evaluates your model and applies generic optimizations which should speed it up. Optimize for inference is a counterpart to these optimizations which runs build & server specific optimizations.  The interaction with existing `optimize_frozen_module` is not great, I guess we could just deprecate the API entirely? it was never officially released but just existed to document the `optimize_numerics` keyword.

Eventually, I would like to add a way of adding example inputs but I didnt add that here because they are not being used at all yet. I also have not yet included a way to blacklist individual optimizations, and would like to wait until we move this to Beta and have a little more clarity on how everything will fit together. I also think blacklisting will be an uncommon use case for the current optimizations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58193

Reviewed By: bertmaher, navahgar

Differential Revision: D28443714

Pulled By: eellison

fbshipit-source-id: b032355bb2585720a6d2f00c89d0d9a7ef60e649
2021-05-15 15:50:14 -07:00
Lunwen He
0a561f83ca [PyTorch Mobile]Fix unit test (#58202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58202

This unit test was testing the wrong target. It should test the sampler under jit::mobile. This diff fixes it.

Test Plan: run unit tests

Reviewed By: shreyanb98

Differential Revision: D28384839

fbshipit-source-id: 35cc63be2e73ca9b1a7d30d6f67fffcfe5021fa2
2021-05-14 13:43:22 -07:00
Lunwen He
73d51406fa [PyTorch Mobile]Move train related files to their own folder (#58205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58205

It's worthing moving train related files into their own folder since we are adding more code under the mobile directory.

This diff does that.

Test Plan: run unit tests and ci

Reviewed By: iseeyuan

Differential Revision: D28402432

fbshipit-source-id: cd76a1c4f8ff06508cdc3aad8a169fbf34bb4995
2021-05-14 12:54:44 -07:00
Lunwen He
a8122062c0 [PyTorch Mobile]Add light version of RandomSampler (#58201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58201

Add light version of RandomSampler which can be used torch mobile.

Test Plan: run unit test

Reviewed By: iseeyuan

Differential Revision: D28364467

fbshipit-source-id: 3148129fa56533f5f4b76b63b60e8778eeaf815f
2021-05-13 22:53:21 -07:00
Martin Yuan
d833caaf6b [PyTorch Mobile][Forward/backward compatibility] Number of arguments for operators (#56845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56845

Handle forward/backward compatibility caused by added default arguments in mobile. As an example,

In older version, operator aten::foo's schema is
```
foo(Tensor a, Tensor b) -> Tensor
```
In the new version, the schema is updated to
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```

## Model file
Serialize the number of specified arguments to each operator into the bytecode operator table. Before the operator table contains operator name and overload name:
```
('operators', (('aten::foo', ''),))
```
Now the number of specified arguments is added:
```
# bytecode version 6
('operators', (('aten::foo', '', 2),))
```
where "2" means the number of specified arguments.

Since there's bytecode schema change, the bytecode version number is bumped. This PR is to be landed after #56002 , where the version number is bumped from 4 to 5. This PR bumps the version number from 5 to 6.

## Runtime and backward compatibility
When the operator is found (either jit or c10), we have the OperatorHandle, where the operator schema can be accessed by
```
op.value().schema().arguments()
```
Adaptation is implemented to handle backward compatibility. For the example above, the new runtime holds the updated schema:
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```
Whereas the model file carries
```
(('aten::foo', ''), 2)
```
We can implement a wrapper around the original function pointer to push the default argument to the stack.

## Deliver time and forward compatibility
At model delivery time, two checks can be done:
### Operator check
Two APIs to be provided:
* Runtime: An API to get a runtime’s ops and their schemas (i.e. the # of args). D27920185(WIP)
* Model: An API to get a model’s ops and their schema requirements (i.e. the # of args required).

The APIs can be used to check
* runtime.ops() is a superset of model.ops()
* for each op in model.ops() validate their schemas are compatible with those in runtime.ops() -- i.e. the # args required in a model op are <= # args in the runtime op.

Note that only root ops in the model needs to be checked here. For transient ops it's not necessary. For example, if a root op, "aten::root" calls "aten::foo", it's "aten::root"'s responsibility to adapt to "aten::foo"'s change, or "aten::root" itself needs to be updated too.
### Bytecode version backport (PR coming)
When delivering a model with bytecode v6, if the runtime only works with bytecode v5 and lower, backport is needed.
* The number of arguments is removed from the operator table
* The bytecode version is changed from 6 to 5

Note that this backport is a pure format change, it does not guarantee the backported model always runs in old runtime. The operator check mentioned before should be done first, before it’s back ported to v5.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27986544

Pulled By: iseeyuan

fbshipit-source-id: 143e19d4798cfb96b65095538dd648eead4e3fda
2021-05-13 14:20:47 -07:00
Jacob Szwejbka
1de9f51782 [Pytorch Edge] Runtime ops compatibility api (#57570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57570

Move runtime ops compatibility api to OSS and introduce schema information
ghstack-source-id: 128789159

Test Plan: unit test and manually ran it for a runtime with all (non custom) ops, and the bixray models unittest {P412728176}

Reviewed By: raziel

Differential Revision: D28203104

fbshipit-source-id: 432a7d0247bccfb2e1ce90e8d41f81596efa3d67
2021-05-13 10:20:41 -07:00
Jeffrey Wan
e71b526e7e Add inference mode python bindings and tests (#58045)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56608

 - Adds binding to the `c10::InferenceMode` RAII class in `torch._C._autograd.InferenceMode` through pybind. Also binds the `torch.is_inference_mode` function.
 - Adds context manager `torch.inference_mode` to manage an instance of `c10::InferenceMode` (global).  Implemented in `torch.autograd.grad_mode.py` to reuse the `_DecoratorContextManager` class.
 - Adds some tests based on those linked in the issue + several more for just the context manager

Issues/todos (not necessarily for this PR):
- Improve short inference mode description
- Small example
- Improved testing since there is no direct way of checking TLS/dispatch keys
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58045

Reviewed By: agolynski

Differential Revision: D28390595

Pulled By: soulitzer

fbshipit-source-id: ae98fa036c6a2cf7f56e0fd4c352ff804904752c
2021-05-13 08:55:35 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
fee7e8b91d Striding for lists Part 2 (#49352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49352

In this PR, we replace all definitions of slice to take None parameters for the start, end, and step. This will simplify the compiler logic

Test Plan:
test_jit test cases

Imported from OSS

Reviewed By: jamesr66a, nikithamalgifb

Differential Revision: D25929903

fbshipit-source-id: 5bfc6bad514a8aafbef2dacc706f95f867fe85f1
2021-05-13 00:16:02 -07:00
Mikhail Zolotukhin
c751e53800 [TensorExpr] Implement 'call_raw' in IREval. (#57882)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57882

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28306752

Pulled By: ZolotukhinM

fbshipit-source-id: 11d0034f9bfbadf8483de90c457f952a2161f10b
2021-05-12 14:08:18 -07:00
Shen Li
cf7a0e5af4 Use RPC context streams to cover serde ops (#57926)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57926

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D28316526

Pulled By: mrshenli

fbshipit-source-id: 1907ec8f46e40fa5049d810c6ad959263361b6aa
2021-05-11 07:07:51 -07:00
Ailing Zhang
481806be97 Fix creation_meta for multi view outputs in NoGradMode/InferenceMode. (#57842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57842

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28295649

Pulled By: ailzhang

fbshipit-source-id: e0e11f537a97825e3fb7255aa561d3e855a6d3ce
2021-05-10 12:37:30 -07:00
CodemodService FBSourceClangFormatLinterBot
cbfce376a8 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28319469

fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988
2021-05-10 03:39:31 -07:00
Raghavan Raman
259d19a733 [JIT] Adding a concat optimization pass (#55474)
Summary:
This PR adds a new pass in JIT that optimizes `aten::cat` ops.

Specifically, here are optimizations performed:
* Eliminate redundant in `cat` inputs by performing cse on the list of inputs.
   - This includes eliminating fully redundant `cat` ops when all the inputs are the same as well the case when "all but one" of the inputs have already been concatenated.
* Expand `cat` into multiple copies and eliminate redundancies.
   - This also includes eliminating redundancies in the underlying buffers used for `cat`.

These optimizations are not enabled in any compilation flow at this point.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55474

Reviewed By: albanD

Differential Revision: D27624511

Pulled By: navahgar

fbshipit-source-id: d509289fafc23e73b02f64a90219148896817339
2021-05-09 22:06:44 -07:00
Shen Li
fc55290e5b Fix distributed autograd gradients synchronization (#57792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57792

There are two problems when using CUDA RPC with distributed autograd
and distributed optimizer:

1) In local autograd engine, all autograd functions/nodes, including
AccumualteGrad will use the forward stream for backward computation.
But distributed autograd skips AccumulateGrad autograd function/node
and directly calls into `AccumulateGrad::accumulateGrad`. As the
result, it will use the default stream to accumulate gradients
instead of the forward stream. This commit changes that and uses the
forward stream to accumulate gradients, matching forward behavior.
2) Distributed optimizer and distributed autograd backward are
separate RPC calls, and CUDA streams are not synchronized across
different RPC calls. As a result, distributed optimizer might
consume gradients before they are ready. This commit uses CUDA
events to record the completion of gradient computation, and use
those events to block current streams when getGradients() are called.

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D28274876

Pulled By: mrshenli

fbshipit-source-id: 22e607152324ae918084066cde8c5dbb418bba7c
2021-05-09 17:32:59 -07:00
Martin Yuan
737f48dfc5 Remove _save_data() and _load_data() from mobile (#57879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57879

_save_data() and _load_data() were designed as a protocol of data serialization of trainer client. As confirmed with kwanmacher and dreiss , they are not used. In addition, there's no plan to use them in Federated Learning flow. Remove them for now.

Test Plan: Imported from OSS

Reviewed By: kwanmacher

Differential Revision: D28306682

Pulled By: iseeyuan

fbshipit-source-id: 1b993ce4d78e372ae9b83bcbe496a196f9269d47
2021-05-08 10:52:44 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Chen Lai
8c04593c0a [PyTorch Edge] Add backport to export old bytecode models (#56802)
Summary:
Add an api to backport a model vn to model vi. It accept an input model (file or buffer) and output a model (file or buffer) with an expected bytecode version.

In this change, the input is a model and it can come from a file or buffer. The output is a model and can be either file path or buffer.

When backport fails, function return false with a warning message :
```
/Users/chenlai/pytorch/cmake-build-debug/bin/test_jit --gtest_filter=LiteInterpreterTest.BackPortByteCodeModelV4:LiteInterpreterTest/*.BackPortByteCodeModelV4:*/LiteInterpreterTest.BackPortByteCodeModelV4/*:*/LiteInterpreterTest/*.BackPortByteCodeModelV4 --gtest_color=no
Testing started at 2:32 PM ...
CUDA not available. Disabling CUDA and MultiCUDA tests

[W backport.cpp:419] Warning: Backport doesn't support backport to version3 (function _backport_for_mobile_impl)
Process finished with exit code 0
```

## Test
1. Run both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py`.
2. Run all prod models with backport api.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56802

ghstack-source-id: 128425510

Test Plan: CI

Reviewed By: raziel, iseeyuan

Differential Revision: D27844651

fbshipit-source-id: 8a803cf6c76433ee0a3049b1a5570585d569f8d6
2021-05-07 18:14:33 -07:00
Luca Wehrstedt
36e47af58b Pass reference to parent future in callbacks (#57635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635

Note: this PR looks massive, but it's just one simple change, codemodded many times.

In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't).
ghstack-source-id: 128296580

Test Plan: CI

Reviewed By: wanchaol

Differential Revision: D28178783

fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081
2021-05-07 03:59:18 -07:00
Luca Wehrstedt
9aa1461a68 Make wrapPropagateTLSState more generic (#57634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57634

`wrapPropagateTLSState` was restricting its argument to be an argument-less function, and I need to relax this for later work.

Also, it was requiring its argument to be converted to `std::function`, and also returned a `std::function`. Each creation of a `std::function` could cause a heap allocation. It's not particularly expensive, but here we can easily avoid it by having `wrapPropagateTLSState` directly operate on generic callables (thus, possibly, raw lambdas).
ghstack-source-id: 128295264

Test Plan: CI

Reviewed By: ilia-cher

Differential Revision: D28178782

fbshipit-source-id: d657f5751514974518606dd4fc4175e805dcb90a
2021-05-07 03:58:08 -07:00
Raghavan Raman
1f178de800 [NNC] Add support for computing conv with dynamic shapes (#57514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57514

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28226918

Pulled By: navahgar

fbshipit-source-id: 818ac8411b809033388d419c8f33db6aeece4b33
2021-05-06 01:08:25 -07:00
Raghavan Raman
95fbc158d4 [NNC] Add a method to compute conv without bias (#57512)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57512

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28226919

Pulled By: navahgar

fbshipit-source-id: e84b944f7fdc84a77409d59218ceaa0862298f3c
2021-05-06 01:07:21 -07:00
albanD
0b51ee311d Add missing return statement from 57057 (#57669)
Summary:
Fixes a bug introduced by https://github.com/pytorch/pytorch/issues/57057

cc ailzhang while writing the tests, I realized that for these functions, we don't properly set the CreationMeta in no grad mode and Inference mode. Added a todo there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57669

Reviewed By: soulitzer

Differential Revision: D28231005

Pulled By: albanD

fbshipit-source-id: 08a68d23ded87027476914bc87f3a0537f01fc33
2021-05-05 16:13:35 -07:00
Alban Desmaison
15c092b888 Revert "Make grad mode error just a warning (#56401)" (#57640)
Summary:
This reverts commit 63dac82444.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57640

Reviewed By: soulitzer, yuguo68

Differential Revision: D28223946

Pulled By: albanD

fbshipit-source-id: 641b87cff1e2f08162ca8cacae333105e89438f1
2021-05-05 13:07:29 -07:00
Chen Lai
fb9a32b7b4 [PyTorch][Edge] Add api to get bytecode model version (#56801)
Summary:
Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer.
## Test
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801

ghstack-source-id: 128169647

Test Plan:
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Reviewed By: iseeyuan

Differential Revision: D27961417

fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c
2021-05-05 09:17:26 -07:00
Mikhail Zolotukhin
dedaf4fad7 Reland: [TensorExpr] Add methods for inspecting generated code in TensorExprKernel. (#57560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57560

The new methods allow to peak into bufferArgs which describe parameters
that codegen expects. This description includes info whether a given
parameter is a scalar var or a buffer and in case it's a buffer allows
to get the corresponding `Buf*` pointer from which we could get the
expected sizes.

Relanding #57074  which was reverted because I forgot to guard a new
test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28199048

Pulled By: ZolotukhinM

fbshipit-source-id: 636e838e7e242a3c63e97ec453b8fae9b6380231
2021-05-05 09:11:40 -07:00
Mikhail Zolotukhin
e686c66fe7 Reland: [TensorExpr] Add TensorExprKernel::runFast method. (#57552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57552

This method uses `CodeGen::call_raw` instead of `CodeGen::call`.

Relanding #57328 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28195047

Pulled By: ZolotukhinM

fbshipit-source-id: bcfd3cb5b4f33a149b7549515ffd705e2c4f208f
2021-05-05 09:11:37 -07:00
Mikhail Zolotukhin
0bf69278f7 Reland: [TensorExpr] Add CodeGen::call_raw method. (#57551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57551

The new method allows to pass input and output arguments by `void*`
pointers instead of CallArgs. That helps to reduce the invocation
overhead. Currently this is only supported in LLVM codegen.

Relanding #55113 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28195049

Pulled By: ZolotukhinM

fbshipit-source-id: 035b77ae996dbbcd542b4b0e4c011b41e8d7828b
2021-05-05 09:10:25 -07:00
Mike Ruberry
1461859fde Revert D28048289: [TensorExpr] Add methods for inspecting generated code in TensorExprKernel.
Test Plan: revert-hammer

Differential Revision:
D28048289 (6b2cb939c5)

Original commit changeset: 3867e862a0ec

fbshipit-source-id: bdd45dcc4b229673efeb06da411bbf0c58d44026
2021-05-04 11:29:14 -07:00