Commit graph

1388 commits

Author SHA1 Message Date
Raghavan Raman
34d6618386 [NNC] Fixing a bug in simplifier (#58291)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/58291

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28435393

Pulled By: navahgar

fbshipit-source-id: 517e47385a93a43d2ddf054382adc81c18484066
2021-05-18 01:28:33 -07:00
Elias Ellison
211bac53ef [JIT] Add optimize_for_inference API (#58193)
Summary:
Freezing exists as a pass which partially evaluates your model and applies generic optimizations which should speed it up. Optimize for inference is a counterpart to these optimizations which runs build & server specific optimizations.  The interaction with existing `optimize_frozen_module` is not great, I guess we could just deprecate the API entirely? it was never officially released but just existed to document the `optimize_numerics` keyword.

Eventually, I would like to add a way of adding example inputs but I didnt add that here because they are not being used at all yet. I also have not yet included a way to blacklist individual optimizations, and would like to wait until we move this to Beta and have a little more clarity on how everything will fit together. I also think blacklisting will be an uncommon use case for the current optimizations.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58193

Reviewed By: bertmaher, navahgar

Differential Revision: D28443714

Pulled By: eellison

fbshipit-source-id: b032355bb2585720a6d2f00c89d0d9a7ef60e649
2021-05-15 15:50:14 -07:00
Lunwen He
0a561f83ca [PyTorch Mobile]Fix unit test (#58202)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58202

This unit test was testing the wrong target. It should test the sampler under jit::mobile. This diff fixes it.

Test Plan: run unit tests

Reviewed By: shreyanb98

Differential Revision: D28384839

fbshipit-source-id: 35cc63be2e73ca9b1a7d30d6f67fffcfe5021fa2
2021-05-14 13:43:22 -07:00
Lunwen He
73d51406fa [PyTorch Mobile]Move train related files to their own folder (#58205)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58205

It's worthing moving train related files into their own folder since we are adding more code under the mobile directory.

This diff does that.

Test Plan: run unit tests and ci

Reviewed By: iseeyuan

Differential Revision: D28402432

fbshipit-source-id: cd76a1c4f8ff06508cdc3aad8a169fbf34bb4995
2021-05-14 12:54:44 -07:00
Lunwen He
a8122062c0 [PyTorch Mobile]Add light version of RandomSampler (#58201)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/58201

Add light version of RandomSampler which can be used torch mobile.

Test Plan: run unit test

Reviewed By: iseeyuan

Differential Revision: D28364467

fbshipit-source-id: 3148129fa56533f5f4b76b63b60e8778eeaf815f
2021-05-13 22:53:21 -07:00
Martin Yuan
d833caaf6b [PyTorch Mobile][Forward/backward compatibility] Number of arguments for operators (#56845)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56845

Handle forward/backward compatibility caused by added default arguments in mobile. As an example,

In older version, operator aten::foo's schema is
```
foo(Tensor a, Tensor b) -> Tensor
```
In the new version, the schema is updated to
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```

## Model file
Serialize the number of specified arguments to each operator into the bytecode operator table. Before the operator table contains operator name and overload name:
```
('operators', (('aten::foo', ''),))
```
Now the number of specified arguments is added:
```
# bytecode version 6
('operators', (('aten::foo', '', 2),))
```
where "2" means the number of specified arguments.

Since there's bytecode schema change, the bytecode version number is bumped. This PR is to be landed after #56002 , where the version number is bumped from 4 to 5. This PR bumps the version number from 5 to 6.

## Runtime and backward compatibility
When the operator is found (either jit or c10), we have the OperatorHandle, where the operator schema can be accessed by
```
op.value().schema().arguments()
```
Adaptation is implemented to handle backward compatibility. For the example above, the new runtime holds the updated schema:
```
foo(Tensor a, Tensor b, int groups=1) -> Tensor
```
Whereas the model file carries
```
(('aten::foo', ''), 2)
```
We can implement a wrapper around the original function pointer to push the default argument to the stack.

## Deliver time and forward compatibility
At model delivery time, two checks can be done:
### Operator check
Two APIs to be provided:
* Runtime: An API to get a runtime’s ops and their schemas (i.e. the # of args). D27920185(WIP)
* Model: An API to get a model’s ops and their schema requirements (i.e. the # of args required).

The APIs can be used to check
* runtime.ops() is a superset of model.ops()
* for each op in model.ops() validate their schemas are compatible with those in runtime.ops() -- i.e. the # args required in a model op are <= # args in the runtime op.

Note that only root ops in the model needs to be checked here. For transient ops it's not necessary. For example, if a root op, "aten::root" calls "aten::foo", it's "aten::root"'s responsibility to adapt to "aten::foo"'s change, or "aten::root" itself needs to be updated too.
### Bytecode version backport (PR coming)
When delivering a model with bytecode v6, if the runtime only works with bytecode v5 and lower, backport is needed.
* The number of arguments is removed from the operator table
* The bytecode version is changed from 6 to 5

Note that this backport is a pure format change, it does not guarantee the backported model always runs in old runtime. The operator check mentioned before should be done first, before it’s back ported to v5.

Test Plan: Imported from OSS

Reviewed By: gmagogsfm

Differential Revision: D27986544

Pulled By: iseeyuan

fbshipit-source-id: 143e19d4798cfb96b65095538dd648eead4e3fda
2021-05-13 14:20:47 -07:00
Jacob Szwejbka
1de9f51782 [Pytorch Edge] Runtime ops compatibility api (#57570)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57570

Move runtime ops compatibility api to OSS and introduce schema information
ghstack-source-id: 128789159

Test Plan: unit test and manually ran it for a runtime with all (non custom) ops, and the bixray models unittest {P412728176}

Reviewed By: raziel

Differential Revision: D28203104

fbshipit-source-id: 432a7d0247bccfb2e1ce90e8d41f81596efa3d67
2021-05-13 10:20:41 -07:00
Jeffrey Wan
e71b526e7e Add inference mode python bindings and tests (#58045)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56608

 - Adds binding to the `c10::InferenceMode` RAII class in `torch._C._autograd.InferenceMode` through pybind. Also binds the `torch.is_inference_mode` function.
 - Adds context manager `torch.inference_mode` to manage an instance of `c10::InferenceMode` (global).  Implemented in `torch.autograd.grad_mode.py` to reuse the `_DecoratorContextManager` class.
 - Adds some tests based on those linked in the issue + several more for just the context manager

Issues/todos (not necessarily for this PR):
- Improve short inference mode description
- Small example
- Improved testing since there is no direct way of checking TLS/dispatch keys
-

Pull Request resolved: https://github.com/pytorch/pytorch/pull/58045

Reviewed By: agolynski

Differential Revision: D28390595

Pulled By: soulitzer

fbshipit-source-id: ae98fa036c6a2cf7f56e0fd4c352ff804904752c
2021-05-13 08:55:35 -07:00
Tugsbayasgalan (Tugsuu) Manlaibaatar
fee7e8b91d Striding for lists Part 2 (#49352)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49352

In this PR, we replace all definitions of slice to take None parameters for the start, end, and step. This will simplify the compiler logic

Test Plan:
test_jit test cases

Imported from OSS

Reviewed By: jamesr66a, nikithamalgifb

Differential Revision: D25929903

fbshipit-source-id: 5bfc6bad514a8aafbef2dacc706f95f867fe85f1
2021-05-13 00:16:02 -07:00
Mikhail Zolotukhin
c751e53800 [TensorExpr] Implement 'call_raw' in IREval. (#57882)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57882

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D28306752

Pulled By: ZolotukhinM

fbshipit-source-id: 11d0034f9bfbadf8483de90c457f952a2161f10b
2021-05-12 14:08:18 -07:00
Shen Li
cf7a0e5af4 Use RPC context streams to cover serde ops (#57926)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57926

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D28316526

Pulled By: mrshenli

fbshipit-source-id: 1907ec8f46e40fa5049d810c6ad959263361b6aa
2021-05-11 07:07:51 -07:00
Ailing Zhang
481806be97 Fix creation_meta for multi view outputs in NoGradMode/InferenceMode. (#57842)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57842

Test Plan: Imported from OSS

Reviewed By: bdhirsh

Differential Revision: D28295649

Pulled By: ailzhang

fbshipit-source-id: e0e11f537a97825e3fb7255aa561d3e855a6d3ce
2021-05-10 12:37:30 -07:00
CodemodService FBSourceClangFormatLinterBot
cbfce376a8 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28319469

fbshipit-source-id: 8295597a8ee16b2fef3f7aacdd6c892cb22db988
2021-05-10 03:39:31 -07:00
Raghavan Raman
259d19a733 [JIT] Adding a concat optimization pass (#55474)
Summary:
This PR adds a new pass in JIT that optimizes `aten::cat` ops.

Specifically, here are optimizations performed:
* Eliminate redundant in `cat` inputs by performing cse on the list of inputs.
   - This includes eliminating fully redundant `cat` ops when all the inputs are the same as well the case when "all but one" of the inputs have already been concatenated.
* Expand `cat` into multiple copies and eliminate redundancies.
   - This also includes eliminating redundancies in the underlying buffers used for `cat`.

These optimizations are not enabled in any compilation flow at this point.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55474

Reviewed By: albanD

Differential Revision: D27624511

Pulled By: navahgar

fbshipit-source-id: d509289fafc23e73b02f64a90219148896817339
2021-05-09 22:06:44 -07:00
Shen Li
fc55290e5b Fix distributed autograd gradients synchronization (#57792)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57792

There are two problems when using CUDA RPC with distributed autograd
and distributed optimizer:

1) In local autograd engine, all autograd functions/nodes, including
AccumualteGrad will use the forward stream for backward computation.
But distributed autograd skips AccumulateGrad autograd function/node
and directly calls into `AccumulateGrad::accumulateGrad`. As the
result, it will use the default stream to accumulate gradients
instead of the forward stream. This commit changes that and uses the
forward stream to accumulate gradients, matching forward behavior.
2) Distributed optimizer and distributed autograd backward are
separate RPC calls, and CUDA streams are not synchronized across
different RPC calls. As a result, distributed optimizer might
consume gradients before they are ready. This commit uses CUDA
events to record the completion of gradient computation, and use
those events to block current streams when getGradients() are called.

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D28274876

Pulled By: mrshenli

fbshipit-source-id: 22e607152324ae918084066cde8c5dbb418bba7c
2021-05-09 17:32:59 -07:00
Martin Yuan
737f48dfc5 Remove _save_data() and _load_data() from mobile (#57879)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57879

_save_data() and _load_data() were designed as a protocol of data serialization of trainer client. As confirmed with kwanmacher and dreiss , they are not used. In addition, there's no plan to use them in Federated Learning flow. Remove them for now.

Test Plan: Imported from OSS

Reviewed By: kwanmacher

Differential Revision: D28306682

Pulled By: iseeyuan

fbshipit-source-id: 1b993ce4d78e372ae9b83bcbe496a196f9269d47
2021-05-08 10:52:44 -07:00
Nikita Shulga
3a66a1cb99 [clang-tidy] Exclude cppcoreguidelines-avoid-magic-numbers (#57841)
Summary:
Add cppcoreguidelines-avoid-magic-numbers exclusion to clang-tidy
Remove existing nolint warnings using following script:
```
for file in `git ls-files | grep -v \.py`; do gsed '/^ *\/\/ NOLINTNEXTLINE(cppcoreguidelines-avoid-magic-numbers)/d' -i  $file; done
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57841

Reviewed By: samestep

Differential Revision: D28295045

Pulled By: malfet

fbshipit-source-id: 7c6e8d1213c9593f169ed3df6a916498f1a97163
2021-05-07 20:02:33 -07:00
Chen Lai
8c04593c0a [PyTorch Edge] Add backport to export old bytecode models (#56802)
Summary:
Add an api to backport a model vn to model vi. It accept an input model (file or buffer) and output a model (file or buffer) with an expected bytecode version.

In this change, the input is a model and it can come from a file or buffer. The output is a model and can be either file path or buffer.

When backport fails, function return false with a warning message :
```
/Users/chenlai/pytorch/cmake-build-debug/bin/test_jit --gtest_filter=LiteInterpreterTest.BackPortByteCodeModelV4:LiteInterpreterTest/*.BackPortByteCodeModelV4:*/LiteInterpreterTest.BackPortByteCodeModelV4/*:*/LiteInterpreterTest/*.BackPortByteCodeModelV4 --gtest_color=no
Testing started at 2:32 PM ...
CUDA not available. Disabling CUDA and MultiCUDA tests

[W backport.cpp:419] Warning: Backport doesn't support backport to version3 (function _backport_for_mobile_impl)
Process finished with exit code 0
```

## Test
1. Run both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py`.
2. Run all prod models with backport api.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56802

ghstack-source-id: 128425510

Test Plan: CI

Reviewed By: raziel, iseeyuan

Differential Revision: D27844651

fbshipit-source-id: 8a803cf6c76433ee0a3049b1a5570585d569f8d6
2021-05-07 18:14:33 -07:00
Luca Wehrstedt
36e47af58b Pass reference to parent future in callbacks (#57635)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57635

Note: this PR looks massive, but it's just one simple change, codemodded many times.

In many cases, a callback needs to access the value/error produced by the parent future. In Python this was easy because the callback was invoked with the parent future as argument, and could thus inspect it. In C++ the callbacks didn't take any arguments, thus in many cases we worked around this by capturing the future in its own callback. This is risky (leads to reference cycle and thus memory leak) and must be done carefully (spoiler: sometimes we weren't).
ghstack-source-id: 128296580

Test Plan: CI

Reviewed By: wanchaol

Differential Revision: D28178783

fbshipit-source-id: 6de02c4568be42123372edc008f630d5ddae0081
2021-05-07 03:59:18 -07:00
Luca Wehrstedt
9aa1461a68 Make wrapPropagateTLSState more generic (#57634)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57634

`wrapPropagateTLSState` was restricting its argument to be an argument-less function, and I need to relax this for later work.

Also, it was requiring its argument to be converted to `std::function`, and also returned a `std::function`. Each creation of a `std::function` could cause a heap allocation. It's not particularly expensive, but here we can easily avoid it by having `wrapPropagateTLSState` directly operate on generic callables (thus, possibly, raw lambdas).
ghstack-source-id: 128295264

Test Plan: CI

Reviewed By: ilia-cher

Differential Revision: D28178782

fbshipit-source-id: d657f5751514974518606dd4fc4175e805dcb90a
2021-05-07 03:58:08 -07:00
Raghavan Raman
1f178de800 [NNC] Add support for computing conv with dynamic shapes (#57514)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57514

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28226918

Pulled By: navahgar

fbshipit-source-id: 818ac8411b809033388d419c8f33db6aeece4b33
2021-05-06 01:08:25 -07:00
Raghavan Raman
95fbc158d4 [NNC] Add a method to compute conv without bias (#57512)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/57512

Test Plan: Imported from OSS

Reviewed By: VitalyFedyunin

Differential Revision: D28226919

Pulled By: navahgar

fbshipit-source-id: e84b944f7fdc84a77409d59218ceaa0862298f3c
2021-05-06 01:07:21 -07:00
albanD
0b51ee311d Add missing return statement from 57057 (#57669)
Summary:
Fixes a bug introduced by https://github.com/pytorch/pytorch/issues/57057

cc ailzhang while writing the tests, I realized that for these functions, we don't properly set the CreationMeta in no grad mode and Inference mode. Added a todo there.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57669

Reviewed By: soulitzer

Differential Revision: D28231005

Pulled By: albanD

fbshipit-source-id: 08a68d23ded87027476914bc87f3a0537f01fc33
2021-05-05 16:13:35 -07:00
Alban Desmaison
15c092b888 Revert "Make grad mode error just a warning (#56401)" (#57640)
Summary:
This reverts commit 63dac82444.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57640

Reviewed By: soulitzer, yuguo68

Differential Revision: D28223946

Pulled By: albanD

fbshipit-source-id: 641b87cff1e2f08162ca8cacae333105e89438f1
2021-05-05 13:07:29 -07:00
Chen Lai
fb9a32b7b4 [PyTorch][Edge] Add api to get bytecode model version (#56801)
Summary:
Add an api `_get_bytecode_version` to get version number given a bytecode model in both cxx and python, and the input can be both from file path and buffer.
## Test
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56801

ghstack-source-id: 128169647

Test Plan:
CI (new added unit test will run as part of `pytorch_core-buck`)

1. run test_lite_interpreter.cpp
2. `python test/mobile/test_bytecode.py`

Reviewed By: iseeyuan

Differential Revision: D27961417

fbshipit-source-id: f786cc9573d855feecff0b4fe8e5363e25f5728c
2021-05-05 09:17:26 -07:00
Mikhail Zolotukhin
dedaf4fad7 Reland: [TensorExpr] Add methods for inspecting generated code in TensorExprKernel. (#57560)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57560

The new methods allow to peak into bufferArgs which describe parameters
that codegen expects. This description includes info whether a given
parameter is a scalar var or a buffer and in case it's a buffer allows
to get the corresponding `Buf*` pointer from which we could get the
expected sizes.

Relanding #57074  which was reverted because I forgot to guard a new
test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28199048

Pulled By: ZolotukhinM

fbshipit-source-id: 636e838e7e242a3c63e97ec453b8fae9b6380231
2021-05-05 09:11:40 -07:00
Mikhail Zolotukhin
e686c66fe7 Reland: [TensorExpr] Add TensorExprKernel::runFast method. (#57552)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57552

This method uses `CodeGen::call_raw` instead of `CodeGen::call`.

Relanding #57328 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28195047

Pulled By: ZolotukhinM

fbshipit-source-id: bcfd3cb5b4f33a149b7549515ffd705e2c4f208f
2021-05-05 09:11:37 -07:00
Mikhail Zolotukhin
0bf69278f7 Reland: [TensorExpr] Add CodeGen::call_raw method. (#57551)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57551

The new method allows to pass input and output arguments by `void*`
pointers instead of CallArgs. That helps to reduce the invocation
overhead. Currently this is only supported in LLVM codegen.

Relanding #55113 (the entire stack) which was reverted because I forgot
to guard a new test with `ifdef LLVM`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28195049

Pulled By: ZolotukhinM

fbshipit-source-id: 035b77ae996dbbcd542b4b0e4c011b41e8d7828b
2021-05-05 09:10:25 -07:00
Mike Ruberry
1461859fde Revert D28048289: [TensorExpr] Add methods for inspecting generated code in TensorExprKernel.
Test Plan: revert-hammer

Differential Revision:
D28048289 (6b2cb939c5)

Original commit changeset: 3867e862a0ec

fbshipit-source-id: bdd45dcc4b229673efeb06da411bbf0c58d44026
2021-05-04 11:29:14 -07:00
Kimish Patel
5326ec60e6 [Inlined Callstack Fix] Fix inlined callstack for blocks of the node. (#56562)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56562

Earlier inlined callstack was annotated only nodes. This left out nodes
such as If which have block of nodes. These nodes should also be updated
similarly.

Test Plan:
Added test in test_misc

Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27902516

fbshipit-source-id: 4e65c686fa6b4977e8719db45f71f7d2599d4d8e
2021-05-04 09:21:15 -07:00
Kimish Patel
bb3c6699a5 [Pytorch Mobile DebugInfo Serialization] Save debug handles for all instructions. (#55252)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55252

Earlier for bytecode serialization we were saving debug handles only for OPs and not all
instructions. This PR makes changes to add that for all instructions.

Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule

Imported from OSS

Reviewed By: dreiss

Differential Revision: D27542502

fbshipit-source-id: cff75118c721ce9f0c2f60d2c9471481f05264ca
2021-05-04 09:21:13 -07:00
Kimish Patel
e0fc473e47 [Pytorch, Mobile] Serialize inlined callstack pointer with debug handle. (#55062)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55062

This diff introduces the following changes:
1. InlinedCallStack pickler/serializer is introduced. It is serialized
as a tuple of {module_instance_info, source range tag, callee:InlinedCallStack}
Module instance info is serialized as tuple of {class_type_name,
instance_name}.
Note that callee of the serialized inlined callstack points to the tuple
of already serialized callstack. This means the first callstack ptr to
serialize, will serialize entire path of the tree, where some callee
nodes might be shared with callstack pointers that will be serialized
subsequently. Pickler supports memoization of pickled objects, where if
a tuple has been serialized then object id is obtained instead of
serialized object again. Thus we stll serialize the tree and not every
path from the root separately. Furthermore, InlinedCallStackSerializer
also uses cache to lookup the pointer and return the serialized IValue.
Furthermore, note that we must also serialize the source range of
InlinedCallStack. In order to this serializer requires map of
source-range-tags-to-source-range map. This was done in the previous
diff, where as part of source range serialization we also generate
unique tags. These are the tags that are serialized in InlinedCallStack.
Thus during deserialization we would have to deserialize source range
before deserializing InlinedCallStacks.
2. Furthermore, each serialized InlinedCallStack is serialized with a
unique debug_handle and source range tag.
BackendDebugHandleManager manages generation of
unique debug handles and saves the map of
debug-handles-to-{source_range_tag, inlined-callstack-ptr}.
This map is then serialized as callstack_debug_map.pkl. Note that
inlined callstack is not sufficient to get all the source information
since it contains source information about the nodes which are inlined.
The top-of-the-stack (or bottom) node, which is the actual op node, is
not part of the inlined callstack pointer and thus the source range of
this node is serialized separately using source_range_tag. This is
similar to how JIT creates callstack in
torch/csrc/jit/runtime/interpreter.cpp

Unique debug handles facilitates exception throwing or profiling using
just the debug handle without any further qualifications, such as which
function or module the inlined-callstack belongs to.

Furthermore, this diff refactors the old mobile code for tracking
module hierarchy information per op. Mainly now bytecode serialization
will serialize debug handles corresponding to ops/nodes in graph and
have callstack_debug_map.pkl help generate:
1. Entire callstack and
2. Module hierarchy information.

Test Plan:
python test/mobile/test_lite_script_module.py TestLiteScriptModule
./build/bin/test_jit --gtest_filter=*ModuleInfo

Imported from OSS

Reviewed By: raziel

Differential Revision: D27468709

fbshipit-source-id: 53e2413e7703ead01c77718b7c333c7c6ff50a23
2021-05-04 09:21:12 -07:00
Mikhail Zolotukhin
6b2cb939c5 [TensorExpr] Add methods for inspecting generated code in TensorExprKernel. (#57074)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57074

The new methods allow to peak into bufferArgs which describe parameters
that codegen expects. This description includes info whether a given
parameter is a scalar var or a buffer and in case it's a buffer allows
to get the corresponding `Buf*` pointer from which we could get the
expected sizes.

Differential Revision: D28048289

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: 3867e862a0ec3593906820826c2344bd8a8f5c0a
2021-05-03 20:02:28 -07:00
Chen Lai
ac71432c54 [PyTorch][Edge] Add api to get bytecode version from runtime (#56948)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56948

Add api to get runtime bytecode version

## Test
Both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py` pass
ghstack-source-id: 127939889

Test Plan: Both `caffe2/test/cpp/jit/test_lite_interpreter.cpp` and `caffe2/test/mobile/test_bytecode.py` pass

Reviewed By: raziel, iseeyuan

Differential Revision: D27987811

fbshipit-source-id: 35ed9bd626aecffc226f6dacfa046e6cdabfed51
2021-05-03 11:26:38 -07:00
Ailing Zhang
0ecdbfebff s/InplaceOrView/ADInplaceOrView/g (#57372)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57372

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57324

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28121821

Pulled By: ailzhang

fbshipit-source-id: f568dd2505f6279da9ffb93ce1d22e0f98c606bb
2021-05-01 22:56:18 -07:00
Mike Ruberry
05b255c543 Revert D27487549: [TensorExpr] Add CodeGen::call_raw method.
Test Plan: revert-hammer

Differential Revision:
D27487549 (c9ab384af7)

Original commit changeset: d8f3d92262cd

fbshipit-source-id: ea8e71dbe2d632bc0fb557362c8bd899eb6aa83a
2021-05-01 19:48:07 -07:00
Hui Guo
afe6b4c8ee [NNC] Add logical Operators '&&' and '||' (#56947)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56947

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D28007342

Pulled By: huiguoo

fbshipit-source-id: a2ad8d2e99d7c8d8c8bdcd8f65fa3f340bdd2bbc
2021-05-01 18:44:27 -07:00
Mike Ruberry
3018093066 Revert D28110359: [TensorExpr] Add TensorExprKernel::runFast method.
Test Plan: revert-hammer

Differential Revision:
D28110359 (f219ed6627)

Original commit changeset: 4fdffc8196d2

fbshipit-source-id: 3c93a058b5dd7a3b71e399341a408ec74949ef56
2021-05-01 16:16:37 -07:00
Luca Wehrstedt
0422e67336 Use Devices instead of DeviceIndexes in TensorPipe agent (#57294)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57294

With the advent of CPUs in the device maps, and to be more generic (e.g., to support AMD GPUs), and to avoid conversions when passing to Future and RRef and such, it's easier to use Devices instead of DeviceIndices. This started by just migrating the TensorPipe agent but the RPC layer is quite intertwined so I had to migrate a lot of stuff.
ghstack-source-id: 127916562

Test Plan: CI

Reviewed By: mrshenli

Differential Revision: D28092733

fbshipit-source-id: 024dcb3648c5898ab13e770413c43958f04f1a8a
2021-05-01 16:12:55 -07:00
Jiakai Liu
3c4d57c18b [pytorch][nnc] update external functions for mobile build (#56850)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56850

This is part of the changes to enable NNC AOT compilation for mobile.
The generated kernels need to call these external functions thus change the declarations to use C linkage when building the mobile runtime.

Added nnc_aten_addmm external function.

ghstack-source-id: 127877411

Test Plan:
- build & CI;
- tested mobile build with stacked PRs;

Reviewed By: ZolotukhinM

Differential Revision: D27897154

fbshipit-source-id: 61d5499d7781a83bd2657859659fd1b5043d6b04
2021-04-30 19:07:19 -07:00
Mikhail Zolotukhin
f219ed6627 [TensorExpr] Add TensorExprKernel::runFast method. (#57328)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57328

This method uses `CodeGen::call_raw` instead of `CodeGen::call`.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D28110359

Pulled By: ZolotukhinM

fbshipit-source-id: 4fdffc8196d24fc3300a9b4bc69f67562042a045
2021-04-30 15:26:18 -07:00
Mikhail Zolotukhin
c9ab384af7 [TensorExpr] Add CodeGen::call_raw method. (#55113)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55113

The new method allows to pass input and output arguments by `void*`
pointers instead of CallArgs. That helps to reduce the invocation
overhead. Currently this is only supported in LLVM codegen.

Differential Revision: D27487549

Test Plan: Imported from OSS

Reviewed By: bertmaher

Pulled By: ZolotukhinM

fbshipit-source-id: d8f3d92262cde1c155beefb629454370d9af2f89
2021-04-30 15:24:37 -07:00
Scott Wolchok
b87d3fa432 [PyTorch][jit] Don't allow create() on singleton types (#56807)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56807

If I understand correctly, there's no reason to create your own instance of these global singleton types.
ghstack-source-id: 127312270

Test Plan: CI

Reviewed By: SplitInfinity

Differential Revision: D27973447

fbshipit-source-id: f12df69d185f1baaa45f2ac6eac70570a7a65912
2021-04-30 10:28:50 -07:00
Raghavan Raman
e795f88d6b [NNC] Make flatten transform in-place (#56629)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/56157

This PR updates the `flatten` API in `LoopNest` to perform the flattening transformation in-place. After this transformation, the first loop in the input becomes the flattened loop.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56629

Reviewed By: H-Huang

Differential Revision: D28004787

Pulled By: navahgar

fbshipit-source-id: 7474ae237fae3fff0cd1c64a276a8831dc5b7db0
2021-04-30 09:51:45 -07:00
CodemodService FBSourceClangFormatLinterBot
e903e16d40 [AutoAccept][Codemod][FBSourceClangFormatLinter] Daily arc lint --take CLANGFORMAT
Reviewed By: zertosh

Differential Revision: D28088724

fbshipit-source-id: 3a350580427b92719a3c300bec310aea78375996
2021-04-29 04:12:25 -07:00
Nikita Shulga
4cb534f92e Make PyTorch code-base clang-tidy compliant (#56892)
Summary:
This is an automatic change generated by the following script:
```
#!/usr/bin/env python3
from subprocess import check_output, check_call
import os

def get_compiled_files_list():
    import json
    with open("build/compile_commands.json") as f:
        data = json.load(f)
    files = [os.path.relpath(node['file']) for node in data]
    for idx, fname in enumerate(files):
        if fname.startswith('build/') and fname.endswith('.DEFAULT.cpp'):
            files[idx] = fname[len('build/'):-len('.DEFAULT.cpp')]
    return files

def run_clang_tidy(fname):
    check_call(["python3", "tools/clang_tidy.py", "-c", "build", "-x", fname,"-s"])
    changes = check_output(["git", "ls-files", "-m"])
    if len(changes) == 0:
        return
    check_call(["git", "commit","--all", "-m", f"NOLINT stubs for {fname}"])

def main():
    git_files = check_output(["git", "ls-files"]).decode("ascii").split("\n")
    compiled_files = get_compiled_files_list()
    for idx, fname in enumerate(git_files):
        if fname not in compiled_files:
            continue
        if fname.startswith("caffe2/contrib/aten/"):
            continue
        print(f"[{idx}/{len(git_files)}] Processing {fname}")
        run_clang_tidy(fname)

if __name__ == "__main__":
    main()
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56892

Reviewed By: H-Huang

Differential Revision: D27991944

Pulled By: malfet

fbshipit-source-id: 5415e1eb2c1b34319a4f03024bfaa087007d7179
2021-04-28 14:10:25 -07:00
Chen Lai
c91ea7d488 [PyTorch][Edge] Add binarires for unittests (#57039)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/57039

## Summary
Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002)

## Test plan
CI

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D28047615

Pulled By: cccclai

fbshipit-source-id: 47f7df3094dadb7e013ed57bc713cc8b3d1c8ce0
2021-04-27 20:46:34 -07:00
Nikita Shulga
a93ceb333d Workaround intermittent gcc-7.5 ICE in cpp tests (#57016)
Summary:
gcc-7.5 optimizer can hit internal compiler error if both `-fopenmp` and
`-faligned-new` are passed:
```
/var/lib/jenkins/workspace/test/cpp/api/transformer.cpp: In function 'void transformer_decoder_test_helper(bool)':
/var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:609:6: internal compiler error: in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429
 void transformer_decoder_test_helper(bool is_cuda) {
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Fixes https://github.com/pytorch/pytorch/issues/40941

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57016

Reviewed By: walterddr

Differential Revision: D28027670

Pulled By: malfet

fbshipit-source-id: 834e34b95e09bcae39ada25e02749f479a7e9013
2021-04-27 09:21:23 -07:00
Mikhail Zolotukhin
f3743f097f [TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27977461

Pulled By: ZolotukhinM

fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f
2021-04-26 01:51:21 -07:00
Tugsbayasgalan Manlaibaatar
2041cd6707 Enable forward/backward compatibility in TS mobile (#56079)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D27828149

Pulled By: tugsbayasgalan

fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da
2021-04-23 16:55:18 -07:00