Commit graph

1340 commits

Author SHA1 Message Date
Mikhail Zolotukhin
f3743f097f [TensorExpr] Nuke tensorexpr::ScalarType and instead use c10::ScalarType directly. (#56825)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56825

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27977461

Pulled By: ZolotukhinM

fbshipit-source-id: f8a72938ba395e426e2d9449627113abb1c9c34f
2021-04-26 01:51:21 -07:00
Tugsbayasgalan Manlaibaatar
2041cd6707 Enable forward/backward compatibility in TS mobile (#56079)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56079

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D27828149

Pulled By: tugsbayasgalan

fbshipit-source-id: 9291ddbf01853354fca0fa0a58b8115d5d2294da
2021-04-23 16:55:18 -07:00
Rong Rong (AI Infra)
5288d05cfd Revert D27958477: [PyTorch][Edge] Add v4 and v5 models and remove unused model
Test Plan: revert-hammer

Differential Revision:
D27958477 (2e4c68a727)

Original commit changeset: 2e6f985a988d

fbshipit-source-id: 520cb8a353d91cd26cb27880a0a8e27dbfcd2d99
2021-04-23 14:42:01 -07:00
Raghavan Raman
5b7317b562 [NNC] API for Buffer Compression (#55853)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54338

This PR adds the following API in NNC to implement "buffer compression".

```
static void compressBuffer(Buf* buf, Stmt* stmt);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55853

Reviewed By: ezyang

Differential Revision: D27960986

Pulled By: navahgar

fbshipit-source-id: a69988e607196f3e2db0212313ea5deefb9859ac
2021-04-23 14:12:03 -07:00
Chen Lai
2e4c68a727 [PyTorch][Edge] Add v4 and v5 models and remove unused model (#56751)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56751

## Summary
1. Add two models (v4 and v5) for testing runtime. (v5 will be introduced in https://github.com/pytorch/pytorch/pull/56002)
2. Remove an unused model.

Side note: these binaries are part of the test in https://github.com/pytorch/pytorch/pull/56002, and currently there is an ongoing issue to `ghexport` with binaries (post is https://fb.workplace.com/groups/533197713799375/permalink/1130109004108240/). `ghimport` can work with binary after checking temporary diff (D23336574).

Test Plan: Imported from OSS

Reviewed By: iseeyuan

Differential Revision: D27958477

Pulled By: cccclai

fbshipit-source-id: 2e6f985a988da55ad08fb9a5037434a2b6db0776
2021-04-23 11:52:42 -07:00
Raghavan Raman
d43d6593cd [NNC] Handling conditionals in reorderAxis (#56063)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/53093

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56063

Reviewed By: huiguoo

Differential Revision: D27894772

Pulled By: navahgar

fbshipit-source-id: 403b65f20567c27eab73faf670087cfab9885f84
2021-04-21 09:35:17 -07:00
Bert Maher
c91c4a081d [NNC] Horizontally fuse all loops (#56324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56324

Inlining is great if LLVM's CSE kicks in; but if a kernel has multiple outputs
(and thus multiple loops), CSE has no chance.

So, this pass "horizontally" fuses the output loops together so that CSE can go
to town. Essentially we want to turn
```
for (...) {
  output_1[] = some_complicated_expr...
}
for (...) {
  output_2[] = some_complicated_expr...
}
```

Into:
```
for (...) {
  output_1[] = complicated_expr
  output_2[] = complicated_expr. // llvm cse should take care of this
}
```

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27841194

Pulled By: bertmaher

fbshipit-source-id: 54153bb59786be87183c636d64f05963c4b1624a
2021-04-20 23:54:40 -07:00
Raghavan Raman
13ac0019ae [NNC] Update loop-carried dependence check to handle all known dependences (#56354)
Summary:
This PR includes:
 * Update to the loop-carried dependence check API to correctly ignore loop-independent dependences and handle all kinds of loop-carried dependences like RAW, WAR and WAW.
 * Fix for the overlap API to look only for conflicting buffer accesses where at least one of them is a Store.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56354

Reviewed By: bertmaher

Differential Revision: D27856202

Pulled By: navahgar

fbshipit-source-id: 206e4ec771fe0f7f2ccf4b11b29e35df7b9b18bc
2021-04-20 17:12:51 -07:00
Ailing Zhang
1d8053655d Rename AutoNonVariableTypeMode to AutoDispatchBelowAutograd and add a warning. (#56422)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56422

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27866608

Pulled By: ailzhang

fbshipit-source-id: 507bbcaa4c25edf23e67162780efaa70f64ad14a
2021-04-20 17:04:08 -07:00
Lucas Hosseini
8868f9c8e3 [TensorPipe] Use targetDevice in tensorpipe_agent. (#56346)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56346

Now that TensorPipe's API has `targetDevice`, use that instead of
manually writing the CUDA device index in `metadata`.

Test Plan: CI

Reviewed By: lw

Differential Revision: D27703235

fbshipit-source-id: c5b620e3b3ce619367412efdbe9fa3778f6b8869
2021-04-20 11:54:13 -07:00
davidriazati@fb.com
4e0760f41a Remove is_variable from tests (#56305)
Summary:
`is_variable` spits out a deprecation warning during the build (if it's
still something that needs to be tested we can ignore deprecated
warnings for the whole test instead of this change).

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56305

Pulled By: driazati

Reviewed By: ezyang

Differential Revision: D27834218

fbshipit-source-id: c7bbea7e9d8099bac232a3a732a27e4cd7c7b950
2021-04-20 09:03:53 -07:00
Alban Desmaison
63dac82444 Make grad mode error just a warning (#56401)
Summary:
Temporary fix to give people extra time to finish the deprecation.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56401

Reviewed By: xw285cornell, drdarshan

Differential Revision: D27862196

Pulled By: albanD

fbshipit-source-id: ed460267f314a136941ba550b904dee0321eb0c6
2021-04-20 06:30:55 -07:00
Raghavan Raman
0d94c04247 [NNC] Change fuseLoops API to return bool flag and not throw any exceptions (#56353)
Summary:
Partial fix for https://github.com/pytorch/pytorch/issues/56357

Changes the `fuseLoops` API to the following form:
```
static bool fuseLoops(const std::vector<For*>& loops, For** fused);
```

Also, adds a new API to check for loop-carried dependences:
```
static bool hasLoopCarriedDependence(For* loop);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56353

Reviewed By: bertmaher

Differential Revision: D27856214

Pulled By: navahgar

fbshipit-source-id: 443557088692585657faee296602c547a00117dd
2021-04-19 17:20:40 -07:00
Ailing Zhang
98162cb0bb Enable AutoGradMode in InferenceMode. (#56107)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56107

Test Plan: Imported from OSS

Reviewed By: pbelevich, driazati

Differential Revision: D27807137

Pulled By: ailzhang

fbshipit-source-id: bfacf11ec5a431589cec73d6371cac81b425a115
2021-04-19 10:24:20 -07:00
Raghavan Raman
b387f7ca47 [NNC] Make normalization transformation in-place (#56158)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/56157

This PR changes `normalize` API in `LoopNest` to transform the given `For` statement and not create a new one.

New API:

```
static bool normalize(For* f);
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56158

Reviewed By: agolynski

Differential Revision: D27798361

Pulled By: navahgar

fbshipit-source-id: 57626a5a367bdf94a0efbd9dc8538f5e4e410d6b
2021-04-18 23:54:13 -07:00
Raghavan Raman
29c5cb797d [NNC] Fuse loops that have the same bounds as expressions (#55997)
Summary:
This PR allows fusing loops whose bounds are specified as expressions that are equal.

For example:
```
   for (int j = 0; j < M + N; j++) {
     A[j] = 10 * j;
   }
   for (int k = 0; k < M + N; k++) {
     B[k] = 20 * k;
   }
```
`fuseLoops(j, k)` is possible since the stop bounds of the two loops are equal though they are different `Expr*` and will result in:
```
   for (int j = 0; j < M + N; j++) {
     A[j] = 10 * j;
     B[j] = 20 * j;
   }
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55997

Reviewed By: bertmaher

Differential Revision: D27841270

Pulled By: navahgar

fbshipit-source-id: a64e4503b7f8f28bc0c9823225bc923177bb4c2e
2021-04-18 11:14:26 -07:00
Mikhail Zolotukhin
85126629a5 [TensorExpr] Add support for constant tensors in tensorexpr kernel. (#56319)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56319

With this change the TorchScript graph can have constant tensors in it
and we still will be able to lower it to TE. The constants are
registered (or bound) within the `TensorExprKernel` object and when the
codegen is called, they are passed along with usual inputs and outputs.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27838747

Pulled By: ZolotukhinM

fbshipit-source-id: 4a519d66fcc07fe5fa53f5cf9af28d25611f8437
2021-04-17 11:15:35 -07:00
Bert Maher
8e82e932f3 Reland: D27652485: [nnc] Enable CPU fusion only when num_threads == 1" (#56120)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56120

This reverts commit ad17fadbfc (D27786457).

The big annoyance here is that depending on the threading mode you may not be
able to toggle num_threads at will, so the fusion tests won't fail.

I hate this solution, but I'm adding a secondary override for the TE fuser.
Now you need to both turn on fusion (_jit_override_can_fuse_on_cpu), and you're
OK if you're running with 1 thread, or you can add
`_jit_set_texpr_parallel_cpu_enabled` to enable it anyways.

This is (a) mainly for tests, since a real user probably won't fiddle aimlessly
with the thread count, and (b) will go away once NNC's threading support is
fully baked.

Test Plan: Imported from OSS

Reviewed By: Krovatkin

Differential Revision: D27788199

Pulled By: bertmaher

fbshipit-source-id: 070d04474f15e9689dbdf8cc1fde43050c6506b1
2021-04-15 15:50:18 -07:00
Lucas Hosseini
3802e577fb [TensorPipe] Use Descriptor::Tensor::sourceDevice in tensorpipe_agent. (#55821)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55821

Test Plan: CI

Reviewed By: lw

Differential Revision: D27661608

fbshipit-source-id: fd241f073d8928528a749758c7d0f570dfeb677b
2021-04-15 03:21:26 -07:00
Lucas Hosseini
047164437e [TensorPipe] Prepare for new Pipe API. (#55820)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55820

Test Plan: CI

Reviewed By: lw

Differential Revision: D27648291

fbshipit-source-id: e08db6e8c1f5f333ec355de29e25fbe552904b25
2021-04-15 03:20:32 -07:00
Mikhail Zolotukhin
556dfcb0db [TensorExpr] Re-enable "LoopNest.VectorizeUse" test. (#56094)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56094

Now FunctionCalls are merged with Loads and vectorization for
intermediate values automatically started to work.

Fixes #53553.

Test Plan: Imported from OSS

Reviewed By: bertmaher

Differential Revision: D27781519

Pulled By: ZolotukhinM

fbshipit-source-id: 1ed68ca2399e9bd4598639bd6dd8f369365f0ef0
2021-04-14 21:39:03 -07:00
Natalia Gimelshein
ad17fadbfc Revert D27652485: [nnc] Enable CPU fusion only when num_threads == 1
Test Plan: revert-hammer

Differential Revision:
D27652485 (e7e164f9e6)

Original commit changeset: 182580cf758d

fbshipit-source-id: e3c95b06d1eef668095f3cf461485395179d94af
2021-04-14 20:23:15 -07:00
Kurt Mohler
3fe4718d16 Add padding_idx argument to EmbeddingBag (#49237)
Summary:
This PR adds a `padding_idx` parameter to `nn.EmbeddingBag` and `nn.functional.embedding_bag`. As with `nn.Embedding`'s `padding_idx` argument, if an embedding's index is equal to `padding_idx` it is ignored, so it is not included in the reduction.

This PR does not add support for `padding_idx` for quantized or ONNX `EmbeddingBag` for opset10/11 (opset9 is supported). In these cases, an error is thrown if `padding_idx` is provided.

Fixes https://github.com/pytorch/pytorch/issues/3194

Pull Request resolved: https://github.com/pytorch/pytorch/pull/49237

Reviewed By: walterddr, VitalyFedyunin

Differential Revision: D26948258

Pulled By: jbschlosser

fbshipit-source-id: 3ca672f7e768941f3261ab405fc7597c97ce3dfc
2021-04-14 09:38:01 -07:00
Bert Maher
e7e164f9e6 [nnc] Enable CPU fusion only when num_threads == 1 (#55621)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55621

Fuser support for thread-level parallelism is a work in progress, so
only fuse when the program is running single-threaded.
ghstack-source-id: 126069259

Test Plan: observe fusion groups formed when torch.get_num_threads==1 vs not

Reviewed By: ZolotukhinM

Differential Revision: D27652485

fbshipit-source-id: 182580cf758d99dd499cc4591eb9d080884aa7ef
2021-04-14 09:16:54 -07:00
Mikhail Zolotukhin
7ab654afd7 [TensorExpr] Rename Tensor::call to Tensor::load to be consistent with Buf and Placeholder. (#55826)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55826

It's a mechanical change.

Differential Revision: D27717777

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: fbc1bb99602250c706cf2c8c2684119c323e4d51
2021-04-13 12:08:53 -07:00
Mikhail Zolotukhin
1263448cb2 [TensorExpr] Remove mask field from Load and Store classes. (#55825)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55825

The mask has never been used (in vectorization we generate an explicit
`IfThenElse` construct when we need to mask out some elements). The PR
removes it and cleans up all its traces from tests.

Differential Revision: D27717776

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 41d1feeea4322da75b3999d661801c2a7f82b9db
2021-04-13 12:08:51 -07:00
Mikhail Zolotukhin
b01a15d3d3 [TensorExpr] Redesign Rfactor loopnest transformation. (#55324)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55324

With this change `rfactor` only affects the passed loop and its body
never touching anything outside (that was a rootcause of a bug with the
previous implementation). Also, we don't have an `insertion_point`
parameter anymore - its meaning was vague, and the effect of it
should've been achievable with other transformations anyway.

The new `rfactor` semantics is as follows:

```
Requirements:
 * S is the reduction store
 * S is the only statement in the innermost loop
 * There is at least two reduction arguments in S
 * OUTER_REDUCTION_FOR loop corresponds to the outermost reduction variable
 used in the store and all other reduction variables are index variables of
 children loops of OUTER_REDUCTION_FOR
 * OUTER_REDUCTION_FOR is a perfect loop nest, i.e. it has only loops
 corresponding to the other reduction variables and the store, nested into
 each other

What it does:
  * Introduce a new buffer with an extra dimension of a size equal to the
  span of the loop OUTER_REDUCTION_FOR (the new buffer is returned via
  RFAC_BUF_PTR)
  * Insert an initialization store for the new buffer in
  OUTER_REDUCTION_FOR before its nested loop
  * Replace the reduction store to the original buffer with the reduction
  store to the temp buffer, removing the index var of OUTER_REDUCTION_FOR
  from reduction arguments
  * Insert a final reduction store over the extra dimension of the new
  buffer to the original buffer
  * Returns TRUE if the transformation succeeded and FALSE otherwise

Example:
Original IR:
S1: for i        # normal axis
S2:   X[i] = 0
S3:   for j      # reduction axis
S4:     for k    # reduction axis
S5:       X[i] = ReduceOp(X[i] + Y[i,j,k], reduce_axis={j,k})

After RFACTOR(S5, S3)
S1: for i               # normal axis
S2:   X[i] = 0
S3:   for j             # reduction axis for X, normal axis for X_rfac
        X_rfac[i,j] = 0
S4:     for k           # reduction axis
          X_rfac[i,j] = ReduceOp(X_rfac[i,j] + Y[i,j,k], reduce_axis={k})
        X[i] = ReduceOp(X[i] + X_rfac[i,j], reduce_axis={j})
```

Differential Revision: D27694960

Test Plan: Imported from OSS

Reviewed By: navahgar

Pulled By: ZolotukhinM

fbshipit-source-id: 076fa6a1df2c23f5948302aa6b43e82cb222901c
2021-04-13 12:08:48 -07:00
Raghavan Raman
d805908c34 [NNC] API to reorder multiple loops (#55568)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/52690

This PR adds the following APIs:

```
static bool areLoopsPerfectlyNested(const std::vector<For*>& loops);

static std::vector<For*> reorder(
      const std::vector<For*>& loops,
      const std::vector<size_t>& permutation);
```

The first API checks if the given list of loops are perfectly nested. The second API reorders the given list of loops according to the permutation specified.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55568

Reviewed By: albanD

Differential Revision: D27689734

Pulled By: navahgar

fbshipit-source-id: dc1bffdbee068c3f401188035772b41847cbc7c6
2021-04-12 18:12:24 -07:00
Yukio Siraichi
93bf0ae6fc Remove legacy constructor calls from pytorch codebase. (#54142)
Summary:
Follow up from https://github.com/pytorch/pytorch/issues/53889
Related to https://github.com/pytorch/pytorch/issues/47112

Removing every occurrence of the legacy constructor call present in PyTorch at:
- _docs_
- _benchmarks_
- _test_
- _caffe2_
- _CONTRIBUTING.md_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54142

Reviewed By: ngimel

Differential Revision: D27699450

Pulled By: mruberry

fbshipit-source-id: 530aa3f5746cc8bc1407d5d51b2bbd8075e30546
2021-04-11 15:45:17 -07:00
Ailing Zhang
6842da6251 [WIP]Relax some limitations of InferenceMode. (#54403)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54403

A few important points about InferenceMode behavior:
1. All tensors created in InferenceMode are inference tensors except for view ops.
   - view ops produce output has the same is_inference_tensor property as their input.
     Namely view of normal tensor inside InferenceMode produce a normal tensor, which is
     exactly the same as creating a view inside NoGradMode. And view of
     inference tensor outside InferenceMode produce inference tensor as output.
2. All ops are allowed inside InferenceMode, faster than normal mode.
3. Inference tensor cannot be saved for backward.

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27316483

Pulled By: ailzhang

fbshipit-source-id: e03248a66d42e2d43cfe7ccb61e49cc4afb2923b
2021-04-09 14:40:37 -07:00
Joel Schlosser
defc649eca Update to short forms of splitWithTail / splitWithMask (#55542)
Summary:
Switched to short forms of `splitWithTail` / `splitWithMask` for all tests in `test/cpp/tensorexpr/test_*.cpp` (except test_loopnest.cpp)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55542

Reviewed By: mrshenli

Differential Revision: D27632033

Pulled By: jbschlosser

fbshipit-source-id: dc2ba134f99bff8951ae61e564cd1daea92c41df
2021-04-09 10:15:20 -07:00
Bert Maher
90f848572c NNC depthwise conv2d implementation (#54920)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54920

Add a depthwise convolution implementation and reasonably good
schedules for 3x3 stride=1,2.
ghstack-source-id: 126076113

Test Plan: new tensorexpr test: Conv.DepthwiseConv2D

Reviewed By: ZolotukhinM

Differential Revision: D27413745

fbshipit-source-id: 833da6072b655fbe2b679704e9d56a08e1bf7e7e
2021-04-08 21:56:53 -07:00
Jeffrey Wan
3f9492c8b3 [Hackathon] Modernize API used in NNC C++ tests (1/3) (#55512)
Summary:
Partially fixes https://github.com/pytorch/pytorch/issues/55203

Fixes issues (1) and (2) in the following tests:
tests in test/cpp/tensorexpr/test_loopnest.cpp from the beginning to LoopNestReorderLongStringFull (including)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55512

Reviewed By: mrshenli

Differential Revision: D27630679

Pulled By: soulitzer

fbshipit-source-id: b581aaea4f5f54b3285f0348aa76e99779418f80
2021-04-08 08:34:25 -07:00
Brian Hirsh
dd2bccafc5 nnc hackathon - use new APIs in tests (#55497)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55497

Migrating some of the NNC API's used in testing, from this issue: https://github.com/pytorch/pytorch/issues/55203

I covered the second half of `test_loopnest.cpp`, and migrated (1) and (2) in the above issue: `LoopNest::getLoopStmtsFor`, `splitWithTail`, and `splitWithMask`

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27628625

Pulled By: bdhirsh

fbshipit-source-id: ec15efba45fae0bbb442ac3577fb9ca2f8023c2d
2021-04-07 13:03:25 -07:00
Martin Yuan
3551bd31be [PyTorch] Lite interpreter with a backend delegate (#54462)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54462

Unclean files during sync - Sat Mar 20 04:00:02 PDT 2021

Unclean files during sync - Sun Mar 21 04:00:01 PDT 2021
ghstack-source-id: 124585992

Test Plan:
```
buck run xplat/caffe2/fb/test/delegate:interpreter_test -- --model_file_path=/path/to/mobile_model.ptl
```

Reviewed By: raziel

Differential Revision: D27232309

fbshipit-source-id: 8504a3185339d73bfa6e924485c4745acf269cec
2021-04-06 00:55:26 -07:00
Nikitha Malgi
197f9f0826 Merge CUDA Streams and Events (#53902)
Summary:
-----------
- Updates current_stream and default stream API's to take `optional[device]` argument
- Adds parsing logic to replace `torch.cuda.Stream` and `torch.cuda.Event` -> `torch.classes.cuda.Stream` and `torch.classes.cuda.Event` for JIT
- Merges StreamContext manager for both Eager and JIT.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53902

Test Plan:
------
Run JIT tests:
python test/test_jit.py -v TestCUDA

Run eager tests:
python test/test_cuda.py -v TestCuda

Reviewed By: glaringlee

Differential Revision: D27494627

Pulled By: nikithamalgifb

fbshipit-source-id: b30b0570e38a33fb335c83762eb06ffd46a44b5c
2021-04-05 08:19:55 -07:00
Louis Feng
159fdde9ae Support needsOutputs for RecordFunction and ObserverUtil improvements (#55012)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55012

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442

Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent.

To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels.

For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled.

Test Plan:
```
=> buck build //caffe2/test/cpp/jit: --show-output
=> buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest*
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = RecordFunctionTest*-*_CUDA:*_MultiCUDA
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from RecordFunctionTest
[ RUN      ] RecordFunctionTest.TracedTestInputsOutputs
[       OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms)
[ RUN      ] RecordFunctionTest.SampledCallbacks
[       OK ] RecordFunctionTest.SampledCallbacks (771 ms)
[ RUN      ] RecordFunctionTest.RecordFunctionGuard
[       OK ] RecordFunctionTest.RecordFunctionGuard (0 ms)
[ RUN      ] RecordFunctionTest.Callbacks
[       OK ] RecordFunctionTest.Callbacks (2 ms)
[ RUN      ] RecordFunctionTest.ShouldRun
[       OK ] RecordFunctionTest.ShouldRun (0 ms)
[ RUN      ] RecordFunctionTest.Basic
[       OK ] RecordFunctionTest.Basic (1 ms)
[ RUN      ] RecordFunctionTest.OperatorNameOverload
[       OK ] RecordFunctionTest.OperatorNameOverload (1 ms)
[----------] 7 tests from RecordFunctionTest (1001 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (1002 ms total)
[  PASSED  ] 7 tests.

```

Reviewed By: ilia-cher

Differential Revision: D27449877

fbshipit-source-id: 69918b729565f5899471d9db42a587f9af52238d
2021-04-02 15:16:17 -07:00
Maxim Grechkin
38a08a49ea Flip clip_grad_norm default for error_if_nonfinite to false (#55169)
Summary:
Non-backwards-compatible change introduced in https://github.com/pytorch/pytorch/pull/53843 is tripping up a lot of code. Better to set it to False initially and then potentially flip to True in the later version to give people time to adapt.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55169

Reviewed By: mruberry

Differential Revision: D27511150

Pulled By: jbschlosser

fbshipit-source-id: 1ac018557c0900b31995c29f04aea060a27bc525
2021-04-02 12:25:32 -07:00
Lucas Hosseini
09f1f14569 Transition to new tensorpipe::Pipe API. (#55193)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55193

Test Plan: CI

Reviewed By: lw

Differential Revision: D27466387

fbshipit-source-id: 07b831d699f56874dd45f37e448b8c4244ead5e3
2021-04-02 02:28:07 -07:00
Mikhail Zolotukhin
0b75f862c7 [TensorExpr] Nuke FunctionCall. (#54998)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54998

The only reason why we couldn't use Load instead of FunctionCall was
DepTracker. Now this is gone and we finally could replace FunctionCall
with Load.

Test Plan: Imported from OSS

Reviewed By: bertmaher, pbelevich

Differential Revision: D27446412

Pulled By: ZolotukhinM

fbshipit-source-id: 9183ae5541c2618abc9026b1dc4c4c9fab085d47
2021-04-01 19:47:59 -07:00
Mikhail Zolotukhin
688e350725 [TensorExpr] Nuke DepTracker and findAllNeededTensors. (#54997)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54997

DepTracker was used to automatically pull in dependent computations from
output ones. While it seems quite convenient, it's led to several
architectural issues, which are fixed in this stack.

DepTracker worked on Tensors, which is a pair of Buf and Stmt. However,
Stmt could become stale and there was no way to reliably update the
corresponding tensor. We're now using Bufs and Stmts directly and moving
away from using Tensors to avoid these problems.

Removing DepTracker allowed to unify Loads and FunctionCalls, which
essentially were duplicates of each other.

Test Plan: Imported from OSS

Reviewed By: navahgar

Differential Revision: D27446414

Pulled By: ZolotukhinM

fbshipit-source-id: a2a32749d5b28beed92a601da33d126c0a2cf399
2021-04-01 19:46:26 -07:00
Lucas Hosseini
9d6a81d1a6 Avoid aggregate initialization for tensorpipe::{Cpu,Cuda}Buffer and tensorpipe::Message::Tensor. (#55136)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55136

This will ease the transition to the new API where `Buffer` does not
store a length anymore.

Test Plan: CI

Reviewed By: lw

Differential Revision: D27466385

fbshipit-source-id: 9a167f8c501455a3ab49ce75257c69d8b4869925
2021-04-01 06:55:02 -07:00
Ailing Zhang
43d4f3b8d0 Implement public API InferenceMode and its error handling (#55008)
Summary:
https://www.internalfb.com/phabricator/paste/view/P360377337Pull Request resolved: https://github.com/pytorch/pytorch/pull/53343

For easier review, here's a diff between the version before revert. https://www.internalfb.com/phabricator/paste/view/P360750919

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55008

Test Plan: Imported from OSS

Pulled By: ailzhang

Reviewed By: bhosmer

Differential Revision: D27443229

fbshipit-source-id: 01b03446a1f6373f43dd5c7170d26226b50f363c
2021-03-31 10:48:00 -07:00
Jianyu Huang
7fc03dd7c9 Back out "[pytorch][PR] Merge CUDA Streams and Events" (#54996)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54996

Original commit changeset: 45d9fee9a582

Test Plan: CI

Reviewed By: jspark1105

Differential Revision: D27444718

fbshipit-source-id: deb627230817923eaf84ade50ecb14bfbce4e779
2021-03-31 10:21:35 -07:00
Jacob Szwejbka
a0ae3e520f [Pytorch Mobile] 'fix' filter of named parameters for FL (#54633)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54633

Theres currently no information that could be used to determine what is a parameter during the loading of a mobile module. This prevents named parameters from functioning correctly. This change is a temporary hack to help out federated learning the sole user of this api currently.
ghstack-source-id: 124885201

Test Plan: todo

Reviewed By: dhruvbird

Differential Revision: D27308738

fbshipit-source-id: 0af5d1e8381ab7b7a43b20560941aa070a02e7b8
2021-03-31 09:21:35 -07:00
Qi Zhao
5b448cf21a Revert D25966661: Support needsOutputs for RecordFunction and ObserverUtil improvements
Test Plan: revert-hammer

Differential Revision:
D25966661 (0e43a73f76)

Original commit changeset: 707886e1f212

fbshipit-source-id: a4e4af29abf622c1e0aaaf7dfb019c045988b4bc
2021-03-30 15:41:12 -07:00
Louis Feng
0e43a73f76 Support needsOutputs for RecordFunction and ObserverUtil improvements (#54442)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/54442

Added needsOutputs support to RecordFunction, improved ObserverUtil functions to handle list data. Minor refactor names to be consistent.

To get output data from kernel calls, we need to temporarily capture them before passing them to the record function. Then the results are released to function return. We handle two cases, for unboxed and boxed kernels. The boxed version is fairly simple since all outputs are stored in the stack object. For unboxed kernel calls, we added a `ReturnValue` utility class to properly handle the different return values of unboxed kernels.

For optimization, this intermediate capture is only enabled for observers that request `needsOutputs(true)` and should not affect other observers or when the observer is not enabled.

Test Plan:
```
=> buck build //caffe2/test/cpp/jit: --show-output
=> buck-out/gen/caffe2/test/cpp/jit/jit --gtest_filter=RecordFunctionTest*
CUDA not available. Disabling CUDA and MultiCUDA tests
Note: Google Test filter = RecordFunctionTest*-*_CUDA:*_MultiCUDA
[==========] Running 7 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 7 tests from RecordFunctionTest
[ RUN      ] RecordFunctionTest.TracedTestInputsOutputs
[       OK ] RecordFunctionTest.TracedTestInputsOutputs (226 ms)
[ RUN      ] RecordFunctionTest.SampledCallbacks
[       OK ] RecordFunctionTest.SampledCallbacks (771 ms)
[ RUN      ] RecordFunctionTest.RecordFunctionGuard
[       OK ] RecordFunctionTest.RecordFunctionGuard (0 ms)
[ RUN      ] RecordFunctionTest.Callbacks
[       OK ] RecordFunctionTest.Callbacks (2 ms)
[ RUN      ] RecordFunctionTest.ShouldRun
[       OK ] RecordFunctionTest.ShouldRun (0 ms)
[ RUN      ] RecordFunctionTest.Basic
[       OK ] RecordFunctionTest.Basic (1 ms)
[ RUN      ] RecordFunctionTest.OperatorNameOverload
[       OK ] RecordFunctionTest.OperatorNameOverload (1 ms)
[----------] 7 tests from RecordFunctionTest (1001 ms total)

[----------] Global test environment tear-down
[==========] 7 tests from 1 test case ran. (1002 ms total)
[  PASSED  ] 7 tests.

```

Reviewed By: ilia-cher

Differential Revision: D25966661

fbshipit-source-id: 707886e1f212f40ba16a1fe292ea7dd33f2646e3
2021-03-30 14:26:22 -07:00
Sam Estep
5bcbbf5373 Lint trailing newlines (#54737)
Summary:
*Context:* https://github.com/pytorch/pytorch/issues/53406 added a lint for trailing whitespace at the ends of lines. However, in order to pass FB-internal lints, that PR also had to normalize the trailing newlines in four of the files it touched. This PR adds an OSS lint to normalize trailing newlines.

The changes to the following files (made in 54847d0adb9be71be4979cead3d9d4c02160e4cd) are the only manually-written parts of this PR:

- `.github/workflows/lint.yml`
- `mypy-strict.ini`
- `tools/README.md`
- `tools/test/test_trailing_newlines.py`
- `tools/trailing_newlines.py`

I would have liked to make this just a shell one-liner like the other three similar lints, but nothing I could find quite fit the bill. Specifically, all the answers I tried from the following Stack Overflow questions were far too slow (at least a minute and a half to run on this entire repository):

- [How to detect file ends in newline?](https://stackoverflow.com/q/38746)
- [How do I find files that do not end with a newline/linefeed?](https://stackoverflow.com/q/4631068)
- [How to list all files in the Git index without newline at end of file](https://stackoverflow.com/q/27624800)
- [Linux - check if there is an empty line at the end of a file [duplicate]](https://stackoverflow.com/q/34943632)
- [git ensure newline at end of each file](https://stackoverflow.com/q/57770972)

To avoid giving false positives during the few days after this PR is merged, we should probably only merge it after https://github.com/pytorch/pytorch/issues/54967.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54737

Test Plan:
Running the shell script from the "Ensure correct trailing newlines" step in the `quick-checks` job of `.github/workflows/lint.yml` should print no output and exit in a fraction of a second with a status of 0. That was not the case prior to this PR, as shown by this failing GHA workflow run on an earlier draft of this PR:

- https://github.com/pytorch/pytorch/runs/2197446987?check_suite_focus=true

In contrast, this run (after correcting the trailing newlines in this PR) succeeded:

- https://github.com/pytorch/pytorch/pull/54737/checks?check_run_id=2197553241

To unit-test `tools/trailing_newlines.py` itself (this is run as part of our "Test tools" GitHub Actions workflow):
```
python tools/test/test_trailing_newlines.py
```

Reviewed By: malfet

Differential Revision: D27409736

Pulled By: samestep

fbshipit-source-id: 46f565227046b39f68349bbd5633105b2d2e9b19
2021-03-30 13:09:52 -07:00
Ailing Zhang
263180d7fc Revert D26973911: Implement public API InferenceMode and its error handling
Test Plan: revert-hammer

Differential Revision:
D26973911 (7caa464631)

Original commit changeset: 0ebdac7a3cd5

fbshipit-source-id: afd37a3785bc694e8ffbd679eba1cfed89ef2273
2021-03-29 11:17:49 -07:00
Kurt Mohler
3ddc6174da Raise error in clip_grad_norm_ if norm is non-finite (#53843)
Summary:
**BC-breaking note**: This change throws errors for cases that used to silently pass. The old behavior can be obtained by setting `error_if_nonfinite=False`

Fixes https://github.com/pytorch/pytorch/issues/46849

Pull Request resolved: https://github.com/pytorch/pytorch/pull/53843

Reviewed By: malfet

Differential Revision: D27291838

Pulled By: jbschlosser

fbshipit-source-id: 216d191b26e1b5919a44a3af5cde6f35baf825c4
2021-03-29 08:41:21 -07:00