Commit graph

36005 commits

Author SHA1 Message Date
Akifumi Imanishi
9da0f2e95e Support __pos__ and positive (#55891)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/55604.

This PR implements `torch.Tensor.__pos__` and `torch.positive` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo and kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55891

Reviewed By: H-Huang

Differential Revision: D28025928

Pulled By: mruberry

fbshipit-source-id: e43e329a802f31bf8805f6efab5c2c7ef34c88b9
2021-04-27 13:23:59 -07:00
Shen Li
5b3c0ae563 Use a FutureFactoryRegistry to allow libtorch_cpu files to create CUDAFuture (#56984)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56984

This is a preparation PR before we can create CUDAFuture in rref_impl.cpp.

The solution is adding a `FutureFactoryRegistry` in `rpc/utils.*`. The
TensorPipe RPC agent is responsible for registering `CUDAFuture` factory
and `ivalue::Future` factory. The reason that we need this change instead
of directly using `USE_CUDA` macro in RRef files is as follows. There are
three build targets: `torch_cpu`, `torch_cuda`, and `torch_python`.
`torch_python` is built on top of the other two. `torch_cpu` is CPU-only,
which contains no CUDA-related code, and hence no `USE_CUDA` macro.
`tensorpipe_*` files are in `torch_python` which does have access to CUDA.
However RRef source files are in `torch_cpu`, which cannot contain CUDA
code. The recommended solution is to allow dynamic dispatching. Therefore,
we had this PR.

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D28020917

Pulled By: mrshenli

fbshipit-source-id: e67c76a273074aebb61877185cc5e6bc0a1a5448
2021-04-27 12:34:15 -07:00
Shen Li
f9e7e2e20e Remove unnecessary noCuda arg from AtomicJitFuture (#56973)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56973

Test Plan: Imported from OSS

Reviewed By: lw

Differential Revision: D28020918

Pulled By: mrshenli

fbshipit-source-id: 99d0e4306f7650be97f73af00d89bdbb762595bc
2021-04-27 12:33:02 -07:00
Edvard Ghazaryan
cea265b8d8 Support layer_norm for static runtime (#56444)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56444

Added out version for layer_norm

Test Plan:
buck test caffe2/aten:math_kernel_test -- NativeLayerNorm

buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest

Reviewed By: hlu1

Differential Revision: D27873846

fbshipit-source-id: 53ee9fec4ff9a4e78198b031e86b5afd013626dd
2021-04-27 12:28:37 -07:00
Xiang Gao
3de86b951d Migrate thrust->cub for index put (#55693)
Summary:
64bit indexing is not supported, because if `num_indices = 2^31`, then 4 long tensors of `num_indices` elements will take 64GB RAM. I don't think anybody will be interested in running `index_put` with 64GB GPU RAM.

Benchmark on CUDA 11.3 RTX3090:
```python
import torch
import itertools

def run50_sync(f):
    for _ in range(50):
        f()
    torch.cuda.synchronize()

run50_sync(lambda: torch.randperm(1000000, device='cuda'))

def benchmark(M, L):
    a = torch.randn(M, device='cuda')
    i1 = torch.randint(M, (L,), dtype=torch.long, device='cuda')
    v = torch.randn(L, device='cuda')

    torch.cuda.synchronize()

    %timeit run50_sync(lambda:a.index_put_((i1,), v, True))

for M, L in itertools.product((100, 100000, 10000000), repeat=2):
    print(M, L)
    benchmark(M, L)
```

Before
```
100 100
5.13 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100 100000
30.2 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
100 10000000
3.17 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100000 100
5.19 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 100000
11.9 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 10000000
712 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10000000 100
5.07 ms ± 66.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 100000
12.1 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 10000000
627 ms ± 7.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

After
```
100 100
3.75 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100 100000
26.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
100 10000000
2.81 s ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
100000 100
3.85 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 100000
9.74 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
100000 10000000
444 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
10000000 100
3.85 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 100000
10.7 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10000000 10000000
396 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55693

Reviewed By: albanD

Differential Revision: D27895967

Pulled By: ngimel

fbshipit-source-id: 0616ce33395ce46f1a4161dfd38940b8e54fedc2
2021-04-27 12:27:09 -07:00
Edward Yang
6c602eb099 Don't hold ThreadPool lock when destructing task (#56817)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56817

Fix https://github.com/pytorch/pytorch/issues/56701 and https://github.com/pytorch/pytorch/issues/56786

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: pritamdamania87

Differential Revision: D27975642

Pulled By: ezyang

fbshipit-source-id: b7f4a6c18a4fa65c38bacc7c46246f0865c95f86
2021-04-27 12:22:49 -07:00
Peter Bell
a18f3aacee Vectorize floating point floor_divide (#55380)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55380

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27993499

Pulled By: mruberry

fbshipit-source-id: 45ea9c3295e4d85316bae9487db20914e0cbe3ed
2021-04-27 12:10:06 -07:00
Yukio Siraichi
cf17fd6dd5 Fix multinomial CUDA misalignment and non-deterministic behavior (#55364)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46702

- fails on probability distribution with odd items
  - trying to access an `acc_type` (`float`) in a `scalar_t` (`float16`) aligned memory
- produce unrepeatable result for large input tensor
  - parallel cumsum not monotonic at some positions

### Fixes
- computing cumsum on `acc_type` (`float`) instead of using `scalar_t` (`float16`) fixed both issues
- the non-monotonic behavior may happen even using `float`, though
  - in these cases, deterministic behavior may be achieved by eliminating the race condition when writing the result, using the atomic function `atomicMax`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/55364

Reviewed By: mruberry

Differential Revision: D28031666

Pulled By: ngimel

fbshipit-source-id: 0fc6289e0b9ea2d31ef3771e7ca370de8f5c02de
2021-04-27 12:04:32 -07:00
Akifumi Imanishi
6e91e90b4d Use OpInfo for unsqueeze test (#56924)
Summary:
This PR is ready for https://github.com/pytorch/pytorch/issues/56774.

(cc: mruberry, emcastillo, kmaehashi)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56924

Reviewed By: H-Huang

Differential Revision: D28026529

Pulled By: mruberry

fbshipit-source-id: 3afb33bb2999110c565728cd761d3e7d9d3fc82b
2021-04-27 11:58:30 -07:00
Serhat Yilmaz
6c37788cb1 [torch] Add cuda support for segment reduction 'max' (#56704)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56704

This is re submit of PR: https://github.com/pytorch/pytorch/pull/54175

Main changes compared to original PR:
- Switch to importing "<ATen/cuda/cub.cuh>"
- Use CUB_WRAPPER to reduce boiler plate code.

Test Plan:
Will check CI status to make sure a

Added unit test

Reviewed By: ngimel

Differential Revision: D27941257

fbshipit-source-id: 24a0e0c7f6c46126d2606fe42ed03dca15684415
2021-04-27 11:29:03 -07:00
lezcano
d578e8cfa2 Improved docs for torch.linalg (#56265)
Summary:
This PR tries to make the docs of `torch.linalg` have/be:
- More uniform notation and structure for every function.
- More uniform use of back-quotes and the `:attr:` directive
- More readable for a non-specialised audience through explanations of the form that factorisations take and when would it be beneficial to use what arguments in some solvers.
- More connected among the different functions through the use of  the `.. seealso::` directive.
- More information on when do gradients explode / when is a function silently returning a wrong result / when things do not work in general

I tried to follow the structure of "one short description and then the rest" to be able to format the docs like those of `torch.` or `torch.nn`. I did not do that yet, as I am waiting for the green light on this idea:
https://github.com/pytorch/pytorch/issues/54878#issuecomment-816636171

What this PR does not do:
- Clean the documentation of other functions that are not in the `linalg` module (although I started doing this for `torch.svd`, but then I realised that this PR would touch way too many functions).

Fixes https://github.com/pytorch/pytorch/issues/54878

cc mruberry IvanYashchuk

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56265

Reviewed By: H-Huang

Differential Revision: D27993986

Pulled By: mruberry

fbshipit-source-id: adde7b7383387e1213cc0a6644331f0632b7392d
2021-04-27 11:16:09 -07:00
Yukio Siraichi
9d54475032 Hide module paths leaking in the documentation. (#54585)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54354

Pull Request resolved: https://github.com/pytorch/pytorch/pull/54585

Reviewed By: H-Huang

Differential Revision: D28027037

Pulled By: mruberry

fbshipit-source-id: 219874e143221f5e8349d007f88464e0be1a6243
2021-04-27 10:58:01 -07:00
Ilia Cherniavskii
c203c921bc Revert D27926270: [pytorch][PR] [profiler] Add cuda synchronization points
Test Plan: revert-hammer

Differential Revision:
D27926270 (38bb0ac3e8)

Original commit changeset: 5cf30128590c

fbshipit-source-id: 940da27f5c921d8921191188230807f1708e3e1f
2021-04-27 09:27:35 -07:00
Nikita Shulga
a93ceb333d Workaround intermittent gcc-7.5 ICE in cpp tests (#57016)
Summary:
gcc-7.5 optimizer can hit internal compiler error if both `-fopenmp` and
`-faligned-new` are passed:
```
/var/lib/jenkins/workspace/test/cpp/api/transformer.cpp: In function 'void transformer_decoder_test_helper(bool)':
/var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:609:6: internal compiler error: in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429
 void transformer_decoder_test_helper(bool is_cuda) {
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
```

Fixes https://github.com/pytorch/pytorch/issues/40941

Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57016

Reviewed By: walterddr

Differential Revision: D28027670

Pulled By: malfet

fbshipit-source-id: 834e34b95e09bcae39ada25e02749f479a7e9013
2021-04-27 09:21:23 -07:00
Eli Uriegas
11d455fa8b .github: Enable Linux CPU GHA on PRs (#56942)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56942

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D28018455

Pulled By: seemethere

fbshipit-source-id: 2b4ba3d616c217b4e960871f1428dda03f2ad92a
2021-04-27 09:16:33 -07:00
Nikita Shulga
ed617a61ce Adjust computeLRWorkDim() to work with Accelerate.framework (#56847)
Summary:
According to `vecLib.framework/Headers/clapack.h` Accelerate.framework's LAPACK implementation is based on 3.2.1, and so LRWORK should be computed using following formula (from
```
*>          If JOBZ = 'N', LRWORK >= 7*min(M,N).
*>          Otherwise,
*>          LRWORK >= min(M,N)*max(5*min(M,N)+7,2*max(M,N)+2*min(M,N)+1)
```

Found while looking at test_linalg.py crashes on M1, but would have happen on x86 as well, if Pytorch+Accelerate framework are to be tested on x86_64

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56847

Reviewed By: albanD

Differential Revision: D27983352

Pulled By: malfet

fbshipit-source-id: f757c515c85b32c1e09d00a91bc20fe4b390a75a
2021-04-27 09:12:54 -07:00
Richard Zou
338a600e78 Add dispatch keys for out-of-tree grad+vmap prototype (#56824)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56824

This PR adds 6 dispatch uses to be used with prototyping.

I'm not sure what the best way to name these are, please let me know if
you think that these should have the same prefix.

Test Plan: - wait for tests

Reviewed By: driazati

Differential Revision: D27999963

Pulled By: zou3519

fbshipit-source-id: 0c3ef4788854f7a93d077cc454b773a6eedbbc22
2021-04-27 09:02:49 -07:00
Nikolay Korovaiko
cfbd06d7a1 add all pools, Batchnorm and Tanh (i.e. all ideeped MKLDNN ops) to MKLDNNFuser (#56541)
Summary:
Fixes #{issue number}

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56541

Reviewed By: pbelevich

Differential Revision: D27930353

Pulled By: Krovatkin

fbshipit-source-id: 4d5b932bad4154e8bdd6e35498354e13b39c87a1
2021-04-27 08:59:30 -07:00
Eli Uriegas
8d29ac2033 .github: Bump linux.2xlarge runners to 500 (#56945)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56945

In preparation to turn these on for CI

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: walterddr

Differential Revision: D28018454

Pulled By: seemethere

fbshipit-source-id: fa94d666499877f2cdd7b8fd3fc8b2d8127f61e8
2021-04-27 08:49:22 -07:00
Eli Uriegas
e138987818 .github: Build test binaries in build/ directory (#56941)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56941

Sets the custom test binaries we build in .jenkins/pytorch/build.sh to
be built in the `build` directory instead of the directory above the
workspace.

This should alleviate any weirdness we were seeing before with test
binaries having to be overwritten

Signed-off-by: Eli Uriegas <eliuriegas@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28018453

Pulled By: seemethere

fbshipit-source-id: 74add11037a622e011d00fb6292bfe20e1d55d9e
2021-04-27 08:48:09 -07:00
Hui Guo
6bbd8ba658 [NNC] removed the second run of llvm passmanager - it is repeated and caused a slowdown in the generated code (#56837)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56837

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27980073

Pulled By: huiguoo

fbshipit-source-id: 4bc821adb7bba67078f0a4cb3294143f701f5335
2021-04-27 08:36:04 -07:00
Erjia Guan
3b977a0d28 [DataLoader] Add generate_state for NumPy seeding (#56797)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56797

After adding default seeding strategy for NumPy random module within each worker of DataLoader #56488, two concerns are raised:
- We dropped the support for NumPy < 1.17 due to `SeedSequence`
- In order to support seeding for NumPy < 1.17, how can we provide seed for `numpy.random`?
  - First option is set the same seed as `random`. But, the problem is a same algorithm is shared between `numpy.random` and `random`. With the same seed, they will have exact same state sequence. Thanks to rkern, we noticed this so-called [bad things](https://github.com/PyTorchLightning/pytorch-lightning/pull/6960#issuecomment-818393659).
  - Considering most of users do not aware this problem, we can provide a better seed by default for `numpy.random` using same `SeedSequence` algorithm as numpy. This is just a workaround with hard-coded function to generate an array of four int32 as the seed.

To better coping with this problem since there are amount of 3rd party libraries not just `NumPy` having random module. We may at the end need to implement a `SeedSequence` within `torch.random` module, then users can `spawn` a new `SeedSequence` for each library.

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28000619

Pulled By: ejguan

fbshipit-source-id: 5701c8124a38ea5ded69eb8eee70f9680877ffa6
2021-04-27 08:14:02 -07:00
Philip Meier
759cfb7495 add missing comma to run_test.py (#57010)
Summary:
Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121:

> Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop`

Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010

Reviewed By: malfet

Differential Revision: D28028061

Pulled By: walterddr

fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258
2021-04-27 08:00:13 -07:00
Jeffrey Wan
201ad938b2 Enable fixed fast_mode for complex (#55699)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55699

Todo:
- error message should be updated to say whether the failure is for fn's real or imaginary component

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28007887

Pulled By: soulitzer

fbshipit-source-id: 1819201f59c8586a1d9631db05983969438bde66
2021-04-27 07:54:19 -07:00
Jeffrey Wan
7fe6e8e5a2 Refactor C->C to C->R twice (#55692)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/55692

### Release notes
get_numerical_jacobian and get_analytical_jacobian only support `grad_out=1` and `fn` no longer accepts functions that return complex output

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28004614

Pulled By: soulitzer

fbshipit-source-id: 9592c9c69584b4035b39be62252f138dce39d3b5
2021-04-27 07:53:13 -07:00
anjali411
268cc117a8 Add OpInfos for torch.{complex, view_as_real, view_as_complex} (#56524)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56524

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D27909165

Pulled By: anjali411

fbshipit-source-id: 38592cdb357386549c8309792ef7c3218665d286
2021-04-27 07:40:46 -07:00
Heitor Schueroff
57e37080cd Added OpInfo for torch.einsum (#56276)
Summary:
Adds OpInfo testing for torch.einsum.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56276

Reviewed By: mruberry

Differential Revision: D27967095

Pulled By: heitorschueroff

fbshipit-source-id: 60524273d2ca885e7eeb932db3e7fd697ae5ca8e
2021-04-27 07:39:38 -07:00
Edward Yang
ab1457ad14 Remove C++17 only optional include (#56782)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56782

Fixes #56749

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: H-Huang

Differential Revision: D28000019

Pulled By: ezyang

fbshipit-source-id: 87f86a402dac87e6c101aef8c78a928ce7d21340
2021-04-27 07:35:15 -07:00
Edward Yang
0d777a808c Make test_randperm work with meta device (#56976)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56976

Band-aid fix for #54282

Signed-off-by: Edward Z. Yang <ezyang@fb.com>

Test Plan: Imported from OSS

Reviewed By: mruberry

Differential Revision: D28020401

Pulled By: ezyang

fbshipit-source-id: 50546d5275eade408d65e9c883999fb3b65ff55a
2021-04-27 07:26:58 -07:00
Joel Schlosser
f7fba854bf Implement module.to_empty() (#56610)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/54600

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56610

Reviewed By: malfet

Differential Revision: D27921653

Pulled By: jbschlosser

fbshipit-source-id: 10734b3eaa5b84bb4ba6eeba1043cfc8bb570a17
2021-04-27 06:19:54 -07:00
Bharat123rox
f2acdff73d DOC: Add note to mutating methods (#56877)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/56243 by adding a note to mutating functions not following the trailing `_` convention in `torch/nn/modules/module.py`

I can also raise separate PRs for other files, if needed

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56877

Reviewed By: ezyang

Differential Revision: D28008856

Pulled By: jbschlosser

fbshipit-source-id: 63bfca0df05e49fceadd3167b1427dcb5542206a
2021-04-27 06:16:56 -07:00
Mike Ruberry
1145e2c6e2 Revert D27831996: ns for fx: move node I/O dtype mapping to be local instead of global
Test Plan: revert-hammer

Differential Revision:
D27831996 (93de80203d)

Original commit changeset: 782f5e77de0e

fbshipit-source-id: 6637ef4e8ba76fc4f2b3836ad1ed8d37ce040576
2021-04-27 01:01:08 -07:00
Mike Ruberry
45e96b5410 Revert D27833189: ns for fx: allow user functions in shadowing
Test Plan: revert-hammer

Differential Revision:
D27833189 (1917350977)

Original commit changeset: dac418e294d1

fbshipit-source-id: c6f58dac1a35806ea7d1dfb993d67e698196dee1
2021-04-27 01:01:06 -07:00
Mike Ruberry
982c72ac33 Revert D27836064: ns for fx: add fp16 function shadowing
Test Plan: revert-hammer

Differential Revision:
D27836064 (96a9eafcfb)

Original commit changeset: 37a434a04e2b

fbshipit-source-id: e85088f5e301e14a0fc9ac1f7241c2baaf0a957e
2021-04-27 01:01:04 -07:00
Mike Ruberry
90d554bd86 Revert D27857735: ns for fx: bug fix for shadowing fp16 emulation patterns
Test Plan: revert-hammer

Differential Revision:
D27857735 (f35540be38)

Original commit changeset: 7c1a067f035a

fbshipit-source-id: 6816223975b2e7b1f395e8894d17e3358fdb50ed
2021-04-27 01:01:02 -07:00
Mike Ruberry
abb8b6c1c1 Revert D27864296: ns for fx: support binary ops when adding unshadowed loggers for inputs
Test Plan: revert-hammer

Differential Revision:
D27864296 (c004346c88)

Original commit changeset: 3cbeb728297a

fbshipit-source-id: bc87cb707b14a0965452e9a1aa0d4e37ffbe5bf1
2021-04-27 01:01:01 -07:00
Mike Ruberry
cc8c5c1447 Revert D27886107: ns for fx: add option to skip matching classes and functions
Test Plan: revert-hammer

Differential Revision:
D27886107 (92c7aec5f5)

Original commit changeset: ec92c4f7ab71

fbshipit-source-id: 87d3b91c3d601f1706b61a2b2ce287a7b44f3d81
2021-04-27 01:00:59 -07:00
Mike Ruberry
5dc7a6b050 Revert D27960767: ns for fx: allow comparing int8 to int8 for functionals
Test Plan: revert-hammer

Differential Revision:
D27960767 (502c58ad84)

Original commit changeset: abc911ca4b9e

fbshipit-source-id: 9bb1aa9d0e764bfd2dd6745af897d958c054ef3a
2021-04-27 01:00:57 -07:00
Mike Ruberry
5db03b4109 Revert D27960766: ns for fx: additional bugfix for user defined functions
Test Plan: revert-hammer

Differential Revision:
D27960766 (9bd14da6e4)

Original commit changeset: 02935d2f400a

fbshipit-source-id: e7026c8637a591b6ffef288da8ef6306cdb9eb95
2021-04-27 00:59:57 -07:00
Andrew Millspaugh
a0483cd06b Back out "fx: Fix type_matches for Optional[List[int]] arguments" (#56991)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56991

Original commit changeset: c5aa5f61a215

Diff: D27987746 (267b554b6f)

Test Plan: `buck test` under the glow-buck target is the target that this reversion is intended to fix

Reviewed By: jfix71

Differential Revision: D28019659

fbshipit-source-id: 37584ff404fc9195b309a5a6afdb4edbc2b4f088
2021-04-27 00:15:15 -07:00
Bert Maher
780f454297 Add some functions for manipulating mkldnn tensors to TORCH_API (#56954)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56954

Test Plan: Imported from OSS

Reviewed By: jamesr66a

Differential Revision: D28010327

Pulled By: bertmaher

fbshipit-source-id: 59872a40c7bc06187f0d87046446dd39193a1d71
2021-04-26 23:52:49 -07:00
Bert Maher
c42dd8b257 Revert "Use at::cpu in bench_approx (#56563)" (#56816)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56816

This doesn't actually work.  For some reason the linker can't find
at::cpu::logit_out, and it's not worth digging into why not.

Test Plan: Imported from OSS

Reviewed By: ZolotukhinM

Differential Revision: D27977406

Pulled By: bertmaher

fbshipit-source-id: d0235a393f25243e2c8a011e9baf267daf483ae4
2021-04-26 23:51:49 -07:00
Ilia Cherniavskii
38bb0ac3e8 [profiler] Add cuda synchronization points (#56651)
Summary:
Adding cuda synchronization when entering and exiting the profiler
context manager

Pull Request resolved: https://github.com/pytorch/pytorch/pull/56651

Test Plan: CI

Reviewed By: gdankel

Differential Revision: D27926270

Pulled By: ilia-cher

fbshipit-source-id: 5cf30128590c1c71a865f877578975c4a6e2cb48
2021-04-26 23:21:05 -07:00
Pritam Damania
dc8a8cea79 Move caffe2 signal_handler to c10. (#56717)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717

The signal_handler was under the caffe2 namespacee but was being used
by PyTorch as well.

I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch
can use it.

The signal_handler interface in caffe2/utils/signal_handler.h is kept the same
for backward compatiblity for C2, but most of the commmon code is moved to c10.
ghstack-source-id: 127446929

Test Plan: waitforbuildbot

Reviewed By: ezyang

Differential Revision: D27946738

fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f
2021-04-26 23:08:12 -07:00
Lucas Hosseini
6ed5bbfb46 [TensorPipe] Give higher priority to CPU-only channels. (#56908)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56908

CUDA channels might implement CPU-to-CPU transfers, but will usually be
less efficient for that purpose.

Test Plan: CI

Reviewed By: lw

Differential Revision: D27994069

fbshipit-source-id: fefa7f243eb43cf769864233df518f2a1819f949
2021-04-26 22:27:44 -07:00
Edvard Ghazaryan
a09bbe73fd static runtime support for fb::equally_split (#56812)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56812

fb::equally_split get fused with ListUnpack and all outputs from ListUnpack getting attached to fb::equally_split.
So fb::equal_split will have as many outputs as ListUnpack .

Test Plan:
buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators
buck test caffe2/torch/fb/sparsenn:test -- test_equally_split_op

Reviewed By: hlu1

Differential Revision: D27974999

fbshipit-source-id: b2ca19ff86aec76b977c1e3cfc56567adab66b35
2021-04-26 20:18:09 -07:00
Yi Wang
35f3feca28 [RPC Framework] Supporting reading the input from the remote worker (#56943)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/56943

If the module is placed on a CUDA device, then all the CPU tensors in `args` and `kwargs` will also be implicitly moved to the same CUDA device to run forward.

Currently still need to move the forward output from CUDA device back to CPU, until:
1) Process group RPC backend is completely deprecated, and we always use TensorPipe RPC backend;
2) A device map is explicitly provided to TensorPipe RPC backend.

These steps will be done in a separate PR.

#Original PR issue: https://github.com/pytorch/pytorch/issues/51670
ghstack-source-id: 127457584

Test Plan:
buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device_script

buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule

buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- --exact 'caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test - test_load_di_parts (caffe2.torch.fb.training_toolkit.applications.sparse_nn.batch_distributed_inference.tests.batch_distributed_inference_test.BatchDistributedInferenceTest)'

Reviewed By: wanchaol

Differential Revision: D27934791

fbshipit-source-id: de27e27b905db83cc52800e63684fc6c942e9dc7
2021-04-26 20:04:06 -07:00
Meghan Lele
3721e01d60 Port adaptive_max_pool3d_backward to structured kernel (#56800)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56800

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27984077

Pulled By: SplitInfinity

fbshipit-source-id: 1425ae741474128f3aacd032d7f926ce5ea81101
2021-04-26 20:01:09 -07:00
Meghan Lele
77e3f5d73d Port adaptive_max_pool2d_backward to structured kernel (#56799)
Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56799

Test Plan: Imported from OSS

Reviewed By: ezyang

Differential Revision: D27984078

Pulled By: SplitInfinity

fbshipit-source-id: 6404513f413fc6966687d8f1e9ea2a423a332ec9
2021-04-26 20:00:07 -07:00
Guilherme Leobas
e7c79cb158 Add type annotations to nnapi (#48142)
Summary:
Fixes https://github.com/pytorch/pytorch/issues/48141

~Mypy is complaining about a missing arg in a function call.~
```bash
torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary"  [call-arg]
Found 1 error in 1 file (checked 1140 source files)
```

9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)

~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~

Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142

Reviewed By: ezyang

Differential Revision: D28006249

Pulled By: walterddr

fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c
2021-04-26 19:08:07 -07:00