pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Akifumi Imanishi	9da0f2e95e	Support `__pos__` and `positive` (#55891 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/55604. This PR implements `torch.Tensor.__pos__` and `torch.positive` for the compatibility with NumPy’s interface. (cc: mruberry, rgommers, emcastillo and kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/55891 Reviewed By: H-Huang Differential Revision: D28025928 Pulled By: mruberry fbshipit-source-id: e43e329a802f31bf8805f6efab5c2c7ef34c88b9	2021-04-27 13:23:59 -07:00
Shen Li	5b3c0ae563	Use a FutureFactoryRegistry to allow libtorch_cpu files to create CUDAFuture (#56984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56984 This is a preparation PR before we can create CUDAFuture in rref_impl.cpp. The solution is adding a `FutureFactoryRegistry` in `rpc/utils.`. The TensorPipe RPC agent is responsible for registering `CUDAFuture` factory and `ivalue::Future` factory. The reason that we need this change instead of directly using `USE_CUDA` macro in RRef files is as follows. There are three build targets: `torch_cpu`, `torch_cuda`, and `torch_python`. `torch_python` is built on top of the other two. `torch_cpu` is CPU-only, which contains no CUDA-related code, and hence no `USE_CUDA` macro. `tensorpipe_` files are in `torch_python` which does have access to CUDA. However RRef source files are in `torch_cpu`, which cannot contain CUDA code. The recommended solution is to allow dynamic dispatching. Therefore, we had this PR. Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28020917 Pulled By: mrshenli fbshipit-source-id: e67c76a273074aebb61877185cc5e6bc0a1a5448	2021-04-27 12:34:15 -07:00
Shen Li	f9e7e2e20e	Remove unnecessary noCuda arg from AtomicJitFuture (#56973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56973 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D28020918 Pulled By: mrshenli fbshipit-source-id: 99d0e4306f7650be97f73af00d89bdbb762595bc	2021-04-27 12:33:02 -07:00
Edvard Ghazaryan	cea265b8d8	Support layer_norm for static runtime (#56444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56444 Added out version for layer_norm Test Plan: buck test caffe2/aten:math_kernel_test -- NativeLayerNorm buck test caffe2/benchmarks/static_runtime:static_runtime_cpptest Reviewed By: hlu1 Differential Revision: D27873846 fbshipit-source-id: 53ee9fec4ff9a4e78198b031e86b5afd013626dd	2021-04-27 12:28:37 -07:00
Xiang Gao	3de86b951d	Migrate thrust->cub for index put (#55693 ) Summary: 64bit indexing is not supported, because if `num_indices = 2^31`, then 4 long tensors of `num_indices` elements will take 64GB RAM. I don't think anybody will be interested in running `index_put` with 64GB GPU RAM. Benchmark on CUDA 11.3 RTX3090: ```python import torch import itertools def run50_sync(f): for _ in range(50): f() torch.cuda.synchronize() run50_sync(lambda: torch.randperm(1000000, device='cuda')) def benchmark(M, L): a = torch.randn(M, device='cuda') i1 = torch.randint(M, (L,), dtype=torch.long, device='cuda') v = torch.randn(L, device='cuda') torch.cuda.synchronize() %timeit run50_sync(lambda:a.index_put_((i1,), v, True)) for M, L in itertools.product((100, 100000, 10000000), repeat=2): print(M, L) benchmark(M, L) ``` Before ``` 100 100 5.13 ms ± 91 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100 100000 30.2 ms ± 471 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 100 10000000 3.17 s ± 14.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 100000 100 5.19 ms ± 61.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 100000 11.9 ms ± 200 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 10000000 712 ms ± 3.49 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 10000000 100 5.07 ms ± 66.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 100000 12.1 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 10000000 627 ms ± 7.65 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 100 100 3.75 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100 100000 26.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 100 10000000 2.81 s ± 23.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 100000 100 3.85 ms ± 16.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 100000 9.74 ms ± 40.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 100000 10000000 444 ms ± 1.86 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 10000000 100 3.85 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 100000 10.7 ms ± 116 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10000000 10000000 396 ms ± 2.63 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55693 Reviewed By: albanD Differential Revision: D27895967 Pulled By: ngimel fbshipit-source-id: 0616ce33395ce46f1a4161dfd38940b8e54fedc2	2021-04-27 12:27:09 -07:00
Edward Yang	6c602eb099	Don't hold ThreadPool lock when destructing task (#56817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56817 Fix https://github.com/pytorch/pytorch/issues/56701 and https://github.com/pytorch/pytorch/issues/56786 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D27975642 Pulled By: ezyang fbshipit-source-id: b7f4a6c18a4fa65c38bacc7c46246f0865c95f86	2021-04-27 12:22:49 -07:00
Peter Bell	a18f3aacee	Vectorize floating point floor_divide (#55380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55380 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D27993499 Pulled By: mruberry fbshipit-source-id: 45ea9c3295e4d85316bae9487db20914e0cbe3ed	2021-04-27 12:10:06 -07:00
Yukio Siraichi	cf17fd6dd5	Fix multinomial CUDA misalignment and non-deterministic behavior (#55364 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/46702 - fails on probability distribution with odd items - trying to access an `acc_type` (`float`) in a `scalar_t` (`float16`) aligned memory - produce unrepeatable result for large input tensor - parallel cumsum not monotonic at some positions ### Fixes - computing cumsum on `acc_type` (`float`) instead of using `scalar_t` (`float16`) fixed both issues - the non-monotonic behavior may happen even using `float`, though - in these cases, deterministic behavior may be achieved by eliminating the race condition when writing the result, using the atomic function `atomicMax` Pull Request resolved: https://github.com/pytorch/pytorch/pull/55364 Reviewed By: mruberry Differential Revision: D28031666 Pulled By: ngimel fbshipit-source-id: 0fc6289e0b9ea2d31ef3771e7ca370de8f5c02de	2021-04-27 12:04:32 -07:00
Akifumi Imanishi	6e91e90b4d	Use OpInfo for unsqueeze test (#56924 ) Summary: This PR is ready for https://github.com/pytorch/pytorch/issues/56774. (cc: mruberry, emcastillo, kmaehashi) Pull Request resolved: https://github.com/pytorch/pytorch/pull/56924 Reviewed By: H-Huang Differential Revision: D28026529 Pulled By: mruberry fbshipit-source-id: 3afb33bb2999110c565728cd761d3e7d9d3fc82b	2021-04-27 11:58:30 -07:00
Serhat Yilmaz	6c37788cb1	[torch] Add cuda support for segment reduction 'max' (#56704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56704 This is re submit of PR: https://github.com/pytorch/pytorch/pull/54175 Main changes compared to original PR: - Switch to importing "<ATen/cuda/cub.cuh>" - Use CUB_WRAPPER to reduce boiler plate code. Test Plan: Will check CI status to make sure a Added unit test Reviewed By: ngimel Differential Revision: D27941257 fbshipit-source-id: 24a0e0c7f6c46126d2606fe42ed03dca15684415	2021-04-27 11:29:03 -07:00
lezcano	d578e8cfa2	Improved docs for `torch.linalg` (#56265 ) Summary: This PR tries to make the docs of `torch.linalg` have/be: - More uniform notation and structure for every function. - More uniform use of back-quotes and the `:attr:` directive - More readable for a non-specialised audience through explanations of the form that factorisations take and when would it be beneficial to use what arguments in some solvers. - More connected among the different functions through the use of the `.. seealso::` directive. - More information on when do gradients explode / when is a function silently returning a wrong result / when things do not work in general I tried to follow the structure of "one short description and then the rest" to be able to format the docs like those of `torch.` or `torch.nn`. I did not do that yet, as I am waiting for the green light on this idea: https://github.com/pytorch/pytorch/issues/54878#issuecomment-816636171 What this PR does not do: - Clean the documentation of other functions that are not in the `linalg` module (although I started doing this for `torch.svd`, but then I realised that this PR would touch way too many functions). Fixes https://github.com/pytorch/pytorch/issues/54878 cc mruberry IvanYashchuk Pull Request resolved: https://github.com/pytorch/pytorch/pull/56265 Reviewed By: H-Huang Differential Revision: D27993986 Pulled By: mruberry fbshipit-source-id: adde7b7383387e1213cc0a6644331f0632b7392d	2021-04-27 11:16:09 -07:00
Yukio Siraichi	9d54475032	Hide module paths leaking in the documentation. (#54585 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/54585 Reviewed By: H-Huang Differential Revision: D28027037 Pulled By: mruberry fbshipit-source-id: 219874e143221f5e8349d007f88464e0be1a6243	2021-04-27 10:58:01 -07:00
Ilia Cherniavskii	c203c921bc	Revert D27926270: [pytorch][PR] [profiler] Add cuda synchronization points Test Plan: revert-hammer Differential Revision: D27926270 (`38bb0ac3e8`) Original commit changeset: 5cf30128590c fbshipit-source-id: 940da27f5c921d8921191188230807f1708e3e1f	2021-04-27 09:27:35 -07:00
Nikita Shulga	a93ceb333d	Workaround intermittent gcc-7.5 ICE in cpp tests (#57016 ) Summary: gcc-7.5 optimizer can hit internal compiler error if both `-fopenmp` and `-faligned-new` are passed: ``` /var/lib/jenkins/workspace/test/cpp/api/transformer.cpp: In function 'void transformer_decoder_test_helper(bool)': /var/lib/jenkins/workspace/test/cpp/api/transformer.cpp:609:6: internal compiler error: in equal_mem_array_ref_p, at tree-ssa-scopedtables.c:429 void transformer_decoder_test_helper(bool is_cuda) { ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ``` Fixes https://github.com/pytorch/pytorch/issues/40941 Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/57016 Reviewed By: walterddr Differential Revision: D28027670 Pulled By: malfet fbshipit-source-id: 834e34b95e09bcae39ada25e02749f479a7e9013	2021-04-27 09:21:23 -07:00
Eli Uriegas	11d455fa8b	.github: Enable Linux CPU GHA on PRs (#56942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56942 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28018455 Pulled By: seemethere fbshipit-source-id: 2b4ba3d616c217b4e960871f1428dda03f2ad92a	2021-04-27 09:16:33 -07:00
Nikita Shulga	ed617a61ce	Adjust computeLRWorkDim() to work with Accelerate.framework (#56847 ) Summary: According to `vecLib.framework/Headers/clapack.h` Accelerate.framework's LAPACK implementation is based on 3.2.1, and so LRWORK should be computed using following formula (from ``` > If JOBZ = 'N', LRWORK >= 7min(M,N). > Otherwise, > LRWORK >= min(M,N)max(5min(M,N)+7,2max(M,N)+2min(M,N)+1) ``` Found while looking at test_linalg.py crashes on M1, but would have happen on x86 as well, if Pytorch+Accelerate framework are to be tested on x86_64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56847 Reviewed By: albanD Differential Revision: D27983352 Pulled By: malfet fbshipit-source-id: f757c515c85b32c1e09d00a91bc20fe4b390a75a	2021-04-27 09:12:54 -07:00
Richard Zou	338a600e78	Add dispatch keys for out-of-tree grad+vmap prototype (#56824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56824 This PR adds 6 dispatch uses to be used with prototyping. I'm not sure what the best way to name these are, please let me know if you think that these should have the same prefix. Test Plan: - wait for tests Reviewed By: driazati Differential Revision: D27999963 Pulled By: zou3519 fbshipit-source-id: 0c3ef4788854f7a93d077cc454b773a6eedbbc22	2021-04-27 09:02:49 -07:00
Nikolay Korovaiko	cfbd06d7a1	add all pools, Batchnorm and Tanh (i.e. all ideeped MKLDNN ops) to MKLDNNFuser (#56541 ) Summary: Fixes #{issue number} Pull Request resolved: https://github.com/pytorch/pytorch/pull/56541 Reviewed By: pbelevich Differential Revision: D27930353 Pulled By: Krovatkin fbshipit-source-id: 4d5b932bad4154e8bdd6e35498354e13b39c87a1	2021-04-27 08:59:30 -07:00
Eli Uriegas	8d29ac2033	.github: Bump linux.2xlarge runners to 500 (#56945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56945 In preparation to turn these on for CI Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: walterddr Differential Revision: D28018454 Pulled By: seemethere fbshipit-source-id: fa94d666499877f2cdd7b8fd3fc8b2d8127f61e8	2021-04-27 08:49:22 -07:00
Eli Uriegas	e138987818	.github: Build test binaries in build/ directory (#56941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56941 Sets the custom test binaries we build in .jenkins/pytorch/build.sh to be built in the `build` directory instead of the directory above the workspace. This should alleviate any weirdness we were seeing before with test binaries having to be overwritten Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28018453 Pulled By: seemethere fbshipit-source-id: 74add11037a622e011d00fb6292bfe20e1d55d9e	2021-04-27 08:48:09 -07:00
Hui Guo	6bbd8ba658	[NNC] removed the second run of llvm passmanager - it is repeated and caused a slowdown in the generated code (#56837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56837 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27980073 Pulled By: huiguoo fbshipit-source-id: 4bc821adb7bba67078f0a4cb3294143f701f5335	2021-04-27 08:36:04 -07:00
Erjia Guan	3b977a0d28	[DataLoader] Add `generate_state` for NumPy seeding (#56797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56797 After adding default seeding strategy for NumPy random module within each worker of DataLoader #56488, two concerns are raised: - We dropped the support for NumPy < 1.17 due to `SeedSequence` - In order to support seeding for NumPy < 1.17, how can we provide seed for `numpy.random`? - First option is set the same seed as `random`. But, the problem is a same algorithm is shared between `numpy.random` and `random`. With the same seed, they will have exact same state sequence. Thanks to rkern, we noticed this so-called [bad things](https://github.com/PyTorchLightning/pytorch-lightning/pull/6960#issuecomment-818393659). - Considering most of users do not aware this problem, we can provide a better seed by default for `numpy.random` using same `SeedSequence` algorithm as numpy. This is just a workaround with hard-coded function to generate an array of four int32 as the seed. To better coping with this problem since there are amount of 3rd party libraries not just `NumPy` having random module. We may at the end need to implement a `SeedSequence` within `torch.random` module, then users can `spawn` a new `SeedSequence` for each library. Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28000619 Pulled By: ejguan fbshipit-source-id: 5701c8124a38ea5ded69eb8eee70f9680877ffa6	2021-04-27 08:14:02 -07:00
Philip Meier	759cfb7495	add missing comma to `run_test.py` (#57010 ) Summary: Factored out from https://github.com/pytorch/pytorch/pull/57008#discussion_r621137121: > Without this comma, the strings are concatenated to `test_binary_ufuncstest_numpy_interop` Pull Request resolved: https://github.com/pytorch/pytorch/pull/57010 Reviewed By: malfet Differential Revision: D28028061 Pulled By: walterddr fbshipit-source-id: 97c64b79a6aaaf0242def03c8808c1a032537258	2021-04-27 08:00:13 -07:00
Jeffrey Wan	201ad938b2	Enable fixed fast_mode for complex (#55699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55699 Todo: - error message should be updated to say whether the failure is for fn's real or imaginary component Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28007887 Pulled By: soulitzer fbshipit-source-id: 1819201f59c8586a1d9631db05983969438bde66	2021-04-27 07:54:19 -07:00
Jeffrey Wan	7fe6e8e5a2	Refactor C->C to C->R twice (#55692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/55692 ### Release notes get_numerical_jacobian and get_analytical_jacobian only support `grad_out=1` and `fn` no longer accepts functions that return complex output Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28004614 Pulled By: soulitzer fbshipit-source-id: 9592c9c69584b4035b39be62252f138dce39d3b5	2021-04-27 07:53:13 -07:00
anjali411	268cc117a8	Add OpInfos for torch.{complex, view_as_real, view_as_complex} (#56524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56524 Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D27909165 Pulled By: anjali411 fbshipit-source-id: 38592cdb357386549c8309792ef7c3218665d286	2021-04-27 07:40:46 -07:00
Heitor Schueroff	57e37080cd	Added OpInfo for torch.einsum (#56276 ) Summary: Adds OpInfo testing for torch.einsum. Pull Request resolved: https://github.com/pytorch/pytorch/pull/56276 Reviewed By: mruberry Differential Revision: D27967095 Pulled By: heitorschueroff fbshipit-source-id: 60524273d2ca885e7eeb932db3e7fd697ae5ca8e	2021-04-27 07:39:38 -07:00
Edward Yang	ab1457ad14	Remove C++17 only optional include (#56782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56782 Fixes #56749 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: H-Huang Differential Revision: D28000019 Pulled By: ezyang fbshipit-source-id: 87f86a402dac87e6c101aef8c78a928ce7d21340	2021-04-27 07:35:15 -07:00
Edward Yang	0d777a808c	Make test_randperm work with meta device (#56976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56976 Band-aid fix for #54282 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D28020401 Pulled By: ezyang fbshipit-source-id: 50546d5275eade408d65e9c883999fb3b65ff55a	2021-04-27 07:26:58 -07:00
Joel Schlosser	f7fba854bf	Implement module.to_empty() (#56610 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/54600 Pull Request resolved: https://github.com/pytorch/pytorch/pull/56610 Reviewed By: malfet Differential Revision: D27921653 Pulled By: jbschlosser fbshipit-source-id: 10734b3eaa5b84bb4ba6eeba1043cfc8bb570a17	2021-04-27 06:19:54 -07:00
Bharat123rox	f2acdff73d	DOC: Add note to mutating methods (#56877 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/56243 by adding a note to mutating functions not following the trailing `_` convention in `torch/nn/modules/module.py` I can also raise separate PRs for other files, if needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/56877 Reviewed By: ezyang Differential Revision: D28008856 Pulled By: jbschlosser fbshipit-source-id: 63bfca0df05e49fceadd3167b1427dcb5542206a	2021-04-27 06:16:56 -07:00
Mike Ruberry	1145e2c6e2	Revert D27831996: ns for fx: move node I/O dtype mapping to be local instead of global Test Plan: revert-hammer Differential Revision: D27831996 (`93de80203d`) Original commit changeset: 782f5e77de0e fbshipit-source-id: 6637ef4e8ba76fc4f2b3836ad1ed8d37ce040576	2021-04-27 01:01:08 -07:00
Mike Ruberry	45e96b5410	Revert D27833189: ns for fx: allow user functions in shadowing Test Plan: revert-hammer Differential Revision: D27833189 (`1917350977`) Original commit changeset: dac418e294d1 fbshipit-source-id: c6f58dac1a35806ea7d1dfb993d67e698196dee1	2021-04-27 01:01:06 -07:00
Mike Ruberry	982c72ac33	Revert D27836064: ns for fx: add fp16 function shadowing Test Plan: revert-hammer Differential Revision: D27836064 (`96a9eafcfb`) Original commit changeset: 37a434a04e2b fbshipit-source-id: e85088f5e301e14a0fc9ac1f7241c2baaf0a957e	2021-04-27 01:01:04 -07:00
Mike Ruberry	90d554bd86	Revert D27857735: ns for fx: bug fix for shadowing fp16 emulation patterns Test Plan: revert-hammer Differential Revision: D27857735 (`f35540be38`) Original commit changeset: 7c1a067f035a fbshipit-source-id: 6816223975b2e7b1f395e8894d17e3358fdb50ed	2021-04-27 01:01:02 -07:00
Mike Ruberry	abb8b6c1c1	Revert D27864296: ns for fx: support binary ops when adding unshadowed loggers for inputs Test Plan: revert-hammer Differential Revision: D27864296 (`c004346c88`) Original commit changeset: 3cbeb728297a fbshipit-source-id: bc87cb707b14a0965452e9a1aa0d4e37ffbe5bf1	2021-04-27 01:01:01 -07:00
Mike Ruberry	cc8c5c1447	Revert D27886107: ns for fx: add option to skip matching classes and functions Test Plan: revert-hammer Differential Revision: D27886107 (`92c7aec5f5`) Original commit changeset: ec92c4f7ab71 fbshipit-source-id: 87d3b91c3d601f1706b61a2b2ce287a7b44f3d81	2021-04-27 01:00:59 -07:00
Mike Ruberry	5dc7a6b050	Revert D27960767: ns for fx: allow comparing int8 to int8 for functionals Test Plan: revert-hammer Differential Revision: D27960767 (`502c58ad84`) Original commit changeset: abc911ca4b9e fbshipit-source-id: 9bb1aa9d0e764bfd2dd6745af897d958c054ef3a	2021-04-27 01:00:57 -07:00
Mike Ruberry	5db03b4109	Revert D27960766: ns for fx: additional bugfix for user defined functions Test Plan: revert-hammer Differential Revision: D27960766 (`9bd14da6e4`) Original commit changeset: 02935d2f400a fbshipit-source-id: e7026c8637a591b6ffef288da8ef6306cdb9eb95	2021-04-27 00:59:57 -07:00
Andrew Millspaugh	a0483cd06b	Back out "fx: Fix type_matches for Optional[List[int]] arguments" (#56991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56991 Original commit changeset: c5aa5f61a215 Diff: D27987746 (`267b554b6f`) Test Plan: `buck test` under the glow-buck target is the target that this reversion is intended to fix Reviewed By: jfix71 Differential Revision: D28019659 fbshipit-source-id: 37584ff404fc9195b309a5a6afdb4edbc2b4f088	2021-04-27 00:15:15 -07:00
Bert Maher	780f454297	Add some functions for manipulating mkldnn tensors to TORCH_API (#56954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56954 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D28010327 Pulled By: bertmaher fbshipit-source-id: 59872a40c7bc06187f0d87046446dd39193a1d71	2021-04-26 23:52:49 -07:00
Bert Maher	c42dd8b257	Revert "Use at::cpu in bench_approx (#56563 )" (#56816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56816 This doesn't actually work. For some reason the linker can't find at::cpu::logit_out, and it's not worth digging into why not. Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D27977406 Pulled By: bertmaher fbshipit-source-id: d0235a393f25243e2c8a011e9baf267daf483ae4	2021-04-26 23:51:49 -07:00
Ilia Cherniavskii	38bb0ac3e8	[profiler] Add cuda synchronization points (#56651 ) Summary: Adding cuda synchronization when entering and exiting the profiler context manager Pull Request resolved: https://github.com/pytorch/pytorch/pull/56651 Test Plan: CI Reviewed By: gdankel Differential Revision: D27926270 Pulled By: ilia-cher fbshipit-source-id: 5cf30128590c1c71a865f877578975c4a6e2cb48	2021-04-26 23:21:05 -07:00
Pritam Damania	dc8a8cea79	Move caffe2 signal_handler to c10. (#56717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56717 The signal_handler was under the caffe2 namespacee but was being used by PyTorch as well. I've fixed this my moving it to the c10 namespace where now both C2 and PyTorch can use it. The signal_handler interface in caffe2/utils/signal_handler.h is kept the same for backward compatiblity for C2, but most of the commmon code is moved to c10. ghstack-source-id: 127446929 Test Plan: waitforbuildbot Reviewed By: ezyang Differential Revision: D27946738 fbshipit-source-id: d6228d1a0108f4c807d405e7a0bb799c5375388f	2021-04-26 23:08:12 -07:00
Lucas Hosseini	6ed5bbfb46	[TensorPipe] Give higher priority to CPU-only channels. (#56908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56908 CUDA channels might implement CPU-to-CPU transfers, but will usually be less efficient for that purpose. Test Plan: CI Reviewed By: lw Differential Revision: D27994069 fbshipit-source-id: fefa7f243eb43cf769864233df518f2a1819f949	2021-04-26 22:27:44 -07:00
Edvard Ghazaryan	a09bbe73fd	static runtime support for fb::equally_split (#56812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56812 fb::equally_split get fused with ListUnpack and all outputs from ListUnpack getting attached to fb::equally_split. So fb::equal_split will have as many outputs as ListUnpack . Test Plan: buck test caffe2/benchmarks/static_runtime/fb:test_fb_operators buck test caffe2/torch/fb/sparsenn:test -- test_equally_split_op Reviewed By: hlu1 Differential Revision: D27974999 fbshipit-source-id: b2ca19ff86aec76b977c1e3cfc56567adab66b35	2021-04-26 20:18:09 -07:00
Yi Wang	35f3feca28	[RPC Framework] Supporting reading the input from the remote worker (#56943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56943 If the module is placed on a CUDA device, then all the CPU tensors in `args` and `kwargs` will also be implicitly moved to the same CUDA device to run forward. Currently still need to move the forward output from CUDA device back to CPU, until: 1) Process group RPC backend is completely deprecated, and we always use TensorPipe RPC backend; 2) A device map is explicitly provided to TensorPipe RPC backend. These steps will be done in a separate PR. #Original PR issue: https://github.com/pytorch/pytorch/issues/51670 ghstack-source-id: 127457584 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- test_input_moved_to_cuda_device_script buck test mode/dev-nosan caffe2/test/distributed/rpc:process_group_agent -- RemoteModule buck test mode/dev-nosan //caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test -- --exact 'caffe2/torch/fb/training_toolkit/applications/sparse_nn/batch_distributed_inference/tests:batch_distributed_inference_test - test_load_di_parts (caffe2.torch.fb.training_toolkit.applications.sparse_nn.batch_distributed_inference.tests.batch_distributed_inference_test.BatchDistributedInferenceTest)' Reviewed By: wanchaol Differential Revision: D27934791 fbshipit-source-id: de27e27b905db83cc52800e63684fc6c942e9dc7	2021-04-26 20:04:06 -07:00
Meghan Lele	3721e01d60	Port adaptive_max_pool3d_backward to structured kernel (#56800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56800 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27984077 Pulled By: SplitInfinity fbshipit-source-id: 1425ae741474128f3aacd032d7f926ce5ea81101	2021-04-26 20:01:09 -07:00
Meghan Lele	77e3f5d73d	Port adaptive_max_pool2d_backward to structured kernel (#56799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/56799 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D27984078 Pulled By: SplitInfinity fbshipit-source-id: 6404513f413fc6966687d8f1e9ea2a423a332ec9	2021-04-26 20:00:07 -07:00
Guilherme Leobas	e7c79cb158	Add type annotations to nnapi (#48142 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/48141 ~Mypy is complaining about a missing arg in a function call.~ ```bash torch/backends/_nnapi/serializer.py:806: error: Too few arguments for "_do_add_binary" [call-arg] Found 1 error in 1 file (checked 1140 source files) ``` `9392137dbe/torch/backends/_nnapi/serializer.py (L804-L806)` ~dreiss, would you mind take a look when you have some cycles to spare and see what would be the appropriated value for `fuse_code` here? Thanks :)~ Edit: https://github.com/pytorch/pytorch/issues/48925 got merged a couple of days ago. The blocking part is now unblocked, and I just pushed the changes to make mypy happy again. This PR is ready for review. Pull Request resolved: https://github.com/pytorch/pytorch/pull/48142 Reviewed By: ezyang Differential Revision: D28006249 Pulled By: walterddr fbshipit-source-id: 5e43eeba7143512a549efaad31541f86718add7c	2021-04-26 19:08:07 -07:00

1 2 3 4 5 ...

36005 commits