Summary:
This PR enables autodiff to use the forward/backward graph compiled from python code, instead of using symbolic gradients(modifying the original graph directly).
We put the map in a separate .h file for now to wait for the native_functions.yaml and derivatives.yaml merge. This should ideally go into native_functions.yaml eventually.
This PR should be enough to unblock us for now, we can start writing gradients for aten functions in python.
Differential Revision: D13494635
Pulled By: ailzhang
fbshipit-source-id: f8d51a15243ac46afd09d930c573ccdfcd9fdaaf
Summary:
Modified step_lr for StepLR, MultiStepLR, ExponentialLR and CosineAnnealingLR. In this way, multiple schedulers can be used simultaneously to modify the learning rates.
Related issue: https://github.com/pytorch/pytorch/issues/13022
Added unit tests combining multiple schedulers.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14010
Reviewed By: ezyang
Differential Revision: D13494941
Pulled By: chandlerzuo
fbshipit-source-id: 7561270245639ba1f2c00748f8e4a5f7dec7160c
Summary:
Changelog:
- change some expect tests that didn't have to be expect tests,
instead use self.assertAllFused
- Some of the fuser tests weren't using self.assertAllFused.
- Minor test renames
cc apaszke
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15134
Differential Revision: D13507481
Pulled By: zou3519
fbshipit-source-id: dd0788530a60bb5ed2f42b961fae3db2b4404b64
Summary:
max and reducemax are smashed together, we need to support one input case.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15241
Reviewed By: yinghai
Differential Revision: D13473312
Pulled By: houseroad
fbshipit-source-id: 9b8c847286a2631b006ca900271bc0d26574101a
Summary:
This PR changes Method (just Method not all graphs) to always have a single
return argument.
This is part 1 in a set of changes that will enable us to have better handling if early return statements.
The simplification that this change provides greatly reduces the work for the next step.
This change makes it so that Method and Python handle multiple returns in the same way:
* 0 - None
* 1 - <single value>
* many - Tuple[...]
The result is that a lot of special-case handling in compiler.cpp and its
bindings can be removed. It also fixes several bugs in return handling,
including one where return values were not always checked against their
attributed values.
Notes:
* inferTypeFrom is renamed to be more accurate and discourage use.
* This has uncovered some bugs in other components, which are noted in
the diff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15289
Differential Revision: D13481649
Pulled By: zdevito
fbshipit-source-id: 0e2242a40bb28cca2d0e8be48bede96195e4858c
Summary:
There is still limitation on this: if a script module is somewhere
in the trace, the inputs/outputs can only be tensors or tuples of
tensors.
resolves#15052
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15184
Differential Revision: D13457691
Pulled By: highker
fbshipit-source-id: 8fe46afc41357a0eb8eadd83f687b31d074deb0e
Summary:
…on](#12115)
mean is calculated in two step sum()/numel(). For half precision, data gets
casted back to half after sum().
We fused the division into the reduction kernel by adding pre_op/post_op.
This allows us to do torch.ones(65536).cuda().half().mean() to return correct
result.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14878
Differential Revision: D13491159
Pulled By: soumith
fbshipit-source-id: e83802e1628b6d2615c45e18d7acf991d143a09e
Summary:
Fixes an issue that arose from https://github.com/pytorch/pytorch/pull/13481 where `.shared_memory()` couldn't be called. Effectively undoes all changes to `nn.Module` from that PR and solve the relevant problem in a different way (the goal was to be able to call `._apply()` on the Python wrapper for a C++ module).
soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15305
Differential Revision: D13493937
Pulled By: goldsborough
fbshipit-source-id: 4cb8687f90fc8709a536c5e7eacd0dc8edf6f750
Summary:
Addresses #918, interpolation results should be similar to tf
* Adds bicubic interpolation operator to `nn.functional.interpolate`
* Corresponding test in `test_nn.py`
The operator is added in legacy `TH` to be aligned with the other upsampling operators; they can be refactored/moved to ATen all at once when #10482 is resolved
Pull Request resolved: https://github.com/pytorch/pytorch/pull/9849
Differential Revision: D9007525
Pulled By: driazati
fbshipit-source-id: 93ef49a34ce4e5ffd4bda94cd9a6ddc939f0a4cc
Summary:
This PR add isinstance to do static type checking in JIT.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15076
Differential Revision: D13471067
Pulled By: wanchaol
fbshipit-source-id: d39b7ed5db9fcca4b503659d02cf7795950ea8ea
Summary:
`torch.expand` and `torch.ne` are used often in models and this PR adds ONNX export support for them. ArmenAg has created issue https://github.com/pytorch/pytorch/issues/10882 for this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15050
Differential Revision: D13453036
Pulled By: houseroad
fbshipit-source-id: 4724b4ffcebda6cd6b2acac51d6733cb27318daf
Summary:
`rsplit` doesn't have kwargs in Python 2 so this line raises an error
Fixes#15135
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12732
Differential Revision: D10458630
Pulled By: driazati
fbshipit-source-id: a63e42fbc0e39e4291480775b516c98122ec05a1
Summary:
This makes DCE more granular by tracking live values/aliases through the graph (rather than just nodes). So we can be more aggressive in DCE around control flow blocks. For example, in:
```
%a0 = aten::foo()
%b = aten::foo()
%a2, %b2 = prim::If(%cond) {
block0() {
%a1 = aten::foo(%.0)
%b1 = aten::foo(%b)
} -> (%a1, %b1)
}
return (%a2)
```
we will now dce all the `%b` stuff.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14910
Differential Revision: D13476445
Pulled By: suo
fbshipit-source-id: 2bf5db19711c07dde946697a4f4b270bd8baf791
Summary:
We need this, for example, to properly call `_unpack` when we have a traced module in the hierarchy
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15101
Differential Revision: D13468467
Pulled By: jamesr66a
fbshipit-source-id: c2b6740b12cde6e23395d12e42d4fc2c4c7ca3f2
Summary:
tests work on ROCm 1.9.2 as present on CI (fp16 bringup, hipMemset and sparse improvements)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15232
Differential Revision: D13470991
Pulled By: bddppq
fbshipit-source-id: 45acc4f9ea5baaaf7672b86eb022948055779925
Summary:
Methods like `module.named_modules()` returns a container of `shared_ptr<nn::Module>`. Currently the `nn::Module` base class does not have Python bindings. This PR fixes this, and adds more unit tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15193
Differential Revision: D13458713
Pulled By: goldsborough
fbshipit-source-id: 4091fe1b96a1be8db14c6a4307fbacc2b41ff6fe
Summary:
Adding support for torch.tensor in script.
The input list is typed as t[], because it can be arbitrarily nested. I added a check a compile time check that the inner type of the list is a bool, float, or int.
Also adds specialization for Boolean Lists, which already existed at the ivalue level but had not been added to the compiler yet
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14913
Differential Revision: D13407930
Pulled By: eellison
fbshipit-source-id: d17f1195a22149d5b0d08d76c89a7fab8444f7c5
Summary:
This PR fixes around 250 places in the codebase where we were making unnecessary copies of objects (some large, some small).
ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15026
Differential Revision: D13458784
Pulled By: goldsborough
fbshipit-source-id: be5148b2ce09493588d70952e6f6d6ff5ec5199b
Summary:
This PR removes the usage of _finfo defined in torch.distributions.utils and changes the call sites
to use torch.finfo instead
Differential Revision: D13451936
Pulled By: soumith
fbshipit-source-id: 6dbda3a6179d9407bc3396bf1a2baf3e85bc4cf2
Summary:
This PR enables C++ frontend modules to be bound into Python and added as submodules of Python modules. For this, I added lots of pybind11 bindings for the `torch::nn::Module` class, and modified the `torch.nn.Module` class in Python to have a new Metaclass that makes `isinstance(m, torch.nn.Module)` return true when `m` is a C++ frontend module. The methods and fields of C++ modules are bound in such a way that they work seamlessly as submodules of Python modules for most operations (one exception I know of: calling `.to()` ends up calling `.apply()` on each submodule with a Python lambda, which cannot be used in C++ -- this may require small changes on Python side).
I've added quite a bunch of tests to verify the bindings and equality with Python. I think I should also try out adding a C++ module as part of some large PyTorch module, like a WLM or something, and see if everything works smoothly.
The next step for inter-op across our system is ScriptModule <-> C++ Frontend Module inter-op. I think this will then also allow using C++ frontend modules from TorchScript.
apaszke zdevito
CC dzhulgakov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/13481
Differential Revision: D12981996
Pulled By: goldsborough
fbshipit-source-id: 147370d3596ebb0e94c82cec92993a148fee50a7
Summary:
Before this PR, loop unrolling + the graph fuser was creating multiple
FusionGroups with the same bodies (with different variable names) for
JIT LSTMs. Each FusionGroup got registered to a separate fusion key;
each key resulted in a different compilation for the same
specializations.
This PR makes it so that when registering FusionGroups with the fusion
compiler, the compiler first checks the KernelSpec cache to see if the
FusionGroup's graph exists already. If it does, then return the
corresponding KernelSpec's key to share compiled kernels.
In addition, graphs in the KernelSpec cache are canonicalized before
being cached. I added a flag to the canonicalize pass to remove unique
names of values.
This shortens the compile time for a JIT LSTM (seq_len of 100, loop
unroll factor of 8) from 5.3s to 2.3s. Most of this compile time is
running the graph fuser and/or fusion compiler; while this PR
makes it so that there is only one unique kernel in the forward pass,
there are a lot of different kernels (6) in the backward pass
(after loop unrolling) that should be investigated.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14541
Differential Revision: D13324487
Pulled By: zou3519
fbshipit-source-id: b841d82ed35a959b5cfc72db033bf5a7b42cc4fb
Summary:
Certain tensor shapes failed when being resized. This pull request addresses the bug found in #13404.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14874
Differential Revision: D13429788
Pulled By: soumith
fbshipit-source-id: 8aa6451dbadce46d6d1c47a01cb26e6559bcfc8c
Summary:
* relax MIOpen if statement to allow fp16/fp32 mixed precision training now supported by ROCm 1.9.2
* use gemm_ex API of rocBLAS in ROCm 1.9.2 instead of the previous hgemm API
* with this: enable all but one half test in test_nn
While there, fix also:
* a group convolution issue w/ MIOpen pertaining to initializing MIOpen on multi-GPU systems properly we detected while working on this
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14994
Differential Revision: D13439869
Pulled By: bddppq
fbshipit-source-id: 75e4eb51a59488882e64b5eabdc30555b25be25e
Summary:
This is an optimized implementation that does the following:
1. created an empty Tensor of correct size.
2. fill the Tensor with correct values.
The following three designs to fill in the Tensor result in roughly the same performance. Hence, the 2nd option is taken for simpler code, and to return contiguous tensors.
1. Sequential: fill row coordinates first, then columns. This results in two for-loop and more arithmetic operations.
2. Interleaved: fill in index coordinates one by one, which jumps between the two output Tensor rows in every iteration.
3. Transpose: create a n X 2 Tensor, fill the Tensor sequentially, and then transpose it.
<img width="352" alt="screen shot 2018-12-10 at 3 54 39 pm" src="https://user-images.githubusercontent.com/16999635/49769172-07bd3580-fc94-11e8-8164-41839185e9f9.png">
NOTE:
This implementation returns a 2D tensor, instead of a tuple of two tensors. It means that users will not be able to do the following:
```python
x = torch.ones(3, 3)
i = torch.tril_indices(3, 3)
x[i] # need to first convert the 2D tensor into a tuple of two 1D tensors.
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/14904
Reviewed By: zou3519
Differential Revision: D13433027
Pulled By: mrshenli
fbshipit-source-id: 41c876aafcf584832d7069f7c5929ffb59e0ae6a
Summary:
Fixes#15119. Before this PR, we were propagating constants through
aten::warn AND running it as a part of shape analysis.
This caused aten::warn to be run regardless of if it is
supposed to be run dynamically. This PR adds an exclusion for aten::warn
in constant propagation and shape analysis, similar to that of prim::RaiseException.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15124
Differential Revision: D13432815
Pulled By: zou3519
fbshipit-source-id: 15ab533ce2accb2da3fd4e569070c7979ce61708
Summary:
Fixes#15038.
aten::_cast_Float(tensor, non_blocking) support was added in #14336.
Its second argument is a bool, but because we don't support generating values
of type bool in the fuser codegen, the codegen errored out.
aten::_cast_Float in the fuser never actually uses its non_blocking
argument, so another way to fix this would be to have a special op for a
fused cast but I thought that we might have fusible ops that do take
bool arguments in the future so this would be good to have.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15057
Differential Revision: D13432091
Pulled By: zou3519
fbshipit-source-id: 455fe574f5f080aca9a112e346b841a2534a8dc3
Summary:
While moving these scenarios into `_test_dim_ops` I accidentally left an empty loop in the actual tests, causing them to do nothing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15077
Differential Revision: D13428759
Pulled By: umanwizard
fbshipit-source-id: 08f53068981d9192c1408878b168e9053f4dc92e
Summary:
Fixes a bug where (de-)/serializing a hierarchy of submodules where one submodule doesn't have any parameters, but its submodules do, doesn't get properly loaded. This had to do with the fact that the old protobuf format couldn't store empty parameters.
Fixes https://github.com/pytorch/pytorch/issues/14891
soumith ezyang ebetica
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15033
Differential Revision: D13411322
Pulled By: goldsborough
fbshipit-source-id: 2ef73b2aa93fa9e46b1cbe1fd47d9f134d6016d5
Summary:
Removing the deprecated functions in `torch/csrc/variable_tensor_functions.h` (like `torch::CPU`) and corresponding implementations from `torch/csrc/torch.cpp` from master after the release.
ezyang gchanan soumith
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15003
Differential Revision: D13418086
Pulled By: goldsborough
fbshipit-source-id: a0accdf6f7b0efa1ec07ac7b74b86ff2da37543f
Summary:
This PR creates TestFuser inside test_jit.py to be a home for graph fuser
specific tests.
This was a useful exercise because now that all the fuser tests are in
one place, I can spot redundant and bitrotting tests for cleanup in a
future PR.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/15072
Differential Revision: D13421458
Pulled By: zou3519
fbshipit-source-id: 80b1a7712feff75a0c186d1664601c4edbbca694
Summary: Removes all warnings spew for the TestJitGenerated tests
Differential Revision: D13420919
fbshipit-source-id: f251c12f923088ccc5daa2984c15003a67cbd1c1