Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51432
ghstack-source-id: 120976584
torchbind is a convenient way to include custom class to both python and torchscript. CREATE_OBJECT is used to create an object of custom class.
CREATE_OBJECT was not supported by lite interpreter. The major reason was that for custom class directly defined in Python, there's no language parser in lite interpreter. It's still the case. However, for torchbind classes that are defined in C++, a python/torchscript parser is not needed.
This diff is to support the case of torchbind custom classes.
1. The class type can be resolved at import level.
2. If the class is not the supported torchbind class, an error message is provided at export stage. Workaround is also suggested.
3. Unit tests. C++: ```LiteInterpreterTest::BuiltinClass``` is added as an end-to-end test on supported class. Python: ```test_unsupported_createobject``` is changed to ```test_unsupported_classtype``` to test unsupported classes.
Test Plan: CI
Reviewed By: raziel
Differential Revision: D26168913
fbshipit-source-id: 74e8b6a12682ad8e9c39afdfd2b605c5f8e65427
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51594
ExternalCall nodes represent opaque calls to external functions to fill a
tensor (buffer) with values. It could be used to include nodes that are
otherwise not-representable as TE, or whose TE representation is currently too
slow.
To make an external function available in NNC as ExternalCall, one needs to
implement a "bridge" function that would take raw (void*) pointers to the data
along with the arrays containing dimension info. This function would then
internally call the desired external function and make sure the results of the
call are correctly placed in the provided raw data buffers.
The reason the PR was previously reverted was that the LLVM generated
calls to bridge functions were breaking unwind tables. This is now fixed
by requiring bridge functions to never throw and setting the
corresponding attribute in the LLVM generated code.
Differential Revision: D26213882
Test Plan: Imported from OSS
Reviewed By: pbelevich, ngimel
Pulled By: ZolotukhinM
fbshipit-source-id: db954d8338e2d750c2bf0a41e88e38bd494f2945
Summary:
Improve the flops computation formula of aten::conv2d operator to support stride, pad, dilation, and groups arguments.
This diff also fixes the following issues:
- Apply a factor of 2 to aten::mm because output accounts for multiplication and addition.
- Fix incorrect names of scalar operators to aten::mul and aten::add.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51377
Test Plan:
```python
python test/test_profiler.py
```
Reviewed By: jspark1105
Differential Revision: D26165223
Pulled By: xuzhao9
fbshipit-source-id: 2c5f0155c47af2e6a19332fd6ed73ace47fa072a
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51475
ExternalCall nodes represent opaque calls to external functions to fill a
tensor (buffer) with values. It could be used to include nodes that are
otherwise not-representable as TE, or whose TE representation is currently too
slow.
To make an external function available in NNC as ExternalCall, one needs to
implement a "bridge" function that would take raw (void*) pointers to the data
along with the arrays containing dimension info. This function would then
internally call the desired external function and make sure the results of the
call are correctly placed in the provided raw data buffers.
Test Plan: Imported from OSS
Reviewed By: pbelevich, Chillee
Differential Revision: D26179083
Pulled By: ZolotukhinM
fbshipit-source-id: 9e44de098ae94d25772cf5e2659d539fa6f3f659
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50229
`fastmod -m 'cast(<((at|c10)::)?\w+Type>\(\)\s*)->' 'castRaw${1}->'` Presuming it builds, this is a safe change: the
result of `cast()` wasn't being saved anywhere, so we didn't need
it, so we can use a raw pointer instead of a new `shared_ptr`.
ghstack-source-id: 120769170
Test Plan: CI
Reviewed By: SplitInfinity
Differential Revision: D25837494
fbshipit-source-id: 46319100dc0dfc78f6d2b45148207f83481f2ada
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863
Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).
Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation
buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg
Reviewed By: iseeyuan
Differential Revision: D25896212
fbshipit-source-id: 6d7e7fd5f3244a88bd44889024d81ad2e678ffa5
Summary:
This is the initial skeleton for C++ codegen, it includes generations for Allocate and Free.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/51070
Test Plan: New unit tests are added to `test_cpp_codegen.cpp`.
Reviewed By: ZolotukhinM
Differential Revision: D26061818
Pulled By: cheng-chang
fbshipit-source-id: b5256b2dcee6b2583ba73b6c9684994dbe7cdc1f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50995
This change makes 'Tensor' a thin wrapper over 'Buf' and 'Stmt', and
merges it with recently introduced 'CompoundTensor'. A statement for the
tensor is either passed directly to the Tensor constructor (akin to
'CompoundTensor'), or is built immediately in constructor.
LoopNest is no longer responsible for constructing statements from
tensors - it simply stitches already constructed statements contained in
Tensors. This has a side effect that now we cannot construct several
loopnests from the same tensors - we need to explicitly clone statements
if we want to do that. A special copy constructor was added to LoopNest
to make it more convenient (note: this only affects tests, we don't
usually create multiple loopnests in other places).
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D26038223
Pulled By: ZolotukhinM
fbshipit-source-id: 27a2e5900437cfb0c151e8f89815edec53608e17
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50993
This is the first step to make 'Tensor` a thin wrapper over 'Buf' and
'Stmt', which will be finished in subsequent PRs. This change also
allows to remove 'buf_initializers_' from 'LoopNest', making it "less
stateful".
Test Plan: Imported from OSS
Reviewed By: bertmaher
Differential Revision: D26038224
Pulled By: ZolotukhinM
fbshipit-source-id: f418816e54c62f291fa45812901487394e9b95b5
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/44859
TensorPipe's `set_device_map` option was applied during the forward
pass. However, if we ran the backward pass for the graph we would not
automatically pick up the reverse device mapping.
As a result, users had to specify both forward and backward device mapping
which is very tedious to do.
In this PR, I've added this functionality such that TensorPipe automatically
picks up the reverse device mapping during the backward pass. This is done by
storing the appropriate device mapping in the "recv" autograd function for
distributed autograd.
#Closes: https://github.com/pytorch/pytorch/issues/44170
ghstack-source-id: 119950842
Test Plan:
1) waitforbuildbot
2) Unit test added.
Reviewed By: mrshenli
Differential Revision: D23751975
fbshipit-source-id: 2717d0ef5bde3db029a6172d98aad95734d52140
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50321
Quantization team reported that when there are two empty tensors are replicated among ranks, the two empty tensors start to share storage after resizing.
The root cause is unflatten_dense_tensor unflattened the empty tensor as view of flat tensor and thus share storage with other tensors.
This PR is trying to avoid unflatten the empty tensor as view of flat tensor so that empty tensor will not share storage with other tensors.
Test Plan: unit test
Reviewed By: pritamdamania87
Differential Revision: D25859503
fbshipit-source-id: 5b760b31af6ed2b66bb22954cba8d1514f389cca
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50649
**Summary**
Tutorials, documentation and comments are not consistent with the file
extension they use for JIT archives. This commit modifies certain
instances of `*.pth` in `torch.jit.save` calls with `*.pt`.
**Test Plan**
Continuous integration.
**Fixes**
This commit fixes#49660.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D25961628
Pulled By: SplitInfinity
fbshipit-source-id: a40c97954adc7c255569fcec1f389aa78f026d47
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50194
**Summary**
`ClassType::repr_str()` prints out only the name of a `ClassType`, which
is not always enough to disambiguate it. In some situations, two
`ClassTypes` are compared and do not match despite having identical
names because they are in separate compilation units. In such cases, the
error message can seem nonsensical (e.g. `expected type T but found type
T`). This commit modifies `ClassType::repr_str()` so that it prints out
the address of the type's compilation unit to make these messages less
puzzling (e.g. `expected type T (0x239023) but found type T (0x230223)`).
**Test Plan**
This commit adds a unit test, `ClassTypeTest.IdenticalTypesDifferentCus`
that reproduces this situation.
**Fixes**
This commit fixes#46212.
Test Plan: Imported from OSS
Reviewed By: tugsbayasgalan
Differential Revision: D25933082
Pulled By: SplitInfinity
fbshipit-source-id: ec71b6728be816edd6a9c2b2d5075ead98d8bc88
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50564
When an RPC was sent, the associated future was stored in two maps:
pendingResponseMessage_ and timeoutMap_. Once the response was received, the
entry was only removed from pendingResponseMessage_ and not timeoutMap_. The
pollTimedoudRpcs method then eventually removed the entry from timeoutMap_
after the time out duration had passed.
Although, in scenarios where there is a large timeout and a large number of
RPCs being used, it is very easy for the timeoutMap_ to grow without any
bounds. This was discovered in https://github.com/pytorch/pytorch/issues/50522.
To fix this issue, I've added some code to cleanup timeoutMap_ as well once we
receive a response.
ghstack-source-id: 119925182
Test Plan:
1) Unit test added.
2) Tested with repro in https://github.com/pytorch/pytorch/issues/50522
#Closes: https://github.com/pytorch/pytorch/issues/50522
Reviewed By: mrshenli
Differential Revision: D25919650
fbshipit-source-id: a0a42647e706d598fce2ca2c92963e540b9d9dbb
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/50228
`fastmod -m 'expect(<((at|c10)::)?\w+Type>\(\)\s*)->'
'expectRef${1}.'`
Presuming it builds, this is a safe change: the result of `expect()`
wasn't being saved anywhere, so we didn't need it, so we can take a
reference instead of a new `shared_ptr`.
ghstack-source-id: 119782961
Test Plan: CI
Reviewed By: SplitInfinity
Differential Revision: D25837374
fbshipit-source-id: 86757b70b1520e3dbaa141001e7976400cdd3b08
Summary:
This adds guarding for DifferentiableGraph nodes in order to not depend on
Also bailing out on required gradients for the CUDA fuser.
Fixes https://github.com/pytorch/pytorch/issues/49299
I still need to look into a handful of failing tests, but maybe it can be a discussion basis.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49433
Reviewed By: ngimel
Differential Revision: D25681374
Pulled By: Krovatkin
fbshipit-source-id: 8e7be53a335c845560436c0cceeb5e154c9cf296
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49348
This is a redux of #45666 post refactor, based off of
d534f7d4c5
Credit goes to peterbell10 for the implementation.
Fixes#43945.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Test Plan: Imported from OSS
Reviewed By: smessmer
Differential Revision: D25594004
Pulled By: ezyang
fbshipit-source-id: c8eb876bb3348308d6dc8ba7bf091a2a3389450f
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49138
See for details: https://fb.quip.com/QRtJAin66lPN
We need to model optional types explicitly, mostly for schema inference. So we cannot pass a `Tensor?[]` as `ArrayRef<Tensor>`, instead we need to pass it as an optional type. This PR changes it to `torch::List<c10::optional<Tensor>>`. It also makes the ops c10-full that were blocked by this.
## Backwards Compatibility
- This should not break the Python API because the representation in Python is the same and python_arg_parser just transforms the python list into a `List<optional<Tensor>>` instead of into a `List<Tensor>`.
- This should not break serialized models because there's some logic that allows loading a serialized `List<Tensor>` as `List<optional<Tensor>>`, see https://github.com/pytorch/pytorch/pull/49138/files#diff-9315f5dd045f47114c677174dcaa2f982721233eee1aa19068a42ff3ef775315R57
- This will break backwards compatibility for the C++ API. There is no implicit conversion from `ArrayRef<Tensor>` (which was the old argument type) to `List<optional<Tensor>>`. One common call pattern is `tensor.index({indices_tensor})`, where indices_tensor is another `Tensor`, and that will continue working because the `{}` initializer_list constructor for `List<optional<Tensor>>` can take `Tensor` elements that are implicitly converted to `optional<Tensor>`, but another common call pattern was `tensor.index(indices_tensor)`, where previously, the `Tensor` got implicitly converted to an `ArrayRef<Tensor>`, and to implicitly convert `Tensor -> optional<Tensor> -> List<optional<Tensor>>` would be two implicit conversions. C++ doesn't allow chaining. two implicit conversions. So those call sites have to be rewritten to `tensor.index({indices_tensor})`.
ghstack-source-id: 119269131
Test Plan:
## Benchmarks (C++ instruction counts):
### Forward
#### Script
```py
from torch.utils.benchmark import Timer
counts = Timer(
stmt="""
auto t = {{op call to measure}};
""",
setup="""
using namespace torch::indexing;
auto x = torch::ones({4, 4, 4});
""",
language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
#### Results
| Op call |before |after |delta | |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x[0] = 1 |11566015 |11566015|0 |0.00% |
|x.index({0}) |6807019 |6801019 |-6000 |-0.09%|
|x.index({0, 0}) |13529019 |13557019|28000 |0.21% |
|x.index({0, 0, 0}) |10677004 |10692004|15000 |0.14% |
|x.index({"..."}) |5512015 |5506015 |-6000 |-0.11%|
|x.index({Slice(None, None, None)}) |6866016 |6936016 |70000 |1.02% |
|x.index({None}) |8554015 |8548015 |-6000 |-0.07%|
|x.index({false}) |22400000 |22744000|344000 |1.54% |
|x.index({true}) |27624088 |27264393|-359695|-1.30%|
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})})|123472000|123463306|-8694|-0.01%|
### Autograd
#### Script
```py
from torch.utils.benchmark import Timer
counts = Timer(
stmt="""
auto t = {{op call to measure}};
""",
setup="""
using namespace torch::indexing;
auto x = torch::ones({4, 4, 4}, torch::requires_grad());
""",
language="cpp",
).collect_callgrind(number=1_000)
print(counts)
```
Note: the script measures the **forward** path of an op call with autograd enabled (i.e. calls into VariableType). It does not measure the backward path.
#### Results
| Op call |before |after |delta | |
|------------------------------------------------------------------------|---------|--------|-------|------|
|x.index({0}) |14839019|14833019|-6000| 0.00% |
|x.index({0, 0}) |28342019|28370019|28000| 0.00% |
|x.index({0, 0, 0}) |24434004|24449004|15000| 0.00% |
|x.index({"..."}) |12773015|12767015|-6000| 0.00% |
|x.index({Slice(None, None, None)}) |14837016|14907016|70000| 0.47% |
|x.index({None}) |15926015|15920015|-6000| 0.00% |
|x.index({false}) |36958000|37477000|519000| 1.40% |
|x.index({true}) |41971408|42426094|454686| 1.08% |
|x.index({"...", 0, true, Slice(1, None, 2), torch::tensor({1, 2})}) |168184392|164545682|-3638710| -2.16% |
Reviewed By: bhosmer
Differential Revision: D25454632
fbshipit-source-id: 28ab0cffbbdbdff1c40b4130ca62ee72f981b76d
Summary:
=======
This PR addresses the following:
* Adds JIT support for CUDA Streams
* Adds JIT support for CUDA Events
* Adds JIT support for CUDA Stream context manager
Testing:
======
python test/test_jit.py -v TestCUDA
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48020
Reviewed By: navahgar
Differential Revision: D25725749
Pulled By: nikithamalgifb
fbshipit-source-id: b0addeb49630f8f0c430ed7badeca43bb9d2535c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49697
Mostly mechanical move. This refactoring helps to hide unnecessary
details from the SimpleIREval interface and make it more similar to a
pure 'codegen'.
Test Plan: Imported from OSS
Reviewed By: nickgg
Differential Revision: D25668696
Pulled By: ZolotukhinM
fbshipit-source-id: 423247bfcdfa88403e8ec92152f00110bb9da19c
Summary:
GCD should always return positive integers. When negative values are used, we hit a corner case that results in an infinite recursion during simplification.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49379
Reviewed By: ezyang
Differential Revision: D25597115
Pulled By: navahgar
fbshipit-source-id: b0e8ac07ee50a5eb775c032628d4840df7424927
Summary:
Makes two changes in NNC for intermediate buffer allocations:
1. Flattens dimensions of buffers allocated in LoopNest::prepareForCodegen() to match their flattened usages.
2. Adds support for tracking memory dependencies of Alloc/Free to the MemDependencyChecker, which will allow us to check safety of accesses to intermediate buffers (coming in a future diff).
I didn't add any new tests as the mem dependency checker tests already cover it pretty well, particularly the GEMM test.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49554
Reviewed By: VitalyFedyunin
Differential Revision: D25643133
Pulled By: nickgg
fbshipit-source-id: 66be3054eb36f0a4279d0c36562e63aa2dae371c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49385
Currently, the API to export operator lists accepts a `torch::jit::Module` object, and spits out an operator list. The operator list is practically used only for mobile. This is not ideal because the set of root operators may change by the time the model is subsequently optmized and exported for mobile.
What we need to to instead is glean the list of operators from the mobile model itself (`bytecode.pkl` specifically), and expose that instead.
Also updated the logic in `converter`.
### Before this change:
1. Get operator List from Torch Script Model
2. Convert to bytecode mobile model
### After this change:
1. Convert to bytecode mobile model
2. Use this converted mobile model to get the list of operators for each method on the model
ghstack-source-id: 118796752
Test Plan:
Added a unit test in `test_lite_interpreter.cpp` to ensure that all model referenced operators show up in the exported operator list. Also make `test_lite_interpreter.cpp` runnable from `xplat/caffe2/BUCK` since this is where the production code will be built from.
Verified that the list of operators produced before and after this change for an example model (segmentation) are the same.
{P147863234}
Also verified that the operator lists for BI-Xray model is different (we have been having problems with missing operators for this one): {P154903132}
Reviewed By: iseeyuan
Differential Revision: D24690094
fbshipit-source-id: 0426a6ef90456a811010cfe337c415882ae2deff
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49357
This is a follow-up fix for PR #48679, where the previous PR
adds support for integer inputs to aten::abs by promoting integers to
float and then demote the result back to integers. This PR supports
integer inputs to aten::abs more efficiently in the SimpleIREvaluator
by allowing implementing integer inputs for kAbs (renamed from kFabs).
- Rename kFabs to kAbs
- Add support for integer input to kAbs in SimpleIREvalator (note that:
llvm_codegen and cuda_codegen already supports integer inputs to kAbs)
Test Plan:
- `PYTORCH_TENSOREXPR_DONT_USE_LLVM=1 python test/test_jit_fuser_te.py
TestTEFuser.test_unary_ops`
- `python test/test_jit_fuser_te.py TestTEFuser.test_unary_ops`
Imported from OSS
Reviewed By: eellison
Differential Revision: D25545791
fbshipit-source-id: e52f51a352d149f66ce8341fb3beb479be08a230
Summary:
Clear static variable at the end of the test to ensure test passes after re-runs
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49581
Test Plan:
`./bin/test_api "--gtest_filter=CustomAutogradTest.ReentrantPriority" --gtest_repeat=50`
Before the change all subsequent runs of the test failed with
```
../test/cpp/api/autograd.cpp:681: Failure
Expected equality of these values:
order.size()
Which is: 310
10
```
Reviewed By: mrshenli
Differential Revision: D25632374
Pulled By: malfet
fbshipit-source-id: 4814d22b5dff15e1b38a0187e51070771fd58370
Summary:
This is a fast log implementations
benchmark:
```
buck run mode/opt //caffe2/benchmarks/cpp/tensorexpr:tensorexpr_bench -c 'fbcode.caffe2_gpu_type=none'
```
Test Plan: buck test mode/no-gpu //caffe2/test/cpp/tensorexpr:tensorexpr -- *.fastLogFloat
Reviewed By: bertmaher
Differential Revision: D25445815
fbshipit-source-id: 20696eacd12a55e797f606f4a6dbbd94c9652888
Summary:
In https://github.com/pytorch/pytorch/pull/48967/ we enabled output buffer inlining, which results in duplicate computation if one output depends on another. This was done to fix correctness for CUDA, but is not needed for correctness for CPU and results in perf slowdown.
The output buffer inlining solution for CUDA is intended to be an interim solution because it does not work with reductions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49488
Reviewed By: ezyang
Differential Revision: D25596071
Pulled By: eellison
fbshipit-source-id: bc3d987645da5ce3c603b4abac3586b169656cfd
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48863
Support default arguments when invoking a module via PyTorch Lite (`mobile::Module`).
Test Plan:
buck test mode/dbg //caffe2/test/cpp/jit:jit -- LiteInterpreterTest.MethodInvocation
buck test mode/dbg caffe2/test:mobile -- test_method_calls_with_optional_arg
Reviewed By: raziel, iseeyuan
Differential Revision: D25152559
fbshipit-source-id: bbf52f1fbdbfbc6f8fa8b65ab524b1cd4648f9c0
Summary:
Fixes https://github.com/pytorch/pytorch/issues/46213
I didn't yet update the documentation, will add those change soon. A few other things that I didn't do, but want to clarify if I maybe should.
1. I didn't expose projections in c++ API: torch/csrc/api/src/nn/modules/rnn.cpp. Let me know if this is desirable and I will add those changes.
2. I didn't expose projections in "lstm_cell" function and "_thnn_differentiable_lstm_cell_backward" functions from aten/src/ATen/native/RNN.cpp. As far as I understand, they are not needed for nn.LSTM CPU execution. For lstm_cell, projections don't bring any real benefit, since if cell is used separately, it can be easily added in Python. For "_thnn_differentiable_lstm_cell_backward", I'm actually not sure where exactly that function is used, so I also disabled projections there for now. Please let me know if I should change that.
3. I added check that projections are not supported for quantized LSTMs to quantized_lstm_<data/input> functions. But I didn't add any checks to LSTMCell code. It seems that since I disabled projections in "lstm_cell" function, they should also not be available for quantized models through any other API than quantized_lstm_<data/input>. Please let me know if I'm not correct and I will add checks to other places.
4. Projections are not supported for CuDNN versions < 7.1.2. Should I add the check for CuDNN version and disable projections in that case? If so, what will be the best way to do that?
5. Currently I added projection weight as the last weight, so the layout is "w_ih, w_hh, b_ih, b_hh, w_hr". This breaks the assumption that biases come after weights and thus I had to add additional if-s in various places. Alternative way would be to have "w_ih, w_hh, w_hr, b_ih, b_hh" layout, in which case the assumption will be true. But in that case I will need to split the loop in get_parameters function from aten/src/ATen/native/cudnn/RNN.cpp. And in some cases, I will still need to add an "undefined" tensor in the 3rd position, because we get all 5 weights from CuDNN most of the time. So I'm not sure which way is better. Let me know if you think I should change to the weights-then-biases layout.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/47725
Reviewed By: zou3519
Differential Revision: D25449794
Pulled By: ngimel
fbshipit-source-id: fe6ce59e481d1f5fd861a8ff7fa13d1affcedb0c
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/49408
Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
ghstack-source-id: 118665808
Test Plan:
Wait for GitHub CI since we had C++14-specific issues with
this one in previous PR https://github.com/pytorch/pytorch/pull/48629
Reviewed By: malfet
Differential Revision: D25563207
fbshipit-source-id: 6a2831205917d465f8248ca37429ba2428d5626d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/48629
Nearly every non-test callsite doesn't need to capture any variables anyway, and this saves 48 bytes per callback.
ghstack-source-id: 118568240
Test Plan: CI
Reviewed By: dhruvbird
Differential Revision: D25135415
fbshipit-source-id: 5e92dc79da6473ed15d1e381a21ed315879168f3