A longstanding confusion in the implementation of fake tensor and proxy tensor is what to do about torch.ops.aten.sym_sizes and related calls. In particular, when you have a tensor that (1) has symbolic shapes and (2) has a `__torch_dispatch__` call, previously, you would always get `__torch_dispatch__` calls for sizes/strides query, *even if you didn't request it* via the dispatch kwargs in `make_wrapper_subclass`.
The reason for this is because we were previously mixing several concepts: "I want to dispatch to Python", "I want to call a virtual method" and "I have dynamic shapes". A single boolean variable controlled all of these things, and so it was not possible to understand inside TensorImpl what the user had actually originally requested.
In this PR, we track each of these concepts individually so that we can preserve user intent. Then, we combine these into a single "policy" variable that controls whether or not we can use the fastpath or not. For the policy to trigger, we only need one of the exceptional cases to be true.
Billing of changes:
* Rename `set_sizes_strides_policy` to `set_custom_sizes_strides`; in general, you cannot DIRECTLY set policy; you have to indirectly set it by the public functions.
* Some helpers for sizes and strides, since it's more complicated (as it is an enum, rather than just bools as is the case for device and layout). `matches_python_custom` is used to test the Python dispatch user ask. `matches_policy` does the policy test (only used in the user facing functions.)
* I reorged the accessor methods so that they are more logical. This makes the diff bad, so I recommend reading the final code directly.
* The default custom implementations now more reliably call their default() implementations
* As bonus refactor, I devirtualized some functions that don't need to be virtual
* `set_sym_sizes_and_strides` is renamed to `set_sizes_and_strides` to make it easier to use in template contexts; it optionally takes a storage offset now so you can set all three values at the same time. If you use the SymInt overload but there are no symbolic integers, we give you a normal resize.
* This adds `sym_storage_offset` since we had that in the symbolic shapes branch and there's no reason not to put it in (and it reduces merge conflicts)
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84641
Approved by: https://github.com/wconstab
swolchok reported that non-tracing usage of Tensor we are wasting a lot
of time on is_symbolic() tests, e.g., when destructing SymInts. This
is a regression for no good reason because we don't actually ever
have SymInts in those cases. This PR moves the stored SymInts on
Tensor out of line, into a separate ExtraMeta struct, which is only
allocated when we make a Tensor store symbolic sizes/strides.
To avoid adding another word to TensorImpl, I take over the named tensor
metadata field. This makes named tensor require a double indirection
and use up more space, but it's OK since we're going to delete this
feature anyway soon.
I restore regular int64_t storage on Tensor. This entailed reverting
https://github.com/pytorch/pytorch/pull/82467 ; there are no other
substantive changes to SizesAndStrides so a close review is not
necessary.
I don't bother optimizes sizes and strides in ExtraMeta in the same
way stock tensor is optimized. I add a SymDimVector alias. I make
SymInt UNCHECKED constructor public as it is a useful optimization
in some situations when the int is known to be positive.
I thought about storing the SymInts on the Python object instead.
However, because we can allocate symbolic shape tensors directly
from C++, we cannot guarantee that there is a PyInterpreter for
a Tensor. So we do it this way instead; it's also faster since you
don't have to take out the GIL to do accesses.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84390
Approved by: https://github.com/swolchok, https://github.com/Krovatkin
I realized that we can deal with the dead vtable problem by...
introducing another indirection! The resulting code is worse
(you have to do one more dereference to get to the vtable), but
the reduction in boilerplate is, IMO, worth it.
I did this refactor because I'm about to add a lot more methods
to PyInterpreter to handle expunging SymInt from TensorImpl.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84388
Approved by: https://github.com/albanD
The bug is that:
(1) functionalization kernels internally call `at::empty_strided()` to construct meta tensors, and then call the meta tensor op
(2) This happens with the Python dispatch key already added to the TLS exclude set, so we expect these meta tensors never to enter python
(3) When calling detach() though, `TensorImpl::shallow_copy_and_detach()` will currently always call into python when a PythonMode is set. Instead, I updated it to check if the Python key is in the TLS exclude set first.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83701
Approved by: https://github.com/ezyang
# Summary
This is PR is pulling out all the changes from #81838 specific to properly creating nested_tensor views. I will update this comment with a design doc once that has been made. This should enable proper creation of NestedTensor views, two nested_tensors sharing the same buffer_ but with different NestedTensor meta data.
The function `create_nested_tensor_view` is a helper function for creating a new nested tensor whose storage aliases the base causing the underlying storage to be shared - and is therefore a view.
This function by itself is not differentiable and therefore autograd does not track its uses. If a nested tensor function implementation uses this helper in its implementation the aten_op must meet two requirements:
- The function must return a view of the input
- The function must be explicit and defines its backward
## Testing
A bug was found when creating a base tensor out of inference mode and then creating a view in inference mode. This test has been aded to this PR in order to show the effect of the change.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82658
Approved by: https://github.com/albanD
I noticed I was missing tensor creations with modes when I tried
to delete proxy tensor. This was the cause.
Hypothetically, all PyInterpreter calls could get this treatment.
But I think it only matters for detach; the rest do not return
Tensors and most modes will not be interested in them.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83372
Approved by: https://github.com/zou3519
Add `TensorImpl::sym_strides`, bind it to python with `torch.ops.aten.sym_strides`, and use it in `ProxyTensor` and `FakeTensor`.
Before, `ProxyTensor` was generating `ProxySymInt`'s for the sizes, but not for the strides. Internally we still represent strides with a `SymIntArrayRef` though, so I ran into some weird issues where sizes were showing up as `ProxySymInt`, but strides were `PySymInt`'s.
Differential Revision: [D38594558](https://our.internmc.facebook.com/intern/diff/D38594558)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81300
Approved by: https://github.com/ezyang
This PR relands sym_numel #82374 and fixes the ios build break in this commit : 8cbd0031c5
which was a type mismatch in an equality.
### Description
<!-- What did you change and why was it needed? -->
### Issue
<!-- Link to Issue ticket or RFP -->
### Testing
<!-- How did you test your change? -->
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82731
Approved by: https://github.com/malfet
From PR:
```
Note: [Fake Tensor Dispatch Keys]
In order to model the behavior of device-specific autocast
and autograd logic, we update the dispatch keys of FakeTensors
to reflect their fake device. This includes the BackendComponent
(DispatchKey::Meta -> DispatchKey::CUDA), and also the BackendComponent
related Autocast and Autograd keys. __torch__dispatch__ sits below
Autocast and Autograd, and is only invoked when we are at the
kernel for the BackendComponent. Then, we add Meta to the
thread-local dispatch include set to hit the meta kernel
instead of the kernel of the BackendComponent for the fake device.
```
Also adds the `conv1/2/3d.padding` operators to the Autocast rule set. Without that fix, the FakeTensor dtype would diverge.
See: https://github.com/pytorch/pytorch/issues/81608
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82449
Approved by: https://github.com/ezyang
This PR adds support for `SymInt`s in python. Namely,
* `THPVariable_size` now returns `sym_sizes()`
* python arg parser is modified to parse PyObjects into ints and `SymbolicIntNode`s
* pybind11 bindings for `SymbolicIntNode` are added, so size expressions can be traced
* a large number of tests added to demonstrate how to implement python symints.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78135
Approved by: https://github.com/ezyang
Change our representation of sizes and strides to contain SymInts
instead of int64_t.
Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.
But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.
This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78272
Approved by: https://github.com/Krovatkin, https://github.com/Chillee
Change our representation of sizes and strides to contain SymInts
instead of int64_t.
Right now it's not actually possible to create a Tensor with symbolic
shape, so this change is intended to be a no-op.
But the intended behavior is:
- If you create a Tensor with symbolic shape, a `CustomSizes` policy
will be set, and the `has_symbolic_sizes_strides_` bit will be set. (not
currently implemented)
- Calling any TensorImpl function that naively interacts with sizes and
strides will throw. For hot-path functions (`sizes()`, `strides()`), we
make use of the existing policy check to throw. For others, we just have
a regular `TORCH_CHECK(!has_symbolic_sizes_strides_)`.
This also undoes the explicit constructor I made in
https://github.com/pytorch/pytorch/pull/77666; it ended up being more
annoying than useful when making these changes.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77994
Approved by: https://github.com/Krovatkin
Prior to this PR, we had a mish-mash of ways of getting unconventional
sizes/strides behavior:
- In OSS (but not in fbcode), some methods are virtual and you
can override them directly
- There is a is_contiguous policy which is a bitfield tag that lets
you toggle is_contiguous to error or hit a virtual method
is_contiguous_custom if it is set. Ordinarily is_contiguous()
is virtual and you can just override it, but this works EVEN IF
is_contiguous() is non-virtual (e.g., in fbcode)
- There is also a sizes policy which is the same idea but for sizes
This PR unifies these mechanisms, and in doing so, eliminates the
maybe virtual/not-virtualness of the methods in question. The primary
downside of this change is that it is BC-breaking (but the BC break is
very easy to fix!)
The new scheme works like this: we have three levels of policy for
sizes/strides (order matters).
- The Default policy is a conventional dense tensor, where we use
all of the built-in fields to directly represent the
sizes/strides/numel/contiguity of the tensor, and it is possible
to bypass virtual call entirely.
- The CustomStrides policy represent tensors which have a custom
notion of strides (most typically, that they don't support them),
shunting strides() and is_contiguous() to virtual methods
strides_custom() and is_contiguous_custom(). This INCLUDES handling
for contiguity, since they typically go hand-in-hand (although
the situation is murky with batched tensors). The default
implementations of these functions raise errors saying the tensor
doesn't support them.
- The CustomSizes policy represent tensors which have a custom
notion of sizes (the two notable examples are nested tensor, which
doesn't have a representation of sizes in the conventional form, and
XLA/LTC tensor, which synchronizes its sizes with an underlying
compiler backend). This shunts sizes(), numel() and dim() (along
with everything from strides) to _custom() variants.
There is no special policy for erroring; instead, we just do a vcall
and expect the virtual method to raise an exception (the performance
hit from the vcall doesn't matter because you're about to raise a C++
exception anyway). The default implementations of all overridable
functions are available at _default() which is helpful in some
situations when you just want to do a "sync" and then run the
conventional semantics.
This PR could be extended further in two ways but I did not do them
due to time constraints:
- Ideally, all TENSORIMPL_MAYBE_VIRTUAL would be eliminated from
TensorImpl, by using the same policy trick.
- set_size and set_stride are still virtual; it's not entirely clear
the same trick should be used here though as these methods are
deprecated.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77036
Approved by: https://github.com/bdhirsh
Whether or not this is a reasonable operation to do in the presence of
subclasses is a good question in and of itself, but this fixes an
obvious invariant violation, which is that if a Tensor reports that
it is a tensor subclass, it had better have the Python dispatch key.
Previously, the dispatch key would have gotten unconditionally cleared;
now we preserve what ever the original bit was.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75644
Approved by: https://github.com/albanD
The pattern of a PyObject* bundled with a PyInterpreter* is pretty
useful in many contexts (e.g., TorchDispatchTypeObject) so I have turned
it into a dedicated class SafePyObject. In the process I fixed a
bug with the old TorchDispatchTypeObject (copy constructor/assignment
was not deleted), made the API more safe (retrieving the PyObject*
pointer requires verification that the PyInterpreter* matches) and
fixed some minor inefficiencies in C++ code.
Signed-off-by: Edward Z. Yang <ezyangfb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75142
Approved by: https://github.com/zou3519