Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73028
A typical use case for `TensorExprKernel` is to create the kernel once and call it multiple times, possibly in parallel. For the parallel calls to work, we need to ensure that the run() method calls do not change any state in `TensorExprKernel`.
Before this change, the `run()` method was modifying the sizes and strides vectors when dynamic shapes were present. This manifested as a data race when running a model with Static Runtime.
ghstack-source-id: 149398820
Test Plan:
```
buck build mode/dev-asan //caffe2/test/cpp/tensorexpr:tensorexpr
./buck-out/dev/gen/caffe2/test/cpp/tensorexpr/tensorexpr --gtest_filter="DynamicShapes.MultiThreadedExecution"
```
Reviewed By: eellison
Differential Revision: D34287960
fbshipit-source-id: d311f3c5a66c5d5de4e1deaeaa01816b53e9906e
(cherry picked from commit 161568bfae9fc1497a36d6103f49deda001509a4)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/70464
Add handling of strided input tensors to dynamic fusion. This is done with the same set of input striding specializations as https://github.com/pytorch/pytorch/pull/60684/:
```
S_ONE, // STRIDE_ONE: packed
S_CONT, // STRIDE_CONTIGUOUS: stride[i + 1] * sizes[i + 1]
S_TRAN_CONT, // STRIDE_TRANSPOSED_CONTIGUOUS: stride[i-1] * sizes[i-1]
S_AS_ARG, // STRIDE_AS_ARG: stride passed in as runtime value
```
and then two additional specializations for a) contiguous tensor and b) channels-last tensor. channels-last is a common case and we should optimize for it. additionally, tensors natively store whether they are contiguous/channels-last contiguous, which makes it faster to check if tensors follow this pattern.
Output striding will be done in a follow up.
The striding is stored on both the TensorGroup node and on the guard node. The striding descriptors are stored as a vector of strings on the node for debugability and to make use of storing ivalues as attributes on nodes.
As an example:
```
%8 : Double(10, 11, 12, 13, strides=[1716, 1, 143, 11], requires_grad=0, device=cpu) = prim::TensorExprGroup_0[symbolic_shape_inputs=[-37, -36, -35, -34], striding_inputs_desc=[["TENSOR_CONT_CHANNELS_LAST"]](%x, %24, %23, %22, %21)```
```
Test Plan: Imported from OSS
Reviewed By: navahgar
Differential Revision: D33458649
Pulled By: eellison
fbshipit-source-id: c42616d3c683d70f6258180d23d3841a31a6030d
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/67861
Previously submitted as https://github.com/pytorch/pytorch/pull/67197.
This got reverted because its failures were hidden by the failures of
another PR.
Test Plan: Imported from OSS
Reviewed By: ZolotukhinM
Differential Revision: D32178196
Pulled By: navahgar
fbshipit-source-id: cc8a5c68aed360d06289e69645461cfa773e1300