This reverts commit 1187dedd33.
Reverted https://github.com/pytorch/pytorch/pull/83436 on behalf of https://github.com/huydhn due to The previous change breaks internal builds D38714900 and other OSS tests. The bug has been fixed by this PR. But we decide that it is safer to revert both, merge them into one PR, then reland the fix
No need to have a lagging op db because there are no more sync issues
between functorch and pytorch. If someone adds a new OpInfo, then we
should explicitly check if we support it or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83418
Approved by: https://github.com/samdow
### Description
Enables jiterator for ROCm builds. This includes necessary porting when hiprtc and nvrtc behavior differed. This also ported ROCm versus CUDA differences w.r.t. MAX_DIMS and NUM_THREADS from the non-jiterator code paths into jiterator.
### Testing
CI with ciflow/trunk label to force running ROCm workflows that are currently trunk-only.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77982
Approved by: https://github.com/ngimel
Summary:
This change adds in input_type_to_index mappings to the backend patterns for `nn.functional.linear`, `nn.functional.conv1d`, `nn.functional.conv1d`, and `nn.functional.conv3d`.
This let's us remove `WEIGHT_INDEX_DICT` and `BIAS_INDEX_DICT` from `prepare.py`.
Instead we pass around `backend_config` and check wether an arg is weight/bias agains that config
Test Plan:
```
python test/test_quantization.py TestQuantizeFx
python test/test_quantization.py TestQuantizeFxOps
```
Reviewers:
@andrewor14
Subscribers:
Tasks:
Tags: quant, fx
Differential Revision: [D38705516](https://our.internmc.facebook.com/intern/diff/D38705516)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83263
Approved by: https://github.com/andrewor14
Summary: During execution graph replay, we also want to know where the tensor is allocated to properly model the behavior. The device name is now captured inside the tensor tuple.
Test Plan:
buck build mode/dev-nosan caffe2/test:profiler --show-output
buck-out/gen/caffe2/test/profiler#binary.par test_profiler.TestExecutionGraph
Reviewed By: aaronenyeshi
Differential Revision: D38017717
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82895
Approved by: https://github.com/aaronenyeshi
Summary:
Previously we don't generate out variant (both schema and kernel) for an operator with functional variant only. This adds support for that and adds test.
## Changes on `native_function_generation.py`
We are generating out variant for all functional variants if possible. This PR introduces a lot of newly generated out variants and `native_functions.yaml` needs to incorporate the changes by adding `autogen` keywords.
The logic for determining what operators we should generate an out variant for is the following:
1. No existing out variant for this `NativeFunction`
2. Contains an existing in place, mutable or functional variant
3. Contains at least 1 tensor like return(s)
For operators matching the first two conditions but failing the third, I listed them in `FUNCTIONAL_OPS_THAT_CANNOT_GET_AN_OUT_VARIANT`.
## Special handling
The following operators satisfy all 3 criteria above but we chose to not autogen them, with some reasons.
* `mkldnn_adaptive_avg_pool2d`, the generated out variant `mkldnn_adaptive_avg_pool2d.out` is colliding with the `mkldnn_adaptive_avg_pool2d_out` kernel in `adaptive_avg_pool2d.out` operator. I manually created `mkldnn_adaptive_avg_pool2d.out` and renamed `mkldnn_adaptive_avg_pool2d_out` to `mkldnn_adaptive_avg_pool2d_out_stub`.
* `min`, `max` and `mean`. There already exist `min.out`, `max.out` and `mean.out` but they are having different semantics with the functional ones. I manually created `min.unary_out`, `max.unary_out` and `mean.dtype_out` to disambiguate.
## Autograd Changes
We introduced a logic to not match derivatives info in `derivatives.yaml` to out variant, since we are generating `NOT_IMPLEMENTED` kernels for those out variants anyway. The issue we are seeing with the original logic is that it doesn't handle `TensorOption` arguments really well. For example we have these two operators:
* `_to_copy(Tensor self, *, ScalarType? dtype=None, Layout? layout=None, Device? device=None, bool? pin_memory=None, bool non_blocking=False, MemoryFormat? memory_format=None) -> Tensor`
* `_to_copy.out(Tensor self, *, bool non_blocking=False, MemoryFormat? memory_format=None, Tensor(a!) out) -> Tensor(a!)`
If we uses `_to_copy` derivative info, there will be compilation error since `dtype` is missing from `_to_copy.out` signature.
Test Plan: Rely on unit test
Differential Revision: D37832342
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81437
Approved by: https://github.com/iseeyuan, https://github.com/bdhirsh
I'm planning on removing functorch lagging op db because it doesn't make
sense in the context of being a part of PyTorch. Before that happens,
this PR updates it, and a future PR will delete it.
Test Plan:
- wait for tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83346
Approved by: https://github.com/samdow
PR #82146 made `Block` non-copyable by adding a `unique_ptr` member,
and used this hacky work-around to copy it anyway. However, it
fails under -Werror with this message:
```
../c10/cuda/CUDACachingAllocator.cpp:1411:51: error: ‘void* memcpy(void*, const void*, size_t)’ writing to an object of type ‘struct c10::cuda::CUDACachingAllocator::{anonymous}::Block’ with no trivial copy-assignment [-Werror=class-memaccess]
1411 | std::memcpy(&key, &p.search_key, sizeof(Block));
```
Instead, this constructs a new `Block` with all the relevant
properties copied.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83275
Approved by: https://github.com/malfet