Summary:
This patch makes send/recv as custom ops such that it's dispatcher
passable. It's one part of the effort to route comm ops to the dispatcher
such that tracing mechanisms that relies on the dispatcher can trace them,
e.g., LazyTensor and AOTAutograd.
Test Plan:
python test/distributed/test_c10d_nccl.py -k test_send_recv
...and other existing distributed tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79779
Approved by: https://github.com/mrshenli, https://github.com/wanchaol
Summary:
X-link: https://github.com/pytorch/data/pull/547
Fixes https://github.com/pytorch/data/issues/538
- Improve the validation function to raise warning about unpickable function when either lambda or local function is provided to DataPipe.
- The inner function from functools.partial object is extracted as well for validation
- Mimic the behavior of pickle module for local lambda function: It would only raise Error for the local function rather than lambda function. So, we will raise warning about local function not lambda function.
```py
>>> import pickle
>>> def fn():
... lf = lambda x: x
... pickle.dumps(lf)
>>> pickle.dumps(fn)
AttributeError: Can't pickle local object 'fn.<locals>.<lambda>'
```
This Diff also fixes the Error introduced by https://github.com/pytorch/pytorch/pull/79344
Test Plan:
CI on PyTorch and TorchData
Manually validated the tests from TorchVision
Differential Revision: D37417556
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80232
Approved by: https://github.com/NivekT
This test configuration runs PyTorch's test suite under torchdynamo.
Once stabilized, we will make this default and remove this particular
CI job.
Signed-off-by: Edward Z. Yang <ezyang@fb.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80342
Approved by: https://github.com/anijain2305
This allows subclasses such as NestedTensorImpl to provide special behavior for `int64_t size(int64_t d)` that'll also be accessible by our Python frontend.
It follows the same pattern as sizes_custom.
Currently getting CI before asking for a review.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80236
Approved by: https://github.com/ezyang
### Summary:
This PR updates the design of APoT Observer, Quantizer, and Tensor to be more consistent with their uniform counterparts in the PyTorch framework. APoT Observer now calculates alpha as the max between the absolute values of the max and min values in the input tensor. APoT Quantizer is modified so its instance methods quantize_APoT and dequantize_APoT are called by their global method counterparts. APoT Tensor is modified to account for the new method definition of the `quantize_APoT` from APoT Quantizer.
### Test Plan:
Run APoT Observer class unit tests with: `python pytorch/test/quantization/core/experimental/test_nonuniform_observer.py`
Run APoT Quantize class unit tests with: `python pytorch/test/quantization/core/experimental/test_quantizer.py`
Run APoT Tensor class unit tests with: `python pytorch/test/quantization/core/experimental/test_quantized_tensor.py`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80075
Approved by: https://github.com/jerryzh168
In transformer, the scale step in attention has a `nested_tensor / scalar` operation. There are two ways to support that:
1. directly support `nested_tensor / scalar`:
* pro: straightforward, good UX
* con: is dispatching `mul(nested tensor, regular tensor)` a good practice?
2. let user manually convert `scalar` to `nested_scalar = torch.nested_tensor([broadcast_scalar])`
* pro: dispatcher only has to deal with `mul(nested tensor, nested tensor)`
* con: confusing manual conversions, bad UX
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80284
Approved by: https://github.com/cpuhrsch
Summary:
This patch makes barrier as a custom op such that it's dispatcher
passable. It's one part of the effort to route comm ops to the dispatcher
such that tracing mechanisms that relies on the dispatcher can trace them,
e.g., LazyTensor and AOTAutograd.
Test Plan:
python test/distributed/test_c10d_nccl.py -k test_nccl_barrier
...and other existing distributed tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79777
Approved by: https://github.com/mrshenli, https://github.com/wanchaol
Summary:
This patch makes alltoall as a custom op such that it's dispatcher
passable. It's one part of the effort to route comm ops to the dispatcher
such that tracing mechanisms that relies on the dispatcher can trace them,
e.g., LazyTensor and AOTAutograd.
Test Plan:
BACKEND=nccl WORLD_SIZE=2 python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_all_to_all_cuda
BACKEND=nccl WORLD_SIZE=2 python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_all_to_all_cuda_complex
BACKEND=nccl WORLD_SIZE=2 python test/distributed/test_distributed_spawn.py -v TestDistBackendWithSpawn.test_all_to_all_full_group_cuda
and other existing distributed tests.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79691
Approved by: https://github.com/mrshenli, https://github.com/wanchaol
The correct variable name should be USE_SYSTEM_PYBIND11, as defined in
the root CMakeLists.txt. In cmake/Dependencies.cmake, it is incorrectly
written as USE_SYSTEM_BIND11, but cmake will not complain about this.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80272
Approved by: https://github.com/suo
Create Z3 types. In particular, dynamic dimensions, dynamic tensor type and tensor types up to size 4. Note that for Z3 decidability reasons, we are using uninterpreted functions for tensor types, which means we must explicitly define tensor constructors with a concrete size (for now, upto size 4). We defer lifting this requirement to future work.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80084
Approved by: https://github.com/anijain2305
Which is, in essence is composite of `eq`->`all`->`item`
`native/mps/operators/Equal.cpp` is an almost verbatim copy of `native/cuda/Equal.cpp`
Fix codegen by generating MPSFunctions headers
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80195
Approved by: https://github.com/albanD
Summary:
Enable a flag in tracer to export the status of the tracing rather than checking the module call function.
Context: Feature_processor needs is_fx_tracing and currently this function is implemented by PyPer and cannot be OSS. We want to open source this function and implement it inside Fx Tracer
Test Plan: test in the stacked diff D37386855
Reviewed By: jamesr66a
Differential Revision: D37386820
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80255
Approved by: https://github.com/jamesr66a