Commit graph

61894 commits

Author SHA1 Message Date
Joel Schlosser
ece19bf018 Update run_test.py to use TEST_WITH_SLOW_GRADCHECK flag (#104819)
Finishes the job from #104537. See https://github.com/pytorch/pytorch/pull/104537#pullrequestreview-1520065008
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104819
Approved by: https://github.com/huydhn
2023-07-11 21:58:46 +00:00
PyTorch MergeBot
24aa8b9b9a Revert "Deprecate registering autograd kernels at not an autograd key (#104481)"
This reverts commit ed13ab6664.

Reverted https://github.com/pytorch/pytorch/pull/104481 on behalf of https://github.com/atalman due to failed in periodic tests ([comment](https://github.com/pytorch/pytorch/pull/104481#issuecomment-1631552846))
2023-07-11 21:48:22 +00:00
Aaron Gokaslan
2f95a3d0fc [BE]: Apply ruff PERF fixes to torch (#104917)
Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917
Approved by: https://github.com/ezyang, https://github.com/albanD
2023-07-11 20:45:21 +00:00
Jack Taylor
c9a806be28 [ROCm] enable additional inductor/dynamo UTs (#104624)
Enables additional inductor UTs on ROCm and un skips outdated skips.

I have also removed a group of failures in `test_torchinductor_opinfo` which are now passing for CUDA and ROCm

```
-    # The following 3 tests fail on CUDA with AssertionError: expected size 5==5, stride 5==1 at dim=0
-    # linalg._svd's return value has different strides on CUDA vs CPU which causes this
-    # In test_meta.py there is a mechanism to skipping strides checks for some ops
-    # (including _linalg_svd), possibly we should have something similar here
-    "linalg.cond": {f32, f64},
-    "linalg.svdvals": {f32, f64},
-    "linalg.matrix_rank": {f32, f64},
-    "linalg.svd": {f32, f64},
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104624
Approved by: https://github.com/malfet
2023-07-11 20:44:02 +00:00
Svetlana Karslioglu
6f27c5185f Fix broken link to torch.compile docs (#104982)
The existing link https://pytorch.org/docs/master/dynamo/custom-backends.html is 404. Updating to use the new link.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104982
Approved by: https://github.com/msaroufim
2023-07-11 20:35:47 +00:00
William Wen
c60cb91700 [dynamo] fix bug where trace_source and graph_sizes artifacts were not being printed with TORCH_LOGS='+dynamo' (#104912)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104912
Approved by: https://github.com/Skylion007, https://github.com/mlazos
2023-07-11 20:09:44 +00:00
Edward Z. Yang
a2f04e9841 Force multi-line messages to still get log format prefix (#104932)
This makes it easier to exclude multi-line messages using single line
grepping.  If your screen is wide enough this should not be a big
problem.

Example of what it looks like:

```
[2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG] GUARDS:
[2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG]   hasattr(L['x'], '_dynamo_dynamic_indices') == False
[2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG]   ___is_grad_enabled()
[2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG]   not ___are_deterministic_algorithms_enabled()
[2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG]   utils_device.CURRENT_DEVICE == None
```

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104932
Approved by: https://github.com/mlazos, https://github.com/albanD
2023-07-11 20:00:52 +00:00
Edward Z. Yang
515e3f2bb9 Add [rankN]: to log messages when distributed is initialized (#104929)
Doing it in the formatter is kind of naughty but I stared a while at logging.setLogRecordFactory for a bit, and then decided it was a bit too global for a library to use well.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104929
Approved by: https://github.com/mlazos, https://github.com/Skylion007
2023-07-11 20:00:52 +00:00
Nikita Shulga
5e4ee15e85 [MPS] Fix unique flatten logic (#104938)
Tensor must be flatted if dim is none before checking whether or not dim dimension is already None

Fixes https://github.com/pytorch/pytorch/issues/104879

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104938
Approved by: https://github.com/albanD
2023-07-11 19:55:56 +00:00
Yukio Siraichi
ad37dd5155 Make unspecified ints to range over negative and positive. (#104658)
Currently, negative unspecified ints get specialized. This PR creates symbolic values for
unspecified ints (including negative ones).

For example, with this PR, the following code only compiles once, instead of 3 times:

```python
def foo(x, y):
    return torch.fill(torch.zeros(x.shape), y)

foo(10)
foo(-5)
foo(-3)
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104658
Approved by: https://github.com/ezyang
2023-07-11 19:13:16 +00:00
Andrew Or
4b29829ece [quant][pt2] Fix QAT convert for mobilenetv2 (#104110)
Summary:
QAT convert for mobilenetv2 was previously not working
because we incorrectly applied dropout during eval as well as
training. This is because, for exported models, model.eval() does
not change the behavior of dropout, unlike models with torch ops.
This commit simulates the effects of model.eval() for exported
models as well by replacing the aten dropout pattern before eval.
As of this commit, end-to-end QAT numerics now match for
mobilenetv2 between FX and PT2.

Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2

Differential Revision: D46750343

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104110
Approved by: https://github.com/jerryzh168
2023-07-11 18:42:42 +00:00
Svetlana Karslioglu
eb03af44ee Fixes to the torch.compile doc and doctest (#104911)
Fixing the user warning in doctest by removing autosummary from the compile/index.rst :
```
/opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/__init__.py:docstring of torch.compile:1: WARNING: duplicate object description of torch.compile, other instance in compile/generated/torch.compile, use :noindex: for one of them
```
The error is no longer present in the log: https://github.com/pytorch/pytorch/actions/runs/5513741050/jobs/10052379357?pr=104911
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104911
Approved by: https://github.com/kit1980, https://github.com/malfet
2023-07-11 17:54:12 +00:00
Yukio Siraichi
6abe0b2ee8 Disable translation validation on performance runs. (#104887)
This PR disables translation validation (TV) when running the benchmark suits on
performance workflows: inductor with A100s.

In summary, the changes are:

- Add flag for turning TV on and off on _benchmarks/dynamo/common.py_
- Turn TV on only on CI accuracy builds
- Add `--no-translation-validation` target flag to _.ci/pytorch/test.sh_

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104887
Approved by: https://github.com/ezyang
2023-07-11 17:30:40 +00:00
DanilBaibak
5d4b2fcc6f Updated pillow version to 9.3.0 for Python version <= 3.8 (#104958)
There are several vulnerabilities with pillow version 9.2.0. In the worst case, this can lead to arbitrary code execution - https://security.gentoo.org/glsa/202211-10.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104958
Approved by: https://github.com/jeanschmidt, https://github.com/malfet
2023-07-11 17:27:09 +00:00
PyTorch MergeBot
f01deb23d5 Revert "[dynamo][numpy] Add support for np.dtype (#103546)"
This reverts commit 0710791929.

Reverted https://github.com/pytorch/pytorch/pull/103546 on behalf of https://github.com/voznesenskym due to Failed on bench, unclear why bench test did not run on CI ([comment](https://github.com/pytorch/pytorch/pull/103546#issuecomment-1631203461))
2023-07-11 17:23:11 +00:00
Nikita Karetnikov
49a2b72927 [inductor] handle Min and Max in TritonPrinter (#104944)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104944
Approved by: https://github.com/ezyang
2023-07-11 17:11:31 +00:00
Jane Xu
15aa401baa [foreach][NAdam] Minimize use of intermediates to decrease peak memory (#104910)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104910
Approved by: https://github.com/Skylion007, https://github.com/albanD
2023-07-11 17:08:07 +00:00
Jane Xu
6878d3a157 [foreach][RAdam] Minimize use of intermediates to decrease peak memory (#104904)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104904
Approved by: https://github.com/albanD, https://github.com/Skylion007
2023-07-11 17:08:07 +00:00
Richard Zou
ed13ab6664 Deprecate registering autograd kernels at not an autograd key (#104481)
Context
-------
This PR adds a new fallback to the Autograd dispatch keys.

If you would prefer the old behavior:
- A quick (unsupported) way to get the previous behavior is to call
`torch._C._set_autograd_fallback("nothing")`
- Register "torch::CppFunction::makeFallthrough()" to your Autograd key,
like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8

It is possible that this PR regresses performance of overhead-bound
models. If this is the case, please reach out (and apply one of the
temporary fixes in the previous section).

Description for reviewers
-------------------------
In order to deprecate registering autograd kernels at not an autograd
key, we add a fallback to the Autograd dispatch keys. This fallback
raises a warning if the user attempts to backprop through the operator
and is also configurable to either warn or not warn.

The goal of this PR is to
- preserve as much BC as possible
- raise a warning that whatever the user is doing is potentially wrong.
- be as performant as possible

There are roughly two cases:
- if the post-autograd kernels return a Tensor that requires grad, then
we install an autograd hook that raises a warning. We are preserving BC
in that it is possible that the user has a torch::autograd::Function
registered to their CPU key.
- if the post-autograd kernels return Tensors that do not require grad,
then we make them require_grad and install a WarnNotImplemented grad fn
that warns in the backward pass. This is mildy BC-breaking (see next
section).

Test Plan:
- bunch of new tests

BC-Breaking Note
----------------
This PR adds a new fallback to the Autograd dispatch keys. It affects
custom operators that do not have a kernel registered to the Autograd
keys (e.g. AutogradCPU and AutogradCUDA).

If the previous behavior was that the custom operator would return
Tensors that do not require grad if the inputs do require grad, then
this PR changes it so that all floating-point and complex returns do
require grad. See the "Context" section above for how to get the old
behavior.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104481
Approved by: https://github.com/soulitzer
2023-07-11 16:48:39 +00:00
Jenny
e095716161 Add a note for Incorrect signature in nn.Module.register_full_backwar… (#104964)
…d_pre_hook

Fixes #102645

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104964
Approved by: https://github.com/albanD
2023-07-11 16:24:13 +00:00
Jane Xu
231364fd06 [optim] use lerp whenever possible (#104796)
This is a better copy (with fixes) of #104781.

Test plan:
CI will pass once https://github.com/pytorch/pytorch/pull/104784 is landed

Internal CI (and the newly enabled compiled optim tests) will pass after https://github.com/pytorch/pytorch/pull/104866 is landed.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104796
Approved by: https://github.com/albanD
2023-07-11 14:32:59 +00:00
Nikita Shulga
999abd56a7 [BE] Make ONNX imports lazy (#104843)
This reduces total number of imported modules by default from 1419 to 1322 according to
```
time python -c "import sys;before=len(sys.modules);import torch;after=len(sys.modules);print(f'torch-{torch.__version__} imported {after-before} modules')"
```

and slightly reduces import time, while having no effect on UX (i.e. `torch.onnx.` submodule is kept intact)

Suppress lint errors that appear after mypy accidentally starts listing more files, for more details see: https://github.com/pytorch/pytorch/issues/104940

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104843
Approved by: https://github.com/jansel, https://github.com/albanD
2023-07-11 12:54:22 +00:00
Huy Do
26f7f470df Handle empty PR body in filter_test_configs (#104914)
This is a bug discovered by https://github.com/pytorch/pytorch/pull/104810.  Basically, when the PR body is empty, GitHub API returns a None value, which is passed into `parse_reenabled_issues` causing it to fail.

### Testing

```
python3 .github/scripts/filter_test_configs.py \
  --workflow "pull" \
  --job-name "linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single-full-jit / filter," \
  --test-matrix "{ include: [ { config: 'default', shard: 1, num_shards: 1, runner: 'linux.2xlarge' }, ]}" \
  --pr-number "104810" \
  --tag "" \
  --event-name "pull_request" \
  --schedule "" \
  --branch ""
```

The command works correctly without failing now
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104914
Approved by: https://github.com/clee2000
2023-07-11 10:16:58 +00:00
Danni Li
db4aed6a03 Include nn.ParameterDict in dynamo __getitem__ (#99771)
Summary:

Fix: #99735

Test Plan: Please see GitHub tests.

Differential Revision: D45200616

Pull Request resolved: https://github.com/pytorch/pytorch/pull/99771
Approved by: https://github.com/Skylion007, https://github.com/anijain2305
2023-07-11 08:19:04 +00:00
chunyuan
ba167e6578 Inductor cpp wrapper: fix codegen of ScatterFallback (#104524)
Fix cpp wrapper failure on TorchBench model `basic_gnn_edgecnn` and `hf_Reformer` which contain scatter OP.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104524
Approved by: https://github.com/jgong5, https://github.com/jansel
2023-07-11 08:17:56 +00:00
Mengwei Liu
0710791929 [dynamo][numpy] Add support for np.dtype (#103546)
## Problem

Trying to support numpy function call in dynamo, with numpy dtype as argument.

For example:

```
def fn(x: int):
    return np.empty_like(x, dtype=np.float64)
```

## Solution

This currently doesn't work because `NumpyVariable` doesn't implement `as_proxy()`. The idea in `as_proxy()` for now is to convert `np.float64` and other np.<dtype> into `torch.dtype` and then feed into the corresponding `torch_np` method.

For previous example, we convert `np.float64` to `torch.float64` in `as_proxy()` and then feed it into `torch_np.empy_like()` method.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/103546
Approved by: https://github.com/ezyang
2023-07-11 06:29:15 +00:00
kshitij12345
90eaa98d13 dynamo : kwarg support for wrap (higher order op) (#104180)
Ref: https://github.com/pytorch/pytorch/issues/100278

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104180
Approved by: https://github.com/zou3519
2023-07-11 06:08:18 +00:00
Elias Ellison
ed5ea15714 [Easy] remove debug code (#104915)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104915
Approved by: https://github.com/mlazos
2023-07-11 04:01:02 +00:00
Thiago Crepaldi
f1bff6601c [ONNX] Add fake tensor support to torch.onnx.dynamo_export (#103865)
## Context prior to this PR

https://github.com/pytorch/pytorch/pull/100017/ was merged onto PyTorch `main` branch with the goal of enabling `torch._dynamo.export` to perform symbolic tracing.
In that context, symbolic tracing is defined as tracing of a model using fake inputs and weights. An input is Fake when `torch.nn.Tensor` is replaced by `torch._subclasses.FakeTensor`, whereas a weight is fake when a `torch.nn.Parameter` is replaced by `torch._subclasses.FakeTensor`.

For additional context, several strategies were discussed with Meta to enable this feature, including 1) calling `torch._dynamo.export` within a `torch._subclass.FakeTensorMode` context and 2) **fake**fying input and model as separate step and then call `torch._dynamo.export` without an active `torch._subclass.FakeTensorMode` context. At the end, 2) was preferred and implemented by #100017 to minimize the number of side-effects the fake tensor mode has on the code base.

As a consequence, `torch._dynamo.export` API introduced a new argument called `fake_mode`. When symbolic tracing is used, the user must pass in the `fake_mode` used to fakefy both the input and the model. Internally, `torch._dynamo.export` will adopt this `fake_mode` instead of creating its own instance. This is needed because each instance of `FakeTensorMode` has metadata on the tensor/parameter it fakefied. Thus, using real tensor/model and specify a `fake_mode` to `torch._dynamo.export` is an error. Also, specify a `fake_mode` instance to `torch._dynamo.export` different than the one used to fakefy the model and input is also an error.

## Changes introduced from this PR

This PR is intended to integrate `torch._dynamo.export(fake_mode=...)` through `torch.onnx.dynamo_export`. In essence, it
* Introduces a new public API `ONNXFakeContext` which wraps a `FakeTensorMode` under the hood. This removes complexity from the user side while still allow the exporter to leverage the fake mode.
* Adds a new public API `enable_fake_mode` *context manager* that instantiates and return a `ONNXFakeContext`.
* Adds a new `ExportOptions.fake_context` that will be used to persist the `ONNXFakeContext` created by `enable_fake_mode` and plumb through until it reaches the call to `torch._dynamo.export`.
* Adds a `model_state_dict` argument to `ExportOutput.save` API.
  * When model is exported with fake tensors, no actual data exist in the FX module and, therefore, in the ONNX graph.
    * In fact, `torch.fx.make_fx` lifts initializers as model input when fake tensors are used
      * https://github.com/pytorch/pytorch/pull/104493 is needed to enforce name matching between Parameters and inputs
    *  A model checkpoint file or state_dict is needed to populate the ONNX graph with real initializers through `export_output.save(model_state_dict=...)` API

Symbolic tracing, or onnx fake mode, is only enabled when the user instantiates the input and model within the `enable_fake_mode` context. Otherwise, real tracing is done, which preserves the current behavior.

## Usability

Because symbolic tracing depends a lot on having changes made on Dynamo side before it can be consumed on ONNX exporter, this feature may have its API and assumptions changed as symbolic tracing matures upstream. Nonetheless, it is still important to have this feature merged ASAP on the ONNX exporter side to "lock" changes on Dynamo that would otherwise break ONNX exporter without warning.

Example:

```python
class Model(torch.nn.Module):
    def __init__(self) -> None:
        super().__init__()
        self.linear = torch.nn.Linear(2, 2)

    def forward(self, x):
        out = self.linear(x)
        return out

with torch.onnx.enable_fake_mode() as fake_context:
    x = torch.rand(5, 2, 2)
    model = Model()

# Export the model with fake inputs and parameters
export_options = ExportOptions(fake_context=fake_context)
export_output = torch.onnx.dynamo_export(
    model, x, export_options=export_options
)

model_state_dict = Model().state_dict()  # optional
export_output.save("/path/to/model.onnx", model_state_dict=model_state_dict)
```

## Next steps

* Add unit tests running the exported model with ORT
Today this is not possible yet because `make_fx` used by our Decomposition pass lifts initializers as model inputs. However, the initializer names are not preserved by FX tracing, causing a mismatch between the initializer and input name.
https://github.com/pytorch/pytorch/pull/104493 and https://github.com/pytorch/pytorch/pull/104741 should fix the initializer mismatch, enabling model execution

* Revisit `ONNXTorchPatcher` and how the ONNX initializers are saved in the graph as external data
We can try to get rid of the PyTorch patcher. If we can't, we might prefer to create specific patchers, say `FXSymbolicTracePatcher` used specifically during an export using `torch.fx.symbolic_trace` and maybe a `ExportOutputSavePacther` used specifically for `ExportOutput.save` to prevent "patching too many pytorch API that we don't need

## References

* [FakeTensor implementation](https://github.com/pytorch/pytorch/blob/main/torch/_subclasses/fake_tensor.py)
* [PR that adds fake tensor support to torch._dynamo.export](https://github.com/pytorch/pytorch/pull/100017)
* [Short fake tensor documentation](https://pytorch.org/torchdistx/latest/fake_tensor.html)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/103865
Approved by: https://github.com/BowenBao
2023-07-11 03:17:17 +00:00
haozhe.zhu
ca8c56ff5d fix QuantizeAvx512 (#104400)
For quantize
```
  for (; i < len / VLEN * VLEN; i += VLEN) {
    __m512 x_vals = _mm512_load_ps(src + i);
    __m512 x_transformed_v = _mm512_mul_ps(x_vals, inverse_scale_v);
    x_transformed_v =
        _mm512_min_ps(x_transformed_v, _mm512_set1_ps(int32_float_max_val));
    __m512i x_rounded_v = _mm512_cvtps_epi32(x_transformed_v);
    x_rounded_v = _mm512_add_epi32(x_rounded_v, _mm512_set1_epi32(zero_point));
    __m512i x_clipped_v =
        _mm512_max_epi32(min_v, _mm512_min_epi32(max_v, x_rounded_v));

    x_clipped_v = _mm512_shuffle_epi8(x_clipped_v, shuffle_mask_v);
    x_clipped_v = _mm512_permutexvar_epi32(permute_mask_l8_v, x_clipped_v);
    _mm_storeu_si128(
        reinterpret_cast<__m128i*>(dst + i),
        _mm512_castsi512_si128(x_clipped_v));
  }
```

```
    x_clipped_v = _mm512_shuffle_epi8(x_clipped_v, shuffle_mask_v);
    x_clipped_v = _mm512_permutexvar_epi32(permute_mask_l8_v, x_clipped_v);
```
is aiming to cast `int32` to `int8` and shuffle 16 `int8` to the first 128 bits.

For example, `A1` represent 8bit
```
    x_clipped_v = _mm512_shuffle_epi8(x_clipped_v, shuffle_mask_v);
    A1A2A3**A4** B1B2B3**B4** C1C2C3**C4** D1D2D3**D4**            -> D4C4B4A4 other 32 * 3 bit
    E1E2E3**E4** F1F2F3**F4** G1G2G3**G4** H1H2H3**H4**            -> H4G4F4E4 other 32 * 3 bit
    I1I2I3**I4** J1J2J3**J4** K1K2K3**K4** L1L2L3**L4**            -> L4K4J4I4 other 32 * 3 bit
    M1M2M3**M4** N1N2N3**N4** O1O2O3**O4** P1P2P3**P4**            -> P4O4N4M4 other 32 * 3 bit
    x_clipped_v = _mm512_permutexvar_epi32(permute_mask_l8_v, x_clipped_v);
    D4C4B4A4 other 32 * 3 bit        -> D4C4B4A4 H4G4F4E4 L4K4J4I4 P4O4N4M4
    H4G4F4E4 other 32 * 3 bit           other 3 * 4 * 32 bits
    L4K4J4I4 other 32 * 3 bit
    P4O4N4M4 other 32 * 3 bit

```

Based on https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_permutexvar_epi32&ig_expand=4966,5088.
```
FOR j := 0 to 15
	i := j*32
	id := idx[i+3:i]*32
	dst[i+31:i] := a[id+31:id]
ENDFOR
dst[MAX:512] := 0
```
the `permute_mask_l8_v` should satisfy
```
permute_mask_l8_v[3:0] = 0
permute_mask_l8_v[3 + 32:0 + 32] = 4
permute_mask_l8_v[3 + 64:0 + 64] = 8
permute_mask_l8_v[3 + 96:0 + 96] = 12
```
The other part of `permute_mask_l8_v` does not matters.

`AVX2` version is correct.

It is not exposed before it is only called with fixed length `64` https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/vec/vec512/vec512_qint.h#L545-L546.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104400
Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/jerryzh168
2023-07-11 02:02:23 +00:00
Michael Lazos
dbb69f78fe Add assert + test for artifact log booleans (#104907)
Fixes https://github.com/pytorch/pytorch/issues/104885

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104907
Approved by: https://github.com/ezyang
2023-07-11 01:59:23 +00:00
Driss Guessous
d184c81166 Add -fstandalone-debug debug flag (#104475)
# Summary

While debugging something in lldb, I found that the formatter I wrote for c10::intarrayref was not working correctly producing:
`(std::string) $6 = error: summary string parsing error`

Based off of this thread: https://github.com/vadimcn/codelldb/issues/415

I adde the standalone-debug information and fixed the std::string formatting issue.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104475
Approved by: https://github.com/ezyang, https://github.com/malfet
2023-07-11 01:29:20 +00:00
Andrew Gu
63d1fb21f5 [FSDP] Default limit_all_gathers=True (#104900)
This PR defaults to `limit_all_gathers=True`.

I included a `record_function()` for the rate limiter synchronization to help with user confusion on the gap in the pre-forward:
<img width="874" alt="Screenshot 2023-07-10 at 3 28 18 PM" src="https://github.com/pytorch/pytorch/assets/31054793/61f55e0e-58d7-4162-9395-bea06d3e8d8a">

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104900
Approved by: https://github.com/fegin
2023-07-11 01:04:29 +00:00
Rodrigo Kumpera
7c3c3dd7ca [C10D] Reimplement TCPStore wait timeout logic. (#100594)
Current TCPStore wait logic leaves the client socket in a bad state if waiting timesout.

This happens because all recv functions raise an exception on timeout and that's it.
The problem is that on timeout we need to unregister the wait.

We implement this with client side cancelation by adding a new CANCEL_WAIT instruction.

So, if no data arrives before the deadline, the client sends a CANCEL_WAIT command.
The server sends a WAIT_CANCELED response to that command, always.

This gets us down to the last issue, which is that there's a race between timeout'ing,
canceling the wait and the wait completing. The client needs to handle the server sending
a STOP_WAITING followed by a WAIT_CANCELED answer.

This ensures client and server state are synchronized regardless of whether the wait
timeouts or not.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/100594
Approved by: https://github.com/H-Huang
2023-07-11 00:36:41 +00:00
maxren
332f2057df [XNNPACK][QS8] torch.nn.ELU (#104307)
Differential Revision: [D47075933](https://our.internmc.facebook.com/intern/diff/D47075933/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104307
Approved by: https://github.com/digantdesai
2023-07-11 00:35:13 +00:00
maxren
c4e084e3c7 [XNNPACK][QS8] torch.nn.ConstantPad2d (#104306)
Differential Revision: [D47075932](https://our.internmc.facebook.com/intern/diff/D47075932/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104306
Approved by: https://github.com/digantdesai
2023-07-11 00:35:02 +00:00
maxren
2c960c73a3 [XNNPACK][QS8] torch.permute (#104305)
Differential Revision: [D47075934](https://our.internmc.facebook.com/intern/diff/D47075934/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104305
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
maxren
d41c4a8338 [XNNPACK][QS8] torch.clamp (#104304)
Differential Revision: [D47075935](https://our.internmc.facebook.com/intern/diff/D47075935/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104304
Approved by: https://github.com/digantdesai
2023-07-11 00:34:58 +00:00
Huy Do
66c41e1c5e Avoid generating core dumps when CONTINUE_THROUGH_ERROR is set (#104905)
Fixes https://github.com/pytorch/pytorch/issues/104234.  This closes another loop hole where multiple core files could be generated when CONTINUE_THROUGH_ERROR flag is set in CI.  This ensures that only one core file is generated in regular Linux test job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104905
Approved by: https://github.com/clee2000
2023-07-11 00:20:33 +00:00
Ying Zhang
e940d5d3c3 Disable cudagraphs by default when dynamic shape is enabled. (#104448)
Disable cudagraphs when dynamic shape is enabled (via torch.compile(dynamic=True)).
Otherwise, Inductor recompiles for each new shape, which doesn't seem to be very reasonable.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104448
Approved by: https://github.com/jansel, https://github.com/ezyang
2023-07-11 00:16:37 +00:00
Matthew Hoffman
3279f06410 Merge and improve torch optim optimizer type stubs (#102593)
Fixes #102428

Also improves hook registration type hints:

```python
from typing import Any, Dict, Tuple

from torch import nn
from torch.optim import Adam, Adagrad, Optimizer

linear = nn.Linear(2,2)
optimizer = Adam(linear.parameters(), lr=0.001)

def pre_hook_fn_return_none(optimizer: Adam, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

def pre_hook_fn_return_modified(
    optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]
) -> Tuple[Tuple[Any, ...], Dict[str, Any]]:
    return inputs, kwargs

def hook_fn(optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

def hook_fn_other_optimizer(optimizer: Adagrad, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None:
    return None

optimizer.register_step_post_hook(hook_fn)  # OK

optimizer.register_step_pre_hook(pre_hook_fn_return_none)  # OK
optimizer.register_step_pre_hook(pre_hook_fn_return_modified)  # OK

optimizer.register_step_post_hook(hook_fn_other_optimizer)  # Parameter 1: type "Adam" cannot be assigned to type "Adagrad"

```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/102593
Approved by: https://github.com/janeyx99
2023-07-11 00:07:30 +00:00
Edward Z. Yang
6059fea760 Make perf_hint_log report at info level (#104873)
If you do it at warning, these log messages will get displayed by
default, which is not the intended behavior.

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104873
Approved by: https://github.com/mlazos
2023-07-10 23:46:34 +00:00
Michael Lazos
4063158df9 Enable running compiled optimizers in CI (#104888)
as title

for reference: this is a followup to https://github.com/pytorch/pytorch/pull/104121

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104888
Approved by: https://github.com/janeyx99
2023-07-10 23:45:41 +00:00
Jane Xu
7e9c891056 [foreach][AdamW] Minimize intermediates to save peak memory (#104898)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/104898
Approved by: https://github.com/albanD
2023-07-10 23:40:52 +00:00
Yukio Siraichi
d5dbe77629 Fix mod semantics for Z3Ops. (#104827)
Python `mod` semantics is not the same as the mathematical modulus operation. According to
the Python reference: `a = floor(a / b) * b + a % r`.

In other words: `a % b = a - floor(a / b) * b`.

This PR fixes the old implementation which used SMT-LIB2 semantics for `mod`. In short, it
only worked with integers and had the following guarantee: `0 <= a % b < b`.

In summary, the changes are:
- `a % b = a - floordiv(a, b) * b`
- `a` and `b` can be both integer or real
- The result will be real if any of the arguments is real. Otherwise, it will be integer

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104827
Approved by: https://github.com/lezcano
2023-07-10 23:35:04 +00:00
Edward Z. Yang
951b9a6a14 Update torchbench pin (#104829)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104829
Approved by: https://github.com/albanD
2023-07-10 23:31:27 +00:00
Edward Z. Yang
0300be5b7b Fix AttributeError("'constexpr' object has no attribute 'type'") (#104831)
Fixes #104759

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104831
Approved by: https://github.com/Skylion007, https://github.com/voznesenskym
2023-07-10 23:26:42 +00:00
fduwjj
aa84078c6c [PTD][TP] Add BWD support for colwise embedding sharding (#104820)
Originally, we didn't enable BWD for colwise embedding because we thought it was just for inference, but it turns out that we do need it for training. So, let's enable it for now and unit test is also added.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104820
Approved by: https://github.com/fegin
2023-07-10 22:33:20 +00:00
atalman
fd378db6a8 Fix lint after 104902 (#104909)
Fix lint after PR: #104902

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104909
Approved by: https://github.com/clee2000, https://github.com/malfet, https://github.com/huydhn
2023-07-10 22:17:06 +00:00
Michael Lazos
9861c4a3f8 Add lerp decomps + meta registrations (#104866)
as title

Pull Request resolved: https://github.com/pytorch/pytorch/pull/104866
Approved by: https://github.com/janeyx99
2023-07-10 22:07:57 +00:00