pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Joel Schlosser	ece19bf018	Update run_test.py to use TEST_WITH_SLOW_GRADCHECK flag (#104819 ) Finishes the job from #104537. See https://github.com/pytorch/pytorch/pull/104537#pullrequestreview-1520065008 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104819 Approved by: https://github.com/huydhn	2023-07-11 21:58:46 +00:00
PyTorch MergeBot	24aa8b9b9a	Revert "Deprecate registering autograd kernels at not an autograd key (#104481 )" This reverts commit `ed13ab6664`. Reverted https://github.com/pytorch/pytorch/pull/104481 on behalf of https://github.com/atalman due to failed in periodic tests ([comment](https://github.com/pytorch/pytorch/pull/104481#issuecomment-1631552846))	2023-07-11 21:48:22 +00:00
Aaron Gokaslan	2f95a3d0fc	[BE]: Apply ruff PERF fixes to torch (#104917 ) Applies automated ruff fixes in the PERF modules and enables all automatic ones. I also updated ruff which applied some additional fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104917 Approved by: https://github.com/ezyang, https://github.com/albanD	2023-07-11 20:45:21 +00:00
Jack Taylor	c9a806be28	[ROCm] enable additional inductor/dynamo UTs (#104624 ) Enables additional inductor UTs on ROCm and un skips outdated skips. I have also removed a group of failures in `test_torchinductor_opinfo` which are now passing for CUDA and ROCm ``` - # The following 3 tests fail on CUDA with AssertionError: expected size 5==5, stride 5==1 at dim=0 - # linalg._svd's return value has different strides on CUDA vs CPU which causes this - # In test_meta.py there is a mechanism to skipping strides checks for some ops - # (including _linalg_svd), possibly we should have something similar here - "linalg.cond": {f32, f64}, - "linalg.svdvals": {f32, f64}, - "linalg.matrix_rank": {f32, f64}, - "linalg.svd": {f32, f64}, ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104624 Approved by: https://github.com/malfet	2023-07-11 20:44:02 +00:00
Svetlana Karslioglu	6f27c5185f	Fix broken link to torch.compile docs (#104982 ) The existing link https://pytorch.org/docs/master/dynamo/custom-backends.html is 404. Updating to use the new link. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104982 Approved by: https://github.com/msaroufim	2023-07-11 20:35:47 +00:00
William Wen	c60cb91700	[dynamo] fix bug where trace_source and graph_sizes artifacts were not being printed with TORCH_LOGS='+dynamo' (#104912 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104912 Approved by: https://github.com/Skylion007, https://github.com/mlazos	2023-07-11 20:09:44 +00:00
Edward Z. Yang	a2f04e9841	Force multi-line messages to still get log format prefix (#104932 ) This makes it easier to exclude multi-line messages using single line grepping. If your screen is wide enough this should not be a big problem. Example of what it looks like: ``` [2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG] GUARDS: [2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG] hasattr(L['x'], '_dynamo_dynamic_indices') == False [2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG] ___is_grad_enabled() [2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG] not ___are_deterministic_algorithms_enabled() [2023-07-10 20:11:30,529] torch._dynamo.convert_frame.__guards: [DEBUG] utils_device.CURRENT_DEVICE == None ``` Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104932 Approved by: https://github.com/mlazos, https://github.com/albanD	2023-07-11 20:00:52 +00:00
Edward Z. Yang	515e3f2bb9	Add [rankN]: to log messages when distributed is initialized (#104929 ) Doing it in the formatter is kind of naughty but I stared a while at logging.setLogRecordFactory for a bit, and then decided it was a bit too global for a library to use well. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104929 Approved by: https://github.com/mlazos, https://github.com/Skylion007	2023-07-11 20:00:52 +00:00
Nikita Shulga	5e4ee15e85	[MPS] Fix unique flatten logic (#104938 ) Tensor must be flatted if dim is none before checking whether or not dim dimension is already None Fixes https://github.com/pytorch/pytorch/issues/104879 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104938 Approved by: https://github.com/albanD	2023-07-11 19:55:56 +00:00
Yukio Siraichi	ad37dd5155	Make unspecified ints to range over negative and positive. (#104658 ) Currently, negative unspecified ints get specialized. This PR creates symbolic values for unspecified ints (including negative ones). For example, with this PR, the following code only compiles once, instead of 3 times: ```python def foo(x, y): return torch.fill(torch.zeros(x.shape), y) foo(10) foo(-5) foo(-3) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/104658 Approved by: https://github.com/ezyang	2023-07-11 19:13:16 +00:00
Andrew Or	4b29829ece	[quant][pt2] Fix QAT convert for mobilenetv2 (#104110 ) Summary: QAT convert for mobilenetv2 was previously not working because we incorrectly applied dropout during eval as well as training. This is because, for exported models, model.eval() does not change the behavior of dropout, unlike models with torch ops. This commit simulates the effects of model.eval() for exported models as well by replacing the aten dropout pattern before eval. As of this commit, end-to-end QAT numerics now match for mobilenetv2 between FX and PT2. Test Plan: python test/test_quantization.py TestQuantizePT2EModels.test_qat_mobilenet_v2 Differential Revision: D46750343 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104110 Approved by: https://github.com/jerryzh168	2023-07-11 18:42:42 +00:00
Svetlana Karslioglu	eb03af44ee	Fixes to the torch.compile doc and doctest (#104911 ) Fixing the user warning in doctest by removing autosummary from the compile/index.rst : ``` /opt/conda/envs/py_3.8/lib/python3.8/site-packages/torch/__init__.py:docstring of torch.compile:1: WARNING: duplicate object description of torch.compile, other instance in compile/generated/torch.compile, use :noindex: for one of them ``` The error is no longer present in the log: https://github.com/pytorch/pytorch/actions/runs/5513741050/jobs/10052379357?pr=104911 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104911 Approved by: https://github.com/kit1980, https://github.com/malfet	2023-07-11 17:54:12 +00:00
Yukio Siraichi	6abe0b2ee8	Disable translation validation on performance runs. (#104887 ) This PR disables translation validation (TV) when running the benchmark suits on performance workflows: inductor with A100s. In summary, the changes are: - Add flag for turning TV on and off on _benchmarks/dynamo/common.py_ - Turn TV on only on CI accuracy builds - Add `--no-translation-validation` target flag to _.ci/pytorch/test.sh_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/104887 Approved by: https://github.com/ezyang	2023-07-11 17:30:40 +00:00
DanilBaibak	5d4b2fcc6f	Updated pillow version to 9.3.0 for Python version <= 3.8 (#104958 ) There are several vulnerabilities with pillow version 9.2.0. In the worst case, this can lead to arbitrary code execution - https://security.gentoo.org/glsa/202211-10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104958 Approved by: https://github.com/jeanschmidt, https://github.com/malfet	2023-07-11 17:27:09 +00:00
PyTorch MergeBot	f01deb23d5	Revert "[dynamo][numpy] Add support for np.dtype (#103546 )" This reverts commit `0710791929`. Reverted https://github.com/pytorch/pytorch/pull/103546 on behalf of https://github.com/voznesenskym due to Failed on bench, unclear why bench test did not run on CI ([comment](https://github.com/pytorch/pytorch/pull/103546#issuecomment-1631203461))	2023-07-11 17:23:11 +00:00
Nikita Karetnikov	49a2b72927	[inductor] handle `Min` and `Max` in `TritonPrinter` (#104944 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104944 Approved by: https://github.com/ezyang	2023-07-11 17:11:31 +00:00
Jane Xu	15aa401baa	[foreach][NAdam] Minimize use of intermediates to decrease peak memory (#104910 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104910 Approved by: https://github.com/Skylion007, https://github.com/albanD	2023-07-11 17:08:07 +00:00
Jane Xu	6878d3a157	[foreach][RAdam] Minimize use of intermediates to decrease peak memory (#104904 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104904 Approved by: https://github.com/albanD, https://github.com/Skylion007	2023-07-11 17:08:07 +00:00
Richard Zou	ed13ab6664	Deprecate registering autograd kernels at not an autograd key (#104481 ) Context ------- This PR adds a new fallback to the Autograd dispatch keys. If you would prefer the old behavior: - A quick (unsupported) way to get the previous behavior is to call `torch._C._set_autograd_fallback("nothing")` - Register "torch::CppFunction::makeFallthrough()" to your Autograd key, like in https://gist.github.com/zou3519/d09a5f4b1afe2430af09fea67c6ff2c8 It is possible that this PR regresses performance of overhead-bound models. If this is the case, please reach out (and apply one of the temporary fixes in the previous section). Description for reviewers ------------------------- In order to deprecate registering autograd kernels at not an autograd key, we add a fallback to the Autograd dispatch keys. This fallback raises a warning if the user attempts to backprop through the operator and is also configurable to either warn or not warn. The goal of this PR is to - preserve as much BC as possible - raise a warning that whatever the user is doing is potentially wrong. - be as performant as possible There are roughly two cases: - if the post-autograd kernels return a Tensor that requires grad, then we install an autograd hook that raises a warning. We are preserving BC in that it is possible that the user has a torch::autograd::Function registered to their CPU key. - if the post-autograd kernels return Tensors that do not require grad, then we make them require_grad and install a WarnNotImplemented grad fn that warns in the backward pass. This is mildy BC-breaking (see next section). Test Plan: - bunch of new tests BC-Breaking Note ---------------- This PR adds a new fallback to the Autograd dispatch keys. It affects custom operators that do not have a kernel registered to the Autograd keys (e.g. AutogradCPU and AutogradCUDA). If the previous behavior was that the custom operator would return Tensors that do not require grad if the inputs do require grad, then this PR changes it so that all floating-point and complex returns do require grad. See the "Context" section above for how to get the old behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104481 Approved by: https://github.com/soulitzer	2023-07-11 16:48:39 +00:00
Jenny	e095716161	Add a note for Incorrect signature in nn.Module.register_full_backwar… (#104964 ) …d_pre_hook Fixes #102645 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104964 Approved by: https://github.com/albanD	2023-07-11 16:24:13 +00:00
Jane Xu	231364fd06	[optim] use lerp whenever possible (#104796 ) This is a better copy (with fixes) of #104781. Test plan: CI will pass once https://github.com/pytorch/pytorch/pull/104784 is landed Internal CI (and the newly enabled compiled optim tests) will pass after https://github.com/pytorch/pytorch/pull/104866 is landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104796 Approved by: https://github.com/albanD	2023-07-11 14:32:59 +00:00
Nikita Shulga	999abd56a7	[BE] Make ONNX imports lazy (#104843 ) This reduces total number of imported modules by default from 1419 to 1322 according to ``` time python -c "import sys;before=len(sys.modules);import torch;after=len(sys.modules);print(f'torch-{torch.__version__} imported {after-before} modules')" ``` and slightly reduces import time, while having no effect on UX (i.e. `torch.onnx.` submodule is kept intact) Suppress lint errors that appear after mypy accidentally starts listing more files, for more details see: https://github.com/pytorch/pytorch/issues/104940 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104843 Approved by: https://github.com/jansel, https://github.com/albanD	2023-07-11 12:54:22 +00:00
Huy Do	26f7f470df	Handle empty PR body in filter_test_configs (#104914 ) This is a bug discovered by https://github.com/pytorch/pytorch/pull/104810. Basically, when the PR body is empty, GitHub API returns a None value, which is passed into `parse_reenabled_issues` causing it to fail. ### Testing ``` python3 .github/scripts/filter_test_configs.py \ --workflow "pull" \ --job-name "linux-focal-py3-clang7-android-ndk-r19c-gradle-custom-build-single-full-jit / filter," \ --test-matrix "{ include: [ { config: 'default', shard: 1, num_shards: 1, runner: 'linux.2xlarge' }, ]}" \ --pr-number "104810" \ --tag "" \ --event-name "pull_request" \ --schedule "" \ --branch "" ``` The command works correctly without failing now Pull Request resolved: https://github.com/pytorch/pytorch/pull/104914 Approved by: https://github.com/clee2000	2023-07-11 10:16:58 +00:00
Danni Li	db4aed6a03	Include nn.ParameterDict in dynamo __getitem__ (#99771 ) Summary: Fix: #99735 Test Plan: Please see GitHub tests. Differential Revision: D45200616 Pull Request resolved: https://github.com/pytorch/pytorch/pull/99771 Approved by: https://github.com/Skylion007, https://github.com/anijain2305	2023-07-11 08:19:04 +00:00
chunyuan	ba167e6578	Inductor cpp wrapper: fix codegen of ScatterFallback (#104524 ) Fix cpp wrapper failure on TorchBench model `basic_gnn_edgecnn` and `hf_Reformer` which contain scatter OP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104524 Approved by: https://github.com/jgong5, https://github.com/jansel	2023-07-11 08:17:56 +00:00
Mengwei Liu	0710791929	[dynamo][numpy] Add support for np.dtype (#103546 ) ## Problem Trying to support numpy function call in dynamo, with numpy dtype as argument. For example: ``` def fn(x: int): return np.empty_like(x, dtype=np.float64) ``` ## Solution This currently doesn't work because `NumpyVariable` doesn't implement `as_proxy()`. The idea in `as_proxy()` for now is to convert `np.float64` and other np.<dtype> into `torch.dtype` and then feed into the corresponding `torch_np` method. For previous example, we convert `np.float64` to `torch.float64` in `as_proxy()` and then feed it into `torch_np.empy_like()` method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/103546 Approved by: https://github.com/ezyang	2023-07-11 06:29:15 +00:00
kshitij12345	90eaa98d13	dynamo : kwarg support for wrap (higher order op) (#104180 ) Ref: https://github.com/pytorch/pytorch/issues/100278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104180 Approved by: https://github.com/zou3519	2023-07-11 06:08:18 +00:00
Elias Ellison	ed5ea15714	[Easy] remove debug code (#104915 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104915 Approved by: https://github.com/mlazos	2023-07-11 04:01:02 +00:00
Thiago Crepaldi	f1bff6601c	[ONNX] Add fake tensor support to torch.onnx.dynamo_export (#103865 ) ## Context prior to this PR https://github.com/pytorch/pytorch/pull/100017/ was merged onto PyTorch `main` branch with the goal of enabling `torch._dynamo.export` to perform symbolic tracing. In that context, symbolic tracing is defined as tracing of a model using fake inputs and weights. An input is Fake when `torch.nn.Tensor` is replaced by `torch._subclasses.FakeTensor`, whereas a weight is fake when a `torch.nn.Parameter` is replaced by `torch._subclasses.FakeTensor`. For additional context, several strategies were discussed with Meta to enable this feature, including 1) calling `torch._dynamo.export` within a `torch._subclass.FakeTensorMode` context and 2) fakefying input and model as separate step and then call `torch._dynamo.export` without an active `torch._subclass.FakeTensorMode` context. At the end, 2) was preferred and implemented by #100017 to minimize the number of side-effects the fake tensor mode has on the code base. As a consequence, `torch._dynamo.export` API introduced a new argument called `fake_mode`. When symbolic tracing is used, the user must pass in the `fake_mode` used to fakefy both the input and the model. Internally, `torch._dynamo.export` will adopt this `fake_mode` instead of creating its own instance. This is needed because each instance of `FakeTensorMode` has metadata on the tensor/parameter it fakefied. Thus, using real tensor/model and specify a `fake_mode` to `torch._dynamo.export` is an error. Also, specify a `fake_mode` instance to `torch._dynamo.export` different than the one used to fakefy the model and input is also an error. ## Changes introduced from this PR This PR is intended to integrate `torch._dynamo.export(fake_mode=...)` through `torch.onnx.dynamo_export`. In essence, it * Introduces a new public API `ONNXFakeContext` which wraps a `FakeTensorMode` under the hood. This removes complexity from the user side while still allow the exporter to leverage the fake mode. * Adds a new public API `enable_fake_mode` context manager that instantiates and return a `ONNXFakeContext`. * Adds a new `ExportOptions.fake_context` that will be used to persist the `ONNXFakeContext` created by `enable_fake_mode` and plumb through until it reaches the call to `torch._dynamo.export`. * Adds a `model_state_dict` argument to `ExportOutput.save` API. * When model is exported with fake tensors, no actual data exist in the FX module and, therefore, in the ONNX graph. * In fact, `torch.fx.make_fx` lifts initializers as model input when fake tensors are used * https://github.com/pytorch/pytorch/pull/104493 is needed to enforce name matching between Parameters and inputs * A model checkpoint file or state_dict is needed to populate the ONNX graph with real initializers through `export_output.save(model_state_dict=...)` API Symbolic tracing, or onnx fake mode, is only enabled when the user instantiates the input and model within the `enable_fake_mode` context. Otherwise, real tracing is done, which preserves the current behavior. ## Usability Because symbolic tracing depends a lot on having changes made on Dynamo side before it can be consumed on ONNX exporter, this feature may have its API and assumptions changed as symbolic tracing matures upstream. Nonetheless, it is still important to have this feature merged ASAP on the ONNX exporter side to "lock" changes on Dynamo that would otherwise break ONNX exporter without warning. Example: ```python class Model(torch.nn.Module): def __init__(self) -> None: super().__init__() self.linear = torch.nn.Linear(2, 2) def forward(self, x): out = self.linear(x) return out with torch.onnx.enable_fake_mode() as fake_context: x = torch.rand(5, 2, 2) model = Model() # Export the model with fake inputs and parameters export_options = ExportOptions(fake_context=fake_context) export_output = torch.onnx.dynamo_export( model, x, export_options=export_options ) model_state_dict = Model().state_dict() # optional export_output.save("/path/to/model.onnx", model_state_dict=model_state_dict) ``` ## Next steps * Add unit tests running the exported model with ORT Today this is not possible yet because `make_fx` used by our Decomposition pass lifts initializers as model inputs. However, the initializer names are not preserved by FX tracing, causing a mismatch between the initializer and input name. https://github.com/pytorch/pytorch/pull/104493 and https://github.com/pytorch/pytorch/pull/104741 should fix the initializer mismatch, enabling model execution * Revisit `ONNXTorchPatcher` and how the ONNX initializers are saved in the graph as external data We can try to get rid of the PyTorch patcher. If we can't, we might prefer to create specific patchers, say `FXSymbolicTracePatcher` used specifically during an export using `torch.fx.symbolic_trace` and maybe a `ExportOutputSavePacther` used specifically for `ExportOutput.save` to prevent "patching too many pytorch API that we don't need ## References * [FakeTensor implementation](https://github.com/pytorch/pytorch/blob/main/torch/_subclasses/fake_tensor.py) * [PR that adds fake tensor support to torch._dynamo.export](https://github.com/pytorch/pytorch/pull/100017) * [Short fake tensor documentation](https://pytorch.org/torchdistx/latest/fake_tensor.html) Pull Request resolved: https://github.com/pytorch/pytorch/pull/103865 Approved by: https://github.com/BowenBao	2023-07-11 03:17:17 +00:00
haozhe.zhu	ca8c56ff5d	fix QuantizeAvx512 (#104400 ) For quantize ``` for (; i < len / VLEN * VLEN; i += VLEN) { __m512 x_vals = _mm512_load_ps(src + i); __m512 x_transformed_v = _mm512_mul_ps(x_vals, inverse_scale_v); x_transformed_v = _mm512_min_ps(x_transformed_v, _mm512_set1_ps(int32_float_max_val)); __m512i x_rounded_v = _mm512_cvtps_epi32(x_transformed_v); x_rounded_v = _mm512_add_epi32(x_rounded_v, _mm512_set1_epi32(zero_point)); __m512i x_clipped_v = _mm512_max_epi32(min_v, _mm512_min_epi32(max_v, x_rounded_v)); x_clipped_v = _mm512_shuffle_epi8(x_clipped_v, shuffle_mask_v); x_clipped_v = _mm512_permutexvar_epi32(permute_mask_l8_v, x_clipped_v); _mm_storeu_si128( reinterpret_cast<__m128i>(dst + i), _mm512_castsi512_si128(x_clipped_v)); } ``` ``` x_clipped_v = _mm512_shuffle_epi8(x_clipped_v, shuffle_mask_v); x_clipped_v = _mm512_permutexvar_epi32(permute_mask_l8_v, x_clipped_v); ``` is aiming to cast `int32` to `int8` and shuffle 16 `int8` to the first 128 bits. For example, `A1` represent 8bit ``` x_clipped_v = _mm512_shuffle_epi8(x_clipped_v, shuffle_mask_v); A1A2A3A4* B1B2B3B4 C1C2C3C4 D1D2D3D4 -> D4C4B4A4 other 32 * 3 bit E1E2E3E4 F1F2F3F4 G1G2G3G4 H1H2H3H4 -> H4G4F4E4 other 32 * 3 bit I1I2I3I4 J1J2J3J4 K1K2K3K4 L1L2L3L4 -> L4K4J4I4 other 32 * 3 bit M1M2M3M4 N1N2N3N4 O1O2O3O4 P1P2P3P4 -> P4O4N4M4 other 32 * 3 bit x_clipped_v = _mm512_permutexvar_epi32(permute_mask_l8_v, x_clipped_v); D4C4B4A4 other 32 * 3 bit -> D4C4B4A4 H4G4F4E4 L4K4J4I4 P4O4N4M4 H4G4F4E4 other 32 * 3 bit other 3 * 4 * 32 bits L4K4J4I4 other 32 * 3 bit P4O4N4M4 other 32 * 3 bit ``` Based on https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm512_permutexvar_epi32&ig_expand=4966,5088. ``` FOR j := 0 to 15 i := j32 id := idx[i+3:i]32 dst[i+31:i] := a[id+31:id] ENDFOR dst[MAX:512] := 0 ``` the `permute_mask_l8_v` should satisfy ``` permute_mask_l8_v[3:0] = 0 permute_mask_l8_v[3 + 32:0 + 32] = 4 permute_mask_l8_v[3 + 64:0 + 64] = 8 permute_mask_l8_v[3 + 96:0 + 96] = 12 ``` The other part of `permute_mask_l8_v` does not matters. `AVX2` version is correct. It is not exposed before it is only called with fixed length `64` https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/cpu/vec/vec512/vec512_qint.h#L545-L546. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104400 Approved by: https://github.com/jgong5, https://github.com/mingfeima, https://github.com/jerryzh168	2023-07-11 02:02:23 +00:00
Michael Lazos	dbb69f78fe	Add assert + test for artifact log booleans (#104907 ) Fixes https://github.com/pytorch/pytorch/issues/104885 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104907 Approved by: https://github.com/ezyang	2023-07-11 01:59:23 +00:00
Driss Guessous	d184c81166	Add -fstandalone-debug debug flag (#104475 ) # Summary While debugging something in lldb, I found that the formatter I wrote for c10::intarrayref was not working correctly producing: `(std::string) $6 = error: summary string parsing error` Based off of this thread: https://github.com/vadimcn/codelldb/issues/415 I adde the standalone-debug information and fixed the std::string formatting issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104475 Approved by: https://github.com/ezyang, https://github.com/malfet	2023-07-11 01:29:20 +00:00
Andrew Gu	63d1fb21f5	[FSDP] Default `limit_all_gathers=True` (#104900 ) This PR defaults to `limit_all_gathers=True`. I included a `record_function()` for the rate limiter synchronization to help with user confusion on the gap in the pre-forward: <img width="874" alt="Screenshot 2023-07-10 at 3 28 18 PM" src="https://github.com/pytorch/pytorch/assets/31054793/61f55e0e-58d7-4162-9395-bea06d3e8d8a"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104900 Approved by: https://github.com/fegin	2023-07-11 01:04:29 +00:00
Rodrigo Kumpera	7c3c3dd7ca	[C10D] Reimplement TCPStore wait timeout logic. (#100594 ) Current TCPStore wait logic leaves the client socket in a bad state if waiting timesout. This happens because all recv functions raise an exception on timeout and that's it. The problem is that on timeout we need to unregister the wait. We implement this with client side cancelation by adding a new CANCEL_WAIT instruction. So, if no data arrives before the deadline, the client sends a CANCEL_WAIT command. The server sends a WAIT_CANCELED response to that command, always. This gets us down to the last issue, which is that there's a race between timeout'ing, canceling the wait and the wait completing. The client needs to handle the server sending a STOP_WAITING followed by a WAIT_CANCELED answer. This ensures client and server state are synchronized regardless of whether the wait timeouts or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/100594 Approved by: https://github.com/H-Huang	2023-07-11 00:36:41 +00:00
maxren	332f2057df	[XNNPACK][QS8] torch.nn.ELU (#104307 ) Differential Revision: [D47075933](https://our.internmc.facebook.com/intern/diff/D47075933/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104307 Approved by: https://github.com/digantdesai	2023-07-11 00:35:13 +00:00
maxren	c4e084e3c7	[XNNPACK][QS8] torch.nn.ConstantPad2d (#104306 ) Differential Revision: [D47075932](https://our.internmc.facebook.com/intern/diff/D47075932/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104306 Approved by: https://github.com/digantdesai	2023-07-11 00:35:02 +00:00
maxren	2c960c73a3	[XNNPACK][QS8] torch.permute (#104305 ) Differential Revision: [D47075934](https://our.internmc.facebook.com/intern/diff/D47075934/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104305 Approved by: https://github.com/digantdesai	2023-07-11 00:34:58 +00:00
maxren	d41c4a8338	[XNNPACK][QS8] torch.clamp (#104304 ) Differential Revision: [D47075935](https://our.internmc.facebook.com/intern/diff/D47075935/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104304 Approved by: https://github.com/digantdesai	2023-07-11 00:34:58 +00:00
Huy Do	66c41e1c5e	Avoid generating core dumps when CONTINUE_THROUGH_ERROR is set (#104905 ) Fixes https://github.com/pytorch/pytorch/issues/104234. This closes another loop hole where multiple core files could be generated when CONTINUE_THROUGH_ERROR flag is set in CI. This ensures that only one core file is generated in regular Linux test job. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104905 Approved by: https://github.com/clee2000	2023-07-11 00:20:33 +00:00
Ying Zhang	e940d5d3c3	Disable cudagraphs by default when dynamic shape is enabled. (#104448 ) Disable cudagraphs when dynamic shape is enabled (via torch.compile(dynamic=True)). Otherwise, Inductor recompiles for each new shape, which doesn't seem to be very reasonable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104448 Approved by: https://github.com/jansel, https://github.com/ezyang	2023-07-11 00:16:37 +00:00
Matthew Hoffman	3279f06410	Merge and improve torch optim optimizer type stubs (#102593 ) Fixes #102428 Also improves hook registration type hints: ```python from typing import Any, Dict, Tuple from torch import nn from torch.optim import Adam, Adagrad, Optimizer linear = nn.Linear(2,2) optimizer = Adam(linear.parameters(), lr=0.001) def pre_hook_fn_return_none(optimizer: Adam, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None: return None def pre_hook_fn_return_modified( optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any] ) -> Tuple[Tuple[Any, ...], Dict[str, Any]]: return inputs, kwargs def hook_fn(optimizer: Optimizer, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None: return None def hook_fn_other_optimizer(optimizer: Adagrad, inputs: Tuple[Any, ...], kwargs: Dict[str, Any]) -> None: return None optimizer.register_step_post_hook(hook_fn) # OK optimizer.register_step_pre_hook(pre_hook_fn_return_none) # OK optimizer.register_step_pre_hook(pre_hook_fn_return_modified) # OK optimizer.register_step_post_hook(hook_fn_other_optimizer) # Parameter 1: type "Adam" cannot be assigned to type "Adagrad" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/102593 Approved by: https://github.com/janeyx99	2023-07-11 00:07:30 +00:00
Edward Z. Yang	6059fea760	Make perf_hint_log report at info level (#104873 ) If you do it at warning, these log messages will get displayed by default, which is not the intended behavior. Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104873 Approved by: https://github.com/mlazos	2023-07-10 23:46:34 +00:00
Michael Lazos	4063158df9	Enable running compiled optimizers in CI (#104888 ) as title for reference: this is a followup to https://github.com/pytorch/pytorch/pull/104121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104888 Approved by: https://github.com/janeyx99	2023-07-10 23:45:41 +00:00
Jane Xu	7e9c891056	[foreach][AdamW] Minimize intermediates to save peak memory (#104898 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/104898 Approved by: https://github.com/albanD	2023-07-10 23:40:52 +00:00
Yukio Siraichi	d5dbe77629	Fix mod semantics for `Z3Ops`. (#104827 ) Python `mod` semantics is not the same as the mathematical modulus operation. According to the Python reference: `a = floor(a / b) * b + a % r`. In other words: `a % b = a - floor(a / b) * b`. This PR fixes the old implementation which used SMT-LIB2 semantics for `mod`. In short, it only worked with integers and had the following guarantee: `0 <= a % b < b`. In summary, the changes are: - `a % b = a - floordiv(a, b) * b` - `a` and `b` can be both integer or real - The result will be real if any of the arguments is real. Otherwise, it will be integer Pull Request resolved: https://github.com/pytorch/pytorch/pull/104827 Approved by: https://github.com/lezcano	2023-07-10 23:35:04 +00:00
Edward Z. Yang	951b9a6a14	Update torchbench pin (#104829 ) Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104829 Approved by: https://github.com/albanD	2023-07-10 23:31:27 +00:00
Edward Z. Yang	0300be5b7b	Fix AttributeError("'constexpr' object has no attribute 'type'") (#104831 ) Fixes #104759 Signed-off-by: Edward Z. Yang <ezyang@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/104831 Approved by: https://github.com/Skylion007, https://github.com/voznesenskym	2023-07-10 23:26:42 +00:00
fduwjj	aa84078c6c	[PTD][TP] Add BWD support for colwise embedding sharding (#104820 ) Originally, we didn't enable BWD for colwise embedding because we thought it was just for inference, but it turns out that we do need it for training. So, let's enable it for now and unit test is also added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/104820 Approved by: https://github.com/fegin	2023-07-10 22:33:20 +00:00
atalman	fd378db6a8	Fix lint after 104902 (#104909 ) Fix lint after PR: #104902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/104909 Approved by: https://github.com/clee2000, https://github.com/malfet, https://github.com/huydhn	2023-07-10 22:17:06 +00:00
Michael Lazos	9861c4a3f8	Add lerp decomps + meta registrations (#104866 ) as title Pull Request resolved: https://github.com/pytorch/pytorch/pull/104866 Approved by: https://github.com/janeyx99	2023-07-10 22:07:57 +00:00

1 2 3 4 5 ...

61894 commits