Commit graph

64781 commits

Author SHA1 Message Date
Nikita Shulga
be02103786 [BE] Get rid of code duplication (#110619)
Replace `dispatch_to_CDouble`, `dispatch_to_CLong` and `dispatch_to_CComplexDouble` with `dispatch_to<T>` template

<!--
copilot:poem
-->
### <samp>🤖 Generated by Copilot at c3d9d01</samp>

> _Sing, O Muse, of the clever coder who devised_
> _A wondrous template function, `dispatch_to<T>`, that could_
> _Handle with ease the various scalar types that vexed_
> _The previous code, which was verbose and dull as wood._
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110619
Approved by: https://github.com/soulitzer, https://github.com/albanD
ghstack dependencies: #110618
2023-10-05 22:05:57 +00:00
Nikita Shulga
82e353fffc [BE] Use nested namespaces in autograd/templates (#110618)
As PyTorch can now use C++17 language features
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110618
Approved by: https://github.com/soulitzer
2023-10-05 22:05:57 +00:00
albanD
cae537126f Set _diffThreshold on our TestCase (#110603)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110603
Approved by: https://github.com/albanD
2023-10-05 21:49:28 +00:00
Aaron Gokaslan
668eb55488 [BE]: Enable some basic pytest style rules (#110362)
Adds some basic flake8-pytest-style rules from ruff with their autofixes. I just picked a couple uncontroversial changes about having a consistent pytest style that were already following. We should consider enabling some more in the future, but this is a good start. I also upgraded ruff to the latest version.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110362
Approved by: https://github.com/ezyang, https://github.com/albanD, https://github.com/kit1980
2023-10-05 21:40:43 +00:00
Wanchao Liang
c95cf4b4c9 [dtensor] add grad placements kwarg to to_local API (#110629)
When we convert to local tensor, dtensor can't track autograd or
gradient layout of the local tensor anymore, if user do sth not expected, there
needs to be a way for user to hint about the gradient layout of the
local tensor
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110629
Approved by: https://github.com/zdevito
2023-10-05 21:34:01 +00:00
chilli
ada65508d2 Add option to flop counter formula registration to get raw values (#110591)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110591
Approved by: https://github.com/awgu
ghstack dependencies: #110501, #110504
2023-10-05 21:14:41 +00:00
Scott Wolchok
9e72c9cccd [torch] easy missing move in aoti_runtime/model.h (#110469)
Just an extra shared_ptr copy, nothing fancy.

Differential Revision: [D49792510](https://our.internmc.facebook.com/intern/diff/D49792510/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110469
Approved by: https://github.com/Skylion007
2023-10-05 20:56:06 +00:00
William Wen
71beca4899 [dynamo, logging] Report name of defining class along side function name in Dynamo logs (#110190)
Implement https://github.com/pytorch/pytorch/issues/109236

Sample code:
```python
import torch

class AAA:
    class DUMMY:
        class DUMMY2:
            pass
    def dummy(self):
        def dummy2():
            pass
    class BBB:
        @staticmethod
        def CCC():
            class DDD:
                if True:
                    @staticmethod
                    def EEE():
                        x = [torch.ones(3, 3) for _ in range(5)]
                        return x
            return DDD

def fn():
    return AAA.BBB.CCC().EEE()

opt_fn = torch.compile(fn, backend="eager")

opt_fn()
```

Logs:
```bash
$TORCH_LOGS="trace_source" python playground2.py
[2023-09-27 17:38:35,641] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:21 in fn (fn)
[2023-09-27 17:38:35,641] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]     def fn():
[2023-09-27 17:38:35,642] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:22 in fn (fn)
[2023-09-27 17:38:35,642] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]         return AAA.BBB.CCC().EEE()
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:11 in CCC (AAA.BBB) (inline depth: 1)
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]             @staticmethod
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:13 in CCC (AAA.BBB.CCC.DDD) (inline depth: 1)
[2023-09-27 17:38:35,661] [0/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]                 class DDD:
[2023-09-27 17:38:35,723] [1/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG] TRACE starts_line /data/users/williamwen/pytorch/playground2.py:17 in <listcomp> (AAA.BBB.CCC.DDD.EEE)
[2023-09-27 17:38:35,723] [1/0] torch._dynamo.symbolic_convert.__trace_source: [DEBUG]                             x = [torch.ones(3, 3) for _ in range(5)]
```

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110190
Approved by: https://github.com/ezyang, https://github.com/mlazos
2023-10-05 20:41:38 +00:00
Jon Chuang
c99de9f37c fix(optim): adagrad sparse multitensor incorrect early exit (#110454)
Fixes https://github.com/pytorch/pytorch/issues/110444#issuecomment-1745181530

This PR:
Passes

Main:
```
test/optim/test_optim.py::TestOptim::test_adagrad_sparse FAILED [0.0058s]

==================================================================================================================================== FAILURES =====================================================================================================================================
__________________________________________________________________________________________________________________________ TestOptim.test_adagrad_sparse __________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 1448, in test_adagrad_sparse
    self._test_rosenbrock_sparse(
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/test/optim/test_optim.py", line 128, in _test_rosenbrock_sparse
    self.assertEqual(params, params_c, atol=1e-6, rtol=1e-6)
  File "/home/jonch/Desktop/Programming/mlsys/pytorch/torch/testing/_internal/common_utils.py", line 3309, in assertEqual
    raise error_metas.pop()[0].to_error(
AssertionError: Tensor-likes are not close!

Mismatched elements: 1 / 2 (50.0%)
Greatest absolute difference: 0.09999999999993325 at index (1,) (up to 1e-06 allowed)
Greatest relative difference: 0.06249999999996089 at index (1,) (up to 1e-06 allowed)

```

CC: @janeyx99
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110454
Approved by: https://github.com/janeyx99
2023-10-05 20:37:57 +00:00
CK Luk
ecdd1bcf03 Back out "[Inductor] Break the loop fusion when node2 depends on node1 mutations (#109172)" (#110622)
Summary:
Original commit changeset: 03980fb054d5

Original Phabricator Diff: D49519512

Bisecting shows that this diff is the cause of S369683. Since this affects Ads production, need to back out this diff immediately.

Test Plan: See S369683

Reviewed By: ezyang

Differential Revision: D49958638

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110622
Approved by: https://github.com/yanboliang
2023-10-05 20:09:09 +00:00
Chien-Chin Huang
88616349d7 [state_dict][1/N] Implement the basic functions of distributed.checkpoint._state_dict (#105902)
This PR implements the basic functions of distributed.checkpoint._state_dict. This PR currently contains the flattening of optimizer state_dict which makes the PR too large. A later version may split it into 2 for a better code review.

Differential Revision: [D47647719](https://our.internmc.facebook.com/intern/diff/D47647719/)

**NOTE FOR REVIEWERS**: This PR has internal Meta-specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D47647719/)!
Pull Request resolved: https://github.com/pytorch/pytorch/pull/105902
Approved by: https://github.com/wz337
2023-10-05 20:04:15 +00:00
Bin Bao
298f01d9a2 [aotinductor] Avoid generating redundant kernel loading code (#110510)
Summary: 1) Stop forcing triton.unique_kernel_names to True for AOTInductor, because the unique kernel name can be read from metadata; 2) Only generate load_kernel once for each kernel since we don't have control flow in our generated code.  This solves https://github.com/pytorch/pytorch/issues/105553.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110510
Approved by: https://github.com/chenyang78, https://github.com/jansel
2023-10-05 19:59:38 +00:00
Sherlock Huang
f1b94461aa [AOTInductor] ProxyExecutor support Dynamic Shape (#110526)
Summary:
Extend ProxyExecutor to support dynamic shape.

Example of ProxyExecutor invocation with symints.
```
    int64_t* arg0_1_size;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_get_sizes(arg0_1, &arg0_1_size));
    auto s0 = arg0_1_size[0];
    auto s1 = arg0_1_size[1];
    int64_t* arg1_1_size;
    AOTI_TORCH_ERROR_CODE_CHECK(aoti_torch_get_sizes(arg1_1, &arg1_1_size));
    auto s2 = arg1_1_size[0];
    auto s3 = arg1_1_size[1];
    ...
    aoti_torch_proxy_executor_call_function(proxy_executor, 0, 15, std::vector<int64_t>{42, 16, 17, s0 + s1, s0 + s1, s2*s3, 45, 67, 16, 17, s2*s3, s2*s3, s0 + s1, 89, 910}.data(), 7, std::vector<AtenTensorHandle>{arg0_1, arg0_1, arg1_1, buf2, arg0_1, arg1_1, buf4}.data());
```

Example of serialized SymInt(s) arguments:
```
          {
            "name": "symint",
            "arg": {
              "asSymInt": {
                "asName": "s0 + s1"
              }
            }
          },
          {
            "name": "symints",
            "arg": {
              "asSymInts": [
                {
                  "asName": "s0 + s1"
                },
                {
                  "asName": "s2*s3"
                }
              ]
            }
          },
          ...
          {
            "name": "o_symint",
            "arg": {
              "asSymInt": {
                "asName": "s2*s3"
              }
            }
          },
          {
            "name": "o_symints",
            "arg": {
              "asSymInts": [
                {
                  "asName": "s2*s3"
                },
                {
                  "asName": "s0 + s1"
                }
              ]
            }
          },
```

Test Plan: buck2 run mode/dev-nosan deeplearning/aot_inductor/test:test_custom_ops

Differential Revision: D49887555

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110526
Approved by: https://github.com/chenyang78
2023-10-05 19:05:20 +00:00
Dmytro Dzhulgakov
a0cea517e7 Add 9.0a to cpp_extension supported compute archs (#110587)
There's an extended compute capability 9.0a for Hopper that was introduced in Cuda 12.0: https://docs.nvidia.com/cuda/archive/12.0.0/cuda-compiler-driver-nvcc/index.html#gpu-feature-list

E.g. Cutlass leverages it: 5f13dcad78/python/cutlass/emit/pytorch.py (L684)

This adds it to the list of permitted architectures to use in `cpp_extension` directly.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110587
Approved by: https://github.com/ezyang
2023-10-05 17:41:06 +00:00
dependabot[bot]
c89d35adfe
Bump pillow from 9.5.0 to 10.0.1 in /.ci/docker (#110494)
Bumps [pillow](https://github.com/python-pillow/Pillow) from 9.5.0 to 10.0.1.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/9.5.0...10.0.1)

---
updated-dependencies:
- dependency-name: pillow
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2023-10-05 10:37:26 -07:00
Antoni Viros i Martin
efdf155383 Add requirement for input to AllGatherIntoTensor to be contiguous (#109561)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109561
Approved by: https://github.com/Chillee
2023-10-05 17:04:48 +00:00
Ikko Eltociear Ashimine
f21c322e20 Fix typo in BatchLinearAlgebraLibBlas.cpp (#110608)
accomodate -> accommodate

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110608
Approved by: https://github.com/malfet
2023-10-05 16:48:53 +00:00
Catherine Lee
d6e5898e8d Quieter logs in CI (#110033)
To reduce the amount of logs
* for successes, only print the part that says what tests ran and don't print the rest.  Zip the log into an artifact.  The line listing al the test names is really long, but if you view source of the raw logs, it will not wrap so it will only be one line.  The log classifier can also be configured to ignored this line. Gets rid of lines like `test_ops.py::TestCommonCPU::test_multiple_devices_round_cpu_int64 SKIPPED [0.0010s] (Only runs on cuda) [  9%]`
* for failures/reruns, print logs.  Do not zip.

Also
* change log artifact name

Examples of various logs:
a074db0f7f failures
1b439e24c4 failures

possibly controversial haha
should i include an option for always printing?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110033
Approved by: https://github.com/huydhn
2023-10-05 16:40:37 +00:00
Joel Schlosser
3597325bc7 pin_memory support for NT (#110404)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110404
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
ghstack dependencies: #110292
2023-10-05 16:33:22 +00:00
ydwu4
cc1de49340 [HigherOrderOp] fallthrough some keys by default. (#110478)
Fixes #109253

Test Plan:
Added a new test that shows default fallthrough keys can be overrided.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110478
Approved by: https://github.com/ezyang
2023-10-05 16:25:42 +00:00
Jason Park
26f634eefb Enable aarch64 for fixing undefined symbol error. (#110542)
Summary: ARM can be safely supported

Reviewed By: andrewjcg, aaronenyeshi

Differential Revision: D49921679

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110542
Approved by: https://github.com/aaronenyeshi
2023-10-05 16:16:06 +00:00
Jeff Daily
a94b6f39d1 [ROCm] conditionally enable hipsparse const descriptors for version >= 2.4.0 (#110317)
This is in preparation for upcoming backwards-incompatible hipsparse changes.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110317
Approved by: https://github.com/malfet
2023-10-05 16:07:51 +00:00
chilli
f767a6c57a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 15:47:30 +00:00
PyTorch MergeBot
1e4c0641ce Revert "Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)"
This reverts commit 9648df1a6a.

Reverted https://github.com/pytorch/pytorch/pull/110504 on behalf of https://github.com/PaliC due to temporarily will revert as it's causing problems with difftrain import ([comment](https://github.com/pytorch/pytorch/pull/110504#issuecomment-1749132253))
2023-10-05 15:28:23 +00:00
Chien-Chin Huang
1a729618ef [FSDP][optim_state_dict] Make the new optimizer allgather fusion work with fine-tuning models (#110540)
With use_orig_params=True, it is possible that some parameters with the same FlatParameter are in the optimizer while others parameters are frozen. This PR makes the allgather fusion logic support the case.

Differential Revision: [D49922028](https://our.internmc.facebook.com/intern/diff/D49922028/)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110540
Approved by: https://github.com/awgu, https://github.com/rohan-varma
2023-10-05 15:17:10 +00:00
Joel Schlosser
f17fe89e14 Multiprocessing support for NT (#110292)
Fixes #110161

Allows NTs to be used in DataLoaders with `num_workers > 1`.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110292
Approved by: https://github.com/cpuhrsch, https://github.com/albanD
2023-10-05 15:04:48 +00:00
Andrew Or
7c72238e4b Back out "Enable pickling model prepared with QAT qconfig" (#110392)
Summary:
D49187352 caused our model conversion and loading of QAT checkpoint to be stuck with thrift time out.

we are actively checking in final code and model for static quant HTP prod model, and encountered this breakage at head Thursday.

Thrift timeout is a not failing, and because of that, it's hard to bisect and find this culprit. It is also hard to set up unit test, because the job simply time-out. Better test is needed to guard downstream model conversion against upstream changes.

Our suspicion of why this diff broke us is that we create a lot of modules with qat (in a recursive manner) but our model is not a qat traceable module (it is a graph with many qat modules and floating point modules). With fuctools.partial as in the original diff, we will be caching modules in the memory and causing the memory of the machine to be taken up completely.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110392
Approved by: https://github.com/junesg, https://github.com/jerryzh168
2023-10-05 14:41:00 +00:00
Oleg Khabinov
cf1b494afd [AOTInductor] Store loaded kernels in the model (#110554)
Defining kernels as static vars is problematic for subsequent model loading on non-default CUDA devices.

Assuming those kernels were loaded in context of the device #0, so, they are not nullptr anymore, therefore kernels won't work on devices other than the device #0.

This change makes devices remembered at model level in AOT mode.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110554
Approved by: https://github.com/chenyang78, https://github.com/desertfire
2023-10-05 10:17:05 +00:00
Sehoon Kim
c36b31d530 torch::nn::AdaptiveLogSoftmaxWithLoss: check length of cutoffs (#106777)
Fixes #106698

Also added a check for python API, because current error message
```
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/sehoon/pytorch-latest/torch/nn/modules/adaptive.py", line 128, in __init__
    or (min(cutoffs) <= 0) \
ValueError: min() arg is an empty sequence
```
is not very comprehensible.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/106777
Approved by: https://github.com/albanD
2023-10-05 05:35:47 +00:00
PyTorch UpdateBot
00b9afa429 [vision hash update] update the pinned vision hash (#110571)
This PR is auto-generated nightly by [this action](https://github.com/pytorch/pytorch/blob/main/.github/workflows/_update-commit-hash.yml).
Update the pinned vision hash.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110571
Approved by: https://github.com/pytorchbot
2023-10-05 05:14:04 +00:00
Avik Chaudhuri
416eca9736 export db links for user errors (#110555)
Ideally all `_dynamo.exc.UserError`s should have "case names", i.e., link to examples in `exportdb`.

This PR adds case names to several instances of `_dynamo.exc.UserError`. In particular, looking at coverage based on `UserErrorType`:
* `DYNAMIC_CONTROL_FLOW`, `ANTI_PATTERN`, and `STANDARD_LIBRARY` are fully covered.
* `CONSTRAINT_VIOLATION` and `DYNAMIC_DIM` have no coverage. We don't seem to have any dedicated examples of specifying dynamic shapes in `exportdb` (although they are used in some other examples without explanation, to avoid some specialization that would make such examples moot).
* `INVALID_INPUT` is only partly covered. Frankly this is tedious to cover via examples.

Differential Revision: [D49928518](https://our.internmc.facebook.com/intern/diff/D49928518/)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110555
Approved by: https://github.com/angelayi, https://github.com/ydwu4
2023-10-05 05:03:04 +00:00
PyTorch MergeBot
21019620ee Revert "[Dynamo] SizeVariable can be indexed by symint (#110349)"
This reverts commit 510ec7e3c5.

Reverted https://github.com/pytorch/pytorch/pull/110349 on behalf of https://github.com/PaliC due to breaking internal tests (check diff) ([comment](https://github.com/pytorch/pytorch/pull/110349#issuecomment-1748021641))
2023-10-05 04:42:33 +00:00
andrewor14
62cad5b5b0 [quant][pt2] Support cudnn_batch_norm in QAT fusion (#109908)
Summary: Today, we get different batch norm ops depending on
the device the model is placed on at export time. Exporting
`model.cpu()` gives `_native_batch_norm_legit`, while exporting
`model.cuda()` gives `cudnn_batch_norm`. QAT fusion currently
only supports the former and silently ignores the latter. This
commit fixes this by additionally matching on the latter op
during QAT fusion.

Test Plan:
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_fusion
python test/test_quantization.py TestQuantizePT2EQAT.test_qat_conv_bn_relu_fusion

Reviewers: jerryzh168, kimishpatel

Subscribers: jerryzh168, kimishpatel, supriyar

Differential Revision: [D49615145](https://our.internmc.facebook.com/intern/diff/D49615145)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/109908
Approved by: https://github.com/jerryzh168
2023-10-05 04:08:44 +00:00
lezcano
4b1e138162 [dynamo] [easy]Remove InstructionTranslator from within Set (#110521)
I believe this was a left over from the before times. See if CI agrees.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110521
Approved by: https://github.com/ezyang
2023-10-05 04:01:18 +00:00
Angela Yi
a93337ed55 [export] Add ir spec (#110394)
Summary: Copied IR spec over from Executorch

Test Plan: _docs_

Differential Revision: D49829187

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110394
Approved by: https://github.com/ydwu4, https://github.com/gmagogsfm
2023-10-05 03:06:30 +00:00
drisspg
a8653f35de One more small Perf Tweak to fill_ (#110294)
# Summary
Perf win by check which device tensors are on

## Before this PR:
``` Shell
CPU | CPU: 1.3328152848407626
GPU | GPU: 6.614773320034146
CPU | GPU: 29.027153505012393
GPU | CPU: 17.22372299991548
```
## After this PR
``` Shell
CPU | CPU: 1.4241038949694484
GPU | GPU: 7.060713530518115
CPU | GPU: 15.149936103262007
GPU | CPU: 5.774620908778161
```

#### Repro Script
``` Python
    a = torch.tensor([0.2, 0.5], device="cpu")
    amax = torch.tensor(0.5, device="cpu")
    print(f"CPU | CPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}")

    a = torch.tensor([0.2, 0.5], device="cuda")
    amax = torch.tensor(0.5, device="cuda")
    print(f"GPU | GPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}")

    a = torch.tensor([0.2, 0.5], device="cpu")
    amax = torch.tensor(0.5, device="cuda")
    print(f"CPU | GPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}")

    a = torch.tensor([0.2, 0.5], device="cuda")
    amax = torch.tensor(0.5, device="cpu")
    print(f"GPU | CPU: {benchmark_torch_function_in_microseconds(torch.fill_, a, amax)}")
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110294
Approved by: https://github.com/mikaylagawarecki
2023-10-05 02:42:57 +00:00
Kazuaki Ishizaki
434a996c42 Fix typo under torch/_inductor directory (#110530)
This PR fixes typo of comments and messages in files under `torch/_dynamo` directory.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110530
Approved by: https://github.com/kit1980
2023-10-05 02:17:20 +00:00
chilli
9648df1a6a Made pattern-matcher diagnostics lazily reported + added TORCH_COMPILE_CPROFILE (#110504)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110504
Approved by: https://github.com/mlazos, https://github.com/eellison
ghstack dependencies: #110501
2023-10-05 01:34:57 +00:00
chilli
e686341f64 Consider that ops can be fused into cat in the min-cut partitioner (#110501)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110501
Approved by: https://github.com/eellison
2023-10-05 01:34:57 +00:00
Justin Chu
d24e7be243 Include onnx and onnxscript information in collect_env.py (#110560)
`onnx` and `onnxscript` are used in torch.onnx.dynamo_export since 2.0. It would be helpful to collect version information in user issue reports.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110560
Approved by: https://github.com/albanD
2023-10-05 01:29:04 +00:00
Amadeusz Skrzypczak
653f966df0 Fix type promotion of float8_e5m2 and float8_e4m3fn (#110279)
There is an issue with float8 type promotion, because _promoteTypesLookup doesn't contain records for few types between bfloat16 and float8.
I have simply moved float8 types just after bfloat16, however I'm not sure if it doesn't break serialization.

Please, decide if it can stay like this, or should I insert missing records filled with "ud" into _promoteTypesLookup instead of moving types.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110279
Approved by: https://github.com/albanD
2023-10-05 01:28:48 +00:00
Bin Bao
c121f957c2 [aotinductor] Enable test_non_default_cuda_device on CI (#110509)
Summary: test_non_default_cuda_device needs to run on a multi-gpu CI instance

Differential Revision: [D49937115](https://our.internmc.facebook.com/intern/diff/D49937115)
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110509
Approved by: https://github.com/angelayi, https://github.com/khabinov, https://github.com/chenyang78
2023-10-05 01:25:50 +00:00
Jane Xu
9f40ffeec6 [optim] disable large_tensor tests for ROCm (#110559)
Closes #105825 #105820 #105754 by replacing with an incode skip.

Fixes #105825, fixes #105820, fixes #105754

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110559
Approved by: https://github.com/albanD
2023-10-05 01:21:21 +00:00
Edward Z. Yang
6a974bec5d Change flash attention outputs to be SymInt instead of int (#110533)
Fixes https://github.com/pytorch/pytorch/issues/110322

Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110533
Approved by: https://github.com/albanD
2023-10-05 01:00:07 +00:00
Edward Z. Yang
f1d81134ef Print output type if assert fires (#110534)
Signed-off-by: Edward Z. Yang <ezyang@meta.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110534
Approved by: https://github.com/albanD
2023-10-05 00:59:17 +00:00
Justin Chu
f3aba45049 [ONNX] Create onnxscript-torchlib specific xfails/skips for fx tests (#110536)
Creates xfail_onnxscript/skip_onnxscript so that it is clear torchlib needs to support it.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110536
Approved by: https://github.com/BowenBao
2023-10-05 00:39:05 +00:00
Mihir Patel
95c59b30b8 Update fully_sharded_data_parallel to fix typing (#110545)
Fixes typing so that linter does not complain when using CustomPolicy.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/110545
Approved by: https://github.com/awgu, https://github.com/Skylion007
2023-10-05 00:00:10 +00:00
Xuehai Pan
0daa7d4815 [test][docs] Fix doctest warnings for syntax errors (#110517)
Fixes some syntax errors in doctest find in CI tests.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110517
Approved by: https://github.com/albanD
2023-10-05 00:00:06 +00:00
Fabrice Pont
053367b1ed fix: flake8-bugbear code B024 (#107265)
See #106571 item B024

This fix concerns the addition of `abstractmethod` to methods declared inside abstract classes.

Should I also include PEP8 compliant reformatting on the files I had to modify ?
Pull Request resolved: https://github.com/pytorch/pytorch/pull/107265
Approved by: https://github.com/kit1980
2023-10-04 23:52:52 +00:00
Xuehai Pan
449271f3f1 [pytree] Extract reusable generic tests for pytree (#110395)
Part of #109684

- #109684

Changes:

- Add new functions `tree_structure`, `tree_leaves`, `tree_map_` and `tree_map_only_` to Python pytree.
- Extract reusable tests for pytree to `TestGenericPytree`.
- Change `treespec_dumps` and `treespec_loads` in C++ pytree to call Python pytree and use JSON string as serialization type.
- Rename `torch.utils.pytree` -> `torch.utils._cxx_pytree`.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/110395
Approved by: https://github.com/zou3519
2023-10-04 23:40:50 +00:00