pytorch

mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Author	SHA1	Message	Date
Nikita Shulga	0e83e7d56e	[EZ] Add logic to build Metal shader with debug info (#146768 ) By appending `-frecord-sources -gline-tables-only` to the compilation command Helpful when debugging shaders compiled into libtorch Test plan: Run `python ../tools/build_with_debinfo.py ../aten/src/ATen/native/mps/kernels/UpSample.metal ../aten/src/ATen/native/mps/operations/UpSample.mm` And then run following to capture shader and check that it contains debug info ```python import torch import os os.environ["MTL_CAPTURE_ENABLED"]="1" inp = torch.rand(size=(6, 3, 10, 20), device="mps", dtype=torch.float32) with torch.mps.profiler.metal_capture("bilinear2d"): out = torch.nn.functional.interpolate(x, scale_factor=(1.7,0.9), mode="bilinear") ``` <img width="769" alt="image" src="https://github.com/user-attachments/assets/e0316c1c-07a4-4da5-97b9-886c56857c1d" /> Pull Request resolved: https://github.com/pytorch/pytorch/pull/146768 Approved by: https://github.com/dcci	2025-02-08 23:40:23 +00:00
Aaron Gokaslan	292af3cc89	[BE][Ez]: ISC001 Auto concatenate implicit one line strings (#146408 ) Apply ruff rule about implicit string concatenation, this autofixes strings that are all the same type and on the same line. These lines are broken up likely as the result of autoformatters in the past. All fixes are automated using the autofixes in ISC001. Pull Request resolved: https://github.com/pytorch/pytorch/pull/146408 Approved by: https://github.com/justinchuby, https://github.com/janeyx99	2025-02-04 19:07:04 +00:00
Yang Wang	fd73ae2068	[Utilization] Convert timestamp to str for datetime64 (#145985 ) Convert all timestamp(float) to int timestamp during data pipeline for db type datetime64. float does not work when try to insert into clickhouse using jsonExtract. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145985 Approved by: https://github.com/huydhn	2025-02-03 21:05:18 +00:00
Scott Wolchok	3fae5c8509	torchgen: support exception boundary for ExecuTorch functions (#144341 ) Needed for ExecuTorch diff D67904052. Differential Revision: [D67906411](https://our.internmc.facebook.com/intern/diff/D67906411/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144341 Approved by: https://github.com/Jack-Khuu	2025-01-31 01:05:21 +00:00
cyy	d94d816d96	Simplify handling of max jobs in CMake builds (#145820 ) Fixes #ISSUE_NUMBER Pull Request resolved: https://github.com/pytorch/pytorch/pull/145820 Approved by: https://github.com/malfet	2025-01-31 00:55:39 +00:00
Yang Wang	a9ed7bd78e	[utilization] pipeline to create clean db records (#145327 ) upload_utilization_script to generate db-ready-insert records to s3 - generate two files: metadata and timeseries in ossci-utilization buckets - convert log record to db format ones - add unit test job for tools/stats/ Related Prs: setup composite action for data pipeline: https://github.com/pytorch/pytorch/pull/145310 add permission for composite action to access S3 bucket: https://github.com/pytorch-labs/pytorch-gha-infra/pull/595 add insert logic in s3 replicator: https://github.com/pytorch/test-infra/pull/6217 Pull Request resolved: https://github.com/pytorch/pytorch/pull/145327 Approved by: https://github.com/huydhn Co-authored-by: Huy Do <huydhn@gmail.com>	2025-01-29 23:48:50 +00:00
Catherine Lee	953e80936e	[linter] Grep linter batches long command (#145950 ) If the command is too long, the linter fails with ``` Failed due to OSError: [Errno 7] Argument list too long: 'grep' ``` Fix this by batching the command so it is shorter Limit of 750k was chosen due to `getconf ARG_MAX` returns ~1M on my mac. My guess is that most people shouldn't hit this unless they run --all-files and the directory length is long. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145950 Approved by: https://github.com/wdvr	2025-01-29 21:23:27 +00:00
Zain Rizvi	a6e3f294f1	Don't use mypy daemon in CI (#145961 ) This is an attempt to fix flaky mypy errors in CI that look like: ``` dmypy status --verbose connection_name : /var/folders/rf/qrn1jkgj0b9_tcznwp8ck46w0000gn/T/tmpjoqsid7_/dmypy.sock pid : 32233 error : timed out Daemon is stuck; consider /Users/zainr/pytorch/venv/bin/dmypy kill ``` "Fix" it by not using the daemon at all, since it doesn't actually provide any perf benefits in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145961 Approved by: https://github.com/malfet	2025-01-29 21:15:29 +00:00
rzou	ea141d8134	functional compiled autograd (#144707 ) This PR squashes together the following commits: https://github.com/pytorch/pytorch/pull/144115 https://github.com/pytorch/pytorch/pull/143417 https://github.com/pytorch/pytorch/pull/143405 https://github.com/pytorch/pytorch/pull/143387 https://github.com/pytorch/pytorch/pull/143304 https://github.com/pytorch/pytorch/pull/143296 This is a refactor of compiled autograd to use "functional autograd". The end goal is that it gets compiled autograd's initial capture to stop specializing on Tensor metadata, therefore allowing compiled autograd to better handle Tensor subclasses. For more information, please read the commit messages for each PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144707 Approved by: https://github.com/bdhirsh, https://github.com/xmfan, https://github.com/jansel	2025-01-27 05:20:56 +00:00
Isalia20	b75afa2e2e	[MPS] cholesky implementation (#145701 ) Requested in #77764 Closed #144193 due to a lot of conflicts when rebasing Pull Request resolved: https://github.com/pytorch/pytorch/pull/145701 Approved by: https://github.com/malfet	2025-01-27 01:53:03 +00:00
Aaron Gokaslan	f3304571fc	[BE][Ez]: FURB148 - remove useless enumerate calls (#145619 ) Remove useless enumerate calls Pull Request resolved: https://github.com/pytorch/pytorch/pull/145619 Approved by: https://github.com/drisspg	2025-01-24 23:37:15 +00:00
Aaron Orenstein	1335882b2a	If mypy fails it should report the error back to lintrunner (#145550 ) This happened to me because I had a bad LD_LIBRARY_PATH and mypy was failing to run (.so load error) - but lintrunner was silent about the underlying problem. Differential Revision: [D68593081](https://our.internmc.facebook.com/intern/diff/D68593081) Pull Request resolved: https://github.com/pytorch/pytorch/pull/145550 Approved by: https://github.com/bobrenjc93, https://github.com/Skylion007	2025-01-24 15:40:30 +00:00
PyTorch MergeBot	6dd8283381	Revert "[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 )" This reverts commit `5531fafffe`. Reverted https://github.com/pytorch/pytorch/pull/143296 on behalf of https://github.com/izaitsevfb due to breaking internal tests T213390054 ([comment](https://github.com/pytorch/pytorch/pull/143296#issuecomment-2611224926))	2025-01-23 23:34:13 +00:00
Yang Wang	6d4f5f7688	[Utilization][Usage Log] Add data model for record (#145114 ) Add data model for consistency and data model change in the future. The data model will be used during the post-test-process pipeline Pull Request resolved: https://github.com/pytorch/pytorch/pull/145114 Approved by: https://github.com/huydhn	2025-01-23 19:04:41 +00:00
Andy Lugo	faa10faa2c	[ROCm] CK SDPA - Move arch check to CK patch (#144777 ) __gfxXXX__ should only be visible by device code. Move the check to the ck kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/144777 Approved by: https://github.com/jeffdaily, https://github.com/xw285cornell, https://github.com/jianyuh	2025-01-23 04:12:25 +00:00
PyTorch MergeBot	dddf52b1b9	Revert "Enable grep_linter to use -a (#144589 )" This reverts commit `3c55669b88`. Reverted https://github.com/pytorch/pytorch/pull/144589 on behalf of https://github.com/clee2000 due to the line parameter is kind of important and -a is not as important as I thought it was so I'm going to revert this ([comment](https://github.com/pytorch/pytorch/pull/144589#issuecomment-2608349155))	2025-01-22 21:55:27 +00:00
rzou	5531fafffe	[compiled autograd] Proxy opaque nodes for built-in autograd nodes (#143296 ) This PR is on the way to getting compiled autograd's initial capture to stop specializing on Tensor metadata. This PR changes compiled autograd's initial capture to proxy an opaque (w.r.t. Dynamo) function into the graph for all built-in codegen'ed autograd nodes and validate_outputs. We changed each codegen'ed apply_with_saved (e.g. MulBackward0::apply_with_saved) to call into Python to proxy a function (compiled_autograd.ops.MulBackward0) into the graph. Then, we use the node's InputMetadata to "guess" at the properties of the output Tensors to create some new FakeTensors. Some details: - MulBackward0::apply_with_saved lives in libtorch_cpu, but needs to be call to Python via libtorch_python. There is an indirection (PyCompilerInterface) to do this. - MulBackward0::apply_with_saved passes a C++ function to Python. To make our lives easier, every codegen'ed apply_with_saved passes a C++ function with the same signature `(variable_list, ivalue_list) -> variable_list`. - We define how to pack arbitrary C++ types into IValue via a helper IValuePacker struct and codegen functional variants of each builtin C++ autograd node (e.g. MulBackward0_apply_functional_ivalue). MulBackward0 before this PR: https://gist.github.com/zou3519/a80381d5fa38e970e413fcd91b0530de MulBackward0 after this PR: https://gist.github.com/zou3519/0c2eee8b3d8d96232b51ef430b53c5b0 Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143296 Approved by: https://github.com/jansel	2025-01-22 21:50:29 +00:00
Aaron Orenstein	07669ed960	PEP585 update - benchmarks tools torchgen (#145101 ) This is one of a series of PRs to update us to PEP585 (changing Dict -> dict, List -> list, etc). Most of the PRs were completely automated with RUFF as follows: Since RUFF UP006 is considered an "unsafe" fix first we need to enable unsafe fixes: ``` --- a/tools/linter/adapters/ruff_linter.py +++ b/tools/linter/adapters/ruff_linter.py @@ -313,6 +313,7 @@ "ruff", "check", "--fix-only", + "--unsafe-fixes", "--exit-zero", *([f"--config={config}"] if config else []), "--stdin-filename", ``` Then we need to tell RUFF to allow UP006 (as a final PR once all of these have landed this will be made permanent): ``` --- a/pyproject.toml +++ b/pyproject.toml @@ -40,7 +40,7 @@ [tool.ruff] -target-version = "py38" +target-version = "py39" line-length = 88 src = ["caffe2", "torch", "torchgen", "functorch", "test"] @@ -87,7 +87,6 @@ "SIM116", # Disable Use a dictionary instead of consecutive `if` statements "SIM117", "SIM118", - "UP006", # keep-runtime-typing "UP007", # keep-runtime-typing ] select = [ ``` Finally running `lintrunner -a --take RUFF` will fix up the deprecated uses. Pull Request resolved: https://github.com/pytorch/pytorch/pull/145101 Approved by: https://github.com/bobrenjc93	2025-01-18 05:05:07 +00:00
Tom Ritchford	46fbd63405	Fix unbind_copy and add its decomposition (#134319 ) * Fixes https://github.com/pytorch/pytorch/issues/130829 Pull Request resolved: https://github.com/pytorch/pytorch/pull/134319 Approved by: https://github.com/amjames, https://github.com/eellison	2025-01-17 18:21:22 +00:00
PyTorch MergeBot	6c713ccb5e	Revert "Make functionalization `ViewMeta` serializable with pickle. (#143712 )" This reverts commit `b8abdaa286`. Reverted https://github.com/pytorch/pytorch/pull/143712 on behalf of https://github.com/kit1980 due to breaking internal builds ([comment](https://github.com/pytorch/pytorch/pull/143712#issuecomment-2597205261))	2025-01-17 00:52:50 +00:00
Yang Wang	fea9d18d5a	[Utilization Log] Concurrently collect aggregate data during the output interval (#143235 ) # overview Add worker to collect metrics in short intervals 1.Worker: Add a worker to collect usage metrics, by default, every 500ms, notice this is configurable 2.Calculate & avg and max as data point, by default, every 5 second. # Other clean up the log format for necessary needs, currentl we do not need to track gpu processesors etc, or all pids from psutil Pull Request resolved: https://github.com/pytorch/pytorch/pull/143235 Approved by: https://github.com/huydhn	2025-01-16 23:52:43 +00:00
PyTorch MergeBot	46b92c025d	Revert "Cholesky mps implementation (#144193 )" This reverts commit `727ae13318`. Reverted https://github.com/pytorch/pytorch/pull/144193 on behalf of https://github.com/malfet due to Alas, inductor changes broke inductor tests, see `aa4a1ff027/1` ([comment](https://github.com/pytorch/pytorch/pull/144193#issuecomment-2596938163))	2025-01-16 21:37:32 +00:00
Yukio Siraichi	b8abdaa286	Make functionalization `ViewMeta` serializable with pickle. (#143712 ) Fix: #141974 This PR makes `ViewMeta` sequence, present in functional tensors, serializable with pickle. In order to accomplish that, it makes `ViewMeta` an abstract class with overridable `forward` and `reverse` functions. In this context, each operation that once instanciated `ViewMeta`, should now create a new specialized class that inherits from `ViewMeta. Therefore, this PR also uses codegen for creating these specializations. In summary, these are the changes this PR introduces: - `ViewMeta` is turned into an abstract class (see _FunctionalStorageImpl.cpp_). `forward` and `reverse` are pure virtual functions that need to be implemented. `to_out_index` should be implemented by operations that might return more than 1 output. - New `ViewMeta` specializations for `resize_` and `_unsafe_view` are created (see _FunctionalizeFallbackKernel.h_). - New templates _ViewMetaClasses.{cpp,h}_ are created. They hold the declaration and definition of the `ViewMeta` specializations, which are automatically generated in the ATen codegen (see _gen.py_). - New `_functionalization` Python sub-module is created (see _Module.cpp_). It serves as namespace for the `ViewMeta` specializations and `InverseReturnMode` enum. - New template _ViewMetaClassesPythonBinding.cpp_ is created. It holds the automatically generated Python bindings for the `ViewMeta` specialization, which are generated in the torch codegen (see _generate_code.py_). Note that this PR makes use of codegen at 2 different moments: - ATen codegen (_gen.py_): generates the `ViewMeta` specialized classes. - Torch codegen (_generate_code.py_): generated the Python bindings for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143712 Approved by: https://github.com/bdhirsh	2025-01-16 19:41:41 +00:00
Isalia20	727ae13318	Cholesky mps implementation (#144193 ) Requested in #77764 PR is still in draft because it needs some cleanups and optimizations to get to cpu performance the least. Tasks: - [x] Make `upper=True` work, only `upper=False` works now - [x] Code cleanup - [x] Optimizations(Though might need some help on this)(tried my best, maybe there is still some more to squeeze out) - [x] Checks for positive definite input - [x] Support for (*, N, N) input, currently only supports (B, N, N) input - [x] Support other dtypes(float16, bfloat16) Pull Request resolved: https://github.com/pytorch/pytorch/pull/144193 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>	2025-01-16 16:26:46 +00:00
fduwjj	e3c4d1b7d6	[c10d][fr] Fix the bug when we still mark mismatch when there are match case (#144916 ) When we introduce partial match, we accidentally introduce the mark of mismatch for the full match case. This is wrong and this PR fix it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/144916 Approved by: https://github.com/c-p-i-o	2025-01-16 04:36:30 +00:00
Catherine Lee	3c55669b88	Enable grep_linter to use -a (#144589 ) Lintrunner can only apply changes (-a) if only one suggestion is made per file. The grep_linter makes a suggestion for every line it finds incorrect, so it creates multiple suggestions per file if there are multiple lines that it wants to change This sets the `line` parameter of the LintMessage to None for all of grep_linter, but I'm not sure if that entry did anything I'm not sure if enabling -a is the best idea, since its currently used for tabs and tab width might differ each time? I had one instance where running with -a cause the spacing to change. On the other hand, -a would have already worked if only one line was bad Pull Request resolved: https://github.com/pytorch/pytorch/pull/144589 Approved by: https://github.com/huydhn	2025-01-13 21:18:24 +00:00
PyTorch MergeBot	99f2491af9	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit `45411d1fc9`. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/jeanschmidt due to Breaking internal CI, @albanD please help get this PR merged ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2571316444))	2025-01-04 14:17:20 +00:00
Xiaodong Wang	0a94bb432e	[ROCm] CK Flash Attention Backend (#143695 ) Replace https://github.com/pytorch/pytorch/pull/138947 for re-import. Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling torch.backends.cuda.preferred_rocm_fa_library("ck"). Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via USE_FLASH_ATTENTION) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143695 Approved by: https://github.com/malfet Co-authored-by: Andy Lugo <Andy.LugoReyes@amd.com> Co-authored-by: Jithun Nair <jithun.nair@amd.com>	2025-01-03 22:01:36 +00:00
Xuehai Pan	45411d1fc9	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2025-01-03 20:03:40 +00:00
Catherine Lee	bb5e439f2d	Add networkx as bazel dep to fix CI failure (#143995 ) Add networkx as a dependency for test_bazel Example failure: https://github.com/pytorch/pytorch/actions/runs/12551752021/job/34996706301 ``` INFO: From Testing //:test_bazel: ==================== Test output for //:test_bazel: Traceback (most recent call last): File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/test/_test_bazel.py", line 33, in <module> test_simple_compile_eager() File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/test/_test_bazel.py", line 27, in test_simple_compile_eager opt_foo1 = torch.compile(foo, backend="eager") File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/__init__.py", line 2533, in compile backend = _TorchCompileWrapper(backend, mode, options, dynamic) File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/__init__.py", line 2342, in __init__ self.compiler_fn = lookup_backend(backend) File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/registry.py", line 66, in lookup_backend _lazy_import() File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/registry.py", line 102, in _lazy_import import_submodule(backends) File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/utils.py", line 2797, in import_submodule importlib.import_module(f"{mod.__name__}.{filename[:-3]}") File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/execroot/pytorch/external/python3_10_x86_64-unknown-linux-gnu/lib/python3.10/importlib/__init__.py", line 126, in import_module return _bootstrap._gcd_import(name[level:], package, level) File "<frozen importlib._bootstrap>", line 1050, in _gcd_import File "<frozen importlib._bootstrap>", line 1027, in _find_and_load File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked File "<frozen importlib._bootstrap>", line 688, in _load_unlocked File "<frozen importlib._bootstrap_external>", line 883, in exec_module File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_dynamo/backends/common.py", line 12, in <module> from torch._functorch.aot_autograd import ( File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/aot_autograd.py", line 147, in <module> from .partitioners import default_partition File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/partitioners.py", line 31, in <module> from ._activation_checkpointing.graph_info_provider import GraphInfoProvider File "/var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/sandbox/processwrapper-sandbox/6504/execroot/pytorch/bazel-out/k8-fastbuild/bin/test_bazel.runfiles/pytorch/torch/_functorch/_activation_checkpointing/graph_info_provider.py", line 3, in <module> import networkx as nx ModuleNotFoundError: No module named 'networkx' ``` No periodic runs on this PR or its main branch commit, but I'm pretty sure its started on https://togithub.com/pytorch/pytorch/pull/143539 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143995 Approved by: https://github.com/huydhn	2025-01-02 19:42:18 +00:00
Benjamin Glass	d88a8c41d5	Fix flaky "Upload test stats" job (#143991 ) Test stat uploading was intermittently failing due to certain XML strings being opportunistically converted to numbers, when string output was expected. This PR makes the conversion behavior optional, which should fix the stat uploads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143991 Approved by: https://github.com/clee2000, https://github.com/huydhn	2024-12-30 21:40:01 +00:00
Xuehai Pan	b6bdb67f82	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-12-29 17:23:13 +00:00
Xuehai Pan	d2f769476f	[Easy] add quotes to shell activation commands (#143902 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143902 Approved by: https://github.com/Skylion007, https://github.com/malfet	2024-12-27 19:17:46 +00:00
Xuehai Pan	c4bff71854	[Easy] Add ROCm support to nightly pull tool (#141282 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/141282 Approved by: https://github.com/malfet ghstack dependencies: #143263	2024-12-27 00:07:38 +00:00
Xuehai Pan	51a7ecde80	[Easy] Bump CUDA nightly version to 11.8 / 12.4 / 12.6 in nightly pull tool (#143263 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/143263 Approved by: https://github.com/malfet	2024-12-26 19:01:38 +00:00
PyTorch MergeBot	475656fd9c	Revert "[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 )" This reverts commit `2293fe1024`. Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/malfet due to failing internal ROCM builds with error: ModuleNotFoundError: No module named hipify ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2562973920))	2024-12-26 17:32:23 +00:00
PyTorch MergeBot	cc4e70b7c3	Revert "Use absolute path `path.resolve()` -> `path.absolute()` (#129409 )" This reverts commit `135c7db99d`. Reverted https://github.com/pytorch/pytorch/pull/129409 on behalf of https://github.com/malfet due to need to revert to as dependency of https://github.com/pytorch/pytorch/pull/129374 ([comment](https://github.com/pytorch/pytorch/pull/129409#issuecomment-2562969825))	2024-12-26 17:26:06 +00:00
Xuehai Pan	b77406a9ec	[BE][CI] bump `ruff` to 0.8.4 (#143753 ) Changes: 1. Bump `ruff` from 0.7.4 to 0.8.4 2. Change `%`-formatted strings to f-string 3. Change arguments with the `__`-prefix to positional-only arguments with the `/` separator in function signature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/143753 Approved by: https://github.com/Skylion007	2024-12-24 12:24:10 +00:00
Xuehai Pan	135c7db99d	Use absolute path `path.resolve()` -> `path.absolute()` (#129409 ) Changes: 1. Always explicit `.absolute()`: `Path(__file__)` -> `Path(__file__).absolute()` 2. Replace `path.resolve()` with `path.absolute()` if the code is resolving the PyTorch repo root directory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/129409 Approved by: https://github.com/albanD	2024-12-24 08:33:08 +00:00
Jason Ansel	eebc93d41e	Better fix for f-strings in set_linter for py3.12 (#143725 ) #143628 didn't handle a few cases right for example: ```py $ python3 tools/linter/adapters/set_linter.py torch/_inductor/scheduler.py torch/_inductor/scheduler.py:261:24: Builtin `set` is deprecated 259 \| multiline=False, 260 \| ) 261 \| return f"{self}{data_str}" ^ 262 \| 263 \| def log_details(self) -> None: torch/_inductor/scheduler.py:261:33: Builtin `set` is deprecated 259 \| multiline=False, 260 \| ) 261 \| return f"{self}{data_str}" ^ 262 \| 263 \| def log_details(self) -> None: ``` also multi-line fstrings Pull Request resolved: https://github.com/pytorch/pytorch/pull/143725 Approved by: https://github.com/yanboliang	2024-12-22 22:51:27 +00:00
Tom Ritchford	f1cbf4b1b5	Enable ruff's unused variable checking everywhere in pytorch (#136965 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/136965 Approved by: https://github.com/cyyever, https://github.com/albanD	2024-12-22 02:33:11 +00:00
Xuehai Pan	2293fe1024	[BE][Easy] use `pathlib.Path` instead of `dirname` / `".."` / `pardir` (#129374 ) Changes by apply order: 1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`. 2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`. 3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first. `.parent{...}.absolute()` -> `.absolute().parent{...}` 4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.) `.parent.parent.parent.parent` -> `.parents[3]` 5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~ ~`.parents[3]` -> `.parents[4 - 1]`~ 6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374 Approved by: https://github.com/justinchuby, https://github.com/malfet	2024-12-21 22:08:01 +00:00
Jason Ansel	04b26ee1e8	Fix false positive from f-strings in set_linter (#143628 ) This linter was going crazy in python 3.12, example: ```py $ python3 tools/linter/adapters/set_linter.py torch/_inductor/runtime/triton_heuristics.py torch/_inductor/runtime/triton_heuristics.py:192:25: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:192:27: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:192:29: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:192:31: Builtin `set` is deprecated 190 \| args_str += ", ".join(call_args) 191 \| for k, v in call_kwargs.items(): 192 \| args_str += f", {k}={v}" ^ 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) torch/_inductor/runtime/triton_heuristics.py:195:17: Builtin `set` is deprecated 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: ^ 196 \| f.write(f"{kernel_name} \| {args_str}\n") 197 \| torch/_inductor/runtime/triton_heuristics.py:195:26: Builtin `set` is deprecated 193 \| 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: ^ 196 \| f.write(f"{kernel_name} \| {args_str}\n") 197 \| torch/_inductor/runtime/triton_heuristics.py:196:19: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:196:31: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:196:35: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:196:44: Builtin `set` is deprecated 194 \| abs_path = os.path.abspath(sys.argv[0]) 195 \| with open(f"{abs_path}.launch_params", "a") as f: 196 \| f.write(f"{kernel_name} \| {args_str}\n") ^ 197 \| 198 \| torch/_inductor/runtime/triton_heuristics.py:729:26: Builtin `set` is deprecated 727 \| exec( 728 \| f""" 729 \| def launcher({', '.join(def_args)}, grid, stream): ^ 730 \| if callable(grid): 731 \| grid_0, grid_1, grid_2 = grid(grid_meta) torch/_inductor/runtime/triton_heuristics.py:729:46: Builtin `set` is deprecated 727 \| exec( 728 \| f""" 729 \| def launcher({', '.join(def_args)}, grid, stream): ^ 730 \| if callable(grid): 731 \| grid_0, grid_1, grid_2 = grid(grid_meta) torch/_inductor/runtime/triton_heuristics.py:735:24: Builtin `set` is deprecated 733 \| grid_0, grid_1, grid_2 = grid 734 \| 735 \| args = {', '.join(call_args)}, ^ 736 \| launch_args = get_launch_args( 737 \| grid, grid_0, grid_1, grid_2, stream, function, torch/_inductor/runtime/triton_heuristics.py:735:45: Builtin `set` is deprecated 733 \| grid_0, grid_1, grid_2 = grid 734 \| 735 \| args = {', '.join(call_args)}, ^ 736 \| launch_args = get_launch_args( 737 \| grid, grid_0, grid_1, grid_2, stream, function, torch/_inductor/runtime/triton_heuristics.py:1144:20: Builtin `set` is deprecated 1142 \| cur_file = inspect.stack()[1].filename 1143 \| summary_str = ( 1144 \| f"SUMMARY ({cur_file})\n" ^ 1145 \| f"{overall_time:.2f}ms \t {overall_gb:.2f} GB\t {overall_gb / (overall_time / 1e3):.2f}GB/s" 1146 \| ) torch/_inductor/runtime/triton_heuristics.py:1144:29: Builtin `set` is deprecated 1142 \| cur_file = inspect.stack()[1].filename 1143 \| summary_str = ( 1144 \| f"SUMMARY ({cur_file})\n" ^ 1145 \| f"{overall_time:.2f}ms \t {overall_gb:.2f} GB\t {overall_gb / (overall_time / 1e3):.2f}GB/s" 1146 \| ) torch/_inductor/runtime/triton_heuristics.py:1162:61: Builtin `set` is deprecated 1160 \| ) 1161 \| file.write("====================\n") 1162 \| file.write(f"TRITON KERNELS BANDWIDTH INFO ({cur_file})\n") ^ 1163 \| for ms, num_gb, gb_per_s, kernel_name in sorted_calls: 1164 \| # also display the runtime percentage for each kernel torch/_inductor/runtime/triton_heuristics.py:1162:70: Builtin `set` is deprecated 1160 \| ) 1161 \| file.write("====================\n") 1162 \| file.write(f"TRITON KERNELS BANDWIDTH INFO ({cur_file})\n") ^ 1163 \| for ms, num_gb, gb_per_s, kernel_name in sorted_calls: 1164 \| # also display the runtime percentage for each kernel torch/_inductor/runtime/triton_heuristics.py:1166:36: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1166:47: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1166:52: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1166:64: Builtin `set` is deprecated 1164 \| # also display the runtime percentage for each kernel 1165 \| percentage = f"{ms / overall_time * 100:.2f}%" 1166 \| suffix = f" \t {percentage} \t {kernel_name}" ^ 1167 \| bw_info_str = create_bandwidth_info_str( 1168 \| ms, torch/_inductor/runtime/triton_heuristics.py:1175:30: Builtin `set` is deprecated 1173 \| ) 1174 \| file.write(bw_info_str + "\n") 1175 \| file.write(f"{summary_str}\n\n") ^ 1176 \| except Exception as e: 1177 \| log.warning( torch/_inductor/runtime/triton_heuristics.py:1175:42: Builtin `set` is deprecated 1173 \| ) 1174 \| file.write(bw_info_str + "\n") 1175 \| file.write(f"{summary_str}\n\n") ^ 1176 \| except Exception as e: 1177 \| log.warning( torch/_inductor/runtime/triton_heuristics.py:1205:29: Builtin `set` is deprecated 1203 \| else: 1204 \| possible_names = _find_names(self) 1205 \| kernel_name = f"{max(possible_names, key=len)}" ^ 1206 \| if not re.match(self.regex_filter, kernel_name): 1207 \| return torch/_inductor/runtime/triton_heuristics.py:1205:58: Builtin `set` is deprecated 1203 \| else: 1204 \| possible_names = _find_names(self) 1205 \| kernel_name = f"{max(possible_names, key=len)}" ^ 1206 \| if not re.match(self.regex_filter, kernel_name): 1207 \| return torch/_inductor/runtime/triton_heuristics.py:1241:60: Builtin `set` is deprecated 1239 \| "%s", 1240 \| create_bandwidth_info_str( 1241 \| ms, num_gb, gb_per_s, suffix=f" \t {kernel_name}" ^ 1242 \| ), 1243 \| ) torch/_inductor/runtime/triton_heuristics.py:1241:72: Builtin `set` is deprecated 1239 \| "%s", 1240 \| create_bandwidth_info_str( 1241 \| ms, num_gb, gb_per_s, suffix=f" \t {kernel_name}" ^ 1242 \| ), 1243 \| ) torch/_inductor/runtime/triton_heuristics.py:1256:15: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:42: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:44: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:58: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:60: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1256:75: Builtin `set` is deprecated 1254 \| for cfg in configs: 1255 \| hasher.update( 1256 \| f"{sorted(cfg.kwargs.items())} {cfg.num_warps} {cfg.num_stages}\n".encode() ^ 1257 \| ) 1258 \| return hasher.hexdigest() torch/_inductor/runtime/triton_heuristics.py:1377:23: Builtin `set` is deprecated 1375 \| if numel is None: 1376 \| continue 1377 \| block = cfg[f"{label}BLOCK"] ^ 1378 \| if numel == 1: 1379 \| assert block == 1, ( torch/_inductor/runtime/triton_heuristics.py:1377:29: Builtin `set` is deprecated 1375 \| if numel is None: 1376 \| continue 1377 \| block = cfg[f"{label}BLOCK"] ^ 1378 \| if numel == 1: 1379 \| assert block == 1, ( torch/_inductor/runtime/triton_heuristics.py:1381:24: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:38: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:46: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:52: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:58: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:64: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:71: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:77: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:84: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1381:88: Builtin `set` is deprecated 1379 \| assert block == 1, ( 1380 \| f"TritonKernel.indexing assumes numel == 1 => BLOCK == 1" 1381 \| f" but {label.lower()}numel=={numel} and {label}BLOCK={block} (cfg={cfg})." ^ 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] torch/_inductor/runtime/triton_heuristics.py:1384:52: Builtin `set` is deprecated 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] 1384 \| max_block_str = f'config.triton.max_block["{label}"]' ^ 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" torch/_inductor/runtime/triton_heuristics.py:1384:58: Builtin `set` is deprecated 1382 \| ) 1383 \| max_block = TRITON_MAX_BLOCK[label] 1384 \| max_block_str = f'config.triton.max_block["{label}"]' ^ 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" torch/_inductor/runtime/triton_heuristics.py:1386:45: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1386:51: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1386:66: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1386:80: Builtin `set` is deprecated 1384 \| max_block_str = f'config.triton.max_block["{label}"]' 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" ^ 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." 1388 \| ) torch/_inductor/runtime/triton_heuristics.py:1387:20: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:26: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:33: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:39: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:45: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:59: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:61: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:71: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:78: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1387:82: Builtin `set` is deprecated 1385 \| assert max_block % block == 0, ( 1386 \| f"TritonKernel.indexing assumes {label}BLOCK divides {max_block_str}" 1387 \| f" but {label}BLOCK={block} and {max_block_str}={max_block} (cfg={cfg})." ^ 1388 \| ) 1389 \| torch/_inductor/runtime/triton_heuristics.py:1402:19: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:23: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:46: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:56: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:67: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1402:71: Builtin `set` is deprecated 1400 \| assert ( 1401 \| val <= max_block 1402 \| ), f"'{var}' too large. Maximum: {max_block}. Actual: {val}." ^ 1403 \| 1404 \| torch/_inductor/runtime/triton_heuristics.py:1551:21: Builtin `set` is deprecated 1549 \| rnumels = {} 1550 \| for idx in range(num_reduction_dims - 1, -1, -1): 1551 \| prefix = f"r{idx}_" ^ 1552 \| max_size = min(size_hints[prefix], TRITON_MAX_BLOCK[prefix.upper()]) 1553 \| dim = min(max_size, remaining) torch/_inductor/runtime/triton_heuristics.py:1551:25: Builtin `set` is deprecated 1549 \| rnumels = {} 1550 \| for idx in range(num_reduction_dims - 1, -1, -1): 1551 \| prefix = f"r{idx}_" ^ 1552 \| max_size = min(size_hints[prefix], TRITON_MAX_BLOCK[prefix.upper()]) 1553 \| dim = min(max_size, remaining) torch/_inductor/runtime/triton_heuristics.py:1556:34: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1556:38: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1556:67: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1556:77: Builtin `set` is deprecated 1554 \| assert ( 1555 \| remaining % dim == 0 1556 \| ), f"Expected dimension '{dim}' to divide remaining size '{remaining}'" ^ 1557 \| rnumels[prefix] = dim 1558 \| remaining //= dim torch/_inductor/runtime/triton_heuristics.py:1564:38: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1564:46: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1564:57: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1564:59: Builtin `set` is deprecated 1562 \| assert ( 1563 \| r == final_numel 1564 \| ), f"Expected ND reduction size ({rnumels}) to have {r} elements." ^ 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels torch/_inductor/runtime/triton_heuristics.py:1567:37: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1567:45: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1567:49: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1567:60: Builtin `set` is deprecated 1565 \| assert all( 1566 \| rnumels[prefix] <= size_hints[prefix] for prefix in rnumels 1567 \| ), f"rnumels exceed size_hints. {rnumels} > {size_hints}" ^ 1568 \| 1569 \| return rnumels torch/_inductor/runtime/triton_heuristics.py:1746:49: Builtin `set` is deprecated 1744 \| 1745 \| if not configs: 1746 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1747 \| return cached_autotune( 1748 \| size_hints, torch/_inductor/runtime/triton_heuristics.py:1746:60: Builtin `set` is deprecated 1744 \| 1745 \| if not configs: 1746 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1747 \| return cached_autotune( 1748 \| size_hints, torch/_inductor/runtime/triton_heuristics.py:1928:32: Builtin `set` is deprecated 1926 \| for prefix in size_hints: 1927 \| if prefix_is_reduction(prefix): 1928 \| c.kwargs.pop(f"{prefix.upper()}BLOCK") ^ 1929 \| 1930 \| if disable_pointwise_autotuning(inductor_meta): torch/_inductor/runtime/triton_heuristics.py:1928:47: Builtin `set` is deprecated 1926 \| for prefix in size_hints: 1927 \| if prefix_is_reduction(prefix): 1928 \| c.kwargs.pop(f"{prefix.upper()}BLOCK") ^ 1929 \| 1930 \| if disable_pointwise_autotuning(inductor_meta): torch/_inductor/runtime/triton_heuristics.py:1975:49: Builtin `set` is deprecated 1973 \| assert triton_meta is not None 1974 \| if len(size_hints) != 2: 1975 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1976 \| 1977 \| configs = _reduction_configs(size_hints=size_hints, inductor_meta=inductor_meta) torch/_inductor/runtime/triton_heuristics.py:1975:60: Builtin `set` is deprecated 1973 \| assert triton_meta is not None 1974 \| if len(size_hints) != 2: 1975 \| raise NotImplementedError(f"size_hints: {size_hints}") ^ 1976 \| 1977 \| configs = _reduction_configs(size_hints=size_hints, inductor_meta=inductor_meta) torch/_inductor/runtime/triton_heuristics.py:2082:56: Builtin `set` is deprecated 2080 \| xnumel, ynumel, znumel = numels[2], numels[1], numels[0] 2081 \| else: 2082 \| raise AssertionError(f"invalid size for numels {len(numels)}") ^ 2083 \| 2084 \| def get_grid_dim(numel, block): torch/_inductor/runtime/triton_heuristics.py:2082:68: Builtin `set` is deprecated 2080 \| xnumel, ynumel, znumel = numels[2], numels[1], numels[0] 2081 \| else: 2082 \| raise AssertionError(f"invalid size for numels {len(numels)}") ^ 2083 \| 2084 \| def get_grid_dim(numel, block): torch/_inductor/runtime/triton_heuristics.py:2104:57: Builtin `set` is deprecated 2102 \| torch._check( 2103 \| y_grid <= max_y_grid, 2104 \| lambda: f"Generated y grid beyond 2^16 ({y_grid}) not supported with z dimension present. File issue", ^ 2105 \| ) 2106 \| torch/_inductor/runtime/triton_heuristics.py:2104:64: Builtin `set` is deprecated 2102 \| torch._check( 2103 \| y_grid <= max_y_grid, 2104 \| lambda: f"Generated y grid beyond 2^16 ({y_grid}) not supported with z dimension present. File issue", ^ 2105 \| ) 2106 \| torch/_inductor/runtime/triton_heuristics.py:2113:43: Builtin `set` is deprecated 2111 \| ) 2112 \| 2113 \| setattr(grid_fn, "grid_fn_str", f"grid{numels}") # noqa: B010 ^ 2114 \| 2115 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2113:50: Builtin `set` is deprecated 2111 \| ) 2112 \| 2113 \| setattr(grid_fn, "grid_fn_str", f"grid{numels}") # noqa: B010 ^ 2114 \| 2115 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2122:48: Builtin `set` is deprecated 2120 \| return (meta["RSPLIT"], ceildiv(xnumel, meta.get("XBLOCK", 1)), 1) 2121 \| 2122 \| grid_fn_str = f"cooperative_reduction_grid({xnumel})" ^ 2123 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2124 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2122:55: Builtin `set` is deprecated 2120 \| return (meta["RSPLIT"], ceildiv(xnumel, meta.get("XBLOCK", 1)), 1) 2121 \| 2122 \| grid_fn_str = f"cooperative_reduction_grid({xnumel})" ^ 2123 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2124 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2135:54: Builtin `set` is deprecated 2133 \| coop_grid = cooperative_reduction_grid(xnumel) 2134 \| normal_grid = grid(xnumel) 2135 \| grid_fn_str = f"maybe_cooperative_reduction_grid({xnumel})" ^ 2136 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2137 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2135:61: Builtin `set` is deprecated 2133 \| coop_grid = cooperative_reduction_grid(xnumel) 2134 \| normal_grid = grid(xnumel) 2135 \| grid_fn_str = f"maybe_cooperative_reduction_grid({xnumel})" ^ 2136 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2137 \| return grid_fn torch/_inductor/runtime/triton_heuristics.py:2145:37: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2145:44: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2145:47: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2145:54: Builtin `set` is deprecated 2143 \| return (ceildiv(rnumel, meta.get("R0_BLOCK", 1)), xnumel, 1) 2144 \| 2145 \| grid_fn_str = f"split_scan_grid({xnumel}, {rnumel})" ^ 2146 \| setattr(grid_fn, "grid_fn_str", grid_fn_str) # noqa: B010 2147 \| torch/_inductor/runtime/triton_heuristics.py:2173:42: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch torch/_inductor/runtime/triton_heuristics.py:2173:53: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch torch/_inductor/runtime/triton_heuristics.py:2173:66: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch torch/_inductor/runtime/triton_heuristics.py:2173:77: Builtin `set` is deprecated 2171 \| assert ( 2172 \| min_blocks_d is None or min_blocks == min_blocks_d 2173 \| ), f"inconsistent min_blocks {min_blocks} vs x grid {numels[-1]}" ^ 2174 \| else: 2175 \| # sequential dispatch ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143628 Approved by: https://github.com/yanboliang, https://github.com/rec	2024-12-20 11:45:26 +00:00
Ryan Guo	629de4da60	[dynamo] Add a lint rule to restrict what 3P library one can import (#143312 ) As title, this patch prevents developers from importing third party libraries to patch things in Dynamo, unless there's no other easy workaround (in which case one would add the library to the allowlist in `import_linter.py`, as instructed by the lint error). For instance, if we remove `einops` from the allowlist, we'd get this ```verbatim >>> Lint for torch/_dynamo/decorators.py: Error (IMPORT) Disallowed import importing from einops is not allowed, if you believe there's a valid reason, please add it to import_linter.py 608 \|# Note: this carefully avoids eagerly import einops. 609 \|# TODO: we should delete this whole _allow_in_graph_einops logic by approximately 2024 Q2 610 \|def _allow_in_graph_einops(): >>> 611 \| import einops 612 \| 613 \| try: 614 \| # requires einops > 0.6.1, torch >= 2.0 Error (IMPORT) Disallowed import importing from einops is not allowed, if you believe there's a valid reason, please add it to import_linter.py 612 \| 613 \| try: 614 \| # requires einops > 0.6.1, torch >= 2.0 >>> 615 \| from einops._torch_specific import ( # type: ignore[attr-defined] # noqa: F401 616 \| _ops_were_registered_in_torchdynamo, 617 \| ) 618 \| ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/143312 Approved by: https://github.com/zou3519	2024-12-19 20:59:16 +00:00
Eli Uriegas	b247f87845	tools: Add a tool to build wheels for multiple python versions (#143361 ) Adds a tool to build bdist_wheels sequentially for multiple different python versions (if specified). The goal of this tool is to eventually be able to utilize this in our binary build runs to significantly reduce the amount of time we take to build packages by utilizing a local ccache from the first build. Tested locally using the following: ``` $ ccache -C # clear cache # -p could actually reference any python interpreter $ python tools/packaging/build_wheel.py \ -p /home/eliuriegas/.local/share/uv/python/cpython-3.12.7-linux-x86_64-gnu/bin/python3.12 \ -p /home/eliuriegas/.local/share/uv/python/cpython-3.13.0-linux-x86_64-gnu/bin/python3.13 \ -d dist-multi/ ... 2024-12-17 10:48:11,365 - INFO - Build time (3.12.7): 571.440689s 2024-12-17 10:48:11,365 - INFO - Build time (3.13.0): 191.147503s ``` Signed-off-by: Eli Uriegas <eliuriegas@meta.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/143361 Approved by: https://github.com/malfet, https://github.com/atalman	2024-12-17 21:56:06 +00:00
Chirag Pandya	0bdc173ab6	[fr] recognize all_reduce_barrier as a valid op (#143354 ) Summary: D67068632 introduced a better profiling name for barrier operations to be able to distinguish various ops. Unfortunately, this broke Flight Recorder Analysis with the following error as reported by dmwu ``` fr_trace -m torchx-param_bench_16g_mi300x-all_to_all -a 0 --mast_job_version 98 -w 16 Traceback (most recent call last): File "/usr/local/fbcode/platform010/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/local/fbcode/platform010/lib/python3.10/runpy.py", line 86, in _run_code ``` Test Plan: Test manually. Differential Revision: D67305997 Pull Request resolved: https://github.com/pytorch/pytorch/pull/143354 Approved by: https://github.com/wconstab	2024-12-17 21:09:18 +00:00
PyTorch MergeBot	969b07b96f	Revert "[ROCm] CK Flash Attention Backend (#138947 )" This reverts commit `500d02921b`. Reverted https://github.com/pytorch/pytorch/pull/138947 on behalf of https://github.com/atalman due to Breaks default windows checkout ([comment](https://github.com/pytorch/pytorch/pull/138947#issuecomment-2548998359))	2024-12-17 16:46:57 +00:00
Andy Lugo	500d02921b	[ROCm] CK Flash Attention Backend (#138947 ) Replaces https://github.com/ROCm/pytorch/pull/1592 This PR contains the initial implementation of SDPA with composable_kernel backend. The CK path can be forced by simply calling `torch.backends.cuda.preferred_rocm_fa_library("ck")`. Similarly, you can force the incumbent aotriton implementation by passing in "aotriton" or "default". As you'd expect, not setting this option will result in aotriton to be used as the backend. In the case of CK, if pytorch deems flash attention usable, then it will use the CK path in all the same places aotriton would have been used. This PR makes no changes to the heuristics which select which attention scheme to use (i.e. flash attention vs memory efficient attention vs math etc etc). It only gets called when flash attention is both enabled (via `USE_FLASH_ATTENTION`) and is selected at runtime by the existing heuristics. Files located in pytorch/aten/src/ATen/native/transformers/hip/flash_attn/ck/mha* have been pulled from https://github.com/Dao-AILab/flash-attention courtesy of @tridao's hard work who is the co-author NOTE: In order to use this backend, the user MUST set USE_CK_FLASH_ATTENTION=1 in their environment when they build PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/138947 Approved by: https://github.com/pruthvistony, https://github.com/xw285cornell, https://github.com/leitian Co-authored-by: Xiaodong Wang <xw285@cornell.edu>	2024-12-17 02:18:07 +00:00
rzou	557da8014d	[gen_autograd_functions] rename some variables (#143166 ) This is a follow-up from https://github.com/pytorch/pytorch/pull/141278. Test Plan: - existing tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/143166 Approved by: https://github.com/soulitzer	2024-12-16 23:18:55 +00:00
Huy Do	39cacc1d81	Fix missing tests on test tool lint job (#143052 ) A follow-up from https://github.com/pytorch/pytorch/pull/142476#discussion_r1878888558 where some tests are not discovered correctly by pytest ### Testing https://github.com/pytorch/pytorch/actions/runs/12287448581/job/34289531307?pr=143052#step:14:162 shows the correct number of tests now Pull Request resolved: https://github.com/pytorch/pytorch/pull/143052 Approved by: https://github.com/ZainRizvi	2024-12-12 20:29:32 +00:00

1 2 3 4 5 ...

5216 commits