Haoci Zhang
8fe5b93667
support zb1p and zb2p algorithms ( #130752 )
...
Previously, we have proved that ZB2P is not truly zero bubble when num_local_stages exceed 4 and so only ZB1P was supported.
We did a few tweaks to the ZB2P to really make it zero bubble. Algorithm and proof is attached.
[zero_bubble.pdf](https://github.com/user-attachments/files/16238738/zero_bubble.pdf )
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130752
Approved by: https://github.com/H-Huang
2024-07-24 17:58:46 +00:00
Jun Luo
abb313b466
[torch.mtia] Noop set_rng_state and get_rng_state APIs ( #130873 )
...
Summary: As title
Test Plan: CI tests
Reviewed By: joebos
Differential Revision: D59036602
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130873
Approved by: https://github.com/hanzlfs
2024-07-24 01:52:21 +00:00
Shangdi Yu
68c725a094
[custom ops] Add register_vmap for custom ops ( #130589 )
...
Fixes #130284
Fixes #130653
- Add `torch.library.register_vmap` to custom ops
- Add `register_vmap` for operators in ops in custom_op_db.
- Make `torch.autograd.Function` support kwarg-only kwargs for vmap
- test operators in op_db with `tests/test_vmap`.
- change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589
Approved by: https://github.com/zou3519
2024-07-23 17:48:38 +00:00
PyTorch MergeBot
e4b5645f83
Revert "Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )"
...
This reverts commit 5b5e0698a5 .
Reverted https://github.com/pytorch/pytorch/pull/130633 on behalf of https://github.com/clee2000 due to breaking a lot of jobs and build rules internally D60085885, possibly needs to update some bazel build? ([comment](https://github.com/pytorch/pytorch/pull/130633#issuecomment-2245806738 ))
2024-07-23 17:19:34 +00:00
PyTorch MergeBot
b435d84261
Revert "[custom ops] Add register_vmap for custom ops ( #130589 )"
...
This reverts commit 074b420641 .
Reverted https://github.com/pytorch/pytorch/pull/130589 on behalf of https://github.com/atalman due to Please fix lint and reland ([comment](https://github.com/pytorch/pytorch/pull/130589#issuecomment-2244092174 ))
2024-07-23 01:44:44 +00:00
Shangdi Yu
074b420641
[custom ops] Add register_vmap for custom ops ( #130589 )
...
Fixes #130284
Fixes #130653
- Add `torch.library.register_vmap` to custom ops
- Add `register_vmap` for operators in ops in custom_op_db.
- Make `torch.autograd.Function` support kwarg-only kwargs for vmap
- test operators in op_db with `tests/test_vmap`.
- change `test_vmap` to allow custom `out_dim` and allow "None" in `out_dim` when testing.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130589
Approved by: https://github.com/zou3519
2024-07-23 00:54:52 +00:00
Mikayla Gawarecki
5b5e0698a5
Add wrappers for synchronous GPUDirect Storage APIs ( #130633 )
...
Based in part on https://github.com/NVIDIA/apex/pull/1774
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130633
Approved by: https://github.com/albanD
2024-07-22 14:51:24 +00:00
PyTorch MergeBot
26383a6cc0
Revert "Added and_masks and or_masks utilities ( #131073 )"
...
This reverts commit 92bb323d36 .
Reverted https://github.com/pytorch/pytorch/pull/131073 on behalf of https://github.com/albanD due to The docs build fails here and in trunk ([comment](https://github.com/pytorch/pytorch/pull/131073#issuecomment-2242997958 ))
2024-07-22 13:44:55 +00:00
chilli
92bb323d36
Added and_masks and or_masks utilities ( #131073 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131073
Approved by: https://github.com/drisspg
ghstack dependencies: #130871 , #130904
2024-07-22 11:48:03 +00:00
Soumith Chintala
8e478d4fb1
Add Alban and Piotr into Core Maintainers ( #130903 )
...
See official announcement here: https://dev-discuss.pytorch.org/t/alban-desmaison-and-piotr-bialecki-are-now-pytorch-core-maintainers/2280
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130903
Approved by: https://github.com/albanD , https://github.com/Skylion007
2024-07-20 16:02:42 +00:00
Li-Huai (Allan) Lin
125be005eb
[Docs] Fix fake tensor doc ( #131205 )
...
Fix this: `# AttributeError: 'FakeTensorMode' object has no attribute 'from_real_tensor'`
Pull Request resolved: https://github.com/pytorch/pytorch/pull/131205
Approved by: https://github.com/eellison
2024-07-19 17:59:45 +00:00
Jerry Zhang
793b17ebcb
Add numeric_debugger top level APIs ( #130643 )
...
Summary:
Add three top level APIs for numeric debugger in pt2e flow that can log intermediate output in the model
and calculate summary for metric comparisons between nodes in two graphs
* `prepare_for_propagation_comparison`
* `extract_results_from_loggers`
* `compare_results`
Test Plan:
python test/test_quantization.py -k test_prepare_for_propagation_comparison
python test/test_quantization.py -k test_extract_results_from_loggers
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130643
Approved by: https://github.com/dulinriley , https://github.com/tarun292
2024-07-18 20:54:18 +00:00
redwrasse
63a0a65df9
Define 'zero-preserving unary functions' in docs ( #130804 )
...
Make explicit the definition of 'zero-preserving unary functions' in the sparse tensors documentation.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130804
Approved by: https://github.com/soulitzer
2024-07-18 13:30:29 +00:00
drisspg
2b43d339fe
Make FlexAttention API public ( #130755 )
...
# Summary
Makes the prototype API flex_attention public
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130755
Approved by: https://github.com/Chillee
2024-07-16 16:21:25 +00:00
Xuehai Pan
a3abfa5cb5
[BE][Easy][1/19] enforce style for empty lines in import segments ( #129752 )
...
See https://github.com/pytorch/pytorch/pull/129751#issue-2380881501 . Most changes are auto-generated by linter.
You can review these PRs via:
```bash
git diff --ignore-all-space --ignore-blank-lines HEAD~1
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129752
Approved by: https://github.com/ezyang , https://github.com/malfet
2024-07-16 00:42:56 +00:00
Jerry Zhang
b893aa71ca
Rename generate_numeric_debug_handle to numeric_debugger ( #130590 )
...
Summary:
att
Test Plan:
CI
Reviewers:
Subscribers:
Tasks:
Tags:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130590
Approved by: https://github.com/dulinriley , https://github.com/tarun292
2024-07-15 22:42:27 +00:00
Yu, Guangye
7cd48df2da
Refine the logic of device construction when only device index is given ( #129119 )
...
# Motivation
Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
>>> b
tensor([1, 2], device='cuda:0')
```
It works well on CUDA GPU. But it will raise unexpected information and error running on XPU.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
```
With this PR, refine the logic to use the currently available device type instead.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129119
Approved by: https://github.com/albanD , https://github.com/gujinghui , https://github.com/EikanWang
ghstack dependencies: #129463 , #129205 , #129363
2024-07-15 14:34:29 +00:00
Yu, Guangye
9cae2160f5
Introduce the concept of Accelerators to PyTorch doc ( #129363 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129363
Approved by: https://github.com/EikanWang , https://github.com/gujinghui , https://github.com/albanD
ghstack dependencies: #129463 , #129205
2024-07-15 14:24:46 +00:00
Mikayla Gawarecki
7c289c2a5c
Add torch.serialization.safe_globals context manager ( #127939 )
...
Add context manager mentioned in https://github.com/pytorch/pytorch/pull/127808#pullrequestreview-2096298486
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127939
Approved by: https://github.com/albanD
2024-07-12 20:38:43 +00:00
rzou
9c69684af8
[custom_ops] expose torch.library.register_torch_dispatch ( #130261 )
...
This is the API for defining the interaction between a torch_dispatch
class and a custom op. Taking API bikeshedding.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261
Approved by: https://github.com/albanD
ghstack dependencies: #130064
2024-07-12 14:13:01 +00:00
Shangdi Yu
fb9bc6d74a
[custom op] add doc for CustomOpDef.set_kernel_enabled ( #130406 )
...
<img width="1067" alt="Screenshot 2024-07-09 at 6 14 55 PM" src="https://github.com/pytorch/pytorch/assets/22356083/941751f8-8e12-43cb-8477-c739476e0096 ">
<img width="965" alt="Screenshot 2024-07-09 at 6 14 59 PM" src="https://github.com/pytorch/pytorch/assets/22356083/aa9be099-f26c-45a3-8a14-742a2bb7c28b ">
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130406
Approved by: https://github.com/zou3519
2024-07-11 15:47:35 +00:00
Shangdi Yu
a4576dad34
[reland][custom ops] infer schema ( #130079 )
...
Fixes #129617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-11 03:39:07 +00:00
PyTorch MergeBot
86bca69c5f
Revert "[custom_ops] expose torch.library.register_torch_dispatch ( #130261 )"
...
This reverts commit bb9a73f767 .
Reverted https://github.com/pytorch/pytorch/pull/130261 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130261#issuecomment-2221569707 ))
2024-07-10 21:43:28 +00:00
PyTorch MergeBot
e14a0f45ed
Revert "[reland][custom ops] infer schema ( #130079 )"
...
This reverts commit bef085bdfa .
Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/izaitsevfb due to depends on #130064 which needs to be reverted ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2221561483 ))
2024-07-10 21:40:16 +00:00
Shangdi Yu
bef085bdfa
[reland][custom ops] infer schema ( #130079 )
...
Fixes #129617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-10 16:18:36 +00:00
rzou
bb9a73f767
[custom_ops] expose torch.library.register_torch_dispatch ( #130261 )
...
This is the API for defining the interaction between a torch_dispatch
class and a custom op. Taking API bikeshedding.
Test Plan:
- new tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130261
Approved by: https://github.com/albanD
ghstack dependencies: #130064
2024-07-09 21:11:27 +00:00
Yuanhao Ji
312652c325
[RFC] Add support for device extension autoloading ( #127074 )
...
Fixes #122468
- Load device extensions at the end of `torch/__init__.py`
- Enabled by default, or you can disable it with `TORCH_DEVICE_BACKEND_AUTOLOAD=0`
run test:
```python
python test/run_test.py -i test_autoload_enable
python test/run_test.py -i test_autoload_disable
```
doc:
https://docs-preview.pytorch.org/pytorch/pytorch/127074/miscellaneous_environment_variables.html
co-author: @jgong5 @bsochack @bkowalskiINTEL @jczaja @FFFrog @hipudding
Co-authored-by: albanD <desmaison.alban@gmail.com>
Co-authored-by: Jiong Gong <jiong.gong@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127074
Approved by: https://github.com/albanD , https://github.com/jgong5
2024-07-09 06:14:13 +00:00
PyTorch MergeBot
44a773c121
Revert "[custom ops] infer schema ( #130079 )"
...
This reverts commit 3fe324ffb6 .
Reverted https://github.com/pytorch/pytorch/pull/130079 on behalf of https://github.com/huydhn due to The test_public_bindings failure looks legit 3fe324ffb6 ([comment](https://github.com/pytorch/pytorch/pull/130079#issuecomment-2215420957 ))
2024-07-08 22:02:29 +00:00
Shangdi Yu
3fe324ffb6
[custom ops] infer schema ( #130079 )
...
Fixes #129617
Pull Request resolved: https://github.com/pytorch/pytorch/pull/130079
Approved by: https://github.com/zou3519
2024-07-08 20:46:23 +00:00
Kurt Mohler
e590168865
Enable sharing meta tensors between processes ( #129520 )
...
Fixes #129436
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129520
Approved by: https://github.com/ezyang
2024-07-04 20:29:48 +00:00
Li-Huai (Allan) Lin
42f3d7e948
[MPS] Add mps profiler env vars to docs ( #129552 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129552
Approved by: https://github.com/malfet
ghstack dependencies: #129451
2024-07-04 06:44:48 +00:00
Zhengxu Chen
042d764872
[export] Update example inputs format for DB. ( #129982 )
...
Summary: To give user a simpler example code, we are getting rid of ExportArgs in favor of example_args and example_kwargs.
Test Plan: CI
Differential Revision: D59288920
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129982
Approved by: https://github.com/angelayi
2024-07-03 17:53:15 +00:00
Edward Z. Yang
29c68df600
Stop immediately specializing common constants 0/1 for plain int ( #128327 )
...
Fixes https://github.com/pytorch/pytorch/issues/128319
Signed-off-by: Edward Z. Yang <ezyang@meta.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128327
Approved by: https://github.com/lezcano
ghstack dependencies: #129983
2024-07-03 16:41:51 +00:00
Howard Huang
4eb449f7dc
[pipelining] add small logging section to docs ( #129368 )
...
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129368
Approved by: https://github.com/wconstab
2024-07-02 18:19:28 +00:00
Haoci Zhang
1ad683033b
Implemented flexible PP schedule ( #129597 )
...
Enabled some cases to work where num_microbatches % pp_size != 0. Using the flex_pp schedule, we will have
num_rounds = max(1, n_microbatches // pp_group_size) and it works as long as n_microbatches % num_rounds is 0. As a few examples, support
pp_group_size = 4, n_microbatches = 10. We will have num_rounds = 2 and n_microbatches % 2 is 0.
pp_group_size = 4, n_microbatches = 3. We will have num_rounds = 1 and n_microbatches % 1 is 0.
Moved over from PiPPy (https://github.com/pytorch/PiPPy/pull/1129 )
Tested using the config in (1), schedule looks like the following graph:
```
=========== ALL_RANK_ACTIONS ===========
Rank 0 Rank 1 Rank 2 Rank 3
Step 00: F0_s0 None None None
Step 01: F1_s0 F0_s1 None None
Step 02: F2_s0 F1_s1 F0_s2 None
Step 03: F3_s0 F2_s1 F1_s2 F0_s3
Step 04: F4_s0 F3_s1 F2_s2 F1_s3
Step 05: F0_s4 F4_s1 F3_s2 F2_s3
Step 06: F1_s4 F0_s5 F4_s2 F3_s3
Step 07: F2_s4 F1_s5 F0_s6 F4_s3
Step 08: F3_s4 F2_s5 F1_s6 F0_s7
Step 09: F4_s4 F3_s5 None B0_s7
Step 10: F5_s0 None F2_s6 F1_s7
Step 11: None None B0_s6 B1_s7
Step 12: None F4_s5 F3_s6 F2_s7
Step 13: None B0_s5 B1_s6 B2_s7
Step 14: F6_s0 F5_s1 F4_s6 F3_s7
Step 15: B0_s4 B1_s5 B2_s6 B3_s7
Step 16: F7_s0 F6_s1 F5_s2 F4_s7
Step 17: B1_s4 B2_s5 B3_s6 B4_s7
Step 18: F8_s0 F7_s1 F6_s2 F5_s3
Step 19: B2_s4 B3_s5 B4_s6 B0_s3
Step 20: F9_s0 F8_s1 F7_s2 F6_s3
Step 21: B3_s4 B4_s5 B0_s2 B1_s3
Step 22: F5_s4 F9_s1 F8_s2 F7_s3
Step 23: B4_s4 B0_s1 B1_s2 B2_s3
Step 24: F6_s4 F5_s5 F9_s2 F8_s3
Step 25: B0_s0 B1_s1 B2_s2 B3_s3
Step 26: F7_s4 F6_s5 F5_s6 F9_s3
Step 27: B1_s0 B2_s1 B3_s2 B4_s3
Step 28: F8_s4 F7_s5 F6_s6 F5_s7
Step 29: B2_s0 B3_s1 B4_s2 B5_s7
Step 30: F9_s4 F8_s5 F7_s6 F6_s7
Step 31: B3_s0 B4_s1 B5_s6 B6_s7
Step 32: None F9_s5 F8_s6 F7_s7
Step 33: B4_s0 B5_s5 B6_s6 B7_s7
Step 34: None None F9_s6 F8_s7
Step 35: B5_s4 B6_s5 B7_s6 B8_s7
Step 36: None None None F9_s7
Step 37: B6_s4 B7_s5 B8_s6 B9_s7
Step 38: None None None None
Step 39: B7_s4 B8_s5 B9_s6 B5_s3
Step 40: None None None None
Step 41: B8_s4 B9_s5 B5_s2 B6_s3
Step 42: None None None None
Step 43: B9_s4 B5_s1 B6_s2 B7_s3
Step 44: None None None None
Step 45: B5_s0 B6_s1 B7_s2 B8_s3
Step 46: None None None None
Step 47: B6_s0 B7_s1 B8_s2 B9_s3
Step 48: None None None
Step 49: B7_s0 B8_s1 B9_s2
Step 50: None None
Step 51: B8_s0 B9_s1
Step 52: None
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129597
Approved by: https://github.com/H-Huang
2024-07-02 07:54:38 +00:00
eqy
f845a7a91a
[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 ( #125343 )
...
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.
What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...
Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-30 19:22:16 +00:00
PyTorch MergeBot
3d96217891
Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )"
...
This reverts commit 9e1f3ecaa7 .
Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it is still failing with the same error ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2197801405 ))
2024-06-29 00:47:15 +00:00
PyTorch MergeBot
999eec8dea
Revert "[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 ( #125343 )"
...
This reverts commit b7e7a4cb01 .
Reverted https://github.com/pytorch/pytorch/pull/125343 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it seems to break some test_transformer running on internal A100 and V100 ([comment](https://github.com/pytorch/pytorch/pull/125343#issuecomment-2196202003 ))
2024-06-28 06:03:54 +00:00
Xuehai Pan
9e1f3ecaa7
[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )
...
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-06-28 00:35:15 +00:00
Li-Huai (Allan) Lin
84ad5452f6
[MPS] Fused SGD optimizer ( #129350 )
...
```
[-------------------------------------- Fused SGD --------------------------------------]
| Fused: True | Fused: False
1 threads: ------------------------------------------------------------------------------
numel: 1024, num_tensors: 100, momentum: True | 2 | 15
numel: 1024, num_tensors: 100, momentum: False | 2 | 5
numel: 65536, num_tensors: 100, momentum: True | 3 | 16
numel: 65536, num_tensors: 100, momentum: False | 2 | 5
numel: 1048576, num_tensors: 100, momentum: True | 11 | 16
numel: 1048576, num_tensors: 100, momentum: False | 8 | 6
numel: 1024, num_tensors: 500, momentum: True | 29 | 70
numel: 1024, num_tensors: 500, momentum: False | 20 | 24
numel: 65536, num_tensors: 500, momentum: True | 33 | 76
numel: 65536, num_tensors: 500, momentum: False | 22 | 26
numel: 1048576, num_tensors: 500, momentum: True | 70 | 80
numel: 1048576, num_tensors: 500, momentum: False | 43 | 40
numel: 1024, num_tensors: 1000, momentum: True | 108 | 139
numel: 1024, num_tensors: 1000, momentum: False | 72 | 48
numel: 65536, num_tensors: 1000, momentum: True | 116 | 150
numel: 65536, num_tensors: 1000, momentum: False | 77 | 52
numel: 1048576, num_tensors: 1000, momentum: True | 190 | 170
numel: 1048576, num_tensors: 1000, momentum: False | 120 | 50
```
```python
def profile_fused_sgd():
from torch.optim.sgd import sgd
import torch.utils.benchmark as benchmark
import itertools
def profile(fn, params, grads, momentum_buffer_list, fused):
fn(
params,
grads,
momentum_buffer_list,
momentum=True if len(momentum_buffer_list) > 0 else False,
dampening=0.0,
nesterov=False,
foreach=False,
fused=fused,
lr=1e-3,
weight_decay=.0,
maximize=False,
grad_scale=None,
found_inf=None,
)
torch.mps.synchronize()
device = "mps"
results = []
for num_tensors, numel, momentum in itertools.product([100, 500, 1000], [1024, 65536, 1048576], [True, False]):
sublabel = f"numel: {numel}, num_tensors: {num_tensors}, momentum: {momentum}"
print(sublabel)
params, grads = [[torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] for _ in range(2)]
momentum_buffer_list = [torch.arange(numel, dtype=torch.float32, device=device) + (numel * i) for i in range(num_tensors)] if momentum else []
fn = sgd
for fused in [True, False]:
t = benchmark.Timer(
stmt='profile(fn, params, grads, momentum_buffer_list, fused)',
label='Fused SGD',
sub_label=sublabel,
globals=locals(),
description= f"Fused: {fused}",
).blocked_autorange(min_run_time=5)
results.append(t)
compare = benchmark.Compare(results)
compare.trim_significant_figures()
compare.colorize(rowwise=True)
compare.print()
```
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129350
Approved by: https://github.com/janeyx99
ghstack dependencies: #129006 , #129008 , #129007 , #129105
2024-06-27 04:37:14 +00:00
PyTorch MergeBot
895316119d
Revert "[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )"
...
This reverts commit 0314c4c101 .
Reverted https://github.com/pytorch/pytorch/pull/129374 on behalf of https://github.com/huydhn due to Sorry for reverting your change but it causes lots of internal build failures where they fail to find hipify module ([comment](https://github.com/pytorch/pytorch/pull/129374#issuecomment-2192437052 ))
2024-06-26 19:03:57 +00:00
Shangdi Yu
cca85c96cd
[export] minor typo fix ( #129543 )
...
Fixes a typo in torch.export doc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129543
Approved by: https://github.com/angelayi
2024-06-26 18:35:31 +00:00
Eddie Yan
b7e7a4cb01
[cuDNN][SDPA] Remove TORCH_CUDNN_SDPA_ENABLED=1, enable cuDNN SDPA by default on H100 and 2nd on other archs >= sm80 ( #125343 )
...
Looks like one of the first failures seen is `test_causal_variants_compile_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` when `test_causal_variants_causal_variant_CausalVariant_LOWER_RIGHT_shape0_cuda` passes.
What seems interesting here is that the `torch.compile` version fails while the eager version passes. Not sure what the difference would be here...
Nevertheless, is there a recommended mechanism to skip cuDNN SDPA as a backend for this test? CC @drisspg
Pull Request resolved: https://github.com/pytorch/pytorch/pull/125343
Approved by: https://github.com/Skylion007
2024-06-26 00:49:18 +00:00
Zhengxu Chen
e58ef5b65f
[export] Rewrite exportdb formatting. ( #129260 )
...
Summary: It'll be easier to generate examples if the code doesn't depend on exportdb library.
Test Plan: CI
Differential Revision: D58886554
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129260
Approved by: https://github.com/tugsbayasgalan
2024-06-25 21:04:53 +00:00
Li-Huai (Allan) Lin
71ebe5121a
[MPS] Fast math env var ( #129007 )
...
Allow users to decide whether they want to have fast math enabled via env var
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129007
Approved by: https://github.com/malfet
ghstack dependencies: #129006 , #129008
2024-06-25 13:52:07 +00:00
Xuehai Pan
0314c4c101
[BE][Easy] use pathlib.Path instead of dirname / ".." / pardir ( #129374 )
...
Changes by apply order:
1. Replace all `".."` and `os.pardir` usage with `os.path.dirname(...)`.
2. Replace nested `os.path.dirname(os.path.dirname(...))` call with `str(Path(...).parent.parent)`.
3. Reorder `.absolute()` ~/ `.resolve()`~ and `.parent`: always resolve the path first.
`.parent{...}.absolute()` -> `.absolute().parent{...}`
4. Replace chained `.parent x N` with `.parents[${N - 1}]`: the code is easier to read (see 5.)
`.parent.parent.parent.parent` -> `.parents[3]`
5. ~Replace `.parents[${N - 1}]` with `.parents[${N} - 1]`: the code is easier to read and does not introduce any runtime overhead.~
~`.parents[3]` -> `.parents[4 - 1]`~
6. ~Replace `.parents[2 - 1]` with `.parent.parent`: because the code is shorter and easier to read.~
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129374
Approved by: https://github.com/justinchuby , https://github.com/malfet
2024-06-25 08:28:38 +00:00
Will Constable
2f8b301c32
Clean up distributed/CONTRIBUTING.md ( #128450 )
...
Click [here](cf6c88af48/torch/distributed/CONTRIBUTING.md ) to see the rendered version of the file in this PR
Pull Request resolved: https://github.com/pytorch/pytorch/pull/128450
Approved by: https://github.com/wanchaol
2024-06-22 02:41:22 +00:00
rzou
311fadb1fb
[docs] Redirect custom ops landing page to the correct place ( #129177 )
...
I'm moving it to pytorch/tutorials
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129177
Approved by: https://github.com/albanD
2024-06-21 13:31:32 +00:00
cyy
5c676bb8b3
Remove Caffe2 handling from onnx_unpack_quantized_weights ( #129021 )
...
Fixes #ISSUE_NUMBER
Pull Request resolved: https://github.com/pytorch/pytorch/pull/129021
Approved by: https://github.com/justinchuby , https://github.com/albanD
2024-06-21 06:16:44 +00:00
Jing Xu
5fba5d83f0
add xpu for amp ( #127276 )
...
As support for Intel GPU has been upstreamed, this PR is to add the XPU-related contents to AMP doc.
Co-authored-by: Yu, Guangye <guangye.yu@intel.com>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/127276
Approved by: https://github.com/dvrogozh , https://github.com/albanD , https://github.com/malfet
2024-06-20 21:49:35 +00:00