Commit graph

419 commits

Author SHA1 Message Date
Catherine Lee
49e10c1598 [ci] test_ops in parallel, ci tests log to file (#85528)
part one of splitting up https://github.com/pytorch/pytorch/pull/84961 into (probably 2) parts

contains
* logging to file
* testing test_ops in parallel
Pull Request resolved: https://github.com/pytorch/pytorch/pull/85528
Approved by: https://github.com/huydhn
2022-09-23 20:45:20 +00:00
Nikita Shulga
46a6a50f4e Skip MPS test from generic M1 testsuite (#85500)
As there is separate Run MPS shard right now

See if this reduces the number of crashes

Pull Request resolved: https://github.com/pytorch/pytorch/pull/85500
Approved by: https://github.com/clee2000, https://github.com/kit1980, https://github.com/huydhn
2022-09-22 22:13:47 +00:00
PyTorch MergeBot
3dce26635f Revert "test in parallel at file granularity (#84961)"
This reverts commit 8107666c6a.

Reverted https://github.com/pytorch/pytorch/pull/84961 on behalf of https://github.com/clee2000 due to makes test_forward_ad_nn_functional_max_unpool2d_cuda_float32 flakily unexpectedly pass
2022-09-21 20:21:25 +00:00
Catherine Lee
8107666c6a test in parallel at file granularity (#84961)
run tests in parallel at the test file granularity

runs 3 files in parallel using multiprocessing pool, output goes to a file, which is then printed when the test finishes.  Some tests cannot be run in parallel (usually due to lacking memory), so we run those after.  Sharding is changed to attempt to mask large files with other large files/run them on the same shard.

test_ops* gets a custom handler to run it because it is simply too big (2hrs on windows) and linalg_cholesky fails (I would really like a solution to this if possible, but until then we use the custom handler).

reduces cuda tests by a lot, reduces total windows test time by ~1hr

Ref. https://github.com/pytorch/pytorch/issues/82894
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84961
Approved by: https://github.com/huydhn
2022-09-21 16:58:11 +00:00
Arindam Roy
a185dc2e63 [ROCm] re-enable tensorexpr and test_openmp (#81367)
The following tests are being re-enabled for ROCm:
- test_openmp.py
- TestTensorExprPyBind tests in test_tensorexpr_pybind.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81367
Approved by: https://github.com/jeffdaily, https://github.com/malfet
2022-09-14 00:41:16 +00:00
Xiang Gao
08c4f8c7a7 ProcessGroupUCC tests (#83285)
- [x] Direct dependency on UCX is completely removed, UCC active set API always enabled
- [x] Remove `TORCH_UCC_PROFILING_ENABLE`, always enable profiling
- [x] Fixes profiling of `recv` and `all_gather`
- [x] Use the NCCL TL of UCC on CUDA, as  the UCP TL is not well supported on CUDA

Most tests are passing, but there are a few skipped tests:
- `scatter` and `gather` are not supported by the UCP TL of UCC on CPU tensors
- A few flaky tests in PyTorch's CI environment
- Profiler-related failures, some of them will be fixed by @Fuzzkatt in https://github.com/pytorch/pytorch/pull/84368

After this PR is merged, I will continue to work on these skipped failures.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83285
Approved by: https://github.com/vtlam, https://github.com/malfet, https://github.com/kwen2501
2022-09-10 10:56:05 +00:00
Huy Do
90d6112a94 Test distributed backends in parallel (#84034)
This allows multiple backends (nccl, gloo) to be tested in parallel and speed up the process. The improvement is mainly in the 1st distributed CUDA shard where the long pole `distributed/test_distributed_spawn` test is executed:

* [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596825?check_suite_focus=true#logs) takes 1h24m. This is better than the current average expectation of 2h12m

On the other hand, there is no improvement for the following two jobs:

* [linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://github.com/pytorch/pytorch/runs/8007417353?check_suite_focus=true#logs) takes 1h47m
* [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596870?check_suite_focus=true#logs) takes 1h40m

This is still a gain though because it allows us to add more shards for distributed test if needed.

Issue https://github.com/pytorch/pytorch/issues/83694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84034
Approved by: https://github.com/wanchaol
2022-09-01 03:48:54 +00:00
PyTorch MergeBot
772721a4b7 Revert "Test distributed backends in parallel (#84034)"
This reverts commit 3ae5be74ac.

Reverted https://github.com/pytorch/pytorch/pull/84034 on behalf of https://github.com/huydhn due to This somehow revives the flaky test https://github.com/pytorch/pytorch/issues/76428
2022-08-30 21:01:25 +00:00
Huy Do
3ae5be74ac Test distributed backends in parallel (#84034)
This allows multiple backends (nccl, gloo) to be tested in parallel and speed up the process. The improvement is mainly in the 1st distributed CUDA shard where the long pole `distributed/test_distributed_spawn` test is executed:

* [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 1, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596825?check_suite_focus=true#logs) takes 1h24m. This is better than the current average expectation of 2h12m

On the other hand, there is no improvement for the following two jobs:

* [linux-focal-py3.7-gcc7 / test (distributed, 1, 1, linux.2xlarge)](https://github.com/pytorch/pytorch/runs/8007417353?check_suite_focus=true#logs) takes 1h47m
* [linux-bionic-cuda11.6-py3.10-gcc7 / test (distributed, 2, 2, linux.8xlarge.nvidia.gpu)](https://github.com/pytorch/pytorch/runs/8007596870?check_suite_focus=true#logs) takes 1h40m

This is still a gain though because it allows us to add more shards for distributed test if needed.

Issue https://github.com/pytorch/pytorch/issues/83694
Pull Request resolved: https://github.com/pytorch/pytorch/pull/84034
Approved by: https://github.com/wanchaol
2022-08-30 19:06:49 +00:00
joncrall
b136f3f310 More doctest refinements. (#83317)
Follow up to #82797

Now that the doctests themselves are in a better state, we should be able to enable xdoctest on the CI so they stay that way.

@ezyang @vadimkantorov
Pull Request resolved: https://github.com/pytorch/pytorch/pull/83317
Approved by: https://github.com/ezyang
2022-08-22 20:07:26 +00:00
joncrall
4618371da5 Integrate xdoctest - Rebased (#82797)
This is a new version of #15648 based on the latest master branch.

Unlike the previous PR where I fixed a lot of the doctests in addition to integrating xdoctest, I'm going to reduce the scope here. I'm simply going to integrate xdoctest, and then I'm going to mark all of the failing tests as "SKIP". This will let xdoctest run on the dashboards, provide some value, and still let the dashboards pass. I'll leave fixing the doctests themselves to another PR.

In my initial commit, I do the bare minimum to get something running with failing dashboards. The few tests that I marked as skip are causing segfaults. Running xdoctest results in 293 failed, 201 passed tests. The next commits will be to disable those tests. (unfortunately I don't have a tool that will insert the `#xdoctest: +SKIP` directive over every failing test, so I'm going to do this mostly manually.)

Fixes https://github.com/pytorch/pytorch/issues/71105

@ezyang
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82797
Approved by: https://github.com/ezyang
2022-08-12 02:08:01 +00:00
Mateusz Sypniewski
916def84d4 CUDA trace Python hooks (#82824)
### Description
This adds Python hooks into PyTorch that allow the user to register their own callbacks for events such as tensor allocation, stream allocation, event record / wait etc.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82824
Approved by: https://github.com/lw, https://github.com/ezyang, https://github.com/malfet
2022-08-11 10:21:40 +00:00
pbialecki
b4f7e22640 Enable periodic builds for CUDA 11.7 (#81688)
CC @atalman
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81688
Approved by: https://github.com/atalman
2022-08-10 00:03:51 +00:00
Nikita Shulga
b9cdd6d0ac Do not assume what is in os.environ (#82495)
`os.environ['FOO']` will raise `IndexError` if `FOO` is not found, while `os.environ.get('FOO')` would simply return `None`

Fixes https://github.com/pytorch/pytorch/issues/82492

Pull Request resolved: https://github.com/pytorch/pytorch/pull/82495
Approved by: https://github.com/huydhn, https://github.com/kit1980
2022-07-29 22:57:18 +00:00
Catherine Lee
86f038dd56 download test times during build to avoid race conditions (#81915)
After https://github.com/pytorch/pytorch/pull/81116, we started pulling test times straight from the source instead of first downloading them in the build job and then having the test job take the build jobs version.  This can cause an issues where different shards pull different versions of the file, leading to incorrect sharding (ex two shards running the same tests file on accident).  This generally happens if the test jobs happen while the test times file is being updated (unlikely, but not impossible) or if someone reruns a test job the next day.

In this PR, I return to the old method of downloading the test times file during the build job and having the test jobs pull from the build jobs uploaded artifacts.  If there is no test times file in the build job's artifacts, we fall back to the default sharding plan.

Notes:
* script moved to a new file to avoid needing to import torch, which would require torch to be built, which can cause issues with asan
* I got errors with asan (`ASan runtime does not come first in initial library list; you should either link runtime to your application or manually preload it with LD_PRELOAD.`), so I put the script at the beginning of the build

### Test Plan
Verified that the number of tests ran in the pull and trunk workflows are similar to workflows run on master.  Checked logs to see if artifacts were being used for sharding.  Spot checked a few test configs to check that their lists of selected tests didn't overlap.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81915
Approved by: https://github.com/huydhn
2022-07-28 16:35:01 +00:00
Richard Zou
5f4e8c0a4d Add ability to functorch tests via run_test.py (#82012)
This PR:
- adds the ability to run functorch tests via run_test.py
- changes the functorch shards in PyTorch CI to invoke functorch tests
via run_test.py

The main motivation for this is so that functorch tests hook into the
standard PyTorch test infrastructure.

Questions for reviewers:
- the functorch tests are located outside of the pytorch/test folder
(they're in the pytorch/functorch/test folder). Is this OK? (run_test.py
works locally for me).

Test Plan:

- checked that `python run_test.py --functorch` ran functorch tests
locally
- Local mock test: added `{"test_compilation_for_dynamic_shape
(__main__.TestCompileCache)":
["https://github.com/pytorch/pytorch/issues/82016", ["linux"]]}` to .pytorch-disabled-tests.json, ran functorch tests, verified that the test was skipped.
- Wait for CI
Pull Request resolved: https://github.com/pytorch/pytorch/pull/82012
Approved by: https://github.com/janeyx99
2022-07-25 14:23:18 +00:00
Michael Suo
9f58d5d7ce [test stats] use published test stats for sharding (#81116)
Use the nightly-published test stats to perform sharding, instead of
calculating it in every build job.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/81116
Approved by: https://github.com/janeyx99
2022-07-12 04:50:19 +00:00
Brian Hirsh
282de5539d add open device registration test with cpp extensions (#80477)
Adding a test for open device registration using cpp extensions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80477
Approved by: https://github.com/albanD, https://github.com/malfet
2022-07-12 01:46:16 +00:00
Charlie Yan
ffae7308c9 Enable test: distributed/algorithms/quantization/test_quantization (#80097)
fixes  https://github.com/pytorch/pytorch/issues/69017
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80097
Approved by: https://github.com/wanchaol
2022-07-01 01:32:33 +00:00
Charlie Yan
14eadf937b Enable test: test/distributed/algorithms/ddp_comm_hooks/test_ddp_hooks.py
Pull Request resolved: https://github.com/pytorch/pytorch/pull/80077

Approved by: https://github.com/wanchaol
2022-06-23 00:11:53 +00:00
Alex Hedges
cb2b7b1e57 Fix code that triggers BytesWarning (#79868)
Fixes #74812.

I have fixed the multiple instances in the repository that trigger
`BytesWarning`, and I have enabled the `-bb` option when tests are run
to prevent regressions.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/79868
Approved by: https://github.com/janeyx99
2022-06-21 01:12:21 +00:00
PyTorch MergeBot
e10cbe3880 Revert "Fix BytesWarning in torch.load() (#74813)"
This reverts commit 6c2e8119dd.

Reverted https://github.com/pytorch/pytorch/pull/74813 on behalf of https://github.com/janeyx99 due to Broke slow tests in cuda 10.2 https://github.com/pytorch/pytorch/runs/6944238177?check_suite_focus=true
2022-06-18 03:53:54 +00:00
Alex Hedges
6c2e8119dd Fix BytesWarning in torch.load() (#74813)
Fixes #74812.

I have enabled the `-bb` option when tests are run to prevent regressions. I don't think it will make CI run more slowly, but I'm not entirely sure.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74813
Approved by: https://github.com/kit1980
2022-06-17 22:56:43 +00:00
Michael Suo
842da8a5de [ci] remove TD + test specification code from run_test.py
In the case of target determination, this is just removing comments that
refer to non-existent code.

In the case of the test specification code; this removes (what I believe
to be) an unused feature. If we're using this somehow let me know and I
can revise the PR.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79372

Approved by: https://github.com/janeyx99
2022-06-13 16:09:53 +00:00
Michael Suo
943c09a53e [ci] clean up dead code related to PR test selection
This is never used and not tested, so removing it for clarity.

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79363

Approved by: https://github.com/janeyx99
2022-06-13 16:09:51 +00:00
Michael Suo
c978b609f7 [ci] remove IN_CI env var
The conventional env var to set is CI. Both circle and GHA set it, so
IN_CI is unnecessary

Pull Request resolved: https://github.com/pytorch/pytorch/pull/79229

Approved by: https://github.com/janeyx99
2022-06-11 17:16:30 +00:00
Jagadish Krishnamoorthy
2d354cdc2a [ROCm] Enable test_instantiator, test_type_hints (#78633)
Signed-off-by: Jagadish Krishnamoorthy <jagdish.krishna@gmail.com>

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78633
Approved by: https://github.com/malfet, https://github.com/pruthvistony
2022-06-06 06:09:34 +00:00
Xiao Wang
ef0332e36d Allow relocatable device code linking in pytorch CUDA extensions (#78225)
Close https://github.com/pytorch/pytorch/issues/57543

Doc: check `Relocatable device code linking:` in https://docs-preview.pytorch.org/78225/cpp_extension.html#torch.utils.cpp_extension.CUDAExtension
Pull Request resolved: https://github.com/pytorch/pytorch/pull/78225
Approved by: https://github.com/ezyang, https://github.com/malfet
2022-06-02 21:35:56 +00:00
Kurt Mohler
1705be8ff7 Fix _free_weak_ref error (#78575)
Fixes #74016

Pull Request resolved: https://github.com/pytorch/pytorch/pull/78575
Approved by: https://github.com/ezyang
2022-06-01 00:07:48 +00:00
pritam
37eb31599c [reland] Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77987)
1. Enabled multigpu tests.
2. Fixed failing multigpu tests.
3. Fixed custom operator decorator to be first preference in operator dispatch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77987
Approved by: https://github.com/fduwjj, https://github.com/wanchaol, https://github.com/janeyx99
2022-05-21 22:33:58 +00:00
PyTorch MergeBot
0f74b44f1a Revert "Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825)"
This reverts commit 8d4c8df33a.

Reverted https://github.com/pytorch/pytorch/pull/77825 on behalf of https://github.com/janeyx99 due to as it will break multigpu test reporting
2022-05-20 17:59:03 +00:00
pritam
8d4c8df33a Add sharding tests to multigpu-test.sh and fix custom operator decorator (#77825)
1. Enabled multigpu tests.
2. Fixed failing multigpu tests.
3. Fixed custom operator decorator to be first preference in operator dispatch.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77825
Approved by: https://github.com/wanchaol, https://github.com/fduwjj
2022-05-20 16:53:27 +00:00
PyTorch MergeBot
5e0f559d23 Revert "Add sharding tests to multigpu-test.sh (#77708)"
This reverts commit a7cf95a609.

Reverted https://github.com/pytorch/pytorch/pull/77708 on behalf of https://github.com/suo
2022-05-18 21:47:11 +00:00
pritam
a7cf95a609 Add sharding tests to multigpu-test.sh (#77708)
Summary: These tests were being skipped since they don't run on multigpu
jobs.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/77708
Approved by: https://github.com/wanchaol
2022-05-18 17:37:55 +00:00
Wanchao Liang
25fa964d96 [shard] add clone/detach and set requires_grad for ShardedTensor
This PR adding clone/detach and set requires_grad to ShardedTensor

Pull Request resolved: https://github.com/pytorch/pytorch/pull/77367

Approved by: https://github.com/pritamdamania87
2022-05-16 21:42:27 +00:00
pritam
9e52b50e34 Additional ops for ShardedTensor, ReplicatedTensor and PartialTensor.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76477

Adding the following ops:

1) softmax for ShardedTensor
2) getitem and unsqueeze for ReplicatedTensor
3) transpose and cat for PartialTensor

Differential Revision: [D35979510](https://our.internmc.facebook.com/intern/diff/D35979510/)

Approved by: https://github.com/fduwjj, https://github.com/wanchaol
2022-05-06 16:28:04 +00:00
Catherine Lee
56ea57de61 shard pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed 1->2
Fixes #ISSUE_NUMBER

shard `pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed ...` from 1 shard to 2

Pros:
- It currently takes about 2.6 hours and is 3rd longest running job on pull
- Theoretically minimal overhead

Cons:
- Requires changes to the run_test.py which might have correctness issues

Notes:
- Cannot shard further as one of the test files is responsible for about half of the total run time

spreadsheet regarding sharding: https://docs.google.com/spreadsheets/d/1BdtVsjRr0Is9LXMNilR02FEdPXNq7zEWl8AmR3ArsLQ/edit#gid=1153012347

Test Plan:
<details><summary>expand to see test plan (its long)</summary>

tests from a commit ran on master (90 tests ran)
```
2022-05-03T12:45:34.7974184Z Selected tests:
2022-05-03T12:45:34.7974495Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-03T12:45:34.7974839Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-03T12:45:34.7975209Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-03T12:45:34.7975575Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-03T12:45:34.7976180Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-03T12:45:34.7976802Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-03T12:45:34.7977361Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-03T12:45:34.7978157Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-03T12:45:34.7978879Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-03T12:45:34.7979594Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-03T12:45:34.7980366Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-03T12:45:34.7981066Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-03T12:45:34.7981877Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-03T12:45:34.7982387Z  distributed/_shard/test_partial_tensor
2022-05-03T12:45:34.7982691Z  distributed/_shard/test_replicated_tensor
2022-05-03T12:45:34.7982994Z  distributed/_shard/test_sharder
2022-05-03T12:45:34.7983280Z  distributed/algorithms/test_join
2022-05-03T12:45:34.7983695Z  distributed/elastic/events/lib_test
2022-05-03T12:45:34.7983984Z  distributed/elastic/metrics/api_test
2022-05-03T12:45:34.7984308Z  distributed/elastic/multiprocessing/api_test
2022-05-03T12:45:34.7984624Z  distributed/elastic/timer/api_test
2022-05-03T12:45:34.7984924Z  distributed/elastic/timer/local_timer_example
2022-05-03T12:45:34.7985254Z  distributed/elastic/timer/local_timer_test
2022-05-03T12:45:34.7985575Z  distributed/elastic/utils/distributed_test
2022-05-03T12:45:34.7985889Z  distributed/elastic/utils/logging_test
2022-05-03T12:45:34.7986176Z  distributed/elastic/utils/util_test
2022-05-03T12:45:34.7986492Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-03T12:45:34.7986799Z  distributed/fsdp/test_fsdp_apply
2022-05-03T12:45:34.7987078Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-03T12:45:34.7987388Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-03T12:45:34.7987691Z  distributed/fsdp/test_fsdp_comm
2022-05-03T12:45:34.7987961Z  distributed/fsdp/test_fsdp_core
2022-05-03T12:45:34.7988251Z  distributed/fsdp/test_fsdp_exec_order
2022-05-03T12:45:34.7988570Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-03T12:45:34.7988865Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-03T12:45:34.7989176Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-03T12:45:34.7989478Z  distributed/fsdp/test_fsdp_input
2022-05-03T12:45:34.7989950Z  distributed/fsdp/test_fsdp_memory
2022-05-03T12:45:34.7990241Z  distributed/fsdp/test_fsdp_meta
2022-05-03T12:45:34.7990640Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-03T12:45:34.7990964Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-03T12:45:34.7991293Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-03T12:45:34.7991610Z  distributed/fsdp/test_fsdp_optim_state
2022-05-03T12:45:34.7991895Z  distributed/fsdp/test_fsdp_overlap
2022-05-03T12:45:34.7992195Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-03T12:45:34.7992500Z  distributed/fsdp/test_fsdp_state_dict
2022-05-03T12:45:34.7992818Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-03T12:45:34.7993117Z  distributed/fsdp/test_fsdp_traversal
2022-05-03T12:45:34.7993861Z  distributed/fsdp/test_fsdp_uneven
2022-05-03T12:45:34.7994181Z  distributed/fsdp/test_shard_utils
2022-05-03T12:45:34.7994447Z  distributed/fsdp/test_utils
2022-05-03T12:45:34.7994721Z  distributed/fsdp/test_wrap
2022-05-03T12:45:34.7995015Z  distributed/nn/jit/test_instantiator
2022-05-03T12:45:34.7995328Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-03T12:45:34.7995664Z  distributed/pipeline/sync/skip/test_api
2022-05-03T12:45:34.7995983Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-03T12:45:34.7996315Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-03T12:45:34.7996652Z  distributed/pipeline/sync/skip/test_leak
2022-05-03T12:45:34.7996977Z  distributed/pipeline/sync/skip/test_portal
2022-05-03T12:45:34.7997292Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-03T12:45:34.7997623Z  distributed/pipeline/sync/skip/test_tracker
2022-05-03T12:45:34.7997968Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-03T12:45:34.7998301Z  distributed/pipeline/sync/test_balance
2022-05-03T12:45:34.7998591Z  distributed/pipeline/sync/test_bugs
2022-05-03T12:45:34.7998927Z  distributed/pipeline/sync/test_checkpoint
2022-05-03T12:45:34.7999243Z  distributed/pipeline/sync/test_copy
2022-05-03T12:45:34.7999557Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-03T12:45:34.7999896Z  distributed/pipeline/sync/test_dependency
2022-05-03T12:45:34.8000215Z  distributed/pipeline/sync/test_inplace
2022-05-03T12:45:34.8000516Z  distributed/pipeline/sync/test_microbatch
2022-05-03T12:45:34.8000826Z  distributed/pipeline/sync/test_phony
2022-05-03T12:45:34.8001130Z  distributed/pipeline/sync/test_pipe
2022-05-03T12:45:34.8001424Z  distributed/pipeline/sync/test_pipeline
2022-05-03T12:45:34.8001733Z  distributed/pipeline/sync/test_stream
2022-05-03T12:45:34.8002055Z  distributed/pipeline/sync/test_transparency
2022-05-03T12:45:34.8002353Z  distributed/pipeline/sync/test_worker
2022-05-03T12:45:34.8002672Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-03T12:45:34.8002982Z  distributed/rpc/test_faulty_agent
2022-05-03T12:45:34.8003270Z  distributed/rpc/test_tensorpipe_agent
2022-05-03T12:45:34.8003568Z  distributed/test_c10d_common
2022-05-03T12:45:34.8003839Z  distributed/test_c10d_gloo
2022-05-03T12:45:34.8004088Z  distributed/test_c10d_nccl
2022-05-03T12:45:34.8004369Z  distributed/test_c10d_spawn_gloo
2022-05-03T12:45:34.8004656Z  distributed/test_c10d_spawn_nccl
2022-05-03T12:45:34.8004938Z  distributed/test_data_parallel
2022-05-03T12:45:34.8005212Z  distributed/test_distributed_spawn
2022-05-03T12:45:34.8005496Z  distributed/test_launcher
2022-05-03T12:45:34.8005767Z  distributed/test_nccl
2022-05-03T12:45:34.8006019Z  distributed/test_pg_wrapper
2022-05-03T12:45:34.8006285Z  distributed/test_store
```

tests ran on first shard for distributed on this PR (34 tests)
```
2022-05-02T21:26:00.1385256Z Selected tests:
2022-05-02T21:26:00.1385767Z  distributed/test_distributed_spawn
2022-05-02T21:26:00.1386403Z  distributed/elastic/multiprocessing/api_test
2022-05-02T21:26:00.1387051Z  distributed/fsdp/test_fsdp_memory
2022-05-02T21:26:00.1387607Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-02T21:26:00.1388179Z  distributed/fsdp/test_fsdp_apply
2022-05-02T21:26:00.1388600Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-02T21:26:00.1389181Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-02T21:26:00.1389545Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-02T21:26:00.1389878Z  distributed/fsdp/test_fsdp_uneven
2022-05-02T21:26:00.1390186Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-02T21:26:00.1390526Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-02T21:26:00.1390877Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-02T21:26:00.1391219Z  distributed/_shard/test_partial_tensor
2022-05-02T21:26:00.1391542Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-02T21:26:00.1391915Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-02T21:26:00.1392297Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-02T21:26:00.1392585Z  distributed/fsdp/test_utils
2022-05-02T21:26:00.1392883Z  distributed/nn/jit/test_instantiator
2022-05-02T21:26:00.1393167Z  distributed/test_nccl
2022-05-02T21:26:00.1393466Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-02T21:26:00.1393787Z  distributed/_shard/test_sharder
2022-05-02T21:26:00.1394085Z  distributed/elastic/timer/api_test
2022-05-02T21:26:00.1394383Z  distributed/pipeline/sync/skip/test_api
2022-05-02T21:26:00.1394738Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-02T21:26:00.1395090Z  distributed/pipeline/sync/skip/test_portal
2022-05-02T21:26:00.1395424Z  distributed/pipeline/sync/skip/test_tracker
2022-05-02T21:26:00.1395935Z  distributed/pipeline/sync/test_balance
2022-05-02T21:26:00.1396288Z  distributed/pipeline/sync/test_checkpoint
2022-05-02T21:26:00.1396635Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-02T21:26:00.1396953Z  distributed/pipeline/sync/test_inplace
2022-05-02T21:26:00.1397269Z  distributed/pipeline/sync/test_phony
2022-05-02T21:26:00.1397587Z  distributed/pipeline/sync/test_pipeline
2022-05-02T21:26:00.1397903Z  distributed/pipeline/sync/test_transparency
2022-05-02T21:26:00.1398221Z  distributed/rpc/test_faulty_agent
```

tests ran on second shard for distributed on this PR (56 tests)
```
2022-05-02T21:26:55.1342892Z Selected tests:
2022-05-02T21:26:55.1343201Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-02T21:26:55.1343526Z  distributed/fsdp/test_fsdp_core
2022-05-02T21:26:55.1343829Z  distributed/test_c10d_nccl
2022-05-02T21:26:55.1344089Z  distributed/test_c10d_gloo
2022-05-02T21:26:55.1344408Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-02T21:26:55.1344749Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-02T21:26:55.1345085Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-02T21:26:55.1345423Z  distributed/fsdp/test_fsdp_optim_state
2022-05-02T21:26:55.1345773Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-02T21:26:55.1346088Z  distributed/fsdp/test_fsdp_state_dict
2022-05-02T21:26:55.1346379Z  distributed/test_store
2022-05-02T21:26:55.1346661Z  distributed/test_c10d_spawn_gloo
2022-05-02T21:26:55.1346966Z  distributed/test_pg_wrapper
2022-05-02T21:26:55.1347252Z  distributed/test_c10d_spawn_nccl
2022-05-02T21:26:55.1347565Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-02T21:26:55.1347871Z  distributed/fsdp/test_wrap
2022-05-02T21:26:55.1348369Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-02T21:26:55.1348679Z  distributed/algorithms/test_join
2022-05-02T21:26:55.1349004Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-02T21:26:55.1349305Z  distributed/fsdp/test_fsdp_comm
2022-05-02T21:26:55.1349593Z  distributed/test_c10d_common
2022-05-02T21:26:55.1349885Z  distributed/fsdp/test_fsdp_meta
2022-05-02T21:26:55.1350171Z  distributed/fsdp/test_fsdp_exec_order
2022-05-02T21:26:55.1350486Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-02T21:26:55.1350798Z  distributed/fsdp/test_fsdp_overlap
2022-05-02T21:26:55.1351105Z  distributed/elastic/timer/local_timer_example
2022-05-02T21:26:55.1351423Z  distributed/fsdp/test_fsdp_input
2022-05-02T21:26:55.1351749Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-02T21:26:55.1352190Z  distributed/elastic/timer/local_timer_test
2022-05-02T21:26:55.1352520Z  distributed/elastic/utils/distributed_test
2022-05-02T21:26:55.1352841Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-02T21:26:55.1353150Z  distributed/test_data_parallel
2022-05-02T21:26:55.1353437Z  distributed/fsdp/test_fsdp_traversal
2022-05-02T21:26:55.1353792Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-02T21:26:55.1354174Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-02T21:26:55.1354534Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-02T21:26:55.1354858Z  distributed/test_launcher
2022-05-02T21:26:55.1355149Z  distributed/elastic/utils/util_test
2022-05-02T21:26:55.1355441Z  distributed/elastic/utils/logging_test
2022-05-02T21:26:55.1355755Z  distributed/elastic/metrics/api_test
2022-05-02T21:26:55.1356095Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-02T21:26:55.1356455Z  distributed/_shard/test_replicated_tensor
2022-05-02T21:26:55.1356754Z  distributed/elastic/events/lib_test
2022-05-02T21:26:55.1357065Z  distributed/fsdp/test_shard_utils
2022-05-02T21:26:55.1357387Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-02T21:26:55.1357702Z  distributed/pipeline/sync/skip/test_leak
2022-05-02T21:26:55.1358040Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-02T21:26:55.1358396Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-02T21:26:55.1358716Z  distributed/pipeline/sync/test_bugs
2022-05-02T21:26:55.1359027Z  distributed/pipeline/sync/test_copy
2022-05-02T21:26:55.1359350Z  distributed/pipeline/sync/test_dependency
2022-05-02T21:26:55.1359662Z  distributed/pipeline/sync/test_microbatch
2022-05-02T21:26:55.1359983Z  distributed/pipeline/sync/test_pipe
2022-05-02T21:26:55.1360299Z  distributed/pipeline/sync/test_stream
2022-05-02T21:26:55.1360593Z  distributed/pipeline/sync/test_worker
2022-05-02T21:26:55.1360912Z  distributed/rpc/test_tensorpipe_agent
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76564
Approved by: https://github.com/jeffdaily, https://github.com/janeyx99
2022-05-03 23:01:42 +00:00
Junjie Wang (PyTorch)
7c44d560ba [PT-D][Sharding] Enable ops needed in the transformer model training (#75374)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75374

From the code base of FairSeq and MetaSeq codebase (which is essentially a transformer model), we have found that loads of ops are not supported by sharded tensor. So we now implement a simple version so that we can at least run a transformer example:

Ops include: chuck, transpose, view, mask_fill, dropout, softmax and type_as.

Isolate the common logic of registering simple ops into a function and for future register, we just need to implement at most three functions for a new op.

ghstack-source-id: 155309147

Test Plan: CI

Reviewed By: pritamdamania87

Differential Revision: D35123021

fbshipit-source-id: 660e559fb8b4a910eb63e0586c63ab927873a2ce
(cherry picked from commit 83a87ebf627d863448dfe1019c7c5f7112cc14ab)
2022-05-03 17:20:28 +00:00
Junjie Wang (PyTorch)
c1037d0d4c [PT-D][Sharding] Move Partial Tensor to the _shard folder and add logic to remove padding (#76199)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76199

Since Partial Tensor is somehow isolated to sharded tensor. We now move it to the _shard folder.

Also, we added the logic to remove paddings when the size is not divisible by the world size. Modify the unit test to reflect this changes.

Finally, we need to consider the placement order for the reshading spec for partial tensor, related logic is added in this change. Futhermore, for sharded linear, we will need to order the placement by rank to get the expected local result.
ghstack-source-id: 154853290

Test Plan: CI

Reviewed By: pritamdamania87, wanchaol

Differential Revision: D35827894

fbshipit-source-id: 58dab77969b8b6557f42afa7e8f5a8a053dd5793
(cherry picked from commit abeb28f16582dcf707c2e165f39df6caf692384d)
2022-04-28 06:22:02 +00:00
Alban Desmaison
3d7abc0e55 Make -h work with run_test.py
As per title.

### When running `python run_test.py -h`
It used to show:
- The general unittest parser help that we print via a second thread 35545d85dc/torch/testing/_internal/common_utils.py (L467-L470)
- The common_utils's parser help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]]

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
```

</p>
</details>

It now prints:
- The general unittest parser help the same way. Should we remove this? We can't merge them unfortunately as inittest does not accept parent / does not expose the parser for us to take it as a parent.
- The combined common_utils + run_test parsers help

<details><summary>Full result</summary>
<p>

```bash
$ python run_test.py -h
usage: run_test.py [-h] [-v] [-q] [--locals] [-f] [-c] [-b] [-k TESTNAMEPATTERNS] [tests [tests ...]]

positional arguments:
  tests                a list of any number of test modules, classes and test methods.

optional arguments:
  -h, --help           show this help message and exit
  -v, --verbose        Verbose output
  -q, --quiet          Quiet output
  --locals             Show local variables in tracebacks
  -f, --failfast       Stop on first fail or error
  -c, --catch          Catch Ctrl-C and display results so far
  -b, --buffer         Buffer stdout and stderr during tests
  -k TESTNAMEPATTERNS  Only run tests which match the given substring

Examples:
  run_test.py                           - run default set of tests
  run_test.py MyTestSuite               - run suite 'MyTestSuite'
  run_test.py MyTestCase.testSomething  - run MyTestCase.testSomething
  run_test.py MyTestCase                - run all 'test*' test methods
                                       in MyTestCase

Ignoring disabled issues:  []
usage: run_test.py [-h] [--subprocess] [--seed SEED] [--accept] [--jit_executor JIT_EXECUTOR] [--repeat REPEAT] [--test_bailouts]
                   [--save-xml [SAVE_XML]] [--discover-tests] [--log-suffix LOG_SUFFIX] [--run-parallel RUN_PARALLEL]
                   [--import-slow-tests [IMPORT_SLOW_TESTS]] [--import-disabled-tests [IMPORT_DISABLED_TESTS]] [-v] [--jit]
                   [--distributed-tests] [-core] [-pt] [-c] [-i TESTS [TESTS ...]] [-x TESTS [TESTS ...]] [-f TESTS] [-l TESTS]
                   [--bring-to-front TESTS [TESTS ...]] [--ignore-win-blocklist] [--continue-through-error]
                   [--export-past-test-times [EXPORT_PAST_TEST_TIMES]] [--shard SHARD SHARD] [--exclude-jit-executor]
                   [--exclude-distributed-tests] [--run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]]
                   [--use-specified-test-cases-by {include,bring-to-front}] [--dry-run]
                   [additional_unittest_args [additional_unittest_args ...]]

Run the PyTorch unit test suite

positional arguments:
  additional_unittest_args
                        additional arguments passed through to unittest, e.g., python run_test.py -i sparse -- TestSparse.test_factory_size_check

optional arguments:
  -h, --help            show this help message and exit
  --subprocess          whether to run each test in a subprocess
  --seed SEED
  --accept
  --jit_executor JIT_EXECUTOR
  --repeat REPEAT
  --test_bailouts
  --save-xml [SAVE_XML]
  --discover-tests
  --log-suffix LOG_SUFFIX
  --run-parallel RUN_PARALLEL
  --import-slow-tests [IMPORT_SLOW_TESTS]
  --import-disabled-tests [IMPORT_DISABLED_TESTS]
  -v, --verbose         print verbose information and test-by-test results
  --jit, --jit          run all jit tests
  --distributed-tests, --distributed-tests
                        run all distributed tests
  -core, --core         Only run core tests, or tests that validate PyTorch's ops, modules,and autograd. They are defined by CORE_TEST_LIST.
  -pt, --pytest         If true, use `pytest` to execute the tests. E.g., this runs TestTorch with pytest in verbose and coverage mode: python run_test.py -vci torch -pt
  -c, --coverage        enable coverage
  -i TESTS [TESTS ...], --include TESTS [TESTS ...]
                        select a set of tests to include (defaults to ALL tests). tests must be a part of the TESTS list defined in run_test.py
  -x TESTS [TESTS ...], --exclude TESTS [TESTS ...]
                        select a set of tests to exclude
  -f TESTS, --first TESTS
                        select the test to start from (excludes previous tests)
  -l TESTS, --last TESTS
                        select the last test to run (excludes following tests)
  --bring-to-front TESTS [TESTS ...]
                        select a set of tests to run first. This can be used in situations where you want to run all tests, but care more about some set, e.g. after making a change to a specific component
  --ignore-win-blocklist
                        always run blocklisted windows tests
  --continue-through-error
                        Runs the full test suite despite one of the tests failing
  --export-past-test-times [EXPORT_PAST_TEST_TIMES]
                        dumps test times from previous S3 stats into a file, format JSON
  --shard SHARD SHARD   runs a shard of the tests (taking into account other selections), e.g., --shard 2 3 will break up the selected tests into 3 shards and run the tests in the 2nd shard (the first number should not exceed the second)
  --exclude-jit-executor
                        exclude tests that are run for a specific jit config
  --exclude-distributed-tests
                        exclude distributed tests
  --run-specified-test-cases [RUN_SPECIFIED_TEST_CASES]
                        load specified test cases file dumped from previous OSS CI stats, format CSV.  If all test cases should run for a <test_module> please add a single row:
                         test_filename,test_case_name
                         ...
                         <test_module>,__all__
                         ...
                        how we use the stats will be based on option "--use-specified-test-cases-by".
  --use-specified-test-cases-by {include,bring-to-front}
                        used together with option "--run-specified-test-cases". When specified test case file is set, this option allows the user to control whether to only run the specified test modules or to simply bring the specified modules to front and also run the remaining modules. Note: regardless of this option, we will only run the specified test cases  within a specified test module. For unspecified test modules with the bring-to-front option, all test cases will be run, as one may expect.
  --dry-run             Only list the test that will run.

where TESTS is any of: benchmark_utils/test_benchmark_utils, distributed/_shard/sharded_optim/test_sharded_optim, distributed/_shard/sharded_tensor/ops/test_binary_cmp, distributed/_shard/sharded_tensor/ops/test_elementwise_ops, distributed/_shard/sharded_tensor/ops/test_embedding, distributed/_shard/sharded_tensor/ops/test_embedding_bag, distributed/_shard/sharded_tensor/ops/test_init, distributed/_shard/sharded_tensor/ops/test_linear, distributed/_shard/sharded_tensor/ops/test_math_ops, distributed/_shard/sharded_tensor/test_megatron_prototype, distributed/_shard/sharded_tensor/test_partial_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor, distributed/_shard/sharded_tensor/test_sharded_tensor_reshard, distributed/_shard/sharding_spec/test_sharding_spec, distributed/_shard/test_replicated_tensor, distributed/algorithms/test_join, distributed/elastic/events/lib_test, distributed/elastic/metrics/api_test, distributed/elastic/multiprocessing/api_test, distributed/elastic/timer/api_test, distributed/elastic/timer/local_timer_example, distributed/elastic/timer/local_timer_test, distributed/elastic/utils/distributed_test, distributed/elastic/utils/logging_test, distributed/elastic/utils/util_test, distributed/fsdp/test_flatten_params_wrapper, distributed/fsdp/test_fsdp_apply, distributed/fsdp/test_fsdp_checkpoint, distributed/fsdp/test_fsdp_clip_grad_norm, distributed/fsdp/test_fsdp_comm, distributed/fsdp/test_fsdp_core, distributed/fsdp/test_fsdp_freezing_weights, distributed/fsdp/test_fsdp_grad_acc, distributed/fsdp/test_fsdp_ignored_modules, distributed/fsdp/test_fsdp_input, distributed/fsdp/test_fsdp_memory, distributed/fsdp/test_fsdp_mixed_precision, distributed/fsdp/test_fsdp_multiple_forward, distributed/fsdp/test_fsdp_multiple_wrapping, distributed/fsdp/test_fsdp_optim_state, distributed/fsdp/test_fsdp_overlap, distributed/fsdp/test_fsdp_pure_fp16, distributed/fsdp/test_fsdp_state_dict, distributed/fsdp/test_fsdp_summon_full_params, distributed/fsdp/test_fsdp_traversal, distributed/fsdp/test_fsdp_uneven, distributed/fsdp/test_shard_utils, distributed/fsdp/test_utils, distributed/fsdp/test_wrap, distributed/nn/jit/test_instantiator, distributed/optim/test_zero_redundancy_optimizer, distributed/pipeline/sync/skip/test_api, distributed/pipeline/sync/skip/test_gpipe, distributed/pipeline/sync/skip/test_inspect_skip_layout, distributed/pipeline/sync/skip/test_leak, distributed/pipeline/sync/skip/test_portal, distributed/pipeline/sync/skip/test_stash_pop, distributed/pipeline/sync/skip/test_tracker, distributed/pipeline/sync/skip/test_verify_skippables, distributed/pipeline/sync/test_balance, distributed/pipeline/sync/test_bugs, distributed/pipeline/sync/test_checkpoint, distributed/pipeline/sync/test_copy, distributed/pipeline/sync/test_deferred_batch_norm, distributed/pipeline/sync/test_dependency, distributed/pipeline/sync/test_inplace, distributed/pipeline/sync/test_microbatch, distributed/pipeline/sync/test_phony, distributed/pipeline/sync/test_pipe, distributed/pipeline/sync/test_pipeline, distributed/pipeline/sync/test_stream, distributed/pipeline/sync/test_transparency, distributed/pipeline/sync/test_worker, distributed/rpc/cuda/test_tensorpipe_agent, distributed/rpc/test_faulty_agent, distributed/rpc/test_tensorpipe_agent, distributed/test_c10d_common, distributed/test_c10d_gloo, distributed/test_c10d_nccl, distributed/test_c10d_spawn_gloo, distributed/test_c10d_spawn_nccl, distributed/test_data_parallel, distributed/test_distributed_spawn, distributed/test_launcher, distributed/test_nccl, distributed/test_pg_wrapper, distributed/test_store, distributions/test_constraints, distributions/test_distributions, lazy/test_bindings, lazy/test_extract_compiled_graph, lazy/test_ts_opinfo, test_ao_sparsity, test_autocast, test_autograd, test_binary_ufuncs, test_bundled_inputs, test_complex, test_cpp_api_parity, test_cpp_extensions_aot_ninja, test_cpp_extensions_aot_no_ninja, test_cpp_extensions_jit, test_cuda, test_cuda_primary_ctx, test_dataloader, test_datapipe, test_deploy, test_deploy, test_dispatch, test_expanded_weights, test_foreach, test_function_schema, test_functional_autograd_benchmark, test_functional_optim, test_functionalization, test_futures, test_fx, test_fx_experimental, test_hub, test_import_stats, test_indexing, test_jit, test_jit_autocast, test_jit_cuda_fuser, test_jit_disabled, test_jit_fuser_legacy, test_jit_fuser_te, test_jit_legacy, test_jit_profiling, test_license, test_linalg, test_logging, test_masked, test_mkldnn, test_mobile_optimizer, test_model_dump, test_module_init, test_modules, test_monitor, test_multiprocessing, test_multiprocessing_spawn, test_namedtensor, test_namedtuple_return_api, test_native_functions, test_nestedtensor, test_nn, test_numba_integration, test_numpy_interop, test_openmp, test_ops, test_ops_gradients, test_ops_jit, test_optim, test_overrides, test_package, test_per_overload_api, test_profiler, test_pruning_op, test_public_bindings, test_python_dispatch, test_pytree, test_quantization, test_reductions, test_scatter_gather_ops, test_serialization, test_set_default_mobile_cpu_allocator, test_shape_ops, test_show_pickle, test_sort_and_select, test_sparse, test_sparse_csr, test_spectral_ops, test_stateless, test_tensor_creation_ops, test_tensorboard, test_tensorexpr, test_tensorexpr_pybind, test_testing, test_torch, test_type_hints, test_type_info, test_type_promotion, test_unary_ufuncs, test_utils, test_view_ops, test_vmap, test_vulkan, test_xnnpack_integration
```

</p>
</details>

### When running anything else (for example  `python test_autograd.py -h`)
It did not change and still does:
- The general unittest parser help that we print via a second thread
- The common_utils's parser help
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76152
Approved by: https://github.com/malfet, https://github.com/seemethere
2022-04-25 14:01:33 +00:00
Wanchao Liang
78ea86a445 [shard] Sharder and ShardingPlan prototype (#73873)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73873

Basic ShardingPlan interface and Sharder implemention:
1. We provide `ShardingPlan` to allow user to specify all parameter sharding strategies for a given model, this including `plan` for sharding the parameters, and `output_plan` for tagging the output layout, `return_local_tensor` for converting back to DDP.
2. Introduce `shard_module` API, that could take a nn.Module, a ShardingPlan, then shard the module according to the plan.

TODO:
next PR we will introduce Extensible Sharder and ShardingPlanner.
ghstack-source-id: 154682421

Test Plan: test_sharding_plann.py

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D34695159

fbshipit-source-id: 3d695803c4b7e9a7543177ade5b709b5f847baa9
(cherry picked from commit 670cd279b0e5304a9bf0ce6e6651a08273a77035)
2022-04-25 13:01:24 +00:00
Jeff Daily
44bbb247a6 [ROCm] enable fsdp tests
Pull Request resolved: https://github.com/pytorch/pytorch/pull/75632
Approved by: https://github.com/kumpera, https://github.com/malfet
2022-04-22 19:50:36 +00:00
wanchaol
be354d8139 [shard] Add basic math ops to ShardedTensor and add ReplicatedTensor inter-op
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73703

This PR add basic math ops to ShardedTensor (+-*/), and add ReplicatedTensor inter-op ShardedTensor to those math ops. This enables ShardedTensor (op) ReplicatedTensor to avoid communication in certain cases.

Differential Revision: [D34560867](https://our.internmc.facebook.com/intern/diff/D34560867/)

**NOTE FOR REVIEWERS**: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.internmc.facebook.com/intern/diff/D34560867/)!

Approved by: https://github.com/pritamdamania87
2022-04-12 04:25:10 +00:00
Andrey Talman
622cff3e95 Cuda 11.6 Disable failing tests (#75420)
Summary:
This mitigates number of issues with CUDA 11.6 update and updates Linux driver .

New issues discovered
#[75391](https://github.com/pytorch/pytorch/issues/75391)
#[75375](https://github.com/pytorch/pytorch/issues/75375)

Old issue present since 11.3
#[57482](https://github.com/pytorch/pytorch/issues/57482)
#[70111](https://github.com/pytorch/pytorch/issues/70111)

These changes already testsed WIP PR:
#[75337](https://github.com/pytorch/pytorch/pull/75337)

Pull Request resolved: https://github.com/pytorch/pytorch/pull/75420

Reviewed By: seemethere

Differential Revision: D35481973

Pulled By: atalman

fbshipit-source-id: 4db00c646e2df4f8650404763963c3b215110f1f
(cherry picked from commit 518e19dc361b43273f5bd6bdfff942614e8466f5)
2022-04-07 22:43:15 +00:00
Brian Hirsh
9429dbb434 make functionalization work better with subclasses
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73441

Approved by: https://github.com/ezyang, https://github.com/albanD
2022-04-04 15:33:27 +00:00
David Berard
27deefb5e1 [JIT] Enable NVFuser tests in OSS CI (#73322)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73322

These tests have been disabled in OSS CI since #34785.

Test Plan: Imported from OSS

Reviewed By: eellison

Differential Revision: D34436844

Pulled By: davidberard98

fbshipit-source-id: c5b14b33e7f369a6fa1e9cfbcb484a30dffc659e
(cherry picked from commit b08f51587c0203c3e8b69f06ea613759e740aa4f)
2022-04-01 23:48:30 +00:00
Wanchao Liang
0524b2829a [shard] Add ReplicatedTensor (#73529)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73529

Add ReplicatedTensor, a ReplicatedTensor is a type of tensor that have the same value on all ranks across the world_size.

ReplicatedTensor is a :class:`~torch.Tensor` subclass, and it could be used together with ShardedTensor/Tensor together to express different types of computation. The inter-op rules defined as (using torch.add as an example op):
    ReplicatedTensor + ReplicatedTensor = ReplicatedTensor
    ReplicatedTensor + torch.Tensor = torch.Tensor
    ReplicatedTensor + ShardedTensor = ShardedTensor

We also added a `validate()` API to help user validate if a replicated tensor on certain process_group is truly replicated or not.

TODO: next PR gonna add ShardedTensor/PartialTensor logic to handle ReplicatedTensor.
ghstack-source-id: 152064781

Test Plan: test_replicated_tensor

Reviewed By: pritamdamania87, fduwjj

Differential Revision: D34529374

fbshipit-source-id: 16ccb300e9f9c47ac29a17eb6d46d029ab7d60b8
(cherry picked from commit 44f4e11e795a1bf330a8108bda256950ca769525)
2022-03-24 12:41:17 +00:00
Jeff Daily
956a028b55 [ROCm] enable HIP IPC
Enables code paths that use hipIpc* functions.  Also enables test_multiprocessing.py.
Pull Request resolved: https://github.com/pytorch/pytorch/pull/74383
Approved by: https://github.com/osalpekar
2022-03-21 19:32:01 +00:00
Sahan Paliskara
0bfa2f8255 Move torch::deploy tests to their own workflow job (#73676)
Summary:
Pull Request resolved: https://github.com/pytorch/pytorch/pull/73676

For some reason https://github.com/pytorch/pytorch/pull/72637 ended up in getting messed up during rebasing so please refer to that pr for review history.

This PR creates a new workflow called ` deploy-linux-xenial-cuda11.3-py3.7-gcc7` for torch::deploy tests.

For testing go to https://www.torch-ci.com/pytorch/pytorch/pull/73676 and check if a build and test job occur with ` deploy-linux-xenial-cuda11.3-py3.7-gcc7`

Test Plan: Imported from OSS

Reviewed By: soulitzer

Differential Revision: D34586702

Pulled By: PaliC

fbshipit-source-id: 5627cf4ff411a4a04030f8b7726f84af979da213
(cherry picked from commit df6dddebb9fe078a6053a31033b5a40cc742fcf3)
2022-03-17 12:19:48 +00:00
atalman
ebca80ed08 Move test ops gradients and test ops jit to separate files
Fixes #72368

As per reference issue, the test_ops in single file takes around 3:30-4:00Hrs to execute on asan jobs:

Reference : pytorch_test_times.json

```
{
    "commit": "39535fec6c3ff5bf7c2d322d096c59571c3295ed",
    "JOB_BASE_NAME": "linux-xenial-py3.7-clang7-asan",
    "job_times": {
        "test_ops": 14928.355000000636, <- This test group is over 4hrs alone
```
----

Hence separating  test_ops into following parts:
1. TestGradients
2. TestJit
3.  TestCommon and TestMathBits

Pull Request resolved: https://github.com/pytorch/pytorch/pull/74297
Approved by: https://github.com/malfet
2022-03-17 02:07:50 +00:00