Tensors and Dynamic neural networks in Python with strong GPU acceleration
Find a file
Catherine Lee 56ea57de61 shard pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed 1->2
Fixes #ISSUE_NUMBER

shard `pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed ...` from 1 shard to 2

Pros:
- It currently takes about 2.6 hours and is 3rd longest running job on pull
- Theoretically minimal overhead

Cons:
- Requires changes to the run_test.py which might have correctness issues

Notes:
- Cannot shard further as one of the test files is responsible for about half of the total run time

spreadsheet regarding sharding: https://docs.google.com/spreadsheets/d/1BdtVsjRr0Is9LXMNilR02FEdPXNq7zEWl8AmR3ArsLQ/edit#gid=1153012347

Test Plan:
<details><summary>expand to see test plan (its long)</summary>

tests from a commit ran on master (90 tests ran)
```
2022-05-03T12:45:34.7974184Z Selected tests:
2022-05-03T12:45:34.7974495Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-03T12:45:34.7974839Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-03T12:45:34.7975209Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-03T12:45:34.7975575Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-03T12:45:34.7976180Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-03T12:45:34.7976802Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-03T12:45:34.7977361Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-03T12:45:34.7978157Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-03T12:45:34.7978879Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-03T12:45:34.7979594Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-03T12:45:34.7980366Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-03T12:45:34.7981066Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-03T12:45:34.7981877Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-03T12:45:34.7982387Z  distributed/_shard/test_partial_tensor
2022-05-03T12:45:34.7982691Z  distributed/_shard/test_replicated_tensor
2022-05-03T12:45:34.7982994Z  distributed/_shard/test_sharder
2022-05-03T12:45:34.7983280Z  distributed/algorithms/test_join
2022-05-03T12:45:34.7983695Z  distributed/elastic/events/lib_test
2022-05-03T12:45:34.7983984Z  distributed/elastic/metrics/api_test
2022-05-03T12:45:34.7984308Z  distributed/elastic/multiprocessing/api_test
2022-05-03T12:45:34.7984624Z  distributed/elastic/timer/api_test
2022-05-03T12:45:34.7984924Z  distributed/elastic/timer/local_timer_example
2022-05-03T12:45:34.7985254Z  distributed/elastic/timer/local_timer_test
2022-05-03T12:45:34.7985575Z  distributed/elastic/utils/distributed_test
2022-05-03T12:45:34.7985889Z  distributed/elastic/utils/logging_test
2022-05-03T12:45:34.7986176Z  distributed/elastic/utils/util_test
2022-05-03T12:45:34.7986492Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-03T12:45:34.7986799Z  distributed/fsdp/test_fsdp_apply
2022-05-03T12:45:34.7987078Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-03T12:45:34.7987388Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-03T12:45:34.7987691Z  distributed/fsdp/test_fsdp_comm
2022-05-03T12:45:34.7987961Z  distributed/fsdp/test_fsdp_core
2022-05-03T12:45:34.7988251Z  distributed/fsdp/test_fsdp_exec_order
2022-05-03T12:45:34.7988570Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-03T12:45:34.7988865Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-03T12:45:34.7989176Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-03T12:45:34.7989478Z  distributed/fsdp/test_fsdp_input
2022-05-03T12:45:34.7989950Z  distributed/fsdp/test_fsdp_memory
2022-05-03T12:45:34.7990241Z  distributed/fsdp/test_fsdp_meta
2022-05-03T12:45:34.7990640Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-03T12:45:34.7990964Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-03T12:45:34.7991293Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-03T12:45:34.7991610Z  distributed/fsdp/test_fsdp_optim_state
2022-05-03T12:45:34.7991895Z  distributed/fsdp/test_fsdp_overlap
2022-05-03T12:45:34.7992195Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-03T12:45:34.7992500Z  distributed/fsdp/test_fsdp_state_dict
2022-05-03T12:45:34.7992818Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-03T12:45:34.7993117Z  distributed/fsdp/test_fsdp_traversal
2022-05-03T12:45:34.7993861Z  distributed/fsdp/test_fsdp_uneven
2022-05-03T12:45:34.7994181Z  distributed/fsdp/test_shard_utils
2022-05-03T12:45:34.7994447Z  distributed/fsdp/test_utils
2022-05-03T12:45:34.7994721Z  distributed/fsdp/test_wrap
2022-05-03T12:45:34.7995015Z  distributed/nn/jit/test_instantiator
2022-05-03T12:45:34.7995328Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-03T12:45:34.7995664Z  distributed/pipeline/sync/skip/test_api
2022-05-03T12:45:34.7995983Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-03T12:45:34.7996315Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-03T12:45:34.7996652Z  distributed/pipeline/sync/skip/test_leak
2022-05-03T12:45:34.7996977Z  distributed/pipeline/sync/skip/test_portal
2022-05-03T12:45:34.7997292Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-03T12:45:34.7997623Z  distributed/pipeline/sync/skip/test_tracker
2022-05-03T12:45:34.7997968Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-03T12:45:34.7998301Z  distributed/pipeline/sync/test_balance
2022-05-03T12:45:34.7998591Z  distributed/pipeline/sync/test_bugs
2022-05-03T12:45:34.7998927Z  distributed/pipeline/sync/test_checkpoint
2022-05-03T12:45:34.7999243Z  distributed/pipeline/sync/test_copy
2022-05-03T12:45:34.7999557Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-03T12:45:34.7999896Z  distributed/pipeline/sync/test_dependency
2022-05-03T12:45:34.8000215Z  distributed/pipeline/sync/test_inplace
2022-05-03T12:45:34.8000516Z  distributed/pipeline/sync/test_microbatch
2022-05-03T12:45:34.8000826Z  distributed/pipeline/sync/test_phony
2022-05-03T12:45:34.8001130Z  distributed/pipeline/sync/test_pipe
2022-05-03T12:45:34.8001424Z  distributed/pipeline/sync/test_pipeline
2022-05-03T12:45:34.8001733Z  distributed/pipeline/sync/test_stream
2022-05-03T12:45:34.8002055Z  distributed/pipeline/sync/test_transparency
2022-05-03T12:45:34.8002353Z  distributed/pipeline/sync/test_worker
2022-05-03T12:45:34.8002672Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-03T12:45:34.8002982Z  distributed/rpc/test_faulty_agent
2022-05-03T12:45:34.8003270Z  distributed/rpc/test_tensorpipe_agent
2022-05-03T12:45:34.8003568Z  distributed/test_c10d_common
2022-05-03T12:45:34.8003839Z  distributed/test_c10d_gloo
2022-05-03T12:45:34.8004088Z  distributed/test_c10d_nccl
2022-05-03T12:45:34.8004369Z  distributed/test_c10d_spawn_gloo
2022-05-03T12:45:34.8004656Z  distributed/test_c10d_spawn_nccl
2022-05-03T12:45:34.8004938Z  distributed/test_data_parallel
2022-05-03T12:45:34.8005212Z  distributed/test_distributed_spawn
2022-05-03T12:45:34.8005496Z  distributed/test_launcher
2022-05-03T12:45:34.8005767Z  distributed/test_nccl
2022-05-03T12:45:34.8006019Z  distributed/test_pg_wrapper
2022-05-03T12:45:34.8006285Z  distributed/test_store
```

tests ran on first shard for distributed on this PR (34 tests)
```
2022-05-02T21:26:00.1385256Z Selected tests:
2022-05-02T21:26:00.1385767Z  distributed/test_distributed_spawn
2022-05-02T21:26:00.1386403Z  distributed/elastic/multiprocessing/api_test
2022-05-02T21:26:00.1387051Z  distributed/fsdp/test_fsdp_memory
2022-05-02T21:26:00.1387607Z  distributed/fsdp/test_fsdp_ignored_modules
2022-05-02T21:26:00.1388179Z  distributed/fsdp/test_fsdp_apply
2022-05-02T21:26:00.1388600Z  distributed/_shard/sharded_tensor/ops/test_binary_cmp
2022-05-02T21:26:00.1389181Z  distributed/_shard/sharding_spec/test_sharding_spec
2022-05-02T21:26:00.1389545Z  distributed/_shard/sharded_tensor/ops/test_linear
2022-05-02T21:26:00.1389878Z  distributed/fsdp/test_fsdp_uneven
2022-05-02T21:26:00.1390186Z  distributed/fsdp/test_fsdp_multiple_wrapping
2022-05-02T21:26:00.1390526Z  distributed/fsdp/test_fsdp_multiple_forward
2022-05-02T21:26:00.1390877Z  distributed/_shard/sharded_tensor/ops/test_embedding
2022-05-02T21:26:00.1391219Z  distributed/_shard/test_partial_tensor
2022-05-02T21:26:00.1391542Z  distributed/_shard/sharded_optim/test_sharded_optim
2022-05-02T21:26:00.1391915Z  distributed/_shard/sharded_tensor/ops/test_elementwise_ops
2022-05-02T21:26:00.1392297Z  distributed/fsdp/test_flatten_params_wrapper
2022-05-02T21:26:00.1392585Z  distributed/fsdp/test_utils
2022-05-02T21:26:00.1392883Z  distributed/nn/jit/test_instantiator
2022-05-02T21:26:00.1393167Z  distributed/test_nccl
2022-05-02T21:26:00.1393466Z  distributed/_shard/sharding_plan/test_sharding_plan
2022-05-02T21:26:00.1393787Z  distributed/_shard/test_sharder
2022-05-02T21:26:00.1394085Z  distributed/elastic/timer/api_test
2022-05-02T21:26:00.1394383Z  distributed/pipeline/sync/skip/test_api
2022-05-02T21:26:00.1394738Z  distributed/pipeline/sync/skip/test_inspect_skip_layout
2022-05-02T21:26:00.1395090Z  distributed/pipeline/sync/skip/test_portal
2022-05-02T21:26:00.1395424Z  distributed/pipeline/sync/skip/test_tracker
2022-05-02T21:26:00.1395935Z  distributed/pipeline/sync/test_balance
2022-05-02T21:26:00.1396288Z  distributed/pipeline/sync/test_checkpoint
2022-05-02T21:26:00.1396635Z  distributed/pipeline/sync/test_deferred_batch_norm
2022-05-02T21:26:00.1396953Z  distributed/pipeline/sync/test_inplace
2022-05-02T21:26:00.1397269Z  distributed/pipeline/sync/test_phony
2022-05-02T21:26:00.1397587Z  distributed/pipeline/sync/test_pipeline
2022-05-02T21:26:00.1397903Z  distributed/pipeline/sync/test_transparency
2022-05-02T21:26:00.1398221Z  distributed/rpc/test_faulty_agent
```

tests ran on second shard for distributed on this PR (56 tests)
```
2022-05-02T21:26:55.1342892Z Selected tests:
2022-05-02T21:26:55.1343201Z  distributed/rpc/cuda/test_tensorpipe_agent
2022-05-02T21:26:55.1343526Z  distributed/fsdp/test_fsdp_core
2022-05-02T21:26:55.1343829Z  distributed/test_c10d_nccl
2022-05-02T21:26:55.1344089Z  distributed/test_c10d_gloo
2022-05-02T21:26:55.1344408Z  distributed/fsdp/test_fsdp_summon_full_params
2022-05-02T21:26:55.1344749Z  distributed/fsdp/test_fsdp_mixed_precision
2022-05-02T21:26:55.1345085Z  distributed/optim/test_zero_redundancy_optimizer
2022-05-02T21:26:55.1345423Z  distributed/fsdp/test_fsdp_optim_state
2022-05-02T21:26:55.1345773Z  distributed/_shard/sharded_tensor/test_sharded_tensor
2022-05-02T21:26:55.1346088Z  distributed/fsdp/test_fsdp_state_dict
2022-05-02T21:26:55.1346379Z  distributed/test_store
2022-05-02T21:26:55.1346661Z  distributed/test_c10d_spawn_gloo
2022-05-02T21:26:55.1346966Z  distributed/test_pg_wrapper
2022-05-02T21:26:55.1347252Z  distributed/test_c10d_spawn_nccl
2022-05-02T21:26:55.1347565Z  distributed/fsdp/test_fsdp_clip_grad_norm
2022-05-02T21:26:55.1347871Z  distributed/fsdp/test_wrap
2022-05-02T21:26:55.1348369Z  distributed/fsdp/test_fsdp_grad_acc
2022-05-02T21:26:55.1348679Z  distributed/algorithms/test_join
2022-05-02T21:26:55.1349004Z  distributed/fsdp/test_fsdp_freezing_weights
2022-05-02T21:26:55.1349305Z  distributed/fsdp/test_fsdp_comm
2022-05-02T21:26:55.1349593Z  distributed/test_c10d_common
2022-05-02T21:26:55.1349885Z  distributed/fsdp/test_fsdp_meta
2022-05-02T21:26:55.1350171Z  distributed/fsdp/test_fsdp_exec_order
2022-05-02T21:26:55.1350486Z  distributed/fsdp/test_fsdp_checkpoint
2022-05-02T21:26:55.1350798Z  distributed/fsdp/test_fsdp_overlap
2022-05-02T21:26:55.1351105Z  distributed/elastic/timer/local_timer_example
2022-05-02T21:26:55.1351423Z  distributed/fsdp/test_fsdp_input
2022-05-02T21:26:55.1351749Z  distributed/_shard/sharded_tensor/ops/test_init
2022-05-02T21:26:55.1352190Z  distributed/elastic/timer/local_timer_test
2022-05-02T21:26:55.1352520Z  distributed/elastic/utils/distributed_test
2022-05-02T21:26:55.1352841Z  distributed/fsdp/test_fsdp_pure_fp16
2022-05-02T21:26:55.1353150Z  distributed/test_data_parallel
2022-05-02T21:26:55.1353437Z  distributed/fsdp/test_fsdp_traversal
2022-05-02T21:26:55.1353792Z  distributed/_shard/sharded_tensor/test_sharded_tensor_reshard
2022-05-02T21:26:55.1354174Z  distributed/_shard/sharded_tensor/ops/test_embedding_bag
2022-05-02T21:26:55.1354534Z  distributed/_shard/sharded_tensor/test_megatron_prototype
2022-05-02T21:26:55.1354858Z  distributed/test_launcher
2022-05-02T21:26:55.1355149Z  distributed/elastic/utils/util_test
2022-05-02T21:26:55.1355441Z  distributed/elastic/utils/logging_test
2022-05-02T21:26:55.1355755Z  distributed/elastic/metrics/api_test
2022-05-02T21:26:55.1356095Z  distributed/_shard/sharded_tensor/ops/test_math_ops
2022-05-02T21:26:55.1356455Z  distributed/_shard/test_replicated_tensor
2022-05-02T21:26:55.1356754Z  distributed/elastic/events/lib_test
2022-05-02T21:26:55.1357065Z  distributed/fsdp/test_shard_utils
2022-05-02T21:26:55.1357387Z  distributed/pipeline/sync/skip/test_gpipe
2022-05-02T21:26:55.1357702Z  distributed/pipeline/sync/skip/test_leak
2022-05-02T21:26:55.1358040Z  distributed/pipeline/sync/skip/test_stash_pop
2022-05-02T21:26:55.1358396Z  distributed/pipeline/sync/skip/test_verify_skippables
2022-05-02T21:26:55.1358716Z  distributed/pipeline/sync/test_bugs
2022-05-02T21:26:55.1359027Z  distributed/pipeline/sync/test_copy
2022-05-02T21:26:55.1359350Z  distributed/pipeline/sync/test_dependency
2022-05-02T21:26:55.1359662Z  distributed/pipeline/sync/test_microbatch
2022-05-02T21:26:55.1359983Z  distributed/pipeline/sync/test_pipe
2022-05-02T21:26:55.1360299Z  distributed/pipeline/sync/test_stream
2022-05-02T21:26:55.1360593Z  distributed/pipeline/sync/test_worker
2022-05-02T21:26:55.1360912Z  distributed/rpc/test_tensorpipe_agent
```
</details>
Pull Request resolved: https://github.com/pytorch/pytorch/pull/76564
Approved by: https://github.com/jeffdaily, https://github.com/janeyx99
2022-05-03 23:01:42 +00:00
.azure_pipelines
.circleci Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml"" 2022-04-28 02:04:57 +00:00
.ctags.d
.github shard pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed 1->2 2022-05-03 23:01:42 +00:00
.jenkins [ROCm] tests set PYTORCH_TESTING_DEVICE_ONLY_FOR="cuda" 2022-05-03 16:23:04 +00:00
.vscode
android Better error message for android-tests workflow 2022-05-02 16:56:29 +00:00
aten Bugfix NAN and Inf handling for scatter_reduce (amin and amax) 2022-05-03 20:10:30 +00:00
benchmarks [lint] upgrade mypy to latest version 2022-05-03 20:51:34 +00:00
binaries [RecordFunction] More effecient machinery to determine which callbacks to run. (#75807) 2022-04-19 20:46:16 +00:00
c10 [PyTorch] Remove dead store in intrusive_ptr dtor 2022-05-03 16:55:33 +00:00
caffe2 Remove breakpad dependency 2022-05-03 20:21:55 +00:00
cmake Remove breakpad dependency 2022-05-03 20:21:55 +00:00
docs Add TORCH_CPP_LOG_LEVEL to the docs 2022-05-03 17:01:11 +00:00
ios remove unneeded overload for nansum 2022-04-27 16:02:54 +00:00
modules Fix sign-compare in caffe2 2022-04-05 00:08:05 +00:00
mypy_plugins Pull request to run CI for #72556 (#73404) 2022-03-24 18:04:08 +00:00
scripts [torch.onnx] support torch.nn.functional.grid_sample 2022-05-02 22:07:58 +00:00
submodules
test shard pull / linux-xenial-cuda11.3-py3.7-gcc7 / test (distributed 1->2 2022-05-03 23:01:42 +00:00
third_party Remove breakpad dependency 2022-05-03 20:21:55 +00:00
tools [PyTorch][AMD] fix hipify_python (#76720) 2022-05-03 22:59:10 +00:00
torch [xplat] add static_cast where missing (#76756) 2022-05-03 22:59:10 +00:00
torchgen [lint] upgrade mypy to latest version 2022-05-03 20:51:34 +00:00
.bazelrc setup system includes for generated files on MacOS (#73591) 2022-03-02 20:56:24 +00:00
.bazelversion
.clang-format
.clang-tidy
.cmakelintrc
.coveragerc
.dockerignore
.flake8
.gdbinit
.gitattributes third_party: Fix build_bundled script 2022-04-25 21:06:03 +00:00
.gitignore add buck generated files to ignore list 2022-04-27 01:40:17 +00:00
.gitmodules Remove breakpad dependency 2022-05-03 20:21:55 +00:00
.isort.cfg
.lintrunner.toml [lint] upgrade mypy to latest version 2022-05-03 20:51:34 +00:00
aten.bzl ufunc codegen (#65851) 2022-03-01 00:33:40 +00:00
BUILD.bazel move generate-code into shared build structure (#75699) 2022-05-03 09:53:37 +00:00
build.bzl disable the //:generate-code target in Bazel (#76174) 2022-05-03 12:13:19 +00:00
CITATION
CMakeLists.txt Remove breakpad dependency 2022-05-03 20:21:55 +00:00
CODE_OF_CONDUCT.md
CODEOWNERS Revert "Revert "Allow specifying tags for aten operators in native_functions.yaml"" 2022-04-28 02:04:57 +00:00
CONTRIBUTING.md Add note about testing inconsistency 2022-04-12 00:54:18 +00:00
docker.Makefile Remove cuda 11.1 references (#73514) 2022-03-01 16:37:37 +00:00
Dockerfile Typo in Dockerfile 2022-04-04 17:23:26 +00:00
GLOSSARY.md
LICENSE [Model Averaging] Support hierarchical model averaging (#73285) 2022-03-04 18:29:36 +00:00
Makefile [lint] add actionlint to lintrunner 2022-04-15 04:03:54 +00:00
MANIFEST.in
mypy-strict.ini
mypy.ini [FSDP] exclude from typing (#74833) 2022-03-29 12:51:11 +00:00
NOTICE
pytest.ini
README.md Readme update to remove old python version 2022-04-28 17:10:10 +00:00
RELEASE.md Update RELEASE.md with steps to prepare before cutting RC 2022-03-21 22:35:04 +00:00
requirements-flake8.txt
requirements.txt
SECURITY.md
setup.py Decouple LTC from TS Backend using Lazy IR Builder 2022-04-28 02:07:02 +00:00
ubsan.supp
version.txt
WORKSPACE add benchmark to Bazel build (#71412) 2022-03-02 11:33:22 +00:00

PyTorch Logo


PyTorch is a Python package that provides two high-level features:

  • Tensor computation (like NumPy) with strong GPU acceleration
  • Deep neural networks built on a tape-based autograd system

You can reuse your favorite Python packages such as NumPy, SciPy, and Cython to extend PyTorch when needed.

Our trunk health (Continuous Integration signals) can be found at hud.pytorch.org.

More About PyTorch

At a granular level, PyTorch is a library that consists of the following components:

Component Description
torch a Tensor library like NumPy, with strong GPU support
torch.autograd a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch
torch.jit a compilation stack (TorchScript) to create serializable and optimizable models from PyTorch code
torch.nn a neural networks library deeply integrated with autograd designed for maximum flexibility
torch.multiprocessing Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training
torch.utils DataLoader and other utility functions for convenience

Usually, PyTorch is used either as:

  • A replacement for NumPy to use the power of GPUs.
  • A deep learning research platform that provides maximum flexibility and speed.

Elaborating Further:

A GPU-Ready Tensor Library

If you use NumPy, then you have used Tensors (a.k.a. ndarray).

Tensor illustration

PyTorch provides Tensors that can live either on the CPU or the GPU and accelerates the computation by a huge amount.

We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs such as slicing, indexing, math operations, linear algebra, reductions. And they are fast!

Dynamic Neural Networks: Tape-Based Autograd

PyTorch has a unique way of building neural networks: using and replaying a tape recorder.

Most frameworks such as TensorFlow, Theano, Caffe, and CNTK have a static view of the world. One has to build a neural network and reuse the same structure again and again. Changing the way the network behaves means that one has to start from scratch.

With PyTorch, we use a technique called reverse-mode auto-differentiation, which allows you to change the way your network behaves arbitrarily with zero lag or overhead. Our inspiration comes from several research papers on this topic, as well as current and past work such as torch-autograd, autograd, Chainer, etc.

While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date. You get the best of speed and flexibility for your crazy research.

Dynamic graph

Python First

PyTorch is not a Python binding into a monolithic C++ framework. It is built to be deeply integrated into Python. You can use it naturally like you would use NumPy / SciPy / scikit-learn etc. You can write your new neural network layers in Python itself, using your favorite libraries and use packages such as Cython and Numba. Our goal is to not reinvent the wheel where appropriate.

Imperative Experiences

PyTorch is designed to be intuitive, linear in thought, and easy to use. When you execute a line of code, it gets executed. There isn't an asynchronous view of the world. When you drop into a debugger or receive error messages and stack traces, understanding them is straightforward. The stack trace points to exactly where your code was defined. We hope you never spend hours debugging your code because of bad stack traces or asynchronous and opaque execution engines.

Fast and Lean

PyTorch has minimal framework overhead. We integrate acceleration libraries such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed. At the core, its CPU and GPU Tensor and neural network backends are mature and have been tested for years.

Hence, PyTorch is quite fast whether you run small or large neural networks.

The memory usage in PyTorch is extremely efficient compared to Torch or some of the alternatives. We've written custom memory allocators for the GPU to make sure that your deep learning models are maximally memory efficient. This enables you to train bigger deep learning models than before.

Extensions Without Pain

Writing new neural network modules, or interfacing with PyTorch's Tensor API was designed to be straightforward and with minimal abstractions.

You can write new neural network layers in Python using the torch API or your favorite NumPy-based libraries such as SciPy.

If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate. No wrapper code needs to be written. You can see a tutorial here and an example here.

Installation

Binaries

Commands to install binaries via Conda or pip wheels are on our website: https://pytorch.org/get-started/locally/

NVIDIA Jetson Platforms

Python wheels for NVIDIA's Jetson Nano, Jetson TX2, and Jetson AGX Xavier are provided here and the L4T container is published here

They require JetPack 4.2 and above, and @dusty-nv and @ptrblck are maintaining them.

From Source

If you are installing from source, you will need Python 3.7 or later and a C++14 compiler. Also, we highly recommend installing an Anaconda environment. You will get a high-quality BLAS library (MKL) and you get controlled dependency versions regardless of your Linux distro.

Once you have Anaconda installed, here are the instructions.

If you want to compile with CUDA support, install

If you want to disable CUDA support, export the environment variable USE_CUDA=0. Other potentially useful environment variables may be found in setup.py.

If you are building for NVIDIA's Jetson platforms (Jetson Nano, TX1, TX2, AGX Xavier), Instructions to install PyTorch for Jetson Nano are available here

If you want to compile with ROCm support, install

  • AMD ROCm 4.0 and above installation
  • ROCm is currently supported only for Linux systems.

If you want to disable ROCm support, export the environment variable USE_ROCM=0. Other potentially useful environment variables may be found in setup.py.

Install Dependencies

Common

conda install astunparse numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing_extensions future six requests dataclasses

On Linux

# CUDA only: Add LAPACK support for the GPU if needed
conda install -c pytorch magma-cuda110  # or the magma-cuda* that matches your CUDA version from https://anaconda.org/pytorch/repo

On MacOS

# Add these packages if torch.distributed is needed
conda install pkg-config libuv

On Windows

# Add these packages if torch.distributed is needed.
# Distributed package support on Windows is a prototype feature and is subject to changes.
conda install -c conda-forge libuv=1.39

Get the PyTorch Source

git clone --recursive https://github.com/pytorch/pytorch
cd pytorch
# if you are updating an existing checkout
git submodule sync
git submodule update --init --recursive --jobs 0

Install PyTorch

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py install

Note that if you are compiling for ROCm, you must run this command first:

python tools/amd_build/build_amd.py

Note that if you are using Anaconda, you may experience an error caused by the linker:

build/temp.linux-x86_64-3.7/torch/csrc/stub.o: file not recognized: file format not recognized
collect2: error: ld returned 1 exit status
error: command 'g++' failed with exit status 1

This is caused by ld from Conda environment shadowing the system ld. You should use a newer version of Python that fixes this issue. The recommended Python version is 3.7.6+ and 3.8.1+.

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install

CUDA is not supported on macOS.

On Windows

Choose Correct Visual Studio Version.

Sometimes there are regressions in new versions of Visual Studio, so it's best to use the same Visual Studio Version 16.8.5 as Pytorch CI's.

PyTorch CI uses Visual C++ BuildTools, which come with Visual Studio Enterprise, Professional, or Community Editions. You can also install the build tools from https://visualstudio.microsoft.com/visual-cpp-build-tools/. The build tools do not come with Visual Studio Code by default.

If you want to build legacy python code, please refer to Building on legacy code and CUDA

Build with CPU

It's fairly easy to build with CPU.

conda activate
python setup.py install

Note on OpenMP: The desired OpenMP implementation is Intel OpenMP (iomp). In order to link against iomp, you'll need to manually download the library and set up the building environment by tweaking CMAKE_INCLUDE_PATH and LIB. The instruction here is an example for setting up both MKL and Intel OpenMP. Without these configurations for CMake, Microsoft Visual C OpenMP runtime (vcomp) will be used.

Build with CUDA

NVTX is needed to build Pytorch with CUDA. NVTX is a part of CUDA distributive, where it is called "Nsight Compute". To install it onto already installed CUDA run CUDA installation once again and check the corresponding checkbox. Make sure that CUDA with Nsight Compute is installed after Visual Studio.

Currently, VS 2017 / 2019, and Ninja are supported as the generator of CMake. If ninja.exe is detected in PATH, then Ninja will be used as the default generator, otherwise, it will use VS 2017 / 2019.
If Ninja is selected as the generator, the latest MSVC will get selected as the underlying toolchain.

Additional libraries such as Magma, oneDNN, a.k.a MKLDNN or DNNL, and Sccache are often needed. Please refer to the installation-helper to install them.

You can refer to the build_pytorch.bat script for some other environment variables configurations

cmd

:: Set the environment variables after you have downloaded and upzipped the mkl package,
:: else CMake would throw an error as `Could NOT find OpenMP`.
set CMAKE_INCLUDE_PATH={Your directory}\mkl\include
set LIB={Your directory}\mkl\lib;%LIB%

:: Read the content in the previous section carefully before you proceed.
:: [Optional] If you want to override the underlying toolset used by Ninja and Visual Studio with CUDA, please run the following script block.
:: "Visual Studio 2019 Developer Command Prompt" will be run automatically.
:: Make sure you have CMake >= 3.12 before you do this when you use the Visual Studio generator.
set CMAKE_GENERATOR_TOOLSET_VERSION=14.27
set DISTUTILS_USE_SDK=1
for /f "usebackq tokens=*" %i in (`"%ProgramFiles(x86)%\Microsoft Visual Studio\Installer\vswhere.exe" -version [15^,17^) -products * -latest -property installationPath`) do call "%i\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=%CMAKE_GENERATOR_TOOLSET_VERSION%

:: [Optional] If you want to override the CUDA host compiler
set CUDAHOSTCXX=C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Tools\MSVC\14.27.29110\bin\HostX64\x64\cl.exe

python setup.py install

Adjust Build Options (Optional)

You can adjust the configuration of cmake variables optionally (without building first), by doing the following. For example, adjusting the pre-detected directories for CuDNN or BLAS can be done with such a step.

On Linux

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
python setup.py build --cmake-only
ccmake build  # or cmake-gui build

On macOS

export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}
MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py build --cmake-only
ccmake build  # or cmake-gui build

Docker Image

Using pre-built images

You can also pull a pre-built docker image from Docker Hub and run with docker v19.03+

docker run --gpus all --rm -ti --ipc=host pytorch/pytorch:latest

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

Building the image yourself

NOTE: Must be built with a docker version > 18.06

The Dockerfile is supplied to build images with CUDA 11.1 support and cuDNN v8. You can pass PYTHON_VERSION=x.y make variable to specify which Python version is to be used by Miniconda, or leave it unset to use the default.

make -f docker.Makefile
# images are tagged as docker.io/${your_docker_username}/pytorch

Building the Documentation

To build documentation in various formats, you will need Sphinx and the readthedocs theme.

cd docs/
pip install -r requirements.txt

You can then build the documentation by running make <format> from the docs/ folder. Run make to get a list of all available output formats.

If you get a katex error run npm install katex. If it persists, try npm install -g katex

Previous Versions

Installation instructions and binaries for previous PyTorch versions may be found on our website.

Getting Started

Three-pointers to get you started:

Resources

Communication

Releases and Contributing

PyTorch has a 90-day release cycle (major releases). Please let us know if you encounter a bug by filing an issue.

We appreciate all contributions. If you are planning to contribute back bug-fixes, please do so without any further discussion.

If you plan to contribute new features, utility functions, or extensions to the core, please first open an issue and discuss the feature with us. Sending a PR without discussion might end up resulting in a rejected PR because we might be taking the core in a different direction than you might be aware of.

To learn more about making a contribution to Pytorch, please see our Contribution page.

The Team

PyTorch is a community-driven project with several skillful engineers and researchers contributing to it.

PyTorch is currently maintained by Adam Paszke, Sam Gross, Soumith Chintala and Gregory Chanan with major contributions coming from hundreds of talented individuals in various forms and means. A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Koepf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

Note: This project is unrelated to hughperkins/pytorch with the same name. Hugh is a valuable contributor to the Torch community and has helped with many things Torch and PyTorch.

License

PyTorch has a BSD-style license, as found in the LICENSE file.