mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Before this change there were two ways for machines to rendezvous for a distributed run: shared file system or Redis. If you're using an MPI cluster it is much more convenient to simply execute mpirun and expect the "right thing (tm)" to happen. This change adds the "mpi_rendezvous" option to the CreateCommonWorld operator. If this is set, the common world size and rank will be pulled from the MPI context and Gloo rendezvous takes place using MPI. Note that this does NOT mean the MPI BTL is used; MPI is only used for rendezvous. Closes https://github.com/caffe2/caffe2/pull/1190 Reviewed By: akyrola Differential Revision: D5796060 Pulled By: pietern fbshipit-source-id: f8276908d3f3afef2ac88594ad377e38c17d0226 |
||
|---|---|---|
| .. | ||
| char_rnn.py | ||
| lmdb_create_example.py | ||
| resnet50_trainer.py | ||