[docs] [testing] distributed training (#7993)

* distributed training * fix * fix formatting * wording
2026-05-14 20:58:08 +00:00 · 2020-10-26 05:15:05 -07:00 · 2020-10-26 05:15:05 -07:00 · 101186bc1f
commit 101186bc1f
parent c153bcc5c8
1 changed files with 18 additions and 0 deletions
--- a/docs/source/testing.rst
+++ b/docs/source/testing.rst
@ -451,6 +451,24 @@ Inside tests:


   
+Distributed training
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+``pytest`` can't deal with distributed training directly. If this is attempted - the sub-processes don't do the right thing and end up thinking they are ``pytest`` and start running the test suite in loops. It works, however, if one spawns a normal process that then spawns off multiple workers and manages the IO pipes.
+
+This is still under development but you can study 2 different tests that perform this successfully:
+
+* `test_seq2seq_examples_multi_gpu.py <https://github.com/huggingface/transformers/blob/master/examples/seq2seq/test_seq2seq_examples_multi_gpu.py>`__ - a ``pytorch-lightning``-running test (had to use PL's ``ddp`` spawning method which is the default) 
+* `test_finetune_trainer.py <https://github.com/huggingface/transformers/blob/master/examples/seq2seq/test_finetune_trainer.py>`__ - a normal (non-PL) test
+
+To jump right into the execution point, search for the ``execute_async_std`` function in those tests.
+
+You will need at least 2 GPUs to see these tests in action:
+
+.. code-block:: bash
+
+   CUDA_VISIBLE_DEVICES="0,1" RUN_SLOW=1 pytest -sv examples/seq2seq/test_finetune_trainer.py \
+   examples/seq2seq/test_seq2seq_examples_multi_gpu.py


 Output capture