mirror of
https://github.com/saymrwulf/pytorch.git
synced 2026-05-15 21:00:47 +00:00
Summary: Use data_parallel_model for seq2seq multi-gpu training. The main reason for complexity here is that GatherOp hasn't yet been implemented on GPU. This diff also adds better cliping procedure - clip by global norm rather than by absolute value. Differential Revision: D4778691 fbshipit-source-id: bff184dae02ecc227413fef51f48a4726e5d3825 |
||
|---|---|---|
| .. | ||
| char_rnn.py | ||
| lmdb_create_example.py | ||
| resnet50_trainer.py | ||
| seq2seq.py | ||
| seq2seq_util.py | ||