mirror of https://github.com/saymrwulf/pytorch.git synced 2026-05-14 20:57:59 +00:00

Tensors and Dynamic neural networks in Python with strong GPU acceleration

Find a file

Wei Zhang 1d4e996b87 Separate parameter downloading tasks from training tasks and run them in a different group Summary: At the end of distributed training, trainer needs to download the parameters back from parameter servers for saving the model. Currently, this parameter downloading happens at the end of job's epoch task group, which creates several problems when checkpointing is enabled for distributed training: 1. When checkpointing is enabled, we run multiple training epochs. At the end of each epoch, the model download tasks will run to collect parameters, but we won't save the model until the true end of training, so there is a big waste of resource. 2. After trainer0 downloads the parameters, these parameters take a lot of memory, so trainer0 can easily run out of memory in the next epoch of training. Our solution is to insert a parameter download task group between the job's training epoch_group and the job's exit_group. Reviewed By: azzolini Differential Revision: D6765393 fbshipit-source-id: 5a4f556fc3c1cd7834a7c406a3c0de3fccd50c49		2018-01-22 14:04:12 -08:00
.github	Add placeholders for issues/pull requests	2017-12-11 14:35:25 -08:00
.jenkins	Semi-automatically generate scripts out of our tutorials	2018-01-19 22:36:47 -08:00
.travis	Run build_android.sh in Jenkins	2017-11-21 15:53:38 -08:00
caffe/proto	cmake: relative paths for install()	2017-08-22 09:52:09 -07:00
caffe2	Separate parameter downloading tasks from training tasks and run them in a different group	2018-01-22 14:04:12 -08:00
cmake	Checking performance flags during init.	2018-01-22 14:04:11 -08:00
conda	Adapting conda build to work for ubuntu and adding a flag to control precedence of Anaconda include dirs	2018-01-11 12:01:04 -08:00
docker	Add doxygen and graphviz to Jenkins docker base.	2018-01-19 15:05:45 -08:00
docs	Build doxygen docs with cmake and fix catalog generation	2018-01-18 18:47:59 -08:00
modules	Enable the detectron module in cmake	2018-01-18 10:21:22 -08:00
scripts	Adding a separate script for anaconda builds	2018-01-18 16:03:45 -08:00
third_party	Bump gloo	2018-01-04 17:49:21 -08:00
.gitattributes	Fix linguist detection with gitattribute overrides	2017-10-23 17:03:07 -07:00
.gitignore	Misc Windows lint	2017-12-23 20:07:27 -08:00
.gitmodules	Adding zstd to build	2017-11-13 22:18:44 -08:00
.travis.yml	disable travis webhook as we are moving to jenkins as CI	2018-01-02 14:42:15 -08:00
appveyor.yml	Fix a few typos and grammars in comment	2017-06-14 18:22:39 -07:00
CMakeLists.txt	Checking performance flags during init.	2018-01-22 14:04:11 -08:00
LICENSE	Re-license to Apache	2017-09-28 16:22:00 -07:00
Makefile
NOTICE	Re-license to Apache	2017-09-28 16:22:00 -07:00
README.md	Remove request for proposal link from README.md	2018-01-04 09:11:05 -08:00
release-notes.md
setup.py	OSError will be raised in setup.py if "git" is not installed	2018-01-22 14:04:10 -08:00
VERSION_NUMBER	Add setup.py	2017-11-17 12:22:52 -08:00

README.md

Caffe2

Caffe2 is a lightweight, modular, and scalable deep learning framework. Building on the original Caffe, Caffe2 is designed with expression, speed, and modularity in mind.

Questions and Feedback

Please use Github issues (https://github.com/caffe2/caffe2/issues) to ask questions, report bugs, and request new features.

Please participate in our survey (https://www.surveymonkey.com/r/caffe2). We will send you information about new releases and special developer events/webinars.

License

Caffe2 is released under the Apache 2.0 license. See the NOTICE file for details.

README.md

Caffe2

Questions and Feedback

License

Further Resources on Caffe2.ai