Example workflow for running disributed (syncsgd) imagenet training in Flow

Summary:
This diff introduces a simplified Imagenet trainer that uses data_parallel_model to parallellize training over GPUs and Nodes in synchronous manner. Flow's gang scheduling is used to launch the nodes, and data_parallel_model handles the synchronization among the gang members.

This example also uses the operator-per-epoch model where each epoch produces a checkpoint consumed by the followup epoch.

Reviewed By: salexspb

Differential Revision: D4223384

fbshipit-source-id: 8c2c73f4f6b2fdadb98511075ebbd8426c91eadb
1 file changed
tree: 9b977558bc66bb31c2c683d06691cea87dbaf4f4
  1. caffe/
  2. caffe2/
  3. docs/
  4. third_party/
  5. .Doxyfile
  6. .gitignore
  7. .gitmodules
  8. build.py
  9. build_android.py
  10. build_android_prepare.py
  11. LICENSE
  12. Makefile
  13. README.md
README.md

Caffe2

Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.

Read the installation instructions for installation details.

License and Citation

Caffe2 is released under the BSD 2-Clause license.