Fix consequtive checkpoint syncs

Summary: Switching to Pieter-MPI changed the way we setup network between operators. For syncronizing parameters after a checkpoint load, we run a checkpoint_net that contaiend operators for creating the common world and broadcast operators. Unfortunately this fails when the checkpoint sync is done a second time, because we would have created a duplicate common world. Solution is to separate common world op and broadcast op to init net and the actual broadcasting net, and we run the init net only once. This problem did not arise in the Flow version since I did only one checkpoint loading per operator (process).

Differential Revision: D4251754

fbshipit-source-id: ba030579e651e529e29bbf2d27920075078d8ff9
1 file changed
tree: ad27e92b88d824842bc9830e0ac98b2e6a9fa19e
  1. caffe/
  2. caffe2/
  3. docs/
  4. third_party/
  5. .Doxyfile
  6. .gitignore
  7. .gitmodules
  8. build.py
  9. build_android.py
  10. build_android_prepare.py
  11. LICENSE
  12. Makefile
  13. README.md
README.md

Caffe2

Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.

Read the installation instructions for installation details.

License and Citation

Caffe2 is released under the BSD 2-Clause license.