commit | 96a5e88d630b8d99c151329f018c0bda76c3a34a | [log] [tgz] |
---|---|---|
author | Aapo Kyrola <akyrola@fb.com> | Thu Dec 01 15:05:24 2016 -0800 |
committer | Bram Wasti <bwasti@dev11999.prn1.facebook.com> | Mon Dec 05 11:53:26 2016 -0800 |
tree | ad27e92b88d824842bc9830e0ac98b2e6a9fa19e | |
parent | 3125e6a821f00c402d6cd2b4beae9f9eb731ebee [diff] |
Fix consequtive checkpoint syncs Summary: Switching to Pieter-MPI changed the way we setup network between operators. For syncronizing parameters after a checkpoint load, we run a checkpoint_net that contaiend operators for creating the common world and broadcast operators. Unfortunately this fails when the checkpoint sync is done a second time, because we would have created a duplicate common world. Solution is to separate common world op and broadcast op to init net and the actual broadcasting net, and we run the init net only once. This problem did not arise in the Flow version since I did only one checkpoint loading per operator (process). Differential Revision: D4251754 fbshipit-source-id: ba030579e651e529e29bbf2d27920075078d8ff9
Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.
Read the installation instructions for installation details.
Caffe2 is released under the BSD 2-Clause license.