Example workflow for running disributed (syncsgd) imagenet training in Flow

Summary:
This diff introduces a simplified Imagenet trainer that uses data_parallel_model to parallellize training over GPUs and Nodes in synchronous manner. Flow's gang scheduling is used to launch the nodes, and data_parallel_model handles the synchronization among the gang members.

This example also uses the operator-per-epoch model where each epoch produces a checkpoint consumed by the followup epoch.

Reviewed By: salexspb

Differential Revision: D4223384

fbshipit-source-id: 8c2c73f4f6b2fdadb98511075ebbd8426c91eadb
diff --git a/caffe2/python/data_parallel_model.py b/caffe2/python/data_parallel_model.py
index 44fb524..5b75f94 100644
--- a/caffe2/python/data_parallel_model.py
+++ b/caffe2/python/data_parallel_model.py
@@ -231,7 +231,7 @@
         comm_world = net.CreateCommonWorld(
             rendezvous['kv_handler'],
             "iter_cw",
-            name="iter_cw_op",
+            name=net.Proto().name + ".iter_cw_op",
             size=rendezvous['num_shards'],
             rank=rendezvous['shard_id'],
             engine=rendezvous['engine'],
@@ -252,7 +252,7 @@
             comm_world = net.CreateCommonWorld(
                 rendezvous['kv_handler'],
                 "{}_cw".format(param_name),
-                name="{}_cw_op".format(param_name),
+                name=net.Proto().name + ".{}_cw_op".format(param_name),
                 size=rendezvous['num_shards'],
                 rank=rendezvous['shard_id'],
                 engine=rendezvous['engine'],