strengthen gloo_test by checking for success

Summary: Weakness in gloo_test led to an embarrassing diff review (D5494956): my test "succeeded", alhough each of the workers failed hard in an assertion. This was not handled because there was no exception to be caught and put into the result queue. So change the logic to put a success-token into the queue, signaling successfully completion.

Reviewed By: pietern

Differential Revision: D5503760

fbshipit-source-id: f2415bcc55638595cefa5d64dea811d86e77f24d
diff --git a/caffe2/contrib/gloo/gloo_test.py b/caffe2/contrib/gloo/gloo_test.py
index 79841b0..4f937de 100755
--- a/caffe2/contrib/gloo/gloo_test.py
+++ b/caffe2/contrib/gloo/gloo_test.py
@@ -50,6 +50,7 @@
                 with core.DeviceScope(device_option):
                     fn(*args, **kwargs)
                     workspace.ResetWorkspace()
+                    queue.put(True)
             except Exception as ex:
                 queue.put(ex)
 
@@ -67,14 +68,19 @@
         while len(procs) > 0:
             proc = procs.pop(0)
             while proc.is_alive():
-                proc.join(1)
+                proc.join(10)
 
-                # Raise exception if we find any.
+                # Raise exception if we find any. Otherwise each worker
+                # should put a True into the queue
                 # Note that the following is executed ALSO after
                 # the last process was joined, so if ANY exception
                 # was raised, it will be re-raised here.
-                if not queue.empty():
-                    raise queue.get()
+                self.assertFalse(queue.empty(), "Job failed without a result")
+                o = queue.get()
+                if isinstance(o, Exception):
+                    raise o
+                else:
+                    self.assertTrue(o)
 
     def run_test_distributed(self, fn, device_option=None, **kwargs):
         comm_rank = os.getenv('COMM_RANK')