Add __syncthreads() between CUB reductions for elementwise linear gradient kernel Summary: Thanks to ezyang, now I know that if a CUB tempstorage is reused, a thread sync is needed. So added this to the elementwise linear gradient kernel. Reviewed By: wickedfoo, ezyang Differential Revision: D4949250 fbshipit-source-id: fbbbd336a962a51be43784207105cadd391a8ef2

commit: da567dcb38b546de7f83058155cfb83b5157af6b [log] [tgz]
author: Aapo Kyrola <akyrola@fb.com> Tue Apr 25 17:25:55 2017 -0700
committer: Facebook Github Bot <facebook-github-bot@users.noreply.github.com> Tue Apr 25 17:32:18 2017 -0700
tree: 1d4146d1da094d3daeae6a09456f5c6edeea29d0
parent: ef2701a57e689f5359b3b5046f1dc8a99ea893df [diff]
diff --git a/caffe2/operators/elementwise_linear_op.cu b/caffe2/operators/elementwise_linear_op.cu
index ebd9ead..18efb71 100644
--- a/caffe2/operators/elementwise_linear_op.cu
+++ b/caffe2/operators/elementwise_linear_op.cu

@@ -37,6 +37,7 @@
   __shared__ typename BlockReduce::TempStorage temp_storage;
 
   float g_a_sum_tot = BlockReduce(temp_storage).Sum(g_a_sum);
+  __syncthreads();
   float g_b_sum_tot = BlockReduce(temp_storage).Sum(g_b_sum);
 
   if (threadIdx.x == 0) {
commit	da567dcb38b546de7f83058155cfb83b5157af6b	[log] [tgz]
author	Aapo Kyrola <akyrola@fb.com>	Tue Apr 25 17:25:55 2017 -0700
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>	Tue Apr 25 17:32:18 2017 -0700
tree	1d4146d1da094d3daeae6a09456f5c6edeea29d0
parent	ef2701a57e689f5359b3b5046f1dc8a99ea893df [diff]