Fix batch norm multiplier init (#12325)

Summary:
Fixes #12259
Pull Request resolved: https://github.com/pytorch/pytorch/pull/12325

Differential Revision: D10203439

Pulled By: SsnL

fbshipit-source-id: 999cc134a45e2554313adb7eb93ee98e1f84335f
diff --git a/torch/nn/modules/batchnorm.py b/torch/nn/modules/batchnorm.py
index deb280a..393e44b 100644
--- a/torch/nn/modules/batchnorm.py
+++ b/torch/nn/modules/batchnorm.py
@@ -43,7 +43,7 @@
     def reset_parameters(self):
         self.reset_running_stats()
         if self.affine:
-            init.uniform_(self.weight)
+            init.ones_(self.weight)
             init.zeros_(self.bias)
 
     def _check_input_dim(self, input):
@@ -97,8 +97,8 @@
 
     The mean and standard-deviation are calculated per-dimension over
     the mini-batches and :math:`\gamma` and :math:`\beta` are learnable parameter vectors
-    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are sampled
-    from :math:`\mathcal{U}(0, 1)` and the elements of :math:`\beta` are set to 0.
+    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are set
+    to 1 and the elements of :math:`\beta` are set to 0.
 
     Also by default, during training this layer keeps running estimates of its
     computed mean and variance, which are then used for normalization during
@@ -169,8 +169,8 @@
 
     The mean and standard-deviation are calculated per-dimension over
     the mini-batches and :math:`\gamma` and :math:`\beta` are learnable parameter vectors
-    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are sampled
-    from :math:`\mathcal{U}(0, 1)` and the elements of :math:`\beta` are set to 0.
+    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are set
+    to 1 and the elements of :math:`\beta` are set to 0.
 
     Also by default, during training this layer keeps running estimates of its
     computed mean and variance, which are then used for normalization during
@@ -241,8 +241,8 @@
 
     The mean and standard-deviation are calculated per-dimension over
     the mini-batches and :math:`\gamma` and :math:`\beta` are learnable parameter vectors
-    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are sampled
-    from :math:`\mathcal{U}(0, 1)` and the elements of :math:`\beta` are set to 0.
+    of size `C` (where `C` is the input size). By default, the elements of :math:`\gamma` are set
+    to 1 and the elements of :math:`\beta` are set to 0.
 
     Also by default, during training this layer keeps running estimates of its
     computed mean and variance, which are then used for normalization during