Use aten's GRAIN_SIZE for TH Tensor ops (#28770)

Summary:
Fixes https://github.com/pytorch/pytorch/issues/28198 in my tests on a 24 core AMD threadripper.

Profiling the benchmark showed that most of the slowdown in https://github.com/pytorch/pytorch/issues/28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size.

Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before.

Here are the timing results I get:

| Version    | Full iteration time | `index_select` | `mm`       | `addmm`    |
|:----------:|---------------:|-------------:|---------:|---------:|
| master     | 3505.85 ms/it  | 184.302 ms   | 9.520 ms | 8.494 ms |
| no scaling | 3453.18 ms/it  |   184.456 ms | 5.810 ms | 5.069 ms |
| this PR    | 3453.23 ms/it  |   184.526 ms | 5.824 ms | 5.202 ms |
Pull Request resolved: https://github.com/pytorch/pytorch/pull/28770

Differential Revision: D18202646

Pulled By: ezyang

fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854
diff --git a/aten/src/TH/generic/THTensorApply.hpp b/aten/src/TH/generic/THTensorApply.hpp
index 579a0c3..2ec2a48 100644
--- a/aten/src/TH/generic/THTensorApply.hpp
+++ b/aten/src/TH/generic/THTensorApply.hpp
@@ -4,10 +4,10 @@
   #define NAN (nan(NULL))
 #endif
 
-#define HYPER_TH_OMP_OVERHEAD_THRESHOLD 2000
-#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD 20000
-#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD 50000
-#define TH_OMP_OVERHEAD_THRESHOLD 100000
+#define HYPER_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 16)
+#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 4)
+#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 2)
+#define TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE)
 
 #define TH_CHECK_SAME_SIZE(TENSOR1, TENSOR2) \
 { \