Diff - fe8804695b^! - platform/external/pytorch

commit	fe8804695bd5a72616a4088ca046af26fed11c16	[log] [tgz]
author	Peter Bell <peterbell10@live.co.uk>	Thu Oct 31 07:17:04 2019 -0700
committer	Facebook Github Bot <facebook-github-bot@users.noreply.github.com>	Thu Oct 31 07:18:46 2019 -0700
tree	85da95c45974f12d750c0887b275e5081ecc3877
parent	9630b78c49596299b7876ee32cbf1002ad326568 [diff]

Use aten's GRAIN_SIZE for TH Tensor ops (#28770) Summary: Fixes https://github.com/pytorch/pytorch/issues/28198 in my tests on a 24 core AMD threadripper. Profiling the benchmark showed that most of the slowdown in https://github.com/pytorch/pytorch/issues/28198 was from `THFloatTensor_fill` not being distributed across threads. It internally uses `TH_TENSOR_APPLY_CONTIG` which is a thin wrapper around `at::parallel_for` and uses `TH_OMP_OVERHEAD_THRESHOLD` or 100,000 as the grain size. Here I've changed it to use `at::internal::GRAIN_SIZE` which is 32,768 so ~1/3 of the old value. I think it makes sense to unify these two values so any future tuning in `ATen` will apply to `TH` as well. It's not entirely clear to me what the "uncertain", "ordin" and "hyper" variants are meant to represent but I've kept them at roughly the same ratio to `TH_OMP_OVERHEAD_THRESHOLD` as before. Here are the timing results I get: | Version | Full iteration time | `index_select` | `mm` | `addmm` | |:----------:|---------------:|-------------:|---------:|---------:| | master | 3505.85 ms/it | 184.302 ms | 9.520 ms | 8.494 ms | | no scaling | 3453.18 ms/it | 184.456 ms | 5.810 ms | 5.069 ms | | this PR | 3453.23 ms/it | 184.526 ms | 5.824 ms | 5.202 ms | Pull Request resolved: https://github.com/pytorch/pytorch/pull/28770 Differential Revision: D18202646 Pulled By: ezyang fbshipit-source-id: ab30e5ef24e62213f9bd3abace5c6442c75c9854

diff --git a/aten/src/TH/generic/THTensorApply.hpp b/aten/src/TH/generic/THTensorApply.hpp index 579a0c3..2ec2a48 100644 --- a/aten/src/TH/generic/THTensorApply.hpp +++ b/aten/src/TH/generic/THTensorApply.hpp

@@ -4,10 +4,10 @@ #define NAN (nan(NULL)) #endif -#define HYPER_TH_OMP_OVERHEAD_THRESHOLD 2000 -#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD 20000 -#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD 50000 -#define TH_OMP_OVERHEAD_THRESHOLD 100000 +#define HYPER_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 16) +#define ORDIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 4) +#define UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE / 2) +#define TH_OMP_OVERHEAD_THRESHOLD (at::internal::GRAIN_SIZE) #define TH_CHECK_SAME_SIZE(TENSOR1, TENSOR2) \ { \