Improved the performance of reductions on CUDA devices
2 files changed