commit | ca3968c37377458af7fcf1cab02a52f83782689b | [log] [tgz] |
---|---|---|
author | Andreas Fidjeland <andreas.fidjeland@gmail.com> | Sat May 04 11:43:00 2013 +0100 |
committer | Andreas Fidjeland <andreas.fidjeland@gmail.com> | Fri May 10 17:35:02 2013 +0100 |
tree | 29671093e0c190e8c6f10c4a2215c4fbee8b2ef2 | |
parent | 88b54ce3794f20e1ca91db161cdf12f254c75d2a [diff] |
Added single-dimensional sum for CudaTensor i.e. summation of the form y:sum(x,1). This is supported for up to 4-dimensional tensors in a single kernel call. More dimensions could be added if needed by looping over this kernel. Internally two generic reduction kernels are used which reduce either the innermost or one of the outer dimensions. In either case global memory accesses are fully coelesced.