AccelerateHS/accelerate

Performance of highly skewed multidimensional reductions

Open

#140 创建于 2013年12月19日

在 GitHub 查看
 (7 评论) (0 反应) (0 负责人)Haskell (830 star) (112 fork)batch import
good first issuehelp wantedllvm-ptx

描述

Performance of multidimensional reductions is not good when the array is highly skewed. For example, a fold where the number of columns is (innermost dimension) is very small. See also this thread:

https://groups.google.com/forum/#!topic/accelerate-haskell/KAFYUz4Sjsk

Multidimensional reduction uses one thread block per reduction; so an (Z :. m :. n) sized matrix uses m thread blocks. If n is very small, then many threads in the block sit idle. We could change this to a warp-per-reduction style, which is actually the strategy segmented fold uses. This will likely have a negative impact if m is small and n large.

It would be possible to generate both variants and choose dynamically which to execute. That implies compiling four kernels per reduction (because fusion; initial vs. recursive step).

贡献者指南

Performance of highly skewed multidimensional reductions · AccelerateHS/accelerate#140 | Good First Issue