SLIDE 22 Streaming-Oriented Parallelization of Domain-Independent Irregular Kernels University of A Coruña, Spain 16 / 25
Irregular reduction
1 2 5 6 3 3 1 2 3 6 5 inspector indirection
1 indirection level inspector
4
2 indirection levels inspector
1 2 3 4 6 5 inspector 1 2 3 5 5 6 7 posiciones 1 2 5 6 3 3 indirection
Although many problems have a well balanced data distribution, when it is uneven and a few elements receive most of the writes, this implementation wastes some space. An alternative to make a more efficient usage of the space is to use two arrays, one of them contiguously stores all the inspector data, and the other points to the address where the data
The problem of this second approach is that it requires an additional indirection level, and according to our tests this offers worse GPU performance.