Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU
David Roger, Ulf Assarsson, Nicolas Holzschuch
Grenoble University Chalmers University
- f Technology
Cornell University
Efficient Stream Reduction on the GPU Efficient Stream Reduction on - - PowerPoint PPT Presentation
Efficient Stream Reduction on the GPU Efficient Stream Reduction on the GPU David Roger, Ulf Assarsson, Nicolas Holzschuch Grenoble Chalmers University Cornell University of Technology University Stream Reduction Removing unwanted elements
Grenoble University Chalmers University
Cornell University
2
Input stream Reduced stream
3
– Ray tracing – Collision detection
4
if x[j] is valid then
x[i]=x[j] i=i+1
5
– We will speak about scatter later
6
7
Input stream Reduced stream
1 1 1 1 2 3 3 4 4 5 5 5 6 6
Dichotomic search:
performs the displacements
Prefix sum Prefix sum scan:
computes the displacements
8
– Hillis and Steele, Horn: O(n log n) – Blelloch, Sengupta et al., Harris et al.: O(n) – Sengupta et al. Hybrid: O(n)
9
– NV_transform_feedback – Input stream: vertices in a VBO – Geometry shader discards NULL elements – Output stream: vertices in a VBO
– Slow
10
11
12
Input stream, split in blocks Reduced stream Reduction of the blocks Concatenation
13
– Prefix sum scan – Dichotomic search
– s: size of a block – One block: O(s log s) – n/s blocks: O(n log s)
14
– Computes displacements of
the blocks in parallel
– Segments extremities moved
by scattering (vertex engine)
– Other elements linearly
interpolated (rasterization)
15
Reduced stream Reduced blocks
16
Reduced stream Reduced blocks
17
Reduced stream Reduced blocks
18
– s is the size of the blocks – s is a constant !
19
Input stream, split in blocks Reduced stream Prefix sum scan + Dichotomic search Prefix sum scan + Line drawing
20
– Dichotomic search is avoided – Vertex engine: scatter ... but lesser efficiency
21
22
23
Input stream, split in blocks Reduced stream Prefix sum scan + Dichotomic search Prefix sum scan + Line drawing
24
Input stream, split in blocks Reduced stream Prefix sum scan + Dichotomic search Prefix sum scan + Line drawing
25
Input block Reduced block
1 1 1 1 2 3 4 4 5 5 5 6 6
Prefix sum
sum[j] = 3
Gather: At output position i Search j in input such as: i = j – sum[j] Search bounds: i+sum[i] ≤ j ≤ i+sum[15] Example: i = 5 6 ≤ j ≤ 11 Search result j = 8
3 ? 0 1 2 3 4 5 6 7 j=8 j=8 9 10 11 12 13 14 15 0 1 2 3 4 i=5 =5 6 7 8 9 10 11 12 13 14 15
26
while(found ≠ 0) { if (found < 0) lowBound = j else upBound = j j = (lowBound + upBound) / 2 found = j-sum[j]-i }
lowBound = i + sum[i] upBound = i + sum[n-1] if(upBound > n-1) discard j = (lowBound + upBound)/2 found = j-sum[j]-i
27
while(found ≠ 0) { if (found < 0) lowBound = j - found else upBound = j - found j = (lowBound + upBound) / 2 found = j-sum[j]-i }
lowBound = i + sum[i] upBound = i + sum[n-1] if(upBound > n-1) discard j = (lowBound + upBound)/2 found = j-sum[j]-i
Because j – sum[j] is contracting!
28
Input stream, split in blocks Reduced stream Prefix sum scan + Dichotomic search Prefix sum scan + Line drawing
29
Input stream, split in blocks Reduced stream Prefix sum scan + Dichotomic search Prefix sum scan + Line drawing
30
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
31
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
32
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
33
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
34
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
35
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
36
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
37
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
38
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
39
– Split all segments in two
– Use geometry engine to split only when necessary
Concatenation
40
41
42
43
44
45
46
47
48
49
Input stream, split in blocks Reduced stream Reduction of the blocks:
sum scan + search O(n log s)
sequential algo (loop over the block) O(n)
Concatenation:
50
– Sum scan (Harris et al. or Sengupta et al.) + scatter
– Expected speed up ≥ 2.5
51
– We don't compete with them, we use them !
– O(n) Vs O(n log n)
52