April 4-7, 2016 | Silicon Valley
John Cheng and Nanxun Dai
BGP International Inc, R&D Center April 5, 2016
PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun - - PowerPoint PPT Presentation
April 4-7, 2016 | Silicon Valley PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun Dai BGP International Inc, R&D Center April 5, 2016 1 Silence Encoding 2 Algorithm Conversion from Serial CONTENTS to Parallel
April 4-7, 2016 | Silicon Valley
BGP International Inc, R&D Center April 5, 2016
2
3
4
5
a pair data: (zero, its length)
6
Which thread has a right to write Where should the thread write to What should the thread write
7
How may zero elements before it How may zero segments before it
8
How may zero elements before it How may zero segments before it How may zero elements in its own zero segment
9
10
partial sum till index 5
11
Inclusive Prefix-Sum Auxiliary variable
12
Exclusive Prefix-Sum Auxiliary variable
13
14
More info: https://nvlabs.github.io/cub/
15
__device__ __forceinline__ void cub_prefix_sum_exclusive (char in, char& out, char& aggregate) { typedef BlockScan <char, DIM, BLOCK_SCAN_RAKING> BlockScanT; typename BlockScanT::TempStorage __shared__ iscan; char data[1]; data[0] = in; __syncthreads(); BlockScanT(iscan).ExclusiveSum (data, data, aggregate); __syncthreads();
}
16
500 1000 1500 2000 2500 1024x1024x1 1024x1024x100 1024x1024x1000
Naive Kernel Parallel Kernel
Elapsed Time in ms Data Size
17
323.81 303.05 352.57 270 280 290 300 310 320 330 340 350 360 LOCK_SCAN_RAKING BLOCK_SCAN_RAKING_MEMOIZE BLOCK_SCAN_WARP_SCANS
1024x1024x1000
18
19
April 4-7, 2016 | Silicon Valley
BGP International Inc, R&D Center 10630 Haddington Dr., Houston, Texas 77043 rwcheng@bgprdc.com