PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun - - PowerPoint PPT Presentation

parallel silence coding
SMART_READER_LITE
LIVE PREVIEW

PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun - - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun Dai BGP International Inc, R&D Center April 5, 2016 1 Silence Encoding 2 Algorithm Conversion from Serial CONTENTS to Parallel


slide-1
SLIDE 1

April 4-7, 2016 | Silicon Valley

John Cheng and Nanxun Dai

BGP International Inc, R&D Center April 5, 2016

PARALLEL SILENCE CODING ALGORITHMS ON GPUS

slide-2
SLIDE 2

2

CONTENTS

Silence Encoding Algorithm Conversion from Serial to Parallel Implementation on GPUS with CUB 1 2 3

slide-3
SLIDE 3

3

 Wavelet Transformation  Quantization  Prefix Encoding  Silence Encoding  Huffman Encoding SEISMIC DATA COMPRESSION ALGORITHM

slide-4
SLIDE 4

4

A typical data in

wave propagation

slide-5
SLIDE 5

5

AN ILLUSTRATION OF SILENCE ENCODING

a pair data: (zero, its length)

slide-6
SLIDE 6

6

 Which thread has a right to write  Where should the thread write to  What should the thread write

HOW TO MAKE IT RUN IN PARALLEL

slide-7
SLIDE 7

7

 How may zero elements before it  How may zero segments before it

FOR A NON-ZERO-THREAD

slide-8
SLIDE 8

8

 How may zero elements before it  How may zero segments before it  How may zero elements in its own zero segment

FOR A ZERO-THREAD

slide-9
SLIDE 9

9

 Prefix Scan might be considered as a key primitive in parallel computation  All the information we need can be calculated in parallel by using Prefix-Sum  Therefore, we can convert the algorithm from serial to parallel PREFIX SCAN

slide-10
SLIDE 10

10

partial sum till index 5

THE ILLUSTRATION OF PREFIX SCAN

slide-11
SLIDE 11

11

CALCULATING PRECEDING ZERO ELEMENTS

Inclusive Prefix-Sum Auxiliary variable

slide-12
SLIDE 12

12

CALCULATING PRECEDING ZERO SEGMENTS

Exclusive Prefix-Sum Auxiliary variable

slide-13
SLIDE 13

13

Step 1: Read global data to shared memory Step 2: Calculate preceding zero elements with inclusive prefix-sum Step 3: Calculate preceding zero segments with exclusive prefix-sum Step 4: Calculate write positions for each thread Step 5: Write the encoded string to shared memory Step 6: Write the encoded string to global memory

PARALLEL SILENCE ENCODING ALGORITHM

slide-14
SLIDE 14

14

 Warp-wide primitives  Block-wide primitives  Device-wide primitives

More info: https://nvlabs.github.io/cub/

IMPLEMENTATION WITH CUB

slide-15
SLIDE 15

15

HOW TO WRAP CUB PRIMITIVES

__device__ __forceinline__ void cub_prefix_sum_exclusive (char in, char& out, char& aggregate) { typedef BlockScan <char, DIM, BLOCK_SCAN_RAKING> BlockScanT; typename BlockScanT::TempStorage __shared__ iscan; char data[1]; data[0] = in; __syncthreads(); BlockScanT(iscan).ExclusiveSum (data, data, aggregate); __syncthreads();

  • ut = data[0];

}

slide-16
SLIDE 16

16

500 1000 1500 2000 2500 1024x1024x1 1024x1024x100 1024x1024x1000

Naive Kernel Parallel Kernel

PERFORMANCE OF DIFFERENT KERNELS

Elapsed Time in ms Data Size

slide-17
SLIDE 17

17

323.81 303.05 352.57 270 280 290 300 310 320 330 340 350 360 LOCK_SCAN_RAKING BLOCK_SCAN_RAKING_MEMOIZE BLOCK_SCAN_WARP_SCANS

1024x1024x1000

DIFFERENT CUB ALGORITHMS

slide-18
SLIDE 18

18

 Prefix-sum is an efficient way to convert serial computations to parallel computations  It is convenient to integrate CUB parallel primitives into your implementation CONCLUSION

slide-19
SLIDE 19

19

Each subject in the book is treated with a profile- driven approach

slide-20
SLIDE 20

April 4-7, 2016 | Silicon Valley

THANK YOU

John Cheng and Nanxun Dai

BGP International Inc, R&D Center 10630 Haddington Dr., Houston, Texas 77043 rwcheng@bgprdc.com