parallel silence coding
play

PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun - PowerPoint PPT Presentation

April 4-7, 2016 | Silicon Valley PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun Dai BGP International Inc, R&D Center April 5, 2016 1 Silence Encoding 2 Algorithm Conversion from Serial CONTENTS to Parallel


  1. April 4-7, 2016 | Silicon Valley PARALLEL SILENCE CODING ALGORITHMS ON GPUS John Cheng and Nanxun Dai BGP International Inc, R&D Center April 5, 2016

  2. 1 Silence Encoding 2 Algorithm Conversion from Serial CONTENTS to Parallel Implementation on GPUS with CUB 3 2

  3. SEISMIC DATA COMPRESSION ALGORITHM Wavelet Transformation  Quantization  Prefix Encoding  Silence Encoding  Huffman Encoding  3

  4. A typical data in wave propagation 4

  5. AN ILLUSTRATION OF SILENCE ENCODING a pair data: ( zero, its length ) 5

  6. HOW TO MAKE IT RUN IN PARALLEL  Which thread has a right to write  Where should the thread write to  What should the thread write 6

  7. FOR A NON-ZERO-THREAD  How may zero elements before it  How may zero segments before it 7

  8. FOR A ZERO-THREAD  How may zero elements before it  How may zero segments before it  How may zero elements in its own zero segment 8

  9. PREFIX SCAN Prefix Scan might be considered as a key  primitive in parallel computation All the information we need can be  calculated in parallel by using Prefix-Sum Therefore, we can convert the algorithm  from serial to parallel 9

  10. THE ILLUSTRATION OF PREFIX SCAN partial sum till index 5 10

  11. CALCULATING PRECEDING ZERO ELEMENTS Auxiliary variable Inclusive Prefix-Sum 11

  12. CALCULATING PRECEDING ZERO SEGMENTS Auxiliary variable Exclusive Prefix-Sum 12

  13. PARALLEL SILENCE ENCODING ALGORITHM Step 1: Read global data to shared memory Step 2: Calculate preceding zero elements with inclusive prefix-sum Step 3: Calculate preceding zero segments with exclusive prefix-sum Step 4: Calculate write positions for each thread Step 5: Write the encoded string to shared memory Step 6: Write the encoded string to global memory 13

  14. IMPLEMENTATION WITH CUB Warp-wide primitives  Block-wide primitives  Device-wide primitives  More info: https://nvlabs.github.io/cub/ 14

  15. HOW TO WRAP CUB PRIMITIVES __device__ __forceinline__ void cub_prefix_sum_exclusive (char in, char& out, char& aggregate) { typedef BlockScan <char, DIM, BLOCK_SCAN_RAKING> BlockScanT; typename BlockScanT::TempStorage __shared__ iscan; char data[1]; data[0] = in; __syncthreads(); BlockScanT(iscan).ExclusiveSum (data, data, aggregate); __syncthreads(); out = data[0]; } 15

  16. PERFORMANCE OF DIFFERENT KERNELS Naive Kernel Parallel Kernel 2500 2000 Elapsed Time in ms 1500 1000 500 0 1024x1024x1 1024x1024x100 1024x1024x1000 Data Size 16

  17. DIFFERENT CUB ALGORITHMS 1024x1024x1000 BLOCK_SCAN_WARP_SCANS 352.57 BLOCK_SCAN_RAKING_MEMOIZE 303.05 LOCK_SCAN_RAKING 323.81 270 280 290 300 310 320 330 340 350 360 17

  18. CONCLUSION Prefix-sum is an efficient way to convert  serial computations to parallel computations It is convenient to integrate CUB parallel  primitives into your implementation 18

  19. Each subject in the book is treated with a profile- driven approach 19

  20. April 4-7, 2016 | Silicon Valley THANK YOU John Cheng and Nanxun Dai BGP International Inc, R&D Center 10630 Haddington Dr., Houston, Texas 77043 rwcheng@bgprdc.com

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend