optimising in field processing using gpu s
play

Optimising In- Field Processing using GPUs Tarik Saidani Senior - PowerPoint PPT Presentation

Optimising In- Field Processing using GPUs Tarik Saidani Senior Software Engineer, PGS Peng Wang DevTech, Nvidia From a Seismic Acquisition Survey To a High Resolution Image of The Sea Subsurface Problem: Source and Receiver Ghost


  1. Optimising In- Field Processing using GPU’s Tarik Saidani Senior Software Engineer, PGS Peng Wang DevTech, Nvidia

  2. From a Seismic Acquisition Survey

  3. To a High Resolution Image of The Sea Subsurface

  4. Problem: Source and Receiver Ghost receiver ghost sea surface source ghost source ghost ray-path cable (receiver) ghost ray-path direct ray-path direct ray-path far field

  5. A Ghost Free Marine Acquisition System 5

  6. Solution: Dual Sensor Streamer Acquisition • Method – Combine pressure and velocity sensors in a solid streamer – Use the complementary ghost patterns of the two sensors to remove the receiver ghost – Tow the dual sensor streamer deep for low frequency content • Result – The bandwidth of the data is increased for both low and high frequencies when compared with conventional streamer data – There is better low frequency penetration of the source signal – The acquisition method is less sensitive to weather conditions 6

  7. The Big Data Challenge In the Seismic Business 1995 6 streamers 2005 16 streamers 2015 24 streamers

  8. Seismic Acquisition Data Volumes • A typical streamer is 8000 meters long and contains 1280 receivers • Data is recorded in time chunks or as continuous series, 2ms sample interval, generating 500 samples per second per receiver • A streamer (single sensor) generates 640,000 samples per second • One streamer spread (10 streamers) generates 6,400,000 samples per second • Big spread, 20 streamers, dual sensor will generates 25,600,000 samples per second • Typical acquisition can generate multi-TBs of data per day

  9. 3D Wave-field Separation Workflow upsampling 16% Receiver 84 % Deghosting 100 % 9

  10. Getting the Best Possible Image from the Early Stages Streamer-wise Wavefield Separation 3D Wavefield Separation 10

  11. Upsampling • Iterative process • Frequency domain (wavenumber) • Not enough parallelism in inner loop (few thousands of threads) • Window parallelism not exposed in the CPU code • Loop restructuring to expose window parallelism • After the code change enough parallelism for the GPU (millions of threads) 11

  12. Receiver Deghosting • Large volume of data (hydrophone and geosensor data) • Frequency domain computations • Parallelism over traces and frequency samples • Fairly straightforward parallel code • Parallelism available at many loop levels on a large number of iterations 12

  13. Infield Constraints • Although she looks big in the picture the ship has very limited space to host a compute cluster • Power and cooling are also limited on-board a vessel • A CPU based solution was considered but was quickly discarded because of the constraints described above 13

  14. But Also Facing Up to New Realities …

  15. Phase 1: Getting the Most of the CPU Cycles • CPU code profiling and analysis • Hotspot analysis showed that not much could be improved • The vectorizer was not doing a great job, had to write vector intrinsics … • Reached an upper bound in terms of CPU performance … not enough! 15

  16. Phase 2: What Can We Do Next? • Parallelism already present at different levels: thread, process, vectorization … • We can not rely on increasing the CPU core count because of the above constraints • GPU accelerators were the most obvious way forward • GPU prototype code: – Ported the streamerwise degohsting code to the GPU – 25x speedup compared to the single core CPU (Haswell) – 7x speedup on the entire flow: interesting … 16

  17. Phase 3: The Bigger Picture • The streamerwise deghosting code having been ported to the GPU, upsampling was the new hotspot in the flow • Two parallel development branches: – Porting the upsampling code to the GPU – Porting the 3D deghosting code to the GPU • At the end of this phase the processing flow was 15x faster • In the meantime an additional processing step was added to the deghosting code: extrapolation • It increased the runtime and changed the application profile ( 50% of the runtime). 17

  18. Phase 4: Putting it All Together • Ported the extrapolation to the GPU • Very similar compute kernel to the upsampling • The first benchmarks showed a throughput that was 40x faster than 1 CPU core • After running more production like tests we achieved an impressive 100x ! 18

  19. Hardware Footprint CPU based 20:1 GPU based Nvidia Tesla K80 19

  20. Summary • Wavefield separation is a fundamental step in marine data acquisition and processing • It is a very demanding process in terms of compute power • Infield constraints discard large scale systems • In order to deliver an acceptable throughput within an acceptable footprint the only viable solution is GPU based • The final result showed an impressive throughput along with a very small footprint • It Improves the geophysical quality of PGS field acquisition deliverable • Real-time 3D processing of data during acquisition • GPU deployment started on vessels in Q1 2016 20

  21. Titan Class Tethys, Now With GPU- Based “3D Wavefield Separation A ppliance”

  22. Acknowledgment Peng Wang Ty Mckercher Ken Hester 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend