Optimising In- Field Processing using GPUs Tarik Saidani Senior - - PowerPoint PPT Presentation

optimising in field processing using gpu s
SMART_READER_LITE
LIVE PREVIEW

Optimising In- Field Processing using GPUs Tarik Saidani Senior - - PowerPoint PPT Presentation

Optimising In- Field Processing using GPUs Tarik Saidani Senior Software Engineer, PGS Peng Wang DevTech, Nvidia From a Seismic Acquisition Survey To a High Resolution Image of The Sea Subsurface Problem: Source and Receiver Ghost


slide-1
SLIDE 1

Optimising In-Field Processing using GPU’s

Tarik Saidani

Senior Software Engineer, PGS

Peng Wang

DevTech, Nvidia

slide-2
SLIDE 2

From a Seismic Acquisition Survey

slide-3
SLIDE 3

To a High Resolution Image of The Sea Subsurface

slide-4
SLIDE 4

Problem: Source and Receiver Ghost

sea surface source far field direct ray-path direct ray-path ghost ray-path ghost ray-path cable (receiver) receiver ghost source ghost

slide-5
SLIDE 5

A Ghost Free Marine Acquisition System

5

slide-6
SLIDE 6
  • Method

– Combine pressure and velocity sensors in a solid streamer – Use the complementary ghost patterns of the two sensors to remove the receiver ghost – Tow the dual sensor streamer deep for low frequency content

  • Result

– The bandwidth of the data is increased for both low and high frequencies when compared with conventional streamer data – There is better low frequency penetration

  • f the source signal

– The acquisition method is less sensitive to weather conditions

Solution: Dual Sensor Streamer Acquisition

6

slide-7
SLIDE 7

The Big Data Challenge In the Seismic Business

1995 6 streamers 2005 16 streamers 2015 24 streamers

slide-8
SLIDE 8

Seismic Acquisition Data Volumes

  • A typical streamer is 8000 meters long and contains 1280 receivers
  • Data is recorded in time chunks or as continuous series, 2ms sample interval,

generating 500 samples per second per receiver

  • A streamer (single sensor) generates 640,000 samples per second
  • One streamer spread (10 streamers) generates 6,400,000 samples per second
  • Big spread, 20 streamers, dual sensor will generates 25,600,000 samples per

second

  • Typical acquisition can generate multi-TBs of data per day
slide-9
SLIDE 9

3D Wave-field Separation Workflow

9

upsampling Receiver Deghosting 84 % 16% 100 %

slide-10
SLIDE 10

Getting the Best Possible Image from the Early Stages

10

Streamer-wise Wavefield Separation 3D Wavefield Separation

slide-11
SLIDE 11

Upsampling

  • Iterative process
  • Frequency domain (wavenumber)
  • Not enough parallelism in inner loop (few thousands of threads)
  • Window parallelism not exposed in the CPU code
  • Loop restructuring to expose window parallelism
  • After the code change enough parallelism for the GPU (millions of threads)

11

slide-12
SLIDE 12

Receiver Deghosting

  • Large volume of data (hydrophone and geosensor data)
  • Frequency domain computations
  • Parallelism over traces and frequency samples
  • Fairly straightforward parallel code
  • Parallelism available at many loop levels on a large number of iterations

12

slide-13
SLIDE 13

Infield Constraints

13

  • Although she looks big in the

picture the ship has very limited space to host a compute cluster

  • Power and cooling are also

limited on-board a vessel

  • A CPU based solution was

considered but was quickly discarded because of the constraints described above

slide-14
SLIDE 14

But Also Facing Up to New Realities …

slide-15
SLIDE 15

Phase 1: Getting the Most of the CPU Cycles

  • CPU code profiling and analysis
  • Hotspot analysis showed that not much could be improved
  • The vectorizer was not doing a great job, had to write vector intrinsics …
  • Reached an upper bound in terms of CPU performance … not enough!

15

slide-16
SLIDE 16

Phase 2: What Can We Do Next?

  • Parallelism already present at different levels: thread, process, vectorization …
  • We can not rely on increasing the CPU core count because of the above

constraints

  • GPU accelerators were the most obvious way forward
  • GPU prototype code:

– Ported the streamerwise degohsting code to the GPU – 25x speedup compared to the single core CPU (Haswell) – 7x speedup on the entire flow: interesting …

16

slide-17
SLIDE 17

Phase 3: The Bigger Picture

  • The streamerwise deghosting code having been ported to the GPU, upsampling

was the new hotspot in the flow

  • Two parallel development branches:

– Porting the upsampling code to the GPU – Porting the 3D deghosting code to the GPU

  • At the end of this phase the processing flow was 15x faster
  • In the meantime an additional processing step was added to the deghosting code:

extrapolation

  • It increased the runtime and changed the application profile (50% of the runtime).

17

slide-18
SLIDE 18

Phase 4: Putting it All Together

  • Ported the extrapolation to the GPU
  • Very similar compute kernel to the upsampling
  • The first benchmarks showed a throughput that was 40x faster than 1 CPU core
  • After running more production like tests we achieved an impressive 100x!

18

slide-19
SLIDE 19

Hardware Footprint

19

CPU based GPU based

20:1 Nvidia Tesla K80

slide-20
SLIDE 20

Summary

  • Wavefield separation is a fundamental step in marine data acquisition and processing
  • It is a very demanding process in terms of compute power
  • Infield constraints discard large scale systems
  • In order to deliver an acceptable throughput within an acceptable footprint the only

viable solution is GPU based

  • The final result showed an impressive throughput along with a very small footprint
  • It Improves the geophysical quality of PGS field acquisition deliverable
  • Real-time 3D processing of data during acquisition
  • GPU deployment started on vessels in Q1 2016

20

slide-21
SLIDE 21

Titan Class Tethys, Now With GPU-Based “3D Wavefield Separation Appliance”

slide-22
SLIDE 22

Acknowledgment

Peng Wang Ty Mckercher Ken Hester

22