SLIDE 1
Optimising In- Field Processing using GPUs Tarik Saidani Senior - - PowerPoint PPT Presentation
Optimising In- Field Processing using GPUs Tarik Saidani Senior - - PowerPoint PPT Presentation
Optimising In- Field Processing using GPUs Tarik Saidani Senior Software Engineer, PGS Peng Wang DevTech, Nvidia From a Seismic Acquisition Survey To a High Resolution Image of The Sea Subsurface Problem: Source and Receiver Ghost
SLIDE 2
SLIDE 3
To a High Resolution Image of The Sea Subsurface
SLIDE 4
Problem: Source and Receiver Ghost
sea surface source far field direct ray-path direct ray-path ghost ray-path ghost ray-path cable (receiver) receiver ghost source ghost
SLIDE 5
A Ghost Free Marine Acquisition System
5
SLIDE 6
- Method
– Combine pressure and velocity sensors in a solid streamer – Use the complementary ghost patterns of the two sensors to remove the receiver ghost – Tow the dual sensor streamer deep for low frequency content
- Result
– The bandwidth of the data is increased for both low and high frequencies when compared with conventional streamer data – There is better low frequency penetration
- f the source signal
– The acquisition method is less sensitive to weather conditions
Solution: Dual Sensor Streamer Acquisition
6
SLIDE 7
The Big Data Challenge In the Seismic Business
1995 6 streamers 2005 16 streamers 2015 24 streamers
SLIDE 8
Seismic Acquisition Data Volumes
- A typical streamer is 8000 meters long and contains 1280 receivers
- Data is recorded in time chunks or as continuous series, 2ms sample interval,
generating 500 samples per second per receiver
- A streamer (single sensor) generates 640,000 samples per second
- One streamer spread (10 streamers) generates 6,400,000 samples per second
- Big spread, 20 streamers, dual sensor will generates 25,600,000 samples per
second
- Typical acquisition can generate multi-TBs of data per day
SLIDE 9
3D Wave-field Separation Workflow
9
upsampling Receiver Deghosting 84 % 16% 100 %
SLIDE 10
Getting the Best Possible Image from the Early Stages
10
Streamer-wise Wavefield Separation 3D Wavefield Separation
SLIDE 11
Upsampling
- Iterative process
- Frequency domain (wavenumber)
- Not enough parallelism in inner loop (few thousands of threads)
- Window parallelism not exposed in the CPU code
- Loop restructuring to expose window parallelism
- After the code change enough parallelism for the GPU (millions of threads)
11
SLIDE 12
Receiver Deghosting
- Large volume of data (hydrophone and geosensor data)
- Frequency domain computations
- Parallelism over traces and frequency samples
- Fairly straightforward parallel code
- Parallelism available at many loop levels on a large number of iterations
12
SLIDE 13
Infield Constraints
13
- Although she looks big in the
picture the ship has very limited space to host a compute cluster
- Power and cooling are also
limited on-board a vessel
- A CPU based solution was
considered but was quickly discarded because of the constraints described above
SLIDE 14
But Also Facing Up to New Realities …
SLIDE 15
Phase 1: Getting the Most of the CPU Cycles
- CPU code profiling and analysis
- Hotspot analysis showed that not much could be improved
- The vectorizer was not doing a great job, had to write vector intrinsics …
- Reached an upper bound in terms of CPU performance … not enough!
15
SLIDE 16
Phase 2: What Can We Do Next?
- Parallelism already present at different levels: thread, process, vectorization …
- We can not rely on increasing the CPU core count because of the above
constraints
- GPU accelerators were the most obvious way forward
- GPU prototype code:
– Ported the streamerwise degohsting code to the GPU – 25x speedup compared to the single core CPU (Haswell) – 7x speedup on the entire flow: interesting …
16
SLIDE 17
Phase 3: The Bigger Picture
- The streamerwise deghosting code having been ported to the GPU, upsampling
was the new hotspot in the flow
- Two parallel development branches:
– Porting the upsampling code to the GPU – Porting the 3D deghosting code to the GPU
- At the end of this phase the processing flow was 15x faster
- In the meantime an additional processing step was added to the deghosting code:
extrapolation
- It increased the runtime and changed the application profile (50% of the runtime).
17
SLIDE 18
Phase 4: Putting it All Together
- Ported the extrapolation to the GPU
- Very similar compute kernel to the upsampling
- The first benchmarks showed a throughput that was 40x faster than 1 CPU core
- After running more production like tests we achieved an impressive 100x!
18
SLIDE 19
Hardware Footprint
19
CPU based GPU based
20:1 Nvidia Tesla K80
SLIDE 20
Summary
- Wavefield separation is a fundamental step in marine data acquisition and processing
- It is a very demanding process in terms of compute power
- Infield constraints discard large scale systems
- In order to deliver an acceptable throughput within an acceptable footprint the only
viable solution is GPU based
- The final result showed an impressive throughput along with a very small footprint
- It Improves the geophysical quality of PGS field acquisition deliverable
- Real-time 3D processing of data during acquisition
- GPU deployment started on vessels in Q1 2016
20
SLIDE 21
Titan Class Tethys, Now With GPU-Based “3D Wavefield Separation Appliance”
SLIDE 22
Acknowledgment
Peng Wang Ty Mckercher Ken Hester
22