Multi-parameter Waveform Inversion with GPUs for the Cloud A - - PowerPoint PPT Presentation

multi parameter waveform inversion with gpus for the cloud
SMART_READER_LITE
LIVE PREVIEW

Multi-parameter Waveform Inversion with GPUs for the Cloud A - - PowerPoint PPT Presentation

Multi-parameter Waveform Inversion with GPUs for the Cloud A Pipelined Implementation Huy Le*, Stewart A. Levin, and Robert G. Clapp Geophysics Department, Stanford University March 28, 2018 Huy Le Multi-parameter Waveform Inversion with GPUs


slide-1
SLIDE 1

Multi-parameter Waveform Inversion with GPUs for the Cloud

A Pipelined Implementation Huy Le*, Stewart A. Levin, and Robert G. Clapp

Geophysics Department, Stanford University March 28, 2018

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 1

slide-2
SLIDE 2

Waveform inversion

χ(m) = 1 2f (m) − d2

2

χ(m): objective function m: subsurface parameters to recover f (m): modeled data by solving wave equations d: observed seismic data

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 2

slide-3
SLIDE 3

Gradient-based optimization

g(m) =

T 0 u(m)v(m)dt

g(m): gradients u(m): source wavefields by solving forward wave equations v(m): receiver wavefields by solving adjoint wave equations in reverse time with data residuals as sources

T 0 : zero-lag temporal cross-correlation

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 3

slide-4
SLIDE 4

Multi-parameters for better physics

Solving wave equations with multiple parameters requires more memory. For a 1000 × 1000 × 500 volume, Physics Parameters Wavefields Memory (GBs) Acoustic isotropic 1 1 4 Acoustic VTI 3 2 12 Elastic isotropic 3 9 54 Elastic VTI 6 9 108 (VTI: vertical transverse isotropic).

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 4

slide-5
SLIDE 5

Conventional domain decomposition

Divide volumes among multiple GPUs, which are potentially on different nodes. More parameters demand more GPUs or GPUs with larger memory. Two-way communication among devices to exchange halos. Fast inter-nodal connection is not guaranteed, particularly on the cloud.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 5

slide-6
SLIDE 6

Pipelined approach

Thor Johnsen and Alex Loddoch (GTC 2014). Divide computational domain along one axis into blocks. A single GPU streams through domain block by block and updates as many time steps as possible. Multiple updates significantly overlap host-device IO.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 6

slide-7
SLIDE 7

Stencil for 2nd-order time difference

Divide along z-axis. Each block contains half-stencil-length number of depth slices. Need three consecutive blocks for second derivatives.

X Y Z block i v block i t=0 block i t=2 block i+1 t=1 block i t=1 block i-1 t=1 time

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 7

slide-8
SLIDE 8

Pipeline iteration 0

CPU GPU

transfer in update transfer out block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 block0 v block0 t=0 block0 t=1 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 8

slide-9
SLIDE 9

Pipeline iteration 1

CPU GPU

transfer in update transfer out block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 9

slide-10
SLIDE 10

Pipeline iteration 2

CPU GPU

transfer in update transfer out block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 block0 t=2 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 10

slide-11
SLIDE 11

Pipeline iteration 3

CPU GPU

transfer in update transfer out block4 v block4 t=0 block4 t=1 block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block1 t=2 block0 v block0 t=0 block0 t=1 block0 t=2 block0 t=3 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 11

slide-12
SLIDE 12

Pipeline iteration 4

CPU GPU

transfer in update transfer out block5 v block5 t=0 block5 t=1 block4 v block4 t=0 block4 t=1 block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=0 block0 t=1 block4 v block4 t=0 block4 t=1 block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block2 t=2 block1 v block1 t=0 block1 t=1 block1 t=2 block1 t=3 block0 v block0 t=0 block0 t=1 block0 t=2 block0 t=3 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 12

slide-13
SLIDE 13

Pipeline iteration 5

CPU GPU

transfer in update transfer out block6 v block6 t=0 block6 t=1 block5 v block5 t=0 block5 t=1 block4 v block4 t=0 block4 t=1 block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=2 block0 t=3 block5 v block5 t=0 block5 t=1 block4 v block4 t=0 block4 t=1 block3 v block3 t=0 block3 t=1 block3 t=2 block2 v block2 t=0 block2 t=1 block2 t=2 block2 t=3 block1 v block1 t=0 block1 t=1 block1 t=2 block1 t=3 block0 v block0 t=0 block0 t=1 block0 t=2 block0 t=3 Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 13

slide-14
SLIDE 14

Streams and threads to overlap transfer and compute

Pipeline takes some iterations to initialize and drain. Stagger tasks to overlap. cudaMemcpyAsynch to copy between host and devices. Two CPU threads to copy between swappable buffers and pinned buffers.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 14

slide-15
SLIDE 15

Pipeline for 2 GPUs

CPU GPU0 GPU1

transfer in update transfer out block11 v block11 t=0 block11 t=1 block10 v block10 t=0 block10 t=1 block9 v block9 t=0 block9 t=1 block8 v block8 t=0 block8 t=1 block7 v block7 t=0 block7 t=1 block6 v block6 t=0 block6 t=1 block5 v block5 t=0 block5 t=1 block4 v block4 t=0 block4 t=1 block3 v block3 t=0 block3 t=1 block2 v block2 t=0 block2 t=1 block1 v block1 t=0 block1 t=1 block0 v block0 t=4 block0 t=5 block10 v block5 v block10 t=0 block5 t=2 block10 t=1 block5 t=3 block9 v block4 v block9 t=0 block4 t=2 block9 t=1 block4 t=3 block8 v block3 v block8 t=0 block3 t=2 block8 t=1 block3 t=3 block8 t=2 block3 t=4 block7 v block2 v block7 t=0 block2 t=2 block7 t=1 block2 t=3 block7 t=2 block2 t=4 block7 t=3 block2 t=5 block6 v block1 v block6 t=0 block1 t=2 block6 t=1 block1 t=3 block6 t=2 block1 t=4 block6 t=3 block1 t=5 block5 v block0 v block5 t=0 block0 t=2 block5 t=1 block0 t=3 block5 t=2 block0 t=4 block5 t=3 block0 t=5

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 15

slide-16
SLIDE 16

IO bottle neck

Computation of the gradients requires reverse-time propagation. Absorbing boundary condition and checkpoints require three propagations, but are IO- and memory-intensive. Solution: random boundary condition (Clapp, SEG 2009; Shen, SEG 2011). Trade-off: gradients computed on the fly and on device but require four propagations.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 16

slide-17
SLIDE 17

Pipelines for source and receiver wavefields

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 17

slide-18
SLIDE 18

Acoustic isotropic wave equation

One medium parameter and one wavefield at two consecutive time steps: 12 bytes per cell. Example: 6GB for volume 1000 × 1000 × 500 and 8th-order stencil. CPU code: blocked, Intel Thread Building Blocks (TBB), Intel SPMD Program Compiler (ISPC), single Xeon machine with 12 cores and 24 threads. "Optimal" speed when volume fits in one Tesla K80 GPU (12GB global memory, 2500 threads), i.e. no domain decomposition or host-device transfer. Pipelined code updates 94 times per host-device transfer for same memory.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 18

slide-19
SLIDE 19

Acoustic isotropic wave equation: forward modeling

CPU Pipeline 1 GPU Optimal 0.0 0.5 1.0 1.5 2.0 2.5 3.0 GCells/s 1.000 2.220 2.380

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 19

slide-20
SLIDE 20

Acoustic VTI wave equations

System of two second-order wave equations, three medium parameters and two wavefields, each at two consecutive time steps: 28 bytes per cell. Example: 28GB for volume 1000 × 1000 × 1000. Number of updates GPU Memory (GBs) 2 0.736 4 1.024 8 1.6 16 2.752 32 5.056 64 9.664

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 20

slide-21
SLIDE 21

Acoustic VTI wave equations: forward modeling

2 4 8 16 32 64 Number of updates 0.00 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 GCells/s 0.579 1.003 1.790 1.816 1.828 1.834

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 21

slide-22
SLIDE 22

Acoustic VTI wave equations: forward modeling

8 updates (1.6GB on GPU) completely overlap host-device transfers.

  • max. speed =

bandwidth bytes per cell × Nupdate. 7 GB/s 28 bytes per cell × 8 = 2 GCell/s. Achieved 1.79 GCell/s.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 22

slide-23
SLIDE 23

Acoustic VTI wave equations: forward modeling

1 2 4 8 Number of GPUs 2 4 6 8 10 12 14 GCells/s 1.828 3.389 6.471 12.350

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 23

slide-24
SLIDE 24

Conclusions

A pipelined implementation enables the processing of large volumes by a single GPU, reducing IO needs, hence very suitable for cloud infrastructures. Taking advantage of stencil shape for multiple updates per IO completely overcomes IO cost, no matter how slow. Simple one-direction transfer among devices and between hosts and devices makes adding GPUs easy. Implementation for acoustic VTI system scales with number of GPUs.

Huy Le Multi-parameter Waveform Inversion with GPUs for the Cloud March 28, 2018 24