for the 3D Groundwater Flow Equation GTC 2015 Robert Zigon Sr - - PowerPoint PPT Presentation

for the
SMART_READER_LITE
LIVE PREVIEW

for the 3D Groundwater Flow Equation GTC 2015 Robert Zigon Sr - - PowerPoint PPT Presentation

GPU Accelerated Solver for the 3D Groundwater Flow Equation GTC 2015 Robert Zigon Sr Staff Research Engineer Beckman Coulter Outline Background Legacy Fortran The Algorithm and CUDA attempts Results Lessons Learned


slide-1
SLIDE 1

GPU Accelerated Solver for the 3D Groundwater Flow Equation

Robert Zigon Sr Staff Research Engineer Beckman Coulter

GTC 2015

slide-2
SLIDE 2

Outline

  • Background
  • Legacy Fortran
  • The Algorithm and CUDA attempts
  • Results
  • Lessons Learned
slide-3
SLIDE 3

Background

Hydrogeology The study of the distribution and movement of water in the Earth’s crust.

slide-4
SLIDE 4

Questions asked by Hydrogeologists

  • Can an aquifer support another subdivision in

a residential area?

  • Will a dam dry up if irrigation doubles?
  • Will waste products from a coal mine

negatively impact wetlands?

slide-5
SLIDE 5

A PDE to model the water flow

Freeze, 1971

slide-6
SLIDE 6

Discretizing the PDE

  • First order for time
  • Second order for spatial

t t t x

j i j i i j

   

  ) ( ) ( 1

) , (   

x K x K x x t x K x

j i j i j i j i j i j i i j

       

            ) 1 ( 1 ) ( 1 ) 1 ( 1 2 ) 1 ( ) ( ) 1 ( 1 ,

2 ) ( 2 ) ( 1 )] ( ) ( [          

slide-7
SLIDE 7

Legacy Fortran

  • About 15 pages of code (Intel compiler)
  • In use for over 10 years
  • 7 day simulation, 24 hr step, 1M elements

 2 hr run time

  • 30 day simulation, 24 hr step, 19M elements

 8 days run time

slide-8
SLIDE 8

Algorithm Overview

For each time step t While pressure not converged at (t)

  • 1. Predict Psi(t)
  • 2. Compute K(Psi(t))
  • 3. Compute Psi(t)
  • 4. Update Psi(t-2), Psi(t-1)
  • 5. Generate discharge field Q(t)
slide-9
SLIDE 9

First CUDA attempt

Results – Not Enough Registers!

  • 1. Predict Psi(t)

Compute K(Psi(t)) Compute Psi(t) Update Psi(t-2), Psi(t-1)

  • 2. Generate discharge field Q(t)
  • Launch 250,000 threads for 19M volume elements
  • Advance the plane of threads across the volume
slide-10
SLIDE 10

Second CUDA attempt

Results – K1 not enough registers!

  • 1. Predict Psi(t)

Compute K(Psi(t))

  • 2. Compute Psi(t)
  • 3. Update Psi(t-2), Psi(t-1)
  • 4. Generate discharge field Q(t)
slide-11
SLIDE 11

Third CUDA attempt

Results – K2 nonlinear coefficients expensive K3 warp divergence boundary cond. Numerous matrix reads from GMEM

  • 1. Predict Psi(t)
  • 2. Compute K(Psi(t))
  • 3. Compute Psi(t)
  • 4. Update Psi(t-2), Psi(t-1)
  • 5. Generate discharge field Q(t)
slide-12
SLIDE 12

Results – 7 Day, 19M elements

All arithmetic in double precision

CUDA 5.5, K20C, VS 2008, Win7/64

1 cpu mins 4 cpu mins K20c mins 1 cpu/K20 4 cpu/K20 24 hrs 120 72 10 12.6 7.6 12 hrs 251 165 21 12.0 7.9 6 hrs 532 352 41 13.0 8.6 4 hrs 826 510 63 13.1 8.1 2 hrs 1557 967 123 12.7 7.9

1 10 100 1000 10000 1 2 3 4 5 6 Time (mins) 1 CPU 4 CPU Tesla K20C

slide-13
SLIDE 13

Lessons Learned

  • Advance a “plane of threads” through the volume
  • Matrix multi-splitting operator could reduce reads
  • Simplify non-linear terms with splines
  • Porting code  10x
  • Re-architecting code  100x
slide-14
SLIDE 14

Collaborators

  • Prof. Sally Letsinger, Indiana University
  • Prof. Raymond Chin, Indiana University-Purdue

University of Indianapolis

  • O’Leary-Multi-splitting of Matrices and Parallel

Solution of Linear Systems

  • Freeze-Three dimensional, transient, saturated

unsaturated flow in a ground basin

  • Micikevicius-3D Finite Difference Computation on

GPUs using CUDA

References

slide-15
SLIDE 15

Questions? robert.zigon@beckman.com