for the Shallow Water Equations on Graphics Processing Units - PowerPoint PPT Presentation

Compact Stencils for the Shallow Water Equations on Graphics Processing Units Technology for a better society 1

Brief Outline • Introduction to Computing on GPUs • The Shallow Water Equations • Compact Stencils on the GPU • Physical correctness • Summary Technology for a better society 2

Introduction to GPU Computing Technology for a better society 3

Long, long time ago, … 1942: Digital Electric Computer (Atanasoff and Berry) 1947: Transistor (Shockley, Bardeen, and Brattain) 1956 1958: Integrated Circuit (Kilby) 2000 1971: Microprocessor (Hoff, Faggin, Mazor) 1971- More transistors (Moore, 1965) Technology for a better society 4

The end of frequency scaling 2004-2011: A serial program uses 2% Frequency 1971-2004: of available resources! constant 29% increase in frequency 1999-2011: Parallelism technologies: 25% increase in parallelism • Multi-core (8x) • Hyper threading (2x) • AVX/SSE/MMX/etc (8x) 1971: Intel 4004, 1982: Intel 80286, 1993: Intel Pentium P5, 2000: Intel Pentium 4, 2010: Intel Nehalem, 2300 trans, 740 KHz 134 thousand trans, 8 MHz 1.18 mill. trans, 66 MHz 42 mill. trans, 1.5 GHz 2.3 bill. trans, 8 X 2.66 GHz Technology for a better society 5

How does parallelism help? The power density of microprocessors 100% is proportional to the clock frequency cubed: 100% Single Core 100% 85% 100% Multi Core Frequency 170 % Power 30% Performance 100 % GPU ~10x Technology for a better society 6

The GPU: Massive parallelism CPU GPU Cores 4 16 Float ops / clock 64 1024 Frequency (MHz) 3400 1544 GigaFLOPS 217 1580 Memory (GiB) 32+ 3 Performance Memory Bandwidth Technology for a better society 7

GPU Programming: From Academic Abuse to Industrial Use OpenCL DirectCompute, C++ AMP DirectX BrookGPU AMD Brook+ AMD CTM / CAL NVIDIA CUDA ~2000 ~2005 ~2010 Graphics APIs "Academic" Abstractions Dedicated C-based languages Technology for a better society 8

GPU Execution mode CPU scalar op CPU SSE op GPU Warp op • CPU scalar op 1 thread, 1 operand on 1 data element • CPU SSE op 1 thread, 1 operand on 2-4 data elements • GPU Warp op 1 warp = 32 threads, 32 operands on 32 data elements • Exposed as individual threads • Actually runs the same instruction • Divergence implies serialization and masking Technology for a better society 9

Warp Serialization and Masking Hardware serializes and masks divergent code flow: • Programmer is relieved of fiddling with element masks (which is necessary for SSE) • But execution time is still the sum of branches taken • Worst case: • All warp threads takes individual branches (1/32 perfomance) • Thus, important to minimize divergent code flow ! • Move conditionals into data, use min, max, conditional moves. Technology for a better society 10

Example: Warp Serialization in Newton’s Method __global__ void • First if-statement newton(float* x,const float* a,const float* b,const float* c,int N) { • Masks out int i = blockIdx.x * blockDim.x + threadIdx.x; superfluous threads if( i < N ) { const float la = a[i]; • Not significant const float lb = b[i]; const float lc = c[i]; • Iteration loop float lx = 0.f; for(int it=0; it<MAXIT; it++) { • Identical for all threads float f = la*lx*lx + lb*lx + lc; • if( fabsf(f) < 1e-7f) { Early exit break; • Possible divergence } float df = 2.f*la*lx + lb; • Only beneficial when lx = lx - f/df; } all threads in warp can x[i] = lx; exit } } • Removing early exit increases performance from 0.84ms to 0.69ms (kernel only) (But fails 7 of 1 000 000 times since multiple zeros isn’t handled properly, but that is a different story  ) Technology for a better society 11

Examples of early GPU research Self-intersection (~10x) Preparation for FEM (~5x) Registration of medical Fluid dynamics and FSI (Navier-Stokes) data (~20x) Inpainting (~400x matlab code) Euler Equations (~25x) Marine aqoustics (~20x) SW Equations (~25x) Matlab Interface Linear algebra Water injection in a fluvial reservoir (20x) Examples from SINTEF Technology for a better society 12

Examples of GPU use today Heterogeneous Computing (Top500) 40 Count top 100 Count top 500 35 Count Cell 30 25 20 15 10 5 0 okt.2006 feb.2008 jul.2009 nov.2010 apr.2012 Screenshot from NVIDIA website Technology for a better society 13

Compact stencils on the GPU: Efficient Flood Simulations Technology for a better society 14

The Shallow Water Equations • A hyperbolic partial differential equation • First described by de Saint-Venant (1797-1886) • Conservation of mass and momentum • Gravity waves in 2D free surface • Gravity-induced fluid motion • Governing flow is horizontal • Not only for water: • Simplification of atmospheric flow • Avalanches • ... Water image from http://freephoto.com / Ian Britton Technology for a better society 15

The Shallow Water Equations Vector of Bed slope Bed friction Conserved Flux Functions source term source term variables Technology for a better society 16

Target Application Areas Tsunamis Floods 2010: Pakistan (2000+) 2011: Japan (5321+) 1931: China floods (2 500 000+) 2004: Indian Ocean (230 000) Storm Surges Dam breaks 2005: Hurricane Katrina (1836) 1975: Banqiao Dam (230 000+) 1530: Netherlands (100 000+) 1959: Malpasset (423) Images from wikipedia.org, www.ecolo.org Technology for a better society 17

Two important uses of shallow water simulations • In preparation for events: Evaluate possible scenarios • Simulation of many ensemble members • Creation of inundation maps • Creation of Emergency Action Plans • In response to ongoing events • Simulate possible scenarios in real-time • Simulate strategies for flood protection (sand bags, etc.) • Determine who to evacuate based on simulation, not guesswork • High requirements to performance => Use the GPU Simulation result from NOAA Inundation map from “Los Angeles County Tsunami Inundation Maps”, http://www.conservation.ca.gov/cgs/geologic_hazards/Tsunami /Inundation_Maps/LosAngeles/Pages/LosAngeles.aspx Technology for a better society 18

Solving a partial differential equation on the GPU • Before we start with the shallow water equations, let us examine something slightly less complex: the heat equation • Describes diffusive heat conduction • Prototypical partial differential equation • u is the temperature, kappa is the diffusion coefficient, t is time, and x is space. Technology for a better society 19

Finding a solution to the heat equation • Solving such partial differential equations analytically is nontrivial in all but a few very special cases • Solution strategy: replace the continuous derivatives with approximations at a set of grid points • Solve for each grid point numerically on a computer • Use many grid points, and high order of approximation to get good results Technology for a better society 20

The Heat Equation with an implicit scheme 1. We can construct an implicit scheme by carefully choosing the "correct" approximation of derivatives 2. This ends up in a system of linear equations 3. Solve Ax=b using standard GPU methods to evolve the solution in time Technology for a better society 21

The Heat Equation with an implicit scheme • Such implicit schemes are often sought after – They allow for large time steps, – They can be solved using standard tools – Allow complex geometries – They can be very accurate – … • However … – for many time-varying phenomena, we are also interested in the temporal dynamics of the problem – Linear algebra solvers can be slow and memory hungry , especially on the GPU Technology for a better society 22

Algorithmic and numerical performance • For all problems, the total performance is the product of the algorithmic and the Explicit numerical performance stencils • Your mileage may vary: algorithmic performance is highly problem dependent Tridiag Numerical performance • Sparse linear algebra solvers have low QR numerical performance PLU • Only able to utilize a fraction of the Red- capabilities of CPUs, and worse on GPUs Black Multigrid • For suitable problems, explicit schemes with compact stencils can give the best Krylov performance • Able to reach near-peak performance Algorithmic performance Technology for a better society 23

Explicit schemes with compact stencils • Explicit schemes can give rise to compact stencils – Embarrassingly parallel – Perfect for the GPU! Technology for a better society 24

Back to the shallow water equations • A Hyperbolic partial differential equation • Enables explicit schemes • Solutions form discontinuities / shocks • Require high accuracy in smooth parts without oscillations near discontinuities • Solutions include dry areas • Negative water depths ruin simulations • Often high requirements to accuracy • Order of spatial/temporal discretization • Floating point rounding errors • Can be difficult to capture "lake at rest" A standing wave or shock Technology for a better society 25

for the Shallow Water Equations on Graphics Processing Units - PowerPoint PPT Presentation

Compact Stencils for the Shallow Water Equations on Graphics Processing Units Technology for a better society 1 Brief Outline Introduction to Computing on GPUs The Shallow Water Equations Compact Stencils on the GPU Physical

A well-balanced scheme for the shallow-water equations with topography and bottom friction C.

A well-balanced scheme for the shallow-water equations with topography and Manning friction C.

A high-order well-balanced scheme for the shallow-water equations with topography and Manning

A well-balanced scheme for the shallow-water equations with topography and Manning friction C.

SHALLOW WATER BATHYMETRY WITH AN SHALLOW WATER BATHYMETRY WITH AN INCOHERENT X- -BAND RADAR

GEOTHERMAL SYSTEMS AND TECHNOLOGIES 5. SHALLOW GEOTHERMAL SYSTEMS 5. SHALLOW GEOTHERMAL SYSTEMS

Asymptotic limits of the Shallow Water equations Carine Lucas MAPMO - univ. Orl eans, France

Water Resource Plans NSW GAB Shallow Water Resource Plan Area GW13 Intersecting Streams Water

The Shallow Water Equations Clint Dawson and Christopher M. Mirabito Institute for Computational

Section6.1 Systems of Equations in Two Variables Introduction Definitions A system of equations

A well-balanced reconstruction of wet/dry fronts for the shallow water equations Guoxian Chen

Modified shallow water equations for significant bathymetry variations D IDIER CLAMOND University

1.25 1.25 Moz Moz HIGH HIGH - GRADE, SHALLOW GRADE, SHALLOW WA GOLD PROJECT WA GOLD PROJECT

Shallow vs. deep networks Restricted Boltzmann Machines Shallow : one hidden layer Features

Challenging Shallow Water DP Jacking Operations Design & Operational Feedback Sofien

Notes Shallow water Simplified linear analysis before had dispersion relation g c = k tanh kH

Does Too What is at Risk when Groundwater Rises? Jayne F. Knott and Jennifer M. Jacobs

Closing the Resilience Gap: What Will it Take? Based on PBSs Sinking Cities: Miami The Science

Reproduction of storms over the North Sea and the Baltic with the regional reanalysis COSMO-REA6

Requirements . Risk vs. Business Requirement Industry Perspective Steven P. Weiss Vice

Damaris Reyes Executive Director, Good Old Lower East Side (GOLES) Equitable Adaptation:

Motion Perception Chapter 8 Lecture 14 Jonathan Pillow Sensation & Perception (PSY 345 /

The Dawning of the Age of Stochasticity For over two millennia, Aristotles logic has ruled

W ikis of Locality: insights from the Open Guides Mark Gaved, Tom Heath, Marc Eisenstadt

for the Shallow Water Equations on Graphics Processing Units - PowerPoint PPT Presentation

Compact Stencils for the Shallow Water Equations on Graphics Processing Units Technology for a better society 1 Brief Outline Introduction to Computing on GPUs The Shallow Water Equations Compact Stencils on the GPU Physical

A well-balanced scheme for the shallow-water equations with topography and bottom friction C.

A well-balanced scheme for the shallow-water equations with topography and Manning friction C.

A high-order well-balanced scheme for the shallow-water equations with topography and Manning

A well-balanced scheme for the shallow-water equations with topography and Manning friction C.

SHALLOW WATER BATHYMETRY WITH AN SHALLOW WATER BATHYMETRY WITH AN INCOHERENT X- -BAND RADAR

GEOTHERMAL SYSTEMS AND TECHNOLOGIES 5. SHALLOW GEOTHERMAL SYSTEMS 5. SHALLOW GEOTHERMAL SYSTEMS

Asymptotic limits of the Shallow Water equations Carine Lucas MAPMO - univ. Orl eans, France

Water Resource Plans NSW GAB Shallow Water Resource Plan Area GW13 Intersecting Streams Water

The Shallow Water Equations Clint Dawson and Christopher M. Mirabito Institute for Computational

Section6.1 Systems of Equations in Two Variables Introduction Definitions A system of equations

A well-balanced reconstruction of wet/dry fronts for the shallow water equations Guoxian Chen

Modified shallow water equations for significant bathymetry variations D IDIER CLAMOND University

1.25 1.25 Moz Moz HIGH HIGH - GRADE, SHALLOW GRADE, SHALLOW WA GOLD PROJECT WA GOLD PROJECT

Shallow vs. deep networks Restricted Boltzmann Machines Shallow : one hidden layer Features

Challenging Shallow Water DP Jacking Operations Design &amp; Operational Feedback Sofien

Notes Shallow water Simplified linear analysis before had dispersion relation g c = k tanh kH

Does Too What is at Risk when Groundwater Rises? Jayne F. Knott and Jennifer M. Jacobs

Closing the Resilience Gap: What Will it Take? Based on PBSs Sinking Cities: Miami The Science

Reproduction of storms over the North Sea and the Baltic with the regional reanalysis COSMO-REA6

Requirements . Risk vs. Business Requirement Industry Perspective Steven P. Weiss Vice

Damaris Reyes Executive Director, Good Old Lower East Side (GOLES) Equitable Adaptation:

Motion Perception Chapter 8 Lecture 14 Jonathan Pillow Sensation &amp; Perception (PSY 345 /

The Dawning of the Age of Stochasticity For over two millennia, Aristotles logic has ruled

W ikis of Locality: insights from the Open Guides Mark Gaved, Tom Heath, Marc Eisenstadt

Challenging Shallow Water DP Jacking Operations Design & Operational Feedback Sofien

Motion Perception Chapter 8 Lecture 14 Jonathan Pillow Sensation & Perception (PSY 345 /