CS 5220: Locality and parallelism in simulations I David Bindel - PowerPoint PPT Presentation

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1

Parallelism and locality • Real world exhibits parallelism and locality • Particles, people, etc function independently • Nearby objects interact more strongly than distant ones • Can often simplify dependence on distant objects • Can get more parallelism / locality through model • Limited range of dependency between adjacent time steps • Can neglect or approximate far-field effects • Often get parallism at multiple levels • Hierarchical circuit simulation • Interacting models for climate • Parallelizing individual experiments in MC or optimization 2

Basic styles of simulation • Discrete event systems (continuous or discrete time) • Game of life, logic-level circuit simulation • Network simulation • Particle systems • Billiards, electrons, galaxies, ... • Ants, cars, ...? • Lumped parameter models (ODEs) • Circuits (SPICE), structures, chemical kinetics • Distributed parameter models (PDEs / integral equations) • Heat, elasticity, electrostatics, ... Often more than one type of simulation appropriate. Sometimes more than one at a time! 3

Discrete events Basic setup: • Finite set of variables, updated via transition function • Synchronous case: finite state machine • Asynchronous case: event-driven simulation • Synchronous example: Game of Life Nice starting point — no discretization concerns! 4

Game of Life Lonely Crowded OK Born (Dead next step) (Live next step) Game of Life (John Conway): 1. Live cell dies with < 2 live neighbors 2. Live cell dies with > 3 live neighbors 3. Live cell lives with 2–3 live neighbors 4. Dead cell becomes live with exactly 3 live neighbors 5

Game of Life P0 P1 P2 P3 Easy to parallelize by domain decomposition . • Update work involves volume of subdomains • Communication per step on surface (cyan) 6

Game of Life: Pioneers and Settlers What if pattern is “dilute”? • Few or no live cells at surface at each step • Think of live cell at a surface as an “event” • Only communicate events! • This is asynchronous • Harder with message passing — when do you receive? 7

Asynchronous Game of Life How do we manage events? • Could be speculative — assume no communication across boundary for many steps, back up if needed • Or conservative — wait whenever communication possible • Deadlock: everyone waits for everyone else to send data • Can get around this with NULL messages How do we manage load balance? • No need to simulate quiescent parts of the game! • Maybe dynamically assign smaller blocks to processors? 8 • possible ̸≡ guaranteed!

Particle simulation • External forces: ambient gravity, currents, etc. • Simple approximations often apply (Saint-Venant) 9 Particles move via Newton ( F = ma ), with • Local forces: collisions, Van der Waals (1 / r 6 ), etc. • Far-field forces: gravity and electrostatics (1 / r 2 ), etc.

A forced example r 3 • Go from attraction to repulsion at radius a r ij Example force: ij 10 Gm i m j j ( ) 4 ) ( x j − x i ) ( a ∑ f i = 1 − , r ij = ∥ x i − x j ∥ • Long-range attractive force ( r − 2 ) • Short-range repulsive force ( r − 6 )

A simple serial simulation In Matlab, we can write npts = 100; t = linspace(0, tfinal, npts); [tout, xyv] = ode113(@fnbody, ... t, [x; v], [], m, g); xout = xyv(:,1:length(x))'; ... but I can’t call ode113 in C in parallel (or can I?) 11

A simple serial simulation Maybe a fixed step leapfrog will do? npts = 100; steps_per_pt = 10; dt = tfinal/(steps_per_pt*(npts-1)); xout = zeros(2*n, npts); xout(:,1) = x; for i = 1:npts-1 for ii = 1:steps_per_pt x = x + v*dt; a = fnbody(x, m, g); v = v + a*dt; end xout(:,i+1) = x; end 12

Plotting particles 13

Pondering particles • Where do particles “live” (esp. in distributed memory)? • Decompose in space? By particle number? • What about clumping? • How are long-range force computations organized? • How are short-range force computations organized? • How is force computation load balanced? • What are the boundary conditions? • How are potential singularities handled? • What integrator is used? What step control? 14

External forces Simplest case: no particle interactions. • Embarrassingly parallel (like Monte Carlo)! • Could just split particles evenly across processors • Is it that easy? • Maybe some trajectories need short time steps? • Even with MC, load balance may not be entirely trivial. 15

Local forces • Or only check close pairs (via binning, quadtrees?) • Communication required for pairs checked • Usual model: domain decomposition 16 • Simplest all-pairs check is O ( n 2 ) (expensive)

Local forces: Communication Minimize communication: • Send particles that might affect a neighbor “soon” • Trade extra computation against communication • Want low surface area-to-volume ratios on domains 17

Local forces: Load balance • Are particles evenly distributed? • Do particles remain evenly distributed? • Can divide space unevenly (e.g. quadtree/octtree) 18

Far-field forces Mine Buffered Mine Buffered Mine Buffered • Every particle affects every other particle • All-to-all communication required • Overlap communication with computation • Poor memory scaling if everyone keeps everything! • Idea: pass particles in a round-robin manner 19

Passing particles for far-field forces Mine Buffered Mine Buffered Mine Buffered copy local particles to current buf for phase = 1:p send current buf to rank+1 (mod p) recv next buf from rank-1 (mod p) interact local particles with current buf swap current buf with next buf end 20

Passing particles for far-field forces More efficient serial code This analysis neglects overhead term in LogP. but scaled speed-up ( n fixed) remains unchanged. 21 So we can mask communication with computation if Suppose n = N / p particles in buffer. At each phase t comm ≈ α + β n t comp ≈ γ n 2 > β ( ) √ β 2 + 4 αγ n ≥ 1 β + 2 γ γ = ⇒ larger n needed to mask communication! = ⇒ worse speed-up as p gets larger (fixed N )

Far-field forces: particle-mesh methods • Enough charges looks like a continuum! • Poisson equation maps charge distribution to potential • Use fast Poisson solvers for regular grids (FFT, multigrid) • Approximation depends on mesh and particle density • Can clean up leading part of approximation error 22 Consider r − 2 electrostatic potential interaction

Far-field forces: particle-mesh methods • Map particles to mesh points (multiple strategies) • Solve potential PDE on mesh • Interpolate potential to particles • Add correction term – acts like local force 23

Far-field forces: tree methods • Distance simplifies things • Andromeda looks like a point mass from here? • Build a tree, approximating descendants at each node • Several variants: Barnes-Hut, FMM, Anderson’s method • More on this later in the semester 24

Summary of particle example • Model: Continuous motion of particles • Could be electrons, cars, whatever... • Step through discretized time • Local interactions • Relatively cheap • Load balance a pain • All-pairs interactions • Particle-mesh and tree-based algorithms help An important special case of lumped/ODE models. 25 • Obvious algorithm is expensive ( O ( n 2 ) )

CS 5220: Locality and parallelism in simulations I David Bindel - PowerPoint PPT Presentation

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1 Parallelism and locality Real world exhibits parallelism and locality Particles, people, etc function independently Nearby objects interact more strongly

CS 5220: Locality and parallelism in simulations II David Bindel 2017-09-14 1 Basic styles of

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Compiling for Parallelism & Locality Last time SSA and its uses Today

Lecture 1: Introduction to CS 5220 David Bindel 24 Aug 2011 CS 5220: Applications of Parallel

CS 5220: Introduction David Bindel 2017-08-22 1 CS 5220: Applications of Parallel Computers

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

locality.org.uk Locality is the national network of ambitious and enterprising community-led

Highway Locality Budget Scheme Steve Dibben Highway Locality Manager Mid Herts Group

Optimizing FFT-based Polynomial Arithmetic for Data Locality and Parallelism Marc Moreno Maza

An Application of Sards Theorem to Electrostatics Directed Reading Program: Differential

Programming in CUDA: the Essentials, Part 1 John E. Stone Theoretical and Computational

Three-Dimensional Modeling of High- Latitude Scintillation Alex T. Chartier 1 , Biagio Forte 2 ,

A Path to a 0.1s Neutron Lifetime Measurement Using the Beam Method F. E. Wietfeldt Tulane

Concentration for Coulomb gases and Coulomb transport inequalities Djalil Chafa 1 , Adrien Hardy

Gravity duals of N = 2 superconformal field theories with no electrostatic description K. S IAMPOS

PY502, Computational Physics Instructor: Prof. Anders Sandvik Office: SCI 450A , phone: 353-3843,

ANNOUNCEMENTS How does the angle (with respect ot the vertical) that the string attached to the

CS 5220: Locality and parallelism in simulations I David Bindel - PowerPoint PPT Presentation

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1 Parallelism and locality Real world exhibits parallelism and locality Particles, people, etc function independently Nearby objects interact more strongly

CS 5220: Locality and parallelism in simulations II David Bindel 2017-09-14 1 Basic styles of

Hardware Parallelism vs. Software Parallelism USENIX Workshop on Hot Topics in Parallelism March

CONTEXT LOCALITY LOCALITY LOCALITY LOCALITY LAYOUTS M E E R L U S T R O A D PICK

Compiling for Parallelism &amp; Locality Last time SSA and its uses Today

Lecture 1: Introduction to CS 5220 David Bindel 24 Aug 2011 CS 5220: Applications of Parallel

CS 5220: Introduction David Bindel 2017-08-22 1 CS 5220: Applications of Parallel Computers

Chapter 17: Parallel Databases Introduction I/O Parallelism Interquery Parallelism

Locality Locality CS 105 Tour of the Black Holes of Computing Principle of Locality: Programs

COMPUTER COMPUTER COMPUTER COMPUTER SIMULATIONS SIMULATIONS SIMULATIONS SIMULATIONS

Pervasive Parallelism Laboratory Stanford University ppl.stanford.edu Make parallelism

Data-Level Parallelism Nima Honarmand Fall 2015 :: CSE 610 Parallel Computer Architectures

Advanced OpenMP Lecture 6: Nested parallelism Nested parallelism Nested parallelism is

CSCI341 Lecture 37, Introduction to Parallelism PIPELINING Exploits potential parallelism

locality.org.uk Locality is the national network of ambitious and enterprising community-led

Highway Locality Budget Scheme Steve Dibben Highway Locality Manager Mid Herts Group

Optimizing FFT-based Polynomial Arithmetic for Data Locality and Parallelism Marc Moreno Maza

An Application of Sards Theorem to Electrostatics Directed Reading Program: Differential

Programming in CUDA: the Essentials, Part 1 John E. Stone Theoretical and Computational

Three-Dimensional Modeling of High- Latitude Scintillation Alex T. Chartier 1 , Biagio Forte 2 ,

A Path to a 0.1s Neutron Lifetime Measurement Using the Beam Method F. E. Wietfeldt Tulane

Concentration for Coulomb gases and Coulomb transport inequalities Djalil Chafa 1 , Adrien Hardy

Gravity duals of N = 2 superconformal field theories with no electrostatic description K. S IAMPOS

PY502, Computational Physics Instructor: Prof. Anders Sandvik Office: SCI 450A , phone: 353-3843,

ANNOUNCEMENTS How does the angle (with respect ot the vertical) that the string attached to the

Compiling for Parallelism & Locality Last time SSA and its uses Today