cs 5220 locality and parallelism in simulations i
play

CS 5220: Locality and parallelism in simulations I David Bindel - PowerPoint PPT Presentation

CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1 Parallelism and locality Real world exhibits parallelism and locality Particles, people, etc function independently Nearby objects interact more strongly


  1. CS 5220: Locality and parallelism in simulations I David Bindel 2017-09-12 1

  2. Parallelism and locality • Real world exhibits parallelism and locality • Particles, people, etc function independently • Nearby objects interact more strongly than distant ones • Can often simplify dependence on distant objects • Can get more parallelism / locality through model • Limited range of dependency between adjacent time steps • Can neglect or approximate far-field effects • Often get parallism at multiple levels • Hierarchical circuit simulation • Interacting models for climate • Parallelizing individual experiments in MC or optimization 2

  3. Basic styles of simulation • Discrete event systems (continuous or discrete time) • Game of life, logic-level circuit simulation • Network simulation • Particle systems • Billiards, electrons, galaxies, ... • Ants, cars, ...? • Lumped parameter models (ODEs) • Circuits (SPICE), structures, chemical kinetics • Distributed parameter models (PDEs / integral equations) • Heat, elasticity, electrostatics, ... Often more than one type of simulation appropriate. Sometimes more than one at a time! 3

  4. Discrete events Basic setup: • Finite set of variables, updated via transition function • Synchronous case: finite state machine • Asynchronous case: event-driven simulation • Synchronous example: Game of Life Nice starting point — no discretization concerns! 4

  5. Game of Life Lonely Crowded OK Born (Dead next step) (Live next step) Game of Life (John Conway): 1. Live cell dies with < 2 live neighbors 2. Live cell dies with > 3 live neighbors 3. Live cell lives with 2–3 live neighbors 4. Dead cell becomes live with exactly 3 live neighbors 5

  6. Game of Life P0 P1 P2 P3 Easy to parallelize by domain decomposition . • Update work involves volume of subdomains • Communication per step on surface (cyan) 6

  7. Game of Life: Pioneers and Settlers What if pattern is “dilute”? • Few or no live cells at surface at each step • Think of live cell at a surface as an “event” • Only communicate events! • This is asynchronous • Harder with message passing — when do you receive? 7

  8. Asynchronous Game of Life How do we manage events? • Could be speculative — assume no communication across boundary for many steps, back up if needed • Or conservative — wait whenever communication possible • Deadlock: everyone waits for everyone else to send data • Can get around this with NULL messages How do we manage load balance? • No need to simulate quiescent parts of the game! • Maybe dynamically assign smaller blocks to processors? 8 • possible ̸≡ guaranteed!

  9. Particle simulation • External forces: ambient gravity, currents, etc. • Simple approximations often apply (Saint-Venant) 9 Particles move via Newton ( F = ma ), with • Local forces: collisions, Van der Waals (1 / r 6 ), etc. • Far-field forces: gravity and electrostatics (1 / r 2 ), etc.

  10. A forced example r 3 • Go from attraction to repulsion at radius a r ij Example force: ij 10 Gm i m j j ( ) 4 ) ( x j − x i ) ( a ∑ f i = 1 − , r ij = ∥ x i − x j ∥ • Long-range attractive force ( r − 2 ) • Short-range repulsive force ( r − 6 )

  11. A simple serial simulation In Matlab, we can write npts = 100; t = linspace(0, tfinal, npts); [tout, xyv] = ode113(@fnbody, ... t, [x; v], [], m, g); xout = xyv(:,1:length(x))'; ... but I can’t call ode113 in C in parallel (or can I?) 11

  12. A simple serial simulation Maybe a fixed step leapfrog will do? npts = 100; steps_per_pt = 10; dt = tfinal/(steps_per_pt*(npts-1)); xout = zeros(2*n, npts); xout(:,1) = x; for i = 1:npts-1 for ii = 1:steps_per_pt x = x + v*dt; a = fnbody(x, m, g); v = v + a*dt; end xout(:,i+1) = x; end 12

  13. Plotting particles 13

  14. Pondering particles • Where do particles “live” (esp. in distributed memory)? • Decompose in space? By particle number? • What about clumping? • How are long-range force computations organized? • How are short-range force computations organized? • How is force computation load balanced? • What are the boundary conditions? • How are potential singularities handled? • What integrator is used? What step control? 14

  15. External forces Simplest case: no particle interactions. • Embarrassingly parallel (like Monte Carlo)! • Could just split particles evenly across processors • Is it that easy? • Maybe some trajectories need short time steps? • Even with MC, load balance may not be entirely trivial. 15

  16. Local forces • Or only check close pairs (via binning, quadtrees?) • Communication required for pairs checked • Usual model: domain decomposition 16 • Simplest all-pairs check is O ( n 2 ) (expensive)

  17. Local forces: Communication Minimize communication: • Send particles that might affect a neighbor “soon” • Trade extra computation against communication • Want low surface area-to-volume ratios on domains 17

  18. Local forces: Load balance • Are particles evenly distributed? • Do particles remain evenly distributed? • Can divide space unevenly (e.g. quadtree/octtree) 18

  19. Far-field forces Mine Buffered Mine Buffered Mine Buffered • Every particle affects every other particle • All-to-all communication required • Overlap communication with computation • Poor memory scaling if everyone keeps everything! • Idea: pass particles in a round-robin manner 19

  20. Passing particles for far-field forces Mine Buffered Mine Buffered Mine Buffered copy local particles to current buf for phase = 1:p send current buf to rank+1 (mod p) recv next buf from rank-1 (mod p) interact local particles with current buf swap current buf with next buf end 20

  21. Passing particles for far-field forces More efficient serial code This analysis neglects overhead term in LogP. but scaled speed-up ( n fixed) remains unchanged. 21 So we can mask communication with computation if Suppose n = N / p particles in buffer. At each phase t comm ≈ α + β n t comp ≈ γ n 2 > β ( ) √ β 2 + 4 αγ n ≥ 1 β + 2 γ γ = ⇒ larger n needed to mask communication! = ⇒ worse speed-up as p gets larger (fixed N )

  22. Far-field forces: particle-mesh methods • Enough charges looks like a continuum! • Poisson equation maps charge distribution to potential • Use fast Poisson solvers for regular grids (FFT, multigrid) • Approximation depends on mesh and particle density • Can clean up leading part of approximation error 22 Consider r − 2 electrostatic potential interaction

  23. Far-field forces: particle-mesh methods • Map particles to mesh points (multiple strategies) • Solve potential PDE on mesh • Interpolate potential to particles • Add correction term – acts like local force 23

  24. Far-field forces: tree methods • Distance simplifies things • Andromeda looks like a point mass from here? • Build a tree, approximating descendants at each node • Several variants: Barnes-Hut, FMM, Anderson’s method • More on this later in the semester 24

  25. Summary of particle example • Model: Continuous motion of particles • Could be electrons, cars, whatever... • Step through discretized time • Local interactions • Relatively cheap • Load balance a pain • All-pairs interactions • Particle-mesh and tree-based algorithms help An important special case of lumped/ODE models. 25 • Obvious algorithm is expensive ( O ( n 2 ) )

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend