Scaling Clustered N-Body/SPH Simulations Thomas Quinn University - - PowerPoint PPT Presentation

scaling clustered n body sph simulations
SMART_READER_LITE
LIVE PREVIEW

Scaling Clustered N-Body/SPH Simulations Thomas Quinn University - - PowerPoint PPT Presentation

Scaling Clustered N-Body/SPH Simulations Thomas Quinn University of Washington Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Fabio Governato Lauren Anderson Amit Sharma Michael Tremmel Lukasz Wesolowski Ferah Munshi


slide-1
SLIDE 1

Scaling Clustered N-Body/SPH Simulations

Thomas Quinn University of Washington

slide-2
SLIDE 2

Fabio Governato Lauren Anderson Michael Tremmel Ferah Munshi Joachim Stadel James Wadsley Greg Stinson Laxmikant Kale Filippo Gioachin Pritish Jetley Celso Mendes Amit Sharma Lukasz Wesolowski Gengbin Zheng Edgar Solomonik Harshitha Menon

slide-3
SLIDE 3

Image courtesy ESA/Planck

Cosmology at 380,000 years

slide-4
SLIDE 4

Cosmology at 13.6 Gigayears

slide-5
SLIDE 5

... is not so simple

slide-6
SLIDE 6
slide-7
SLIDE 7

Computational Cosmology

  • CMB has fluctuations of 1e-5
  • Galaxies are overdense by 1e7
  • It happens (mostly) through Gravitational

Collapse

  • Making testable predictions from a cosmological

hypothesis requires

–Non-linear, dynamic calculation –e.g. Computer simulation

slide-8
SLIDE 8
slide-9
SLIDE 9

Michael Tremmel et al, 2017

slide-10
SLIDE 10

TreePiece: basic data structure

  • A “vertical slice” of the

tree, all the way to the root.

  • Nodes are either:

– Internal – External – Boundary (shared)

slide-11
SLIDE 11

4/18/2017 Parallel Programming Laboratory @ UIUC 12

Overall treewalk structure

slide-12
SLIDE 12
slide-13
SLIDE 13

Speedups for 2 billion clustered particles

slide-14
SLIDE 14

Multistep Speedup

slide-15
SLIDE 15

4/18/2017 Parallel Programming Laboratory @ UIUC 16

Clustered/Multistepping Challenges

  • Load/particle imbalance
  • Communication imbalance
  • Rapid switching between phases

– Gravity, Star formation, SMBH mergers

  • Fixed costs:

– Domain Decomposition – Load balancing – Tree build

slide-16
SLIDE 16

Zoomed Cluster simulation

slide-17
SLIDE 17

Load distribution

slide-18
SLIDE 18

ORB Load Balancing

slide-19
SLIDE 19

Gravity Gas Communication SMP load sharing

29.4 seconds

LB by particle count

slide-20
SLIDE 20

15.8 seconds

LB by Compute time

Star Formation

slide-21
SLIDE 21

Multistepping Utilization

slide-22
SLIDE 22

Small rungs:

Energy Energy

slide-23
SLIDE 23

Smallest step

Total interval: 1 second

slide-24
SLIDE 24

CPU Scaling Summary

  • Load balancing the big steps is (mostly) solved
  • Load balancing/optimizing the small steps is

what is needed:

– Small steps dominate the total time – Small steps increase throughput even when not

  • ptimal

– Plenty of opportunity for improvement

slide-25
SLIDE 25

GPU Implementation: Gravity Only

  • Load (SMP node) local tree/particle data onto

the GPU

  • Load prefetched remote tree onto the GPU
  • CPUs walk tree and pass interaction lists

– Lists are batched to minimize number of data

transfers

  • “Missed” treenodes: walk is resumed when data

arrives: interaction list plus new tree data sent to the GPU.

slide-26
SLIDE 26

Grav/SPH scaling with GPUs

slide-27
SLIDE 27

Tree walking on the GPU

Jianqiau Liu, Purdue University

slide-28
SLIDE 28

Paratreet: parallel framework for tree algorithms

slide-29
SLIDE 29

Availability

  • ChaNGa: http://github.com/N-bodyShop/changa

– See the Wiki for a developer's guide – Extensible: e.g. ChaNGa-MM by Phil Chang

  • Paratreet: http://github.com/paratreet

– Some design discussion and sample code

slide-30
SLIDE 30

Acknowledgments

  • NSF ITR
  • NSF Astronomy
  • NSF XSEDE program for computing
  • BlueWaters Petascale Computing
  • NASA HST
  • NASA Advanced Supercomuting