A Framework for Particle Advection for Very Large Data Hank Childs, - - PowerPoint PPT Presentation

a framework for particle advection for very large data
SMART_READER_LITE
LIVE PREVIEW

A Framework for Particle Advection for Very Large Data Hank Childs, - - PowerPoint PPT Presentation

A Framework for Particle Advection for Very Large Data Hank Childs, LBNL/UCDavis David Pugmire, ORNL Christoph Garth, Kaiserslautern David Camp, LBNL/UCDavis Sean Ahern, ORNL Gunther Weber, LBNL Allen Sanderson, Univ. of Utah Advecting


slide-1
SLIDE 1

A Framework for Particle Advection for Very Large Data

Hank Childs, LBNL/UCDavis David Pugmire, ORNL Christoph Garth, Kaiserslautern David Camp, LBNL/UCDavis Sean Ahern, ORNL Gunther Weber, LBNL Allen Sanderson, Univ. of Utah

slide-2
SLIDE 2

Advecting particles

slide-3
SLIDE 3

Particle advection basics

  • Advecting particles create integral curves
  • Streamlines: display particle path

(instantaneous velocities)

  • Pathlines: display particle path (velocity

field evolves as particle moves)

slide-4
SLIDE 4

Particle advection is the duct tape

  • f the visualization world

Advecting particles is essential to understanding flow and other phenomena (e.g. magnetic fields)!

slide-5
SLIDE 5

Outline

 Efficient advection of particles  A general system for particle-advection based

analysis

slide-6
SLIDE 6

Particle Advection Load Balancing

 N particles (P1, P2, … Pn), M MPI tasks (T1, …, Tm)  Each particle takes a variable number of steps, S1, S2, … Sn  Total number of steps is ΣSi

 We cannot do less work than this (ΣSi)

 Goal: Distribute the ΣSi steps over M MPI tasks such that

problem finishes in minimal time

slide-7
SLIDE 7

Particle Advection Performance

 Goal: Distribute the ΣSi steps over M MPI tasks such that

problem finishes in minimal time

 Sounds sort of like a bin-packing problem, but…

 particles can move from MPI task to MPI task  path of particle is data dependent and unknown a priori

(we don’t know Si beforehand)

 big data significantly complicates this picture….

 … data may not be readily available, introducing starvation

slide-8
SLIDE 8

Advecting particles

Decomposition of large data set into blocks on filesystem

?

What is the right strategy for getting particle and data together?

slide-9
SLIDE 9

Strategy: load blocks necessary for advection

Decomposition of large data set into blocks on filesystem Go to filesystem and read block

slide-10
SLIDE 10

Decomposition of large data set into blocks on filesystem

Strategy: load blocks necessary for advection

This strategy has multiple benefits: 1) Indifferent to data size: a serial program can process data of any size 2) Trivial parallelization (partition particles over processors) BUT: redundant I/O (both over MPI tasks and within a task) is a significant problem.

slide-11
SLIDE 11

“Parallelize over Particles”

 “Parallelize over Particles”: particles are partitioned

  • ver processors, blocks of data are loaded as

needed.

 Some additional complexities:

 Work for a given particle (i.e. Si) is variable and not

known a priori: how to share load between processors dynamically?

 More blocks than can be stored in memory: what is the

best caching/purging strategy?

slide-12
SLIDE 12

“Parallelize over data” strategy:

parallelize over blocks and pass particles

T1 T2 T4 T3

This strategy has multiple benefits: 1) Ideal for in situ processing. 2) Only load data once. BUT: starvation is a significant problem.

slide-13
SLIDE 13

Both parallelization schemes have serious flaws.

  • Two approaches:

Parallelizing Over I/O Efficiency Data Good Bad Particles Bad Good

Parallelize

  • ver particles

Parallelize

  • ver data

Hybrid algorithms

slide-14
SLIDE 14

The master-slave algorithm is an example of a hybrid technique.

 Algorithm adapts during runtime to avoid pitfalls of

parallelize-over-data and parallelize-over- particles.

 Nice property for production visualization tools.

 Implemented inside VisIt visualization and analysis

package.

  • D. Pugmire, H. Childs, C. Garth, S. Ahern, G.

Weber, “Scalable Computation of Streamlines on Very Large Datasets.” SC09, Portland, OR, November, 2009

slide-15
SLIDE 15

Master-Slave Hybrid Algorithm

  • Divide processors into groups of N
  • Uniformly distribute seed points to each group

Master:

  • Monitor workload
  • Make decisions to optimize resource

utilization Slaves:

  • Respond to commands from

Master

  • Report status when work

complete

slide-16
SLIDE 16

Master Process Pseudocode

Master() { while ( ! done ) { if ( NewStatusFromAnySlave() ) { commands = DetermineMostEfficientCommand() for cmd in commands SendCommandToSlaves( cmd ) } } }

What are the possible commands?

slide-17
SLIDE 17

Commands that can be issued by master

Master Slave

Slave is given a streamline that is contained in a block that is already loaded

  • 1. Assign / Loaded Block
  • 2. Assign / Unloaded Block
  • 3. Handle OOB / Load
  • 4. Handle OOB / Send

OOB = out of bounds

slide-18
SLIDE 18

Master Slave

Slave is given a streamline and loads the block

Commands that can be issued by master

  • 1. Assign / Loaded Block
  • 2. Assign / Unloaded Block
  • 3. Handle OOB / Load
  • 4. Handle OOB / Send

OOB = out of bounds

slide-19
SLIDE 19

Master Slave

Load

Slave is instructed to load a

  • block. The streamline in that

block can then be computed.

Commands that can be issued by master

  • 1. Assign / Loaded Block
  • 2. Assign / Unloaded Block
  • 3. Handle OOB / Load
  • 4. Handle OOB / Send

OOB = out of bounds

slide-20
SLIDE 20

Master Slave

Send to J

Slave J

Slave is instructed to send a streamline to another slave that has loaded the block

Commands that can be issued by master

  • 1. Assign / Loaded Block
  • 2. Assign / Unloaded Block
  • 3. Handle OOB / Load
  • 4. Handle OOB / Send

OOB = out of bounds

slide-21
SLIDE 21

Master Process Pseudocode

Master() { while ( ! done ) { if ( NewStatusFromAnySlave() ) { commands = DetermineMostEfficientCommand() for cmd in commands SendCommandToSlaves( cmd ) } } }

* See SC 09 paper for details

slide-22
SLIDE 22

Master-slave in action

T0 T0 T1 T1 T2 T3 T4 Iteration Action T0 reads B0, T3 reads B1 1 T1 passes points to T0, T4 passes points to T3, T2 reads B0 0: Read 0: Read 1: Pass 1: Pass 1: Read

  • When to pass and when to read?
  • How to coordinate communication?

Status? Efficiently?

slide-23
SLIDE 23

Algorithm Test Cases

  • Core collapse supernova simulation
  • Magnetic confinement fusion simulation
  • Hydraulic flow simulation
slide-24
SLIDE 24

Workload distribution in parallelize-over-data

Starvation

slide-25
SLIDE 25

Workload distribution in parallelize-over- particles

Too much I/O

slide-26
SLIDE 26

Workload distribution in master-slave algorithm

Just right

slide-27
SLIDE 27

Particles Data Hybrid

Workload distribution in supernova simulation

Parallelization by:

Colored by processor doing integration

slide-28
SLIDE 28

Astrophysics Test Case:

Total time to compute 20,000 Streamlines

Seconds Seconds Number of procs Number of procs

Uniform Seeding Non-uniform Seeding Data Part- icles Hybrid

slide-29
SLIDE 29

Astrophysics Test Case:

Number of blocks loaded

Blocks loaded Blocks loaded Number of procs Number of procs

Data Part- icles Hybrid Uniform Seeding Non-uniform Seeding

slide-30
SLIDE 30

Summary: Master-Slave Algorithm

 First ever attempt at a hybrid parallelization

algorithm for particle advection

 Algorithm adapts during runtime to avoid pitfalls of

parallelize-over-data and parallelize-over- particles.

 Nice property for production visualization tools.

 Implemented inside VisIt visualization and analysis

package.

slide-31
SLIDE 31

Outline

 Efficient advection of particles  A general system for particle-advection based

analysis

slide-32
SLIDE 32

Goal

 Efficient code for a variety of particle advection

based techniques

 Cognizant of use cases with >>10K particles.

 Need handling of every particle, every evaluation to

be efficient.

 Want to support diverse flow techniques: flexibility/

extensibility is key.

 Fit within data flow network design (i.e. a filter)

slide-33
SLIDE 33

Motivating examples of system

 FTLE  Stream surfaces  Streamline  Dynamical Systems (e.g. Poincaré Maps)  Statistics based analysis  + more

slide-34
SLIDE 34

Design

 PICS filter: parallel integral curve system  Execution:

 Instantiate particles at seed locations  Step particles to form integral curves

 Analysis performed at each step  Termination criteria evaluated for each step

 When all integral curves have completed, create final

  • utput
slide-35
SLIDE 35

Design

 Five major types of extensibility:

 How to parallelize?  How do you evaluate velocity field?  How do you advect particles?  Initial particle locations?  How do you analyze the particle paths?

slide-36
SLIDE 36

Inheritance hierarchy

avtPICSFilter Streamline Filter Your derived type

  • f PICS filter

avtIntegralCurve avtStreamlineIC Your derived type

  • f integral curve

 We disliked the “matching inheritance” scheme, but

this achieved all of our design goals cleanly.

slide-37
SLIDE 37

#1: How to parallelize?

avtICAlgorithm avtParDomIC- Algorithm (parallel over data) avtSerialIC- Algorithm (parallel over seeds) avtMasterSlave- ICAlgorithm

slide-38
SLIDE 38

#2: Evaluating velocity field

avtIVPField avtIVPVTKField avtIVPVTK- TimeVarying- Field avtIVPM3DC1 Field avtIVP- <YOUR>Higher Order-Field IVP = initial value problem

slide-39
SLIDE 39

#3: How do you advect particles?

avtIVPSolver avtIVPDopri5 avtIVPEuler avtIVPLeapfrog avtIVP- M3DC1Integrator IVP = initial value problem avtIVPAdams- Bashforth

slide-40
SLIDE 40

#4: Initial particle locations

 avtPICSFilter::GetInitialLocations() = 0;

slide-41
SLIDE 41

#5: How do you analyze particle path?

 avtIntegralCurve::AnalyzeStep() = 0;  All AnalyzeStep will evaluate termination criteria  avtPICSFilter::CreateIntegralCurveOutput(

std::vector<avtIntegralCurve*> &) = 0;

 Examples:  Streamline: store location and scalars for current step in data

members

 Poincare: store location for current step in data members  FTLE: only store location of final step, no-op for preceding steps  NOTE: these derived types create very different types of

  • utputs.
slide-42
SLIDE 42

Putting it all together

PICS Filter avtICAlgorithm avtIVPSolver avtIVPField Vector< avtIntegral-Curve> Integral curves sent to other processors with some derived types of avtICAlgorithm. ::CreateInitialLocations() = 0; ::AnalyzeStep() = 0;

slide-43
SLIDE 43

VisIt is an open source, richly featured,

turn-key application for large data.

 Used by:

 Visualization experts  Simulation code developers  Simulation code consumers

 Popular

 R&D 100 award in 2005  Used on many of the Top500  >>>100K downloads 217 pin reactor cooling simulation Run on ¼ of Argonne BG/P Image credit: Paul Fischer, ANL 1 billion grid points / time slice

slide-44
SLIDE 44

Final thoughts…

 Summary:

 Particle advection is important for understanding flow

and efficiently parallelizing this computation is difficult.

 We have developed a freely available system for

doing this analysis for large data.

 Documentation:

 (PICS) http://www.visitusers.org/index.php?title=Pics_dev  (VisIt) http://www.llnl.gov/visit

 Future work:

 UI extensions, including Python  Additional analysis techniques (FTLE & more)

slide-45
SLIDE 45

Acknowledgements

 Funding: This work was supported by the Director, Office of

Science, Office and Advanced Scientific Computing Research,

  • f the U.S. Department of Energy under Contract No. DE-

AC02-05CH11231 through the Scientific Discovery through Advanced Computing (SciDAC) program's Visualization and Analytics Center for Enabling Technologies (VACET).

 Program Manager: Lucy Nowell  Master-Slave Algorithm: Dave Pugmire (ORNL), Hank Childs

(LBNL/UCD), Christoph Garth (Kaiserslautern), Sean Ahern (ORNL), and Gunther Weber (LBNL)

 PICS framework: Hank Childs (LBNL/UCD), Dave Pugmire

(ORNL), Christoph Garth (Kaiserslautern), David Camp (LBNL/ UCD), Allen Sanderson (Univ of Utah)

slide-46
SLIDE 46

A Framework for Particle Advection for Very Large Data

Hank Childs, LBNL/UCDavis David Pugmire, ORNL Christoph Garth, Kaiserslautern David Camp, LBNL/UCDavis Sean Ahern, ORNL Gunther Weber, LBNL Allen Sanderson, Univ. of Utah