Managing Complexity in the Parallel Sparse Grid Combination - - PowerPoint PPT Presentation

managing complexity in the parallel sparse grid
SMART_READER_LITE
LIVE PREVIEW

Managing Complexity in the Parallel Sparse Grid Combination - - PowerPoint PPT Presentation

Managing Complexity in the Parallel Sparse Grid Combination Technique J. W. Larson 1 P. E. Strazdins 2 M. Hegland 1 B. Harding 1 S. Roberts 1 L. Stals 1 A. P. Rendell 2 M. Ali 2 J. Southern 3 1 Mathematical Sciences Institute, The Australian


slide-1
SLIDE 1

Managing Complexity in the Parallel Sparse Grid Combination Technique

  • J. W. Larson1
  • P. E. Strazdins 2
  • M. Hegland 1
  • B. Harding 1
  • S. Roberts 1
  • L. Stals 1
  • A. P. Rendell 2
  • M. Ali 2
  • J. Southern 3

1Mathematical Sciences Institute, The Australian National University 2Research School of Computer Science, The Australian National University 3Fujitsu Laboratories, Europe

July 19, 2016

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-2
SLIDE 2
  • utline

the emerging hpc landscape Ultraproblems at Ultrascale Faults and Fault-Tolerant Techniques (FT) understanding complexity so we can manage it Sparse Grids The Sparse Grid Combination Technique Complexity Metrics Implications for the Parallel SGCT managing complexity Numerical MapReduce Framework (NuMRF) Parallel SGCT Implementation

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-3
SLIDE 3

part i: the emerging hpc landscape

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-4
SLIDE 4

the road from petascale to exascale

Ultra-parallelism

> O(106) cores High scaling efficiency required Large number of hardware components, each with a finite probability of failure Hardware faults such as node failures will become routine for large-scale applications running

  • n these platforms

This means... Future ultrascale applications must embody FT

Run successfully through node failures Recover from faults or at least checkpoint/exit gracefully

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-5
SLIDE 5

fault recovery and fault-tolerance

Technological approaches: Replication/redundancy Runtime checkpointing with recovery through restart/task reassignment Runtime recreation of lost data using neighboring data “...computational techniques for one mill...BILLION processing elements!” Algorithm-based FT (ABFT): Huang and Abraham (1984): row/column checksums to correct for computational errors Du et al. (2012): checksum-based fail/stop in to LU & QR decompositions Liu (2002); Geist and Engleman (2007): chaotic relaxation Dean and Ghemawat (2004): MapReduce Our group: sparse grid combination method with built-in runtime fault-tolerance

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-6
SLIDE 6

part ii: understanding complexity so we can manage it

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-7
SLIDE 7

what is a sparse grid?

A solution to a complexity problem: The number of gridpoints on a d-dimensional isotropic grid grows exponentially w.r.t. d This is the curse of dimensionality A sparse grid provides fine-scale resolution in each dimension, but not combined fine scales from all multidimensional subspaces Constructed from a number of coarser component grids that are fine-scale in some dimensions but coarse in others Developed to solve problems in high dimensions

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-8
SLIDE 8

sparse grids reduce problem size dramatically

|F| ∝ 2Ld |S | ∝ 2LLd−1 RC = |F| |S | ∝ 2L L d−1

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-9
SLIDE 9

geometric definition of sparse grid

a simple sparse grid ∪ = sparse grid in frequency / scale space ∪ = captures fine scales in both dimensions but not joint fine scales

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-10
SLIDE 10

constructing sparse grids from the scale lattice-part i

Consider a one-dimensional level L ≥ 0 equipartition of a closed interval into 2L segments including boundaries, this partition results in 2L + 1 grid points Generalize to a closed box domain of dimensionality d with a d-dimensional tensor product grid of the dimensions’ grids. result is an isotropic full-grid F having |F| = (2L + 1)d grid points Suppose instead, we choose for each dimension 1 ≤ j ≤ d a partition of level 0 ≤ lj ≤ L result is a component grid G

  • l of level

l having |G

  • l| = d

i=1(2li + 1) grid points

the index vector l defines a point on the scale lattice L of level L, equivalent to a unique gridded partition of the closed d-dimensional box domain

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-11
SLIDE 11

constructing sparse grids from the scale lattice-part ii

Each point l ∈ L defines a d-dimensional grid G

  • l ∈ L, with L the

set of all grids generated by the scale lattice. Grids with l1 = l2 = · · · = ld are called isotropic The full grid F is isotropic with l1 = l2 = · · · = ld = L Level L Sparse grid S is the union of a set G ∈ L of component grids The definition of G is defined by constraints C(

  • l) on

l S =

  • C(
  • l)

G

  • l.

for example is the classic combination’s constraint on the sum

  • f the level indices |
  • l|1

|

  • l|1 ∈ {L, L − 1, . . . , L − d + 1}, L ≥ d − 1.
  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-12
SLIDE 12

general combination formulae

The classic combination solution f C

L (

x) for level L in d dimensions is, in terms of the component grid solutions f

  • l(

x) f C

L (

x) =

d−1

  • q=0

(−1)q d − 1 q

  • |
  • l|1=L−q

f

  • l(

x) Possible to include m ≤ L − 1 hyperplanes’ worth of “spare” component grids for FT. These spare grids are used only in scenarios of loss of one ore classic combination component grids due to fault(s)

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-13
SLIDE 13

classic combination and example ft scenarios

classic combination loss of (3, 4) loss of (2, 5)

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-14
SLIDE 14

building solvers on sparse grids

algorithm

1 Pick a set G of multidimensional, coarser component grids 2 Solve on each component grid G

  • l (interpolate to S )

3 (Linear) Combination of component grids’ solutions for

solution on S

4 Optional: interpolate solution from S to F 5 Time Evolution/Iteration: propagate solution on S back to

each G

  • l ∈ G

Error bounds for solutions on the sparse grid can be computed based on the scheme used on the component grids and the combination method

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-15
SLIDE 15

sgct complexity—number of component grids

|G| =

d−1

  • k=0

L − k + d − 1 d − 1

  • |GFT| =

L − 1 d − 1

  • +

L − 2 d − 1

  • This is the number of M × N parallel data transfers in the

SGCT

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-16
SLIDE 16

what is an M × N transfer?

Data connections for the 2D level 5 SGCT

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-17
SLIDE 17

sgct complexity—total number of component gridpoints

Aggregate memory usage; (very!) crude measure of cost Aggregate data traffic between the components’ solvers and the solver for S

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-18
SLIDE 18

implications for a parallel sgct

complexity analysis tells us... Lossy ABFT overhead is low compared to replication High values of (L, d) will engender

numerous component grid tasks high grid data volumes many (parallel) data connections routing data to/from the sparse grid

Further modeling required using application- and platform-specific information

application performance data hardware characteristics: processor speed, switch latency/bandwidth

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-19
SLIDE 19

implications for a parallel sgct, cont’d...

requirements for a parallel sgct system Low-level automation:

Distributed grid/field data description Parallel M × N transfer G

  • l ↔ S

Data transformation (specifically, interpolation) Performance measurement/timing Fault detection/reporting

High-level automation:

Scheduling of iterative execution of large numbers of tasks

Load balance based on task cost model (TCM) Probabilistic Fault Detection (PFD) through predicted/elapsed runtime comparison

Automatic coordination of large numbers of M × N transfers Monitoring/ explicit fault detection Self-steering using an error quality of service (QoS) model to compute alternative solutions in the event of faults

Compatibility with legacy science/engineering codes

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-20
SLIDE 20

part iii: managing complexity

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-21
SLIDE 21

numrf

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-22
SLIDE 22

python grids and fields toolkit (PyGrAFT)

PyGrAFT is the data language for NuMRF. It is a system for Representing logically Cartesian grids CartGrid class)

Arbitrary dimensionality supported

Field data residing on these grids (GriddedScalarField)

Implemented using NumPy ndarray

Arbitrary dimensionality supported Any NumPy base type supported

Any number of fields may be associated with a CartGrid Complete flexibility regarding storage order

Expressing multi-resolution relationships (FullGrid and ComponentGrid subclasses) Performing combinations involving component grids. Parallelization currently underway At present, there are numerous test examples. Including generation

  • f most of the sparse grid pictures in this talk.
  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-23
SLIDE 23

c++ parallel sgct implementation

Implemented in three C++ classes:

GridCombine2D: Overall combination method Vec2D: Field storage ProcGrid2D: Domain decomposition for each grid

Level of abstraction reduced code complexity dramatically! Assumptions:

Each component grid G

  • l is distributed over a 2D grid of MPI

PE’s P

  • l

Algorithm uses gather-scatter within each grid’s pool Load balance computed with an awareness of computational cost; based on component grids’ respective (fixed) ∆t

Implemented using aggressive defensive programming techniques (cross-checking 2D vector calcuations, etc)

Robustness (simplest L = 4 case requires 32 processes!) Rapid development

Source only about 1000 lines of code Interoperable with NuMRF via CTypes Performance studies just commencing

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-24
SLIDE 24

summary

conclusions A complexity analysis of the parallel SGCT has been presented Preliminary results have driven a set of requirements for a parallel SGCT system NuMRF, a MapReduce variant, numerical-analysis-friendly, error/fault aware calling framework has been presented Implemtation of NuMRF’s data model (PyGrAFT) well underway, prototyping of framework core in progress A robust parallel SGCT has been built and is entering performance analysis and tuning future work Completion of PyGrAFT: Parallelization, M × N services, interpolation services, and sparse grid representation Ongoing development of NuMRF’s elements.

  • J. Larson et al.

Managing Complexity in the Parallel CT

slide-25
SLIDE 25

ende

DANKE! FRAGEN?

  • J. Larson et al.

Managing Complexity in the Parallel CT