1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning - PowerPoint PPT Presentation

Outline Tutorial: Partitioning, Load Balancing Part 1: and the Zoltan Toolkit • Partitioning and load balancing – “Owner computes” approach • Static vs. dynamic partitioning • Models and algorithms – Geometric (RCB, SFC) Erik Boman and Karen Devine – Graph & hypergraph Discrete Algorithms and Math Dept. Part 2: Sandia National Laboratories, NM • Zoltan – Capabilities CSCAPES Institute – How to get it, configure, build – How to use Zoltan with your application SciDAC Tutorial, MIT, June 2007 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy ’ s National Nuclear Security Administration under contract DE-AC04-94AL85000. Slide 3 Slide 4 Parallel Computing in CS&E Parallel Computing Approaches • Parallel Computing Challenge • We focus on distributed memory systems. – Scientific simulations critical to modern science. – Two common approaches: • Models grow in size, higher fidelity/resolution. • Master–slave • Simulations must be done on parallel computers. – A “master” processor is a global synchronization – Clusters with 64-256 nodes are widely available. point, hands out work to the slaves. – High-performance computers have 100,000+ • Data decomposition + “Owner computes”: processors. – The data is distributed among the processors. • How can we use such machines efficiently? – The owner performs all computation on its data. – Data distribution defines work assignment. – Data dependencies among data items owned by different processors incur communication. 1 1

Slide 6 Partitioning and Load Balancing Partitioning Goals • Assignment of application data to processors for parallel • Minimize total execution time by… computation. – Minimizing processor idle time. • Applied to grid points, elements, matrix rows, particles, …. • Load balance data and work. – Keeping inter-processor communication low. • Reduce total volume, max volume. • Reduce number of messages. Partition of an unstructured finite element mesh for three processors Slide 7 Slide 8 “Simple” Example (1) “Simple” Example (2) • Finite difference method. • Finite difference method. – Assign equal numbers of grid points to processors. – Assign equal numbers of grid points to processors. – Keep amount of data communicated small. – Keep amount of data communicated small. 3 3 3 3 3 3 3 2 2 2 2 2 2 3 1 1 1 1 2 2 2 0 0 1 1 1 1 1 Max Data Comm: 14 Total Volume: 42 0 0 0 0 0 0 0 7x5 grid Max Nbor Proc: 2 First 35/4 points to processor 0; 5-point stencil Max Imbalance: 3% next 35/4 points to processor 1; etc. 4 processors 2 2

Slide 10 “Simple” Example (3) “Simple” Example (4) • Finite difference method. • Finite difference method. – Assign equal numbers of grid points to processors. – Assign equal numbers of grid points to processors. – Keep amount of data communicated small. – Keep amount of data communicated small. 1 1 1 1 2 2 2 0 0 1 1 2 2 3 1 1 1 1 2 2 2 0 0 1 1 2 2 3 0 0 0 0 3 3 3 0 0 1 1 2 2 3 0 0 0 0 3 3 3 0 0 1 1 2 2 3 Max Data Comm: 7 Max Data Comm: 10 Total Volume: 26 Total Volume: 30 0 0 0 0 3 3 3 0 0 1 1 2 2 3 Max Nbor Proc: 2 Max Nbor Proc: 2 Two-dimensional Max Imbalance: 37% One-dimensional striped partition Max Imbalance: 14% structured grid partition Slide 11 Slide 12 Static Partitioning Dynamic Applications • Characteristics: Initialize Partition Distribute Compute Output Application Data Data Solutions & End – Work per processor is unpredictable or changes during a computation; and/or – Locality of objects changes during computations. – Dynamic redistribution of work is needed during • Static partitioning in an application: computation. – Data partition is computed. – Data are distributed according to partition map. • Example: – Application computes. adaptive mesh • Ideal partition: refinement (AMR) – Processor idle time is minimized. methods – Inter-processor communication costs are kept low. 3 3

Dynamic Repartitioning Static vs. Dynamic: Slide 13 Slide 14 (a.k.a. Dynamic Load Balancing) Usage and Implementation Compute • Static: • Dynamic: Initialize Partition Redistribute Output Solutions Application Data Data & End – Must run side-by-side – Pre-processor to & Adapt with application. application. – Must be implemented in – Can be implemented parallel. serially. • Dynamic repartitioning (load balancing) in an application: – Must be fast, scalable. – May be slow, expensive. – Library application – Data partition is computed. – File-based interface interface required. – Data are distributed according to partition map. acceptable. – Should be easy to use. – Application computes and, perhaps, adapts. – No consideration of – Incremental algorithms – Process repeats until the application is done. existing decomposition preferred. required. • Small changes in input result small changes in • Ideal partition: partitions. – Processor idle time is minimized. • Explicit or implicit – Inter-processor communication costs are kept low. incrementality acceptable. – Cost to redistribute data is also kept low. Recursive Coordinate Slide 15 Slide 16 Two Types of Models/Algorithms Geometric Bisection (RCB) • Developed by Berger & Bokhari (1987) for AMR. • Geometric – Independently discovered by others. – Computations are tied to a geometric domain. 1st cut • Idea: 3rd – Coordinates for data items are available. – Divide work into two equal parts – Geometric locality is loosely correlated to data using a cutting plane orthogonal to a coordinate axis. 3rd dependencies. – Recursively cut the • Combinatorial (topological) resulting subdomains. – No geometry . 2nd – Connectivity among data items is known. • Represent as graph or hypergraph. 2nd 3rd 3rd 4 4

RCB Advantages Slide 17 Slide 18 RCB Repartitioning and Disadvantages • Advantages: • Implicitly incremental. – Conceptually simple; fast and inexpensive. • Small changes in data results in small movement of – Regular subdomains. cuts. • Can be used for structured or unstructured applications. • All processors can inexpensively know entire decomposition. – Effective when connectivity info is not available. • Disadvantages: – No explicit control of communication costs. – Can generate disconnected subdomains. – Mediocre partition quality. – Geometric coordinates needed. Slide 19 Slide 20 Applications of RCB Variations on RCB : RIB • Recursive Inertial Bisection – Simon, Taylor, et al., 1991 – Cutting planes orthogonal to principle axes of geometry. – Not incremental. Particle Simulations Adaptive Mesh Refinement 1.6 ms 3.2 ms Crash Simulations and Contact Detection Parallel Volume Rendering 5 5

Space-Filling Curve Slide 21 Slide 22 Partitioning (SFC) SFC Algorithm • Developed by Peano, 1890. • Run space-filling curve through domain. • Space-Filling Curve: • Order objects according to position on curve. – Mapping between R 3 to R 1 that completely fills a domain. • Perform 1-D partition of curve. – Applied recursively to obtain desired granularity. • Used for partitioning by … 14 14 14 – Warren and Salmon, 1993, gravitational simulations. 12 12 12 13 13 13 – Pilkington and Baden, 1994, smoothed particle 15 15 15 hydrodynamics. 9 9 9 8 16 8 16 16 8 – Patra and Oden, 1995, adaptive mesh refinement. 11 11 11 10 10 10 5 5 5 6 6 6 17 17 17 7 7 7 4 4 4 20 20 20 18 18 18 1 1 1 2 2 2 3 19 3 19 3 19 SFC Advantages Slide 23 Slide 24 SFC Repartitioning and Disadvantages • Advantages: • Implicitly incremental. – Simple, fast, inexpensive. – Maintains geometric locality of objects in • Small changes in data results in small processors. movement of cuts in linear ordering. – Linear ordering of objects may improve cache performance. • Disadvantages: – No explicit control of communication costs. – Can generate disconnected subdomains. – Often lower quality partitions than RCB. – Geometric coordinates needed. 6 6

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning - PowerPoint PPT Presentation

Slide 2 Outline Tutorial: Partitioning, Load Balancing Part 1: and the Zoltan Toolkit Partitioning and load balancing Owner computes approach Static vs. dynamic partitioning Models and algorithms Geometric (RCB, SFC)

Population pharm acokinetics Population pharm acokinetics and optim al design of paediatric and

Fra superdatamaskiner til grafikkprosessorer og Brdtekst maskinlring Prof. Anne C. Elster

On bipartite Q -polynomial distance-regular graphs with c 2 2 Stefko Miklavi c, Safet

FOCUS ON DELI FOCUS ON DELIVERY VERY Merck KGaA, Darm stadt, Germ any Q3 2 0 1 7 results

Extended Context Patterns A Visual Language for Context-Aware Applications Andrei

New hardness results for graph and hypergraph colorings Joshua Brakensiek , Venkatesan Guruswami

Unsupervised Concept-to-text Generation with Hypergraphs Ioannis Konstas, Mirella Lapata

The (Non)Utility of Semantics for Coreference Resolution (CORBON Remix) Michael Strube

Approximating max-min linear programs with local algorithms Patrik Floren, Marja Hassinen,

] Virtualization For Your SAP Environment Timothy Yates Matt Lestock [ ERI C SYNSTELI EN ASUG

Hierarchical Orchestration Joao F. Santos, Luiz A. DaSilva of End-to-End Networks WInnComm

mixed criticality systems (Experience from MultiPARTES, DREAMS and PROXIMA FP7) Dr. Jon Perez

Interfacing Processors and Peripherals I/O I/O Design affected by many factors

On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory

South Estero Commercial Center DCI2019-E004 Proposed Request and Project Location Amendment

Untethering the Rocket-Chip Producing a stand-alone lowRISC SoC Wei Song 07/10/2015 1

Wilde Lake Middle School Replacement School Construction Documents Report Scott Washington

Firehose: a Unified Message Bus for Infra Services Jeremy Stanley Matthew Treinish

Q3FY19 PERFORMANCE HIGHLIGHTS | FEBRUARY 2019 Disclaimer The information contained in this

The IoT Inc Business The IoT Inc Business Meetup Meetup Silicon Silicon Valley Valley Op

Critical Infrastructure Software Security: A Maritime Shipping Study Case Barton P. Miller

Functions & Modules (Optional) Readings Reading for Next Week Chapter 3 in the text

Instructional and Information Literacy Technology Board Presentation by: Eric Ferguson:

Albert contains thousands of high-quality questions for students at various academic levels , all

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning - PowerPoint PPT Presentation

Slide 2 Outline Tutorial: Partitioning, Load Balancing Part 1: and the Zoltan Toolkit Partitioning and load balancing Owner computes approach Static vs. dynamic partitioning Models and algorithms Geometric (RCB, SFC)

Population pharm acokinetics Population pharm acokinetics and optim al design of paediatric and

Fra superdatamaskiner til grafikkprosessorer og Brdtekst maskinlring Prof. Anne C. Elster

On bipartite Q -polynomial distance-regular graphs with c 2 2 Stefko Miklavi c, Safet

FOCUS ON DELI FOCUS ON DELIVERY VERY Merck KGaA, Darm stadt, Germ any Q3 2 0 1 7 results

Extended Context Patterns A Visual Language for Context-Aware Applications Andrei

New hardness results for graph and hypergraph colorings Joshua Brakensiek , Venkatesan Guruswami

Unsupervised Concept-to-text Generation with Hypergraphs Ioannis Konstas, Mirella Lapata

The (Non)Utility of Semantics for Coreference Resolution (CORBON Remix) Michael Strube

Approximating max-min linear programs with local algorithms Patrik Floren, Marja Hassinen,

] Virtualization For Your SAP Environment Timothy Yates Matt Lestock [ ERI C SYNSTELI EN ASUG

Hierarchical Orchestration Joao F. Santos, Luiz A. DaSilva of End-to-End Networks WInnComm

mixed criticality systems (Experience from MultiPARTES, DREAMS and PROXIMA FP7) Dr. Jon Perez

Interfacing Processors and Peripherals I/O I/O Design affected by many factors

On the Applicability of PEBS based Online Memory Access Tracking for Heterogeneous Memory

South Estero Commercial Center DCI2019-E004 Proposed Request and Project Location Amendment

Untethering the Rocket-Chip Producing a stand-alone lowRISC SoC Wei Song 07/10/2015 1

Wilde Lake Middle School Replacement School Construction Documents Report Scott Washington

Firehose: a Unified Message Bus for Infra Services Jeremy Stanley Matthew Treinish

Q3FY19 PERFORMANCE HIGHLIGHTS | FEBRUARY 2019 Disclaimer The information contained in this

The IoT Inc Business The IoT Inc Business Meetup Meetup Silicon Silicon Valley Valley Op

Critical Infrastructure Software Security: A Maritime Shipping Study Case Barton P. Miller

Functions &amp; Modules (Optional) Readings Reading for Next Week Chapter 3 in the text

Instructional and Information Literacy Technology Board Presentation by: Eric Ferguson:

Albert contains thousands of high-quality questions for students at various academic levels , all

Functions & Modules (Optional) Readings Reading for Next Week Chapter 3 in the text