1
play

1 1 Slide 5 Slide 6 Partitioning and Load Balancing Partitioning - PowerPoint PPT Presentation

Slide 2 Outline Tutorial: Partitioning, Load Balancing Part 1: and the Zoltan Toolkit Partitioning and load balancing Owner computes approach Static vs. dynamic partitioning Models and algorithms Geometric (RCB, SFC)


  1. Slide 2 Outline Tutorial: Partitioning, Load Balancing Part 1: and the Zoltan Toolkit • Partitioning and load balancing – “Owner computes” approach • Static vs. dynamic partitioning • Models and algorithms – Geometric (RCB, SFC) Erik Boman and Karen Devine – Graph & hypergraph Discrete Algorithms and Math Dept. Part 2: Sandia National Laboratories, NM • Zoltan – Capabilities CSCAPES Institute – How to get it, configure, build – How to use Zoltan with your application SciDAC Tutorial, MIT, June 2007 Sandia is a multiprogram laboratory operated by Sandia Corporation, a Lockheed Martin Company, for the United States Department of Energy ’ s National Nuclear Security Administration under contract DE-AC04-94AL85000. Slide 3 Slide 4 Parallel Computing in CS&E Parallel Computing Approaches • Parallel Computing Challenge • We focus on distributed memory systems. – Scientific simulations critical to modern science. – Two common approaches: • Models grow in size, higher fidelity/resolution. • Master–slave • Simulations must be done on parallel computers. – A “master” processor is a global synchronization – Clusters with 64-256 nodes are widely available. point, hands out work to the slaves. – High-performance computers have 100,000+ • Data decomposition + “Owner computes”: processors. – The data is distributed among the processors. • How can we use such machines efficiently? – The owner performs all computation on its data. – Data distribution defines work assignment. – Data dependencies among data items owned by different processors incur communication. 1 1

  2. Slide 5 Slide 6 Partitioning and Load Balancing Partitioning Goals • Assignment of application data to processors for parallel • Minimize total execution time by… computation. – Minimizing processor idle time. • Applied to grid points, elements, matrix rows, particles, …. • Load balance data and work. – Keeping inter-processor communication low. • Reduce total volume, max volume. • Reduce number of messages. Partition of an unstructured finite element mesh for three processors Slide 7 Slide 8 “Simple” Example (1) “Simple” Example (2) • Finite difference method. • Finite difference method. – Assign equal numbers of grid points to processors. – Assign equal numbers of grid points to processors. – Keep amount of data communicated small. – Keep amount of data communicated small. 3 3 3 3 3 3 3 2 2 2 2 2 2 3 1 1 1 1 2 2 2 0 0 1 1 1 1 1 Max Data Comm: 14 Total Volume: 42 0 0 0 0 0 0 0 7x5 grid Max Nbor Proc: 2 First 35/4 points to processor 0; 5-point stencil Max Imbalance: 3% next 35/4 points to processor 1; etc. 4 processors 2 2

  3. Slide 9 Slide 10 “Simple” Example (3) “Simple” Example (4) • Finite difference method. • Finite difference method. – Assign equal numbers of grid points to processors. – Assign equal numbers of grid points to processors. – Keep amount of data communicated small. – Keep amount of data communicated small. 1 1 1 1 2 2 2 0 0 1 1 2 2 3 1 1 1 1 2 2 2 0 0 1 1 2 2 3 0 0 0 0 3 3 3 0 0 1 1 2 2 3 0 0 0 0 3 3 3 0 0 1 1 2 2 3 Max Data Comm: 7 Max Data Comm: 10 Total Volume: 26 Total Volume: 30 0 0 0 0 3 3 3 0 0 1 1 2 2 3 Max Nbor Proc: 2 Max Nbor Proc: 2 Two-dimensional Max Imbalance: 37% One-dimensional striped partition Max Imbalance: 14% structured grid partition Slide 11 Slide 12 Static Partitioning Dynamic Applications • Characteristics: Initialize Partition Distribute Compute Output Application Data Data Solutions & End – Work per processor is unpredictable or changes during a computation; and/or – Locality of objects changes during computations. – Dynamic redistribution of work is needed during • Static partitioning in an application: computation. – Data partition is computed. – Data are distributed according to partition map. • Example: – Application computes. adaptive mesh • Ideal partition: refinement (AMR) – Processor idle time is minimized. methods – Inter-processor communication costs are kept low. 3 3

  4. Dynamic Repartitioning Static vs. Dynamic: Slide 13 Slide 14 (a.k.a. Dynamic Load Balancing) Usage and Implementation Compute • Static: • Dynamic: Initialize Partition Redistribute Output Solutions Application Data Data & End – Must run side-by-side – Pre-processor to & Adapt with application. application. – Must be implemented in – Can be implemented parallel. serially. • Dynamic repartitioning (load balancing) in an application: – Must be fast, scalable. – May be slow, expensive. – Library application – Data partition is computed. – File-based interface interface required. – Data are distributed according to partition map. acceptable. – Should be easy to use. – Application computes and, perhaps, adapts. – No consideration of – Incremental algorithms – Process repeats until the application is done. existing decomposition preferred. required. • Small changes in input result small changes in • Ideal partition: partitions. – Processor idle time is minimized. • Explicit or implicit – Inter-processor communication costs are kept low. incrementality acceptable. – Cost to redistribute data is also kept low. Recursive Coordinate Slide 15 Slide 16 Two Types of Models/Algorithms Geometric Bisection (RCB) • Developed by Berger & Bokhari (1987) for AMR. • Geometric – Independently discovered by others. – Computations are tied to a geometric domain. 1st cut • Idea: 3rd – Coordinates for data items are available. – Divide work into two equal parts – Geometric locality is loosely correlated to data using a cutting plane orthogonal to a coordinate axis. 3rd dependencies. – Recursively cut the • Combinatorial (topological) resulting subdomains. – No geometry . 2nd – Connectivity among data items is known. • Represent as graph or hypergraph. 2nd 3rd 3rd 4 4

  5. RCB Advantages Slide 17 Slide 18 RCB Repartitioning and Disadvantages • Advantages: • Implicitly incremental. – Conceptually simple; fast and inexpensive. • Small changes in data results in small movement of – Regular subdomains. cuts. • Can be used for structured or unstructured applications. • All processors can inexpensively know entire decomposition. – Effective when connectivity info is not available. • Disadvantages: – No explicit control of communication costs. – Can generate disconnected subdomains. – Mediocre partition quality. – Geometric coordinates needed. Slide 19 Slide 20 Applications of RCB Variations on RCB : RIB • Recursive Inertial Bisection – Simon, Taylor, et al., 1991 – Cutting planes orthogonal to principle axes of geometry. – Not incremental. Particle Simulations Adaptive Mesh Refinement 1.6 ms 3.2 ms Crash Simulations and Contact Detection Parallel Volume Rendering 5 5

  6. Space-Filling Curve Slide 21 Slide 22 Partitioning (SFC) SFC Algorithm • Developed by Peano, 1890. • Run space-filling curve through domain. • Space-Filling Curve: • Order objects according to position on curve. – Mapping between R 3 to R 1 that completely fills a domain. • Perform 1-D partition of curve. – Applied recursively to obtain desired granularity. • Used for partitioning by … 14 14 14 – Warren and Salmon, 1993, gravitational simulations. 12 12 12 13 13 13 – Pilkington and Baden, 1994, smoothed particle 15 15 15 hydrodynamics. 9 9 9 8 16 8 16 16 8 – Patra and Oden, 1995, adaptive mesh refinement. 11 11 11 10 10 10 5 5 5 6 6 6 17 17 17 7 7 7 4 4 4 20 20 20 18 18 18 1 1 1 2 2 2 3 19 3 19 3 19 SFC Advantages Slide 23 Slide 24 SFC Repartitioning and Disadvantages • Advantages: • Implicitly incremental. – Simple, fast, inexpensive. – Maintains geometric locality of objects in • Small changes in data results in small processors. movement of cuts in linear ordering. – Linear ordering of objects may improve cache performance. • Disadvantages: – No explicit control of communication costs. – Can generate disconnected subdomains. – Often lower quality partitions than RCB. – Geometric coordinates needed. 6 6

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend