Partitioning Spatially Located Load with Rectangles: Algorithms and - - PowerPoint PPT Presentation

partitioning spatially located load with rectangles
SMART_READER_LITE
LIVE PREVIEW

Partitioning Spatially Located Load with Rectangles: Algorithms and - - PowerPoint PPT Presentation

Partitioning Spatially Located Load with Rectangles: Algorithms and Simulations Erik Saule , Erdeniz Ozgun Bas, Umit V. Catalyurek Department of Biomedical Informatics, The Ohio State University { esaule,erdeniz,umit } @bmi.osu.edu Frejus 2010


slide-1
SLIDE 1

Partitioning Spatially Located Load with Rectangles: Algorithms and Simulations

Erik Saule, Erdeniz Ozgun Bas, Umit V. Catalyurek

Department of Biomedical Informatics, The Ohio State University {esaule,erdeniz,umit}@bmi.osu.edu

Frejus 2010

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning :: 1 / 32

slide-2
SLIDE 2

A load distribution problem

Load matrix

In parallel computing, the load can be spatially located. The computation should be distributed accordingly.

Applications

Particles in Cell (stencil). Sparse Matrices. Direct Volume Rendering.

Metrics

Load balance. Communication. Stability.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Introduction:: 2 / 32

slide-3
SLIDE 3

Outline

1

Introduction

2

Preliminaries Notation In One Dimension Simulation Setting

3

Rectilinear Partitioning Nicol’s Algorithm

4

Jagged Partitioning PxQ jagged partitioning m-way Jagged Partitioning

5

Hierarchical Bisection Recursive Bisection Dynamic Programming

6

Final thoughts Summing up Conclusion and Perspective

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Introduction:: 3 / 32

slide-4
SLIDE 4

The Rectangular Partitioning Problem

Definition

Let A be a n1 × n2 matrix of non-negative values. The problem is to partition the [1, 1] × [n1, n2] rectangle into a set S of m rectangles. The load of rectangle r = [x, y] × [x′, y′] is L(r) =

x≤i≤x′,y≤j≤y′ A[i][j]. The

problem is to minimize Lmax = maxr∈S L(r).

Prefix Sum

Algorithms are rarely interested in the value of a particular element but rather interested in the load of a rectangle. The matrix is given as a 2D prefix sum array Pr such as Pr[i][j] =

i′≤i,j′≤j A[i′][j′]. By convention

Pr[0][j] = Pr[i][0] = 0. We can now compute the load of rectangle r = [x, y] × [x′, y′] as L(r) = Pr[x′][y′] + Pr[x − 1][y − 1] − Pr[x′][y − 1] − Pr[x − 1][y′].

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Preliminaries::Notation 4 / 32

slide-5
SLIDE 5

In One Dimension

Heuristic : Direct Cut [MP97]

Greedily set the first interval at the first i such as

i′≤i A[i′] ≥ P

i′ A[i′]

m

. Complexity: O(m log n

m). Guarantees : Lmax(DC) ≤ P

i′ A[i′]

m

+ maxi A[i].

Optimal : Nicol’s algorithm [Nic94] (improved by [PA04])

Use Probe(B) which tries to build a solution of value less than B. It loads greedily the processors up with the largest interval of load less than B. It exploits the property that there exists a solution so that the first interval [1, i] is either the smallest such that Probe(L([1, i])) is true or the largest such that Probe(L([1, i])) is false. Complexity: O((m log n

m)2).

Note: it works on more than load matrices, as long as the load of intervals are non-decreasing (by inclusion).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Preliminaries::In One Dimension 5 / 32

slide-6
SLIDE 6

Simulation Setting

Classes (Some inspired by [MS96]) Processors

Simulation are perform with different number of processors: most squared numbers up to 10,000.

Metric

Load imbalance is the presented metric :

Lmax

P i,j A[i][j] m

− 1.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Preliminaries::Simulation Setting 6 / 32

slide-7
SLIDE 7

Outline of the Talk

1

Introduction

2

Preliminaries Notation In One Dimension Simulation Setting

3

Rectilinear Partitioning Nicol’s Algorithm

4

Jagged Partitioning PxQ jagged partitioning m-way Jagged Partitioning

5

Hierarchical Bisection Recursive Bisection Dynamic Programming

6

Final thoughts Summing up Conclusion and Perspective

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning:: 7 / 32

slide-8
SLIDE 8

Rectilinear Partitioning

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning:: 8 / 32

slide-9
SLIDE 9

Known results on rectilinear partitioning

NP Complete [GM96] and there is no (2 − ǫ)-approximation algorithm (unless P = NP). [Nic94]: a θ(m)-approximation algorithm based on iterative

  • refinement. O(n1n2) iterations in O(Q(P log n1

P )2 + P(Q log n2 Q )2).

[AHM01](refinement of [Nic94]): a θ(m1/4)-approximation algorithm for squared matrices. [KMS97]: a 120-approximation algorithm of complexity O(n1n2). [GIK02]: 4-approximation algorithm (from rectangle stabbing) of complexity O(log(

i,j A[i][j])n10 1 n10 2 ) (heavy linear programming).

[MS05]: (4 + ǫ)-approximation algorithm that runs in O((n1 + n2 + PQ)P log(n1n2)).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning:: 9 / 32

slide-10
SLIDE 10

Nicol’s Rectilinear Algorithm [Nic94]

PxQ rectilinear partitioning

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-11
SLIDE 11

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-12
SLIDE 12

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-13
SLIDE 13

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-14
SLIDE 14

Nicol’s Rectilinear Algorithm [Nic94]

PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-15
SLIDE 15

Nicol’s Rectilinear Algorithm [Nic94]

max

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-16
SLIDE 16

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval. Partition it in Q.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-17
SLIDE 17

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval. Partition it in Q. Get a PxQ rectilinear partitioning.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-18
SLIDE 18

Nicol’s Rectilinear Algorithm [Nic94]

PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval. Partition it in Q. Get a PxQ rectilinear partitioning.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-19
SLIDE 19

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval. Partition it in Q. Get a PxQ rectilinear partitioning. Ignore the row partition.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-20
SLIDE 20

Nicol’s Rectilinear Algorithm [Nic94]

  • PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval. Partition it in Q. Get a PxQ rectilinear partitioning. Ignore the row partition. Iterate if improve.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-21
SLIDE 21

Nicol’s Rectilinear Algorithm [Nic94]

PxQ rectilinear partitioning

Sum the columns to make a 1d instance. Partition it in P parts. Get a Px1 rectilinear partitioning. Sum the rows in each part. Build a 1d instance by taking the maximum for each interval. Partition it in Q. Get a PxQ rectilinear partitioning. Ignore the row partition. Iterate if improve.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-22
SLIDE 22

Nicol’s Rectilinear Algorithm [Nic94]

PxQ rectilinear partitioning

Complexity: O(n1n2) iterations (around 10 in practice) 1 iteration : O(Q(P log n1

P )2 + P(Q log n2 Q )2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Rectilinear Partitioning::Nicol’s Algorithm 10 / 32

slide-23
SLIDE 23

Outline of the Talk

1

Introduction

2

Preliminaries Notation In One Dimension Simulation Setting

3

Rectilinear Partitioning Nicol’s Algorithm

4

Jagged Partitioning PxQ jagged partitioning m-way Jagged Partitioning

5

Hierarchical Bisection Recursive Bisection Dynamic Programming

6

Final thoughts Summing up Conclusion and Perspective

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning:: 11 / 32

slide-24
SLIDE 24

PxQ Jagged Partitioning

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 12 / 32

slide-25
SLIDE 25

PxQ heuristic

PxQ Jagged Partitioning

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 13 / 32

slide-26
SLIDE 26

PxQ heuristic

  • PxQ Jagged Partitioning

Sum on columns to generate a 1D problem.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 13 / 32

slide-27
SLIDE 27

PxQ heuristic

  • PxQ Jagged Partitioning

Sum on columns to generate a 1D problem. Partition it in P parts.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 13 / 32

slide-28
SLIDE 28

PxQ heuristic

  • PxQ Jagged Partitioning

Sum on columns to generate a 1D problem. Partition it in P parts. For the first stripe, sum on rows.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 13 / 32

slide-29
SLIDE 29

PxQ heuristic

  • PxQ Jagged Partitioning

Sum on columns to generate a 1D problem. Partition it in P parts. For the first stripe, sum on rows. Partition it in Q parts.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 13 / 32

slide-30
SLIDE 30

PxQ heuristic

PxQ Jagged Partitioning

Sum on columns to generate a 1D problem. Partition it in P parts. For the first stripe, sum on rows. Partition it in Q parts. Treat all stripes. Complexity : O((P log n1

P )2 + P × (Q log n2 Q )2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 13 / 32

slide-31
SLIDE 31

How good is that ?

Theorem

If there are no zero in the array, the heuristic P × Q-way partitioning is a (1 + ∆ P

n1 )(1 + ∆ Q n2 )-approximation algorithm where ∆ = max A min A , P < n1,

Q < n2.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 14 / 32

slide-32
SLIDE 32

How good is that ?

Theorem

If there are no zero in the array, the heuristic P × Q-way partitioning is a (1 + ∆ P

n1 )(1 + ∆ Q n2 )-approximation algorithm where ∆ = max A min A , P < n1,

Q < n2.

Proof.

One dimension guarantee (upper bound) Lmax(DC) ≤

P

i′ A[i′]

m

+ maxi A[i] can be rewritten as Lmax(DC) ≤

P A[i] m

(1 + ∆ m

n ).

It allows to bound the imbalance of a stripe : Loadstripe ≤

P A[i][j] P

(1 + ∆ P

n1 ).

And finally of a processor : Lmax ≤ (1 + ∆ P

n1 )(1 + ∆ Q n2 ).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 14 / 32

slide-33
SLIDE 33

How good is that ?

Theorem

If there are no zero in the array, the heuristic P × Q-way partitioning is a (1 + ∆ P

n1 )(1 + ∆ Q n2 )-approximation algorithm where ∆ = max A min A , P < n1,

Q < n2.

Proof.

One dimension guarantee (upper bound) Lmax(DC) ≤

P

i′ A[i′]

m

+ maxi A[i] can be rewritten as Lmax(DC) ≤

P A[i] m

(1 + ∆ m

n ).

It allows to bound the imbalance of a stripe : Loadstripe ≤

P A[i][j] P

(1 + ∆ P

n1 ).

And finally of a processor : Lmax ≤ (1 + ∆ P

n1 )(1 + ∆ Q n2 ).

Theorem

The approximation ratio is minimized by P =

  • m n1

n2 .

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 14 / 32

slide-34
SLIDE 34

An optimal PxQ jagged partitioning

A Dynamic Programming Formulation

   Lmax(n1, P) = min1≤k<n1 max Lmax(k − 1, P − 1), 1D(k, n1, Q) Lmax(0, P) = 0 Lmax(n1, 0) = +∞, ∀n1 ≥ 1 O(n1m) Lmax functions. O(n2

1) 1D functions.

For a 512x512 matrix and 1000 processors, that’s 512,000+262,144

  • values. On 64-bit values, that’s 6MB.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 15 / 32

slide-35
SLIDE 35

An optimal PxQ jagged partitioning

A Dynamic Programming Formulation

   Lmax(n1, P) = min1≤k<n1 max Lmax(k − 1, P − 1), 1D(k, n1, Q) Lmax(0, P) = 0 Lmax(n1, 0) = +∞, ∀n1 ≥ 1 O(n1m) Lmax functions. O(n2

1) 1D functions.

For a 512x512 matrix and 1000 processors, that’s 512,000+262,144

  • values. On 64-bit values, that’s 6MB.

Not all values need to be stored

Binary search on k. Lower bound/Upper bound on Lmax and 1D. Tree pruning.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 15 / 32

slide-36
SLIDE 36

Performance of PxQ jagged Partitioning

0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 load imbalance nb proc iteration 30000 Nicol Heuristic PxQ Jagged Optimal PxQ Jagged Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::PxQ jagged partitioning 16 / 32

slide-37
SLIDE 37

m-way Jagged Partitioning

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::m-way Jagged Partitioning 17 / 32

slide-38
SLIDE 38

m-way jagged partitioning heuristic

Algorithm

Cut in P stripes. Distribute processors in each stripe proportionally to the stripe’s load : allocj = P

i,j A[i][j]

loadj

(m − P)

  • .

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::m-way Jagged Partitioning 18 / 32

slide-39
SLIDE 39

m-way jagged partitioning heuristic

Algorithm

Cut in P stripes. Distribute processors in each stripe proportionally to the stripe’s load : allocj = P

i,j A[i][j]

loadj

(m − P)

  • .

Theorem

If there are no zero in A, the approximation ratio of the described algorithm is

m m−P (1 + ∆ n2 ) + m∆ Pn2 (1 + ∆P n1 ).

Proof.

Same kind of proof than for heuristic PxQ jagged partitioning. Recall that the guarantee of heuristic PxQ jagged partitioning was: (1 + ∆ P

n1 )(1 + ∆ Q n2 ). m-way is better for large m values.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::m-way Jagged Partitioning 18 / 32

slide-40
SLIDE 40

An optimal m-way partitioning

A Dynamic Programming Formulation

   Lmax(n1, m) = min1≤k<n1,1≤x≤m max Lmax(k − 1, m − x), 1D(k, n1, x) Lmax(0, m) = 0 Lmax(n1, 0) = +∞, ∀n1 ≥ 1 O(n1m) Lmax functions. O(n2

1m) 1D functions.

The same kind of optimizations apply. For a 512x512 matrix on 1,000 processors. That’s 512,000 + 262,144,000 values, if they are 64-bits, about 2GB (and takes 30 minutes).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::m-way Jagged Partitioning 19 / 32

slide-41
SLIDE 41

Performance of m-way

0.0001 0.001 0.01 0.1 1 500 1000 1500 2000 2500 3000 load imbalance nb proc iteration 30000 Nicol Heuristic PxQ Jagged Heuristic m-way Jagged Optimal m-way Jagged Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Jagged Partitioning::m-way Jagged Partitioning 20 / 32

slide-42
SLIDE 42

Outline of the Talk

1

Introduction

2

Preliminaries Notation In One Dimension Simulation Setting

3

Rectilinear Partitioning Nicol’s Algorithm

4

Jagged Partitioning PxQ jagged partitioning m-way Jagged Partitioning

5

Hierarchical Bisection Recursive Bisection Dynamic Programming

6

Final thoughts Summing up Conclusion and Perspective

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection:: 21 / 32

slide-43
SLIDE 43

Hierarchical Bisection Partitioning

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection:: 22 / 32

slide-44
SLIDE 44

Recursive Bisection [BB87]

m = 8

Algorithm

m processors to partition a rectangle. Complexity: O(m log max n1, n2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 23 / 32

slide-45
SLIDE 45

Recursive Bisection [BB87]

m = 8

Algorithm

m processors to partition a rectangle. Cut to balance the load evenly. Complexity: O(m log max n1, n2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 23 / 32

slide-46
SLIDE 46

Recursive Bisection [BB87]

m = 4 m = 4

Algorithm

m processors to partition a rectangle. Cut to balance the load evenly. Allocate half the processors to each side. Complexity: O(m log max n1, n2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 23 / 32

slide-47
SLIDE 47

Recursive Bisection [BB87]

m = 2 m = 2 m = 4

Algorithm

m processors to partition a rectangle. Cut to balance the load evenly. Allocate half the processors to each side. Complexity: O(m log max n1, n2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 23 / 32

slide-48
SLIDE 48

Recursive Bisection [BB87]

m = 1 m = 1 m = 2 m = 4

Algorithm

m processors to partition a rectangle. Cut to balance the load evenly. Allocate half the processors to each side. Cut the dimension that balances the load best. Complexity: O(m log max n1, n2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 23 / 32

slide-49
SLIDE 49

Recursive Bisection [BB87]

Algorithm

m processors to partition a rectangle. Cut to balance the load evenly. Allocate half the processors to each side. Cut the dimension that balances the load best. Complexity: O(m log max n1, n2).

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 23 / 32

slide-50
SLIDE 50

Performance of Recursive Bisection

0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 load imbalance nb proc iteration 30000 Nicol Heuristic PxQ Jagged Heuristic m-way Jagged Recursive Bisection Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Recursive Bisection 24 / 32

slide-51
SLIDE 51

An Optimal Hierarchical Bisection Algorithm

A Dynamic Programming Formulation

   Lmax(x1, x2, y1, y2, m) = minj min (minx max Lmax(x1, x, y1, y2, j), Lmax(x + 1, x2, y1, y2, m − j)) , (miny max Lmax(x1, x2, y1, y, j), Lmax(x1, x2, y + 1, y2, m − j)) O(n2

1n2 2m) Lmax functions.

For a 512x512 matrix and 1000 processors, that’s 68,719,476,736,000

  • values. On 64-bit values, that’s 544TB.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Dynamic Programming 25 / 32

slide-52
SLIDE 52

An Optimal Hierarchical Bisection Algorithm

A Dynamic Programming Formulation

   Lmax(x1, x2, y1, y2, m) = minj min (minx max Lmax(x1, x, y1, y2, j), Lmax(x + 1, x2, y1, y2, m − j)) , (miny max Lmax(x1, x2, y1, y, j), Lmax(x1, x2, y + 1, y2, m − j)) O(n2

1n2 2m) Lmax functions.

For a 512x512 matrix and 1000 processors, that’s 68,719,476,736,000

  • values. On 64-bit values, that’s 544TB.

The Relaxed Hierarchical Heuristic

Build the solution according to      Lmax(x1, x2, y1, y2, m) = minj min (minx max L(x1,x,y1,y2)

j

, L(x+1,x2,y1,y2)

m−j

) , (miny max L(x1,x2,y1,y)

j

, L(x1,x2,y+1,y2)

m−j

)

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Dynamic Programming 25 / 32

slide-53
SLIDE 53

Performance of Relaxed Hierarchical

0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 load imbalance nb proc iteration 30000 Nicol Heuristic PxQ Jagged Heuristic m-way Jagged Recursive Bisection Relaxed Hierarchical Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Hierarchical Bisection::Dynamic Programming 26 / 32

slide-54
SLIDE 54

Outline of the Talk

1

Introduction

2

Preliminaries Notation In One Dimension Simulation Setting

3

Rectilinear Partitioning Nicol’s Algorithm

4

Jagged Partitioning PxQ jagged partitioning m-way Jagged Partitioning

5

Hierarchical Bisection Recursive Bisection Dynamic Programming

6

Final thoughts Summing up Conclusion and Perspective

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts:: 27 / 32

slide-55
SLIDE 55

More General ?

Recursively Defined Partitioning

Most of them are polynomial by Dynamic Programming

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts:: 28 / 32

slide-56
SLIDE 56

More General ?

Recursively Defined Partitioning

Most of them are polynomial by Dynamic Programming

Arbitrary Rectangles

NP-Complete with a 5

4 non-approximability result [KMP98].

There is a known 2-approximation of complexity O(n1n2 + m log n1n2) which heavily relies on linear programming [Pal06].

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts:: 28 / 32

slide-57
SLIDE 57

Performance Over the Execution

0.01 0.1 1 5000 10000 15000 20000 25000 30000 35000 load imbalance iteration 6400 processors Nicol Heuristic PxQ Jagged Heuristic m-way Jagged Recursive Bisection Relaxed Hierarchical Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Summing up 29 / 32

slide-58
SLIDE 58

Relaxed Hierarchical Might Be Unstable

0.01 0.1 5000 10000 15000 20000 25000 30000 35000 load imbalance iteration 400 processors Nicol Heuristic PxQ Jagged Heuristic m-way Jagged Recursive Bisection Relaxed Hierarchical Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Summing up 30 / 32

slide-59
SLIDE 59

Conclusion and Perspective

Conclusion

Proposed new classes of partitioning. Proved that most recursively defined classes are polynomial: . Proposed two new well-founded heuristics which outperform state-of-the-art algorithm. Theoretically analyzed two heuristics.

Perspective

Better m-way jagged partitioning algorithm. Integration into real physic simulation codes. Include communication models.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Conclusion and Perspective 31 / 32

slide-60
SLIDE 60

Thank you

Collaborators

Thanks to H. Karimabadi, A. Majumdar, Y.A. Omelchenko and K.B. Quest, collaborators of the Petaapps NSF OCI-0904802 grant, for providing the particle-in-cell dataset.

More information

contact : esaule@bmi.osu.edu visit: http://bmi.osu.edu/hpc/

Research at HPC lab is funded by

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Conclusion and Perspective 32 / 32

slide-61
SLIDE 61

Bengt Aspvall, Magn´ us M. Halld´

  • rsson, and Fredrick Manne.

Approximations for the general block distribution of a matrix.

  • Theor. Comput. Sci., 262(1-2):145–160, 2001.

Marsha Berger and Shahid Bokhari. A partitioning strategy for nonuniform problems on multiprocessors. IEEE Transaction on Computers, C36(5):570–580, 1987. Daya Ram Gaur, Toshihide Ibaraki, and Ramesh Krishnamurti. Constant ratio approximation algorithms for the rectangle stabbing problem and the rectilinear partitioning problem.

  • J. Algorithms, 43(1):138–152, 2002.

Michelangelo Grigni and Fredrik Manne. On the complexity of the generalized block distribution. In IRREGULAR ’96: Proceedings of the Third International Workshop

  • n Parallel Algorithms for Irregularly Structured Problems, pages

319–326, London, UK, 1996. Springer-Verlag.

  • S. Khanna, S. Muthukrishnan, and M. Paterson.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Conclusion and Perspective 32 / 32

slide-62
SLIDE 62

On approximating rectangle tiling and packaging. In proceedings of the 19th SODA, pages 384–393, 1998. Sanjeev Khanna, S. Muthukrishnan, and Steven Skiena. Efficient array partitioning. In ICALP ’97: Proceedings of the 24th International Colloquium on Automata, Languages and Programming, pages 616–626, London, UK, 1997. Springer-Verlag. Serge Miguet and Jean-Marc Pierson. Heuristics for 1d rectilinear partitioning as a low cost and high quality answer to dynamic load balancing. In HPCN Europe ’97: Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking, pages 550–564, London, UK, 1997. Springer-Verlag. Fredrik Manne and Tor Sørevik. Partitioning an array onto a mesh of processors.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Conclusion and Perspective 32 / 32

slide-63
SLIDE 63

In PARA ’96: Proceedings of the Third International Workshop on Applied Parallel Computing, Industrial Computation and Optimization, pages 467–477, London, UK, 1996. Springer-Verlag.

  • S. Muthukrishnan and Torsten Suel.

Approximation algorithms for array partitioning problems. Journal of Algorithms, 54:85–104, 2005. David Nicol. Rectilinear partitioning of irregular data parallel computations. Journal of Parallel and Distributed Computing, 23:119–134, 1994. Ali Pinar and Cevdet Aykanat. Fast optimal load balancing algorithms for 1d partitioning. Journal of Parallel and Distributed Computing, 64:974–996, 2004.

  • K. Paluch.

A new approximation algorithm for multidimensional rectangle tiling. In Proceedings of ISAAC, 2006.

Erik Saule Ohio State University, Biomedical Informatics HPC Lab http://bmi.osu.edu/hpc 2D partitioning Final thoughts::Conclusion and Perspective 32 / 32