Performance Results of Running Parallel Applications on the - - PowerPoint PPT Presentation

performance results of running parallel applications on
SMART_READER_LITE
LIVE PREVIEW

Performance Results of Running Parallel Applications on the - - PowerPoint PPT Presentation

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Performance Results of Running Parallel Applications on the InteGrade Edson Norberto C aceres, Henrique Mongelli, Leonardo Loureiro, Christiane


slide-1
SLIDE 1

1

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work

Performance Results of Running Parallel Applications on the InteGrade

Edson Norberto C´ aceres, Henrique Mongelli, Leonardo Loureiro, Christiane Nishibe Siang Wun Song 29 de Outubro de 2008

slide-2
SLIDE 2

2

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work

Outline

Introduction; The 0-1 Knapsack Problem; Local Alignment Problem; Conclusions and Future Work.

slide-3
SLIDE 3

3

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work The BSP/CGM Model Implementations

The BSP/CGM Model

BSP/CGM model: p of processors, each with its own local memory, communicating through a network. The algorithm alternates between Computation rounds: each processor computes independently. Communication rounds: each processor sends/receives data to/from

  • ther processors.

Goals: Obtain a linear speed-up on p. Minimize the number of rounds.

slide-4
SLIDE 4

4

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work The BSP/CGM Model Implementations

Implementations

C and C++ languages Lam MPI library. SPMD paradigm

6 Pentium IV 1.7 MHz nodes 6 AMD Athlon 1.6 MHz nodes 1 GB memory 1 GBit Interconnection Network

slide-5
SLIDE 5

5

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Experimental Results

Introduction

Motivation: Classical Combinatorial Problem

Wide range of applications; Integer Programming Problem - on constraint.

Good Algorithms 0-1 Knapsack Problem

Integer Programming Research Area.

NP-Complete Problem

Two basic approachs: Dynamic Programming (DP) and Branch-and-Bound (B&B)

O(nW ) time - Pseudo-Polynomial - DP Our Result: An O(p) communication rounds BSP/CGM algorithm that requires communication with few neighbor processors.

slide-6
SLIDE 6

6

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Experimental Results

Definition

0-1 Knapsack Problem:

S = {1, 2, . . ., n} a set of n distinct items ith item is worth vi dollars and weighs wi kilos.

vi and wi are integers.

W is the integer capacity of the knapsack.

which items should be selected in order to fill the knapsack with the most valuable load without exceeding the capacity constraint. max n

  • i=1

vizi :

n

  • i=1

wizi ≤ W , zi ∈ {0, 1}

  • .
slide-7
SLIDE 7

7

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Experimental Results

The BSP/CGM Algorithm

Approach: Wavefront - Dynamic Programming - Alves at al. p processors, each processor has O(Wn/p) local memory. Computing the optimal solution matrix f .

S = {1, 2, . . ., n} of items. w, where w[i] is the weight of item i, is broadcasted to all processors v[i] is divided into p pieces, of size n

p .

Each Pi, 1 ≤ i ≤ p, receives the i-th piece of v (v[(i − 1) n

p + 1 . . i n p])

slide-8
SLIDE 8

8

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Experimental Results

The BSP/CGM Algorithm

Pk

i - work of processor Pi at round k.

P1 starts computing at round 0. P1 and P2 can work at round 1. P1, P2 and P3 at round 2, and so on. After computing f k

i , Pi sends to Pi+1 the boundary Rk i .

Using Rk

i , Pi+1 compute fi+1.

After p − 1 rounds, Pp receives R1

p−1 and computes f 1 p .

In the 2p − 2 round, Pp receives Rp

p−1 and computes f p p .

GOOD, but poor load balancing.

slide-9
SLIDE 9

9

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Experimental Results

The BSP/CGM Algorithm

Input: (1) The number p of processors; (2) The number i of the processor, where 1 ≤ i ≤ p; and (3) The array w, the capacity of the knapsack W and subarray vi of size n

p , respectively.

Output: f (r, c) = max{f [r, c − w[r]] + v[r], f [r − 1, c]}, where 1 ≤ c ≤ W and (j − 1) n

p + 1 ≤ r ≤ j n p .

for 1 ≤ k ≤ p do if i = 1 then for (k − 1) W

p + 1 ≤ r ≤ k W p

and 1 ≤ c ≤ n

p do

compute f (r, c); end for send(Rk

i ,Pi+1);

end if if i = 1 then receive(Rk

i−1, Pi−1);

for (k − 1) W

p + 1 ≤ r ≤ k W p

and 1 ≤ c ≤ n

p do

compute f (r, c); end for if i = p then send(Rk

i ,Pi+1);

end if end if end for

P p−1

1

P p

2

P 2p−2

p

P p

p

P p−1

p

P k

i

P 0

1

P 1

1

P 1

2

P 2

1

P 2

2

P 2

3

W n

W p n p

Rk

i

slide-10
SLIDE 10

10

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The 0-1 Knapsack Problem The BSP/CGM Algorithm Experimental Results

LAM × InteGrade

4096 × 1024 8192 × 2048 16384 × 4096 32768 × 8192 p I II I II I II I II 1 0.071 0.084 0.283 0.367 1.105 1.105 4.050 4.591 2 0.063 0.072 0.250 0.278 0.992 1.053 3.953 4.065 4 0.057 0.078 0.244 0.280 0.952 1.146 3.718 4.079 8 0.050

  • 0.173
  • 0.645
  • 2.390
  • 0.000

0.500 1.000 1.500 2.000 2.500 3.000 3.500 4.000 4.500 5.000 8 4 2 1 Time (s)

  • No. CPUs

Cluster 4096 x 1024 Cluster 8192 x 2048 Cluster 16384 x 4096 Cluster 32768 x 8192 Grid 4096 x 1024 Grid 8192 x 2048 Grid 16384 x 4096 Grid 32768 x 8192

slide-11
SLIDE 11

11

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The BSP/CGM Algorithm Experimental Results

Introduction

Motivation: The local alignment is used to determine if two sequences

  • f nucleotides or proteins have similar functionality or evolutionary

relationship. Basic approach: Dynamic Programming (DP) O(nm) time Our Result: An O(p) communication rounds and O(m × n/p) complexity BSP/CGM algorithm (requires communication with few neighbor processors).

slide-12
SLIDE 12

12

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The BSP/CGM Algorithm Experimental Results

The BSP/CGM Algorithm

Input: (1) Sequences S1 of size m and S2 of size n; (2) Number of processors p; (3) Rank of processor i; (3) Each processor of rank i holds s1[0..m − 1] and s2[i ∗ (n/p)..(i + 1) ∗ (n/p)]. Output: Best local alignment between S1 and S2 matrix A(m+1, blockSize+1), matrix B(m+1, blockSize+1), matrix C(m+1, blockSize+1) blockSize ← n/p next ← i + 1 previous ← i − 1 col ← 1 for round ← 0 to p − 1 do col ← col + blockSize if i = 0 then receive (A[0, col..col + blockSize], previous) receive (B[0, col..col + blockSize], previous) receive (C[0, col..col + blockSize], previous) end if compute A[1..m, col..col + blockSize] compute B[1..m, col..col + blockSize] compute C[1..m, col..col + blockSize] if i = p − 1 then send (A[m, col..col + blockSize], next) send (B[m, col..col + blockSize], next) send (C[m, col..col + blockSize], next) end if end for

slide-13
SLIDE 13

13

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work Introduction The BSP/CGM Algorithm Experimental Results

LAM × InteGrade

0.000 50.000 100.000 150.000 200.000 250.000 300.000 350.000 400.000 450.000 8 4 2 1 Time (s)

  • No. CPUs

Cluster 4096 x 4096 Cluster 8192 x 8192 Cluster 16384 x 16384 Cluster 32768 x 32768 Grid 4096 x 4096 Grid 8192 x 8192 Grid 16384 x 16384 Grid 32768 x 32768

slide-14
SLIDE 14

14

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work

Conclusions

BSP/CGM Algorithms are suitable for grids. BSP/CGM DP Algorithms can be implemented using wavefront strategy.

slide-15
SLIDE 15

15

Introduction The 0-1 Knapsack Problem Local Alignment Problem Conclusions and Future Work

Future Work

Fault Tolerant BSP/CGM Algorithms. “Balance” the size of the messages.