Parallel Algorithms Parallel Algorithms
- Examples
Examples
- Concepts & Definitions
Concepts & Definitions
- Analysis of Algorithms
Parallel Algorithms Parallel Algorithms Examples Examples - - PowerPoint PPT Presentation
Parallel Algorithms Parallel Algorithms Examples Examples Concepts & Definitions Concepts & Definitions Analysis of Algorithms Analysis of Algorithms Lemma Lemma Any complete binary tree with n leaves has
internal nodes = n-
1 (i.e., 2n-
1 total nodes)
height = log2
2 n
n
1, d
1),
n,
n)}
i ∈
i is an
i will be
i’
Sequential algorithm: for each step i, read x and
Time = n steps Memory = Θ(n)
1 m-1
Shared memory with n processors P1
1, P2 2, .., Pn n
Memory = m(2n-1) Think about m complete BT T0
0, T1 1, ..., Tm m-
1 each with n
leaves numbered 1, 2, .., n which corresponds to P1
1,.., Pn n .
1 m-1
T0 T1
1
Tm
m-
1
1 n 1 1 n n 1 n
Phase 1: Each processor Pi
i will read the pair (xi i, di i) and
insert it in the leaf i that belongs to the tree Txi
i
Phase 2: Each processor Pi
i will try to move the pair (xi i, di i)
higher up in its tree until it can go no higher as follows:
1 m-1
T0 T1
1
Tm
m-
1
1 n 1 1 n n 1 n
If node u is free, then the pair in the right child
Since in shared memory parallel computer we have
Phase 1: takes 1 step only Phase 2: it takes log2
2 n steps only because each tree
2 n.
Total = log2
2 n + 1 steps
Extra empty cells in the m(2n-1) memory can be
The root sends the key K to its children which they send subsequently to their children and so on. subsequently to their children and so on.
Until it reaches the leaves where it is compared with the keys they stored there. keys they stored there.
The leaf that contains the key is going to send up the corresponding record to the root through its parent and corresponding record to the root through its parent and
When a parent receives a record from one of its children then it will send the same record to its parent; otherwise it then it will send the same record to its parent; otherwise it will send null. will send null.
and so on ...
2 n steps to send the key down the tree
2 n + m - 1
0, x
1, ..., x
n-
1 where n is a
k = x
0 + x
1 + ... +
k
i = x
i
2 n – 1, let
i ←
i + S
i-
2^j
j until 2^j = i
2 n +1) processors
2 n +1 columns and n rows.
2 n +1)
The number of columns is log2
2 n +1
Each processor does at most one step (addition). The processors in any fixed column work in
Time = log2
2 n +1 additions.
1. 1.
2. 2.
1. 1.
2. 2.
3. 3.
4. 4.
1. 1.
2. 2.
5. 5.
P1 performs 13 steps, then idle for 3 steps, then 4 steps more. steps more.
P2 performs 11 steps continuously
P3 performs 5 steps, then 11 steps more.
Total = 20 steps
p to denote the running
1 /
p
p ×
1 is an optimal time then
1 ≤
p ×
1 /
p ≤
p ×
p ≥
1/p
p cannot be better than
1/p
Time to add numbers in each processor = O(log O(log n) steps n) steps
Time to propagate = O(log O(log n) steps n) steps
Speedup = O(n O(n/log n) /log n)
log ( n/log n) n/log n 1 2
i compares
i finds x in
i holds
Theorem: if a certain problem can be solved with if a certain problem can be solved with p p processors in processors in t tp
p time and with
time and with q processors in q processors in t tq
q time
time where q < p, then where q < p, then t tp
p ≤
≤ t tq
q ≤
≤ t tp
p +p
+p t tp
p /q
/q
That is when the number of CPUs when the number of CPUs decreases from p to decreases from p to q q then the running time can slowdown then the running time can slowdown by a factor of by a factor of (1+p/q) in the worst case. (1+p/q) in the worst case.
Or when the number of CPUs when the number of CPUs increases from q to p increases from q to p then the running time can be reduced then the running time can be reduced by a factor of by a factor of 1/(1+p/q) in the best case. 1/(1+p/q) in the best case.
p steps
p /q
q ≤
p +p
p /q.
1 /
p = S(1,p)
Algorithms (with same running time and on same models) but with less number of processors are preferred (less expensive). with less number of processors are preferred (less expensive).
Sometimes optimal times and speedups can be achieved with certain number of processors certain number of processors
A minimum number of processors may be required to have successful computations successful computations
Slowdown and speedup theorems show that number of processors is important. processors is important.
Certain computational models may not accommodate the required number of processors. (e.g., perfect squares or prime required number of processors. (e.g., perfect squares or prime number) number)
In combinatorial circuits each CPU is used at most once. That gives an upper bound on the time. gives an upper bound on the time.
Running time = 8 = 8
Cost = 8 x 6 = 48 = 8 x 6 = 48
Work = 6+4+4 = 6+4+4 +8+5+5 +8+5+5 = 32 = 32
RT ≤
≤ Work
Work ≤
≤ Cost
P1 P1 P6 P6 P2 P2
P3 P3 P4 P4 P5 P5 7 6 1 2 3 4 5 8
Cost
1. 1.
This follows from the speedup theorem which says that the This follows from the speedup theorem which says that the reduction in the running time is by at most a factor of 1/p. reduction in the running time is by at most a factor of 1/p.
Ω(n^2)
Ω Ω(
1 / (
p) ,
1 is the running time of the best known sequential
p is the running time of the parallel algorithm that
1 is optimal,
p to the running time of the
1.
p to
1.