Analytical Modeling of Parallel Programs (Chapter 5) Alexandre - - PowerPoint PPT Presentation

analytical modeling of parallel programs chapter 5
SMART_READER_LITE
LIVE PREVIEW

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre - - PowerPoint PPT Presentation

Analytical Modeling of Parallel Programs (Chapter 5) Alexandre David B2-206 Topic Overview Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on performance. Scalability of


slide-1
SLIDE 1

Analytical Modeling of Parallel Programs (Chapter 5)

Alexandre David B2-206

slide-2
SLIDE 2

10-03-2006 Alexandre David, MVP'06 2

Topic Overview

Sources of overhead in parallel programs. Performance metrics for parallel systems. Effect of granularity on performance. Scalability of parallel systems. Minimum execution time and minimum

cost-optimal execution time.

Asymptotic analysis of parallel programs. Other scalability metrics.

slide-3
SLIDE 3

10-03-2006 Alexandre David, MVP'06 3

Analytical Modeling – Basics

A sequential algorithm is evaluated by its

runtime in function of its input size.

O(f(n)), Ω(f(n)), Θ(f(n)).

The asymptotic runtime is independent of

the platform. Analysis “at a constant factor”.

A parallel algorithm has more parameters.

Which ones?

slide-4
SLIDE 4

10-03-2006 Alexandre David, MVP'06 4

Analytical Modeling – Basics

A parallel algorithm is evaluated by its

runtime in function of

the input size, the number of processors, the communication parameters.

Which performance measures? Compare to which (serial version)

baseline?

slide-5
SLIDE 5

10-03-2006 Alexandre David, MVP'06 5

Sources of Overhead in Parallel Programs

Overheads: wasted computation,

communication, idling, contention.

Inter-process interaction. Load imbalance. Dependencies.

slide-6
SLIDE 6

10-03-2006 Alexandre David, MVP'06 6

Performance Metrics for Parallel Systems

Execution time = time elapsed between

beginning and end of execution on a

sequential computer.

beginning of first processor and end of the last

processor on a parallel computer.

slide-7
SLIDE 7

10-03-2006 Alexandre David, MVP'06 7

Performance Metrics for Parallel Systems

Total parallel overhead.

Total time collectively spent by all processing

elements = pTP.

Time spent doing useful work (serial time) =

TS.

Overhead function: TO = pTP-TS.

slide-8
SLIDE 8

10-03-2006 Alexandre David, MVP'06 8

Performance Metrics for Parallel Systems

What is the benefit of parallelism?

Speedup of course… let’s define it.

Speedup S = TS/TP. Example: Compute the sum of n elements.

Serial algorithm Θ(n). Parallel algorithm Θ(logn). Speedup = Θ(n/logn).

Baseline (TS) is for the best sequential

algorithm available.

slide-9
SLIDE 9

10-03-2006 Alexandre David, MVP'06 9

Speedup

Theoretically, speedup can never exceed

  • p. If > p, then you found a better

sequential algorithm… Best: TP=TS/p.

In practice, super-linear speedup is

  • bserved. How?

Serial algorithm does more work? Effects from caches. Exploratory decompositions.

slide-10
SLIDE 10

10-03-2006 Alexandre David, MVP'06 10

Speedup – Example

1 processing element: 14tc. 2 processing elements: 5tc. Speedup: 2.8.

Depth-first Search

slide-11
SLIDE 11

10-03-2006 Alexandre David, MVP'06 11

Performance Metrics

Efficiency E=S/p.

Measure time spent in doing useful work. Previous sum example: E = Θ(1/logn).

Cost C=pTP.

A.k.a. work or processor-time product. Note: E=TS/C. Cost optimal if E is a constant.

slide-12
SLIDE 12

10-03-2006 Alexandre David, MVP'06 12

Effect of Granularity on Performance

Scaling down: To use fewer processing

elements than the maximum possible.

Naïve way to scale down:

Assign the work of n/p processing element to

every processing element.

Computation increases by n/p. Communication growth ≤ n/p.

If a parallel system with n processing elements

is cost optimal, then it is still cost optimal with p. If it is not cost optimal, it may still not be cost optimal after the granularity increase.

slide-13
SLIDE 13

10-03-2006 Alexandre David, MVP'06 13

Adding n Numbers – Bad Way

12 8 4 13 9 5 1 14 10 6 2 15 11 7 3 1 2 3

slide-14
SLIDE 14

10-03-2006 Alexandre David, MVP'06 14

Adding n Numbers – Bad Way

12+13 8+9 4+5 0+1 14+15 10+11 6+7 2+3 1 2 3

slide-15
SLIDE 15

10-03-2006 Alexandre David, MVP'06 15

Adding n Numbers – Bad Way

12+13+14+15 8+9+10+11 4+5+6+7 0+1+2+3 1 2 3 + + +

Bad way: T=Θ((n/p)logp)

slide-16
SLIDE 16

10-03-2006 Alexandre David, MVP'06 16

Adding n Numbers – Good Way

3 2 1 7 6 5 4 11 10 9 8 15 14 13 12 1 2 3 + + + + + + + + + + + +

slide-17
SLIDE 17

10-03-2006 Alexandre David, MVP'06 17

Adding n Numbers – Good Way

0+1+2+3 4+5+6+7 8+9+10+11 12+13+14+15 1 2 3

Much less communication. T=Θ(n/p +logp).

slide-18
SLIDE 18

10-03-2006 Alexandre David, MVP'06 18

Scalability of Parallel Systems

In practice: Develop and test on small

systems with small problems.

Problem: What happens for the real large

problems on large systems?

Difficult to extrapolate results.

slide-19
SLIDE 19

10-03-2006 Alexandre David, MVP'06 19

Problem with Extrapolation

slide-20
SLIDE 20

10-03-2006 Alexandre David, MVP'06 20

Scaling Characteristics of Parallel Programs

Rewrite efficiency (E): What does it tell us?

S S p p S

T T E T T pT pT T p S E 1 1 + = ⇒ ⎪ ⎩ ⎪ ⎨ ⎧ + = = =

slide-21
SLIDE 21

10-03-2006 Alexandre David, MVP'06 21

Example: Adding Numbers

n p p p S E p p n n S p p n TP log 2 1 1 log 2 log 2 + = = ⇒ + = ⇒ + =

slide-22
SLIDE 22

10-03-2006 Alexandre David, MVP'06 22

Speedup

slide-23
SLIDE 23

10-03-2006 Alexandre David, MVP'06 23

Scalable Parallel System

Can maintain its efficiency constant when

increasing the number of processors and the size of the problem.

In many cases T0=f(TS,p) and grows sub-

linearly with TS. It can be possible to increase p and TS and keep E constant.

Scalability measures the ability to increase

speedup in function of p.

slide-24
SLIDE 24

10-03-2006 Alexandre David, MVP'06 24

Cost-Optimality

Cost optimal parallel systems have

efficiency Θ(1).

So scalability and cost-optimality are

linked.

Adding number example: becomes cost-

  • ptimal when n=Ω(p logp).
slide-25
SLIDE 25

10-03-2006 Alexandre David, MVP'06 25

Scalable System

Efficiency can be kept constant when

the number of processors increases and the problem size increases.

At which rate the problem size should

increase with the number of processors?

The rate determines the degree of scalability.

In complexity problem size = size of the

  • input. Here = number of basic operations

to solve the problem. Noted W.

slide-26
SLIDE 26

10-03-2006 Alexandre David, MVP'06 26

Rewrite Formulas

Parallel execution time Speedup Efficiency

slide-27
SLIDE 27

10-03-2006 Alexandre David, MVP'06 27

Isoefficiency Function

For scalable systems efficiency can be kept

constant if T0/W is kept constant.

For a target E Keep this constant Isoefficiency function

W=KT0(W,p)

slide-28
SLIDE 28

10-03-2006 Alexandre David, MVP'06 28

Example

Adding number: We saw that T0=2p logp. We get W=K 2p logp. If we increate p to p’, the problem size

must be increased by (p’ logp’ )/(p logp) to keep the same efficiency.

Increase p by p’/p. Increase n by (p’ logp’ )/(p logp).

slide-29
SLIDE 29

10-03-2006 Alexandre David, MVP'06 29

Example

Isoefficiency = Θ(p3).

slide-30
SLIDE 30

10-03-2006 Alexandre David, MVP'06 30

Why?

After isoefficiency analysis, we can test our

parallel program with few processors and then predict what will happen for larger systems.

slide-31
SLIDE 31

10-03-2006 Alexandre David, MVP'06 31

Link to Cost-Optimality

A parallel system is cost-optimal iff pTP=Θ(W).

A parallel system is cost-optimal iff its overhead (T0) does not exceed (asymptotically) the problem size.

slide-32
SLIDE 32

10-03-2006 Alexandre David, MVP'06 32

Lower Bounds

For a problem consisting of W units of

work, p ≤ W processors can be used

  • ptimally.

W=Ω(p) is the lower bound. For a degree of concurrency C(W),

p ≤ C(W).

C(W)=Θ(W) for optimality (necessary

condition).

slide-33
SLIDE 33

10-03-2006 Alexandre David, MVP'06 33

Example

Gaussian elimination: W=Θ(n3).

But eliminate n variables consecutively with

Θ(n2) operations → C(W) = O(n2) = O(W2/3).

Use all the processors: C(W)=Θ(p) →

W=Ω(p3/2).

slide-34
SLIDE 34

10-03-2006 Alexandre David, MVP'06 34

Minimum Execution Time

If TP

in function of p, we want its

  • minimum. Find p0 s.t. dTP/dp=0.

Adding n numbers: TP=n/p+2 logp.

→ p0=n/2. → TP

min=2 logn.

Fastest but not necessary cost-optimal.

slide-35
SLIDE 35

10-03-2006 Alexandre David, MVP'06 35

Cost-Optimal Minimum Execution Time

If we solve cost-optimally, what is the

minimum execution time?

We saw that if isoefficiency function =

Θ(f(p)) then a problem of size W can be solved optimally iff p=Ω(f-1(W)).

Cost-optimal system: TP=Θ(W/p)

→ TP

cost_opt=Ω(W/f-1(W)).

slide-36
SLIDE 36

10-03-2006 Alexandre David, MVP'06 36

Example: Adding Numbers

Isoefficiency function f(p)=Θ(p logp).

W=n=f(p)=p logp → logn=logp loglogp. We have approximately p=n/logn=f-1(n).

TPcost_opt=Ω(W/f-1(W))

=Ω(n/logn * log(n/logn) / (n/logn)) = Ω(log(n/logn))= Ω(logn -loglogn)= Ω(logn).

TP=Θ(n/p+logp)=Θ(logn+log(n/logn))

=Θ(2logn-loglogn)= Θ(logn).

For this example TP

cost_opt= Θ(TP min).

slide-37
SLIDE 37

10-03-2006 Alexandre David, MVP'06 37

Remark

If p0 > C(W) then its value is meaningless.

TP

min is obtained for p=C(W).

slide-38
SLIDE 38

10-03-2006 Alexandre David, MVP'06 38

Asymptotic Analysis of Parallel Programs

slide-39
SLIDE 39

10-03-2006 Alexandre David, MVP'06 39

Other Scalability Metrics

Scaled speedup: speedup when problem

size increases linearly in function of p.

Motivation: constraints such as memory linear

in function of p.

Time and memory constrained.