Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg - - PowerPoint PPT Presentation

toward understanding heterogeneity in computing
SMART_READER_LITE
LIVE PREVIEW

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg - - PowerPoint PPT Presentation

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Department of Electrical and Computer Engineering Colorado State University Fort Collins, CO, USA {rsnbrg, ron.chiang@colostate.edu} Motivation Goal


slide-1
SLIDE 1

Toward Understanding Heterogeneity in Computing

Arnold L. Rosenberg Ron C. Chiang Department of Electrical and Computer Engineering Colorado State University Fort Collins, CO, USA {rsnbrg, ron.chiang@colostate.edu}

slide-2
SLIDE 2

2

Motivation

  • Goal

– to increase our understanding of heterogeneity in computing platforms

slide-3
SLIDE 3

3

Motivation

  • Goal

– to increase our understanding of heterogeneity in computing platforms

  • Heterogeneous computing platforms

– different computing speeds

slide-4
SLIDE 4

4

Motivation

  • Goal

– to increase our understanding of heterogeneity in computing platforms

  • Heterogeneous computing platforms

– different computing speeds – architecturally balanced

slide-5
SLIDE 5

5

“Understanding” Heterogeneity

Suppose we have

  • n+1 computers:

– the server C0 – a “cluster” C comprising n computers, C1, …, Cn

  • Heterogeneity profile of C

– Ci can complete one unit of work in time – – > <

n

ρ ρ ,...,

1 i

ρ

n

ρ ρ ρ ≥ ≥ ≥ ...

2 1

slide-6
SLIDE 6

6

The Cluster-Exploitation Problem (CEP)

  • C0 must complete as many units of work

as possible on cluster C within a given lifespan of L time units

slide-7
SLIDE 7

7

The Cluster-Exploitation Problem (CEP)

  • C0 must complete as many units of work

as possible on cluster C within a given lifespan of L time units

  • A worksharing protocol

– a schedule that solves the CEP

slide-8
SLIDE 8

8

Fixed communication cost – setup time – latency σ

Architectural Parameters

negligible over a long lifespan λ

slide-9
SLIDE 9

9

Common parameters: – transmission rate (e.g. 1 sec. / work unit) – output-to-input length ratio (= 1) For computer i, – packaging rate (e.g. 10 sec. / work unit) – unpackaging rate (e.g. 10 sec. / work unit) – workload (work units)

i

π

Architectural Parameters and Sample Values

i

π

i

w τ δ μ μ μ

slide-10
SLIDE 10

10

Worksharing Protocols

1

) ( w τ π +

C0 C1 Cn

1

w

slide-11
SLIDE 11

11

Worksharing Protocols

n

w ) ( τ π +

1 1

) 1 ( w ρ π +

C0 C1 Cn

n

w

slide-12
SLIDE 12

12

Worksharing Protocols

1

w δ

1 1

) ( w δ τ πρ +

C0 C1 Cn

n nw

ρ π) 1 ( +

slide-13
SLIDE 13

13

Worksharing Protocols

n

w δ

C0 C1 Cn

n n

w δ τ πρ ) ( +

slide-14
SLIDE 14

14

The FIFO Protocol

2 2

) ( w δ τ πρ +

1 1

) 1 ( w ρ π +

3 3

) ( w δ τ πρ +

1 1

) ( w δ τ πρ +

2 2

) 1 ( w ρ π +

3 3

) 1 ( w ρ π +

C1 C2 C3

1

) (

processes processes processes results results results waits waits waits

w τ π +

2

) ( w τ π +

3

) ( w τ π +

C0 sends work to C1 sends work to C2 sends work to C3 (NOT TO SCALE)

slide-15
SLIDE 15

15

The FIFO Protocol is Optimal

  • Theorem [Adler-Gong-Rosenberg]

Over any sufficiently long lifespan L, for any heterogeneous cluster C — no matter what its heterogeneity profile:

– FIFO worksharing protocols provide optimal solutions to the cluster-exploitation problem – C is equally productive under every FIFO protocol, i.e., under all startup orderings

slide-16
SLIDE 16

16

The Work-Production of FIFO

∑ ∏

= − =

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + + − + − + + + + =

n i i j j i

X

1 1 1

) 1 ( ) ( 1 ) 1 ( ) ( 1 Let ρ πδ π τ π τδ τ π ρ πδ π τ π

slide-17
SLIDE 17

17

The Work-Production of FIFO

L X W ⋅ + = 1 1 Then, τδ

∑ ∏

= − =

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + + − + − + + + + =

n i i j j i

X

1 1 1

) 1 ( ) ( 1 ) 1 ( ) ( 1 Let ρ πδ π τ π τδ τ π ρ πδ π τ π

slide-18
SLIDE 18

18

The Work-Production of FIFO

∑ ∏

= − =

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + = + + = + = ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈ ≈

n i i j j j i

B A B B A X

1 1 1

1 , 1 B and A let simplify, To ρ τδ ρ ρ πδ π τ π

∑ ∏

= − =

⎟ ⎟ ⎠ ⎞ ⎜ ⎜ ⎝ ⎛ + + + + − + − + + + + =

n i i j j i

X

1 1 1

) 1 ( ) ( 1 ) 1 ( ) ( 1 Let ρ πδ π τ π τδ τ π ρ πδ π τ π

slide-19
SLIDE 19

19

On Comparing Heterogeneity Profiles

  • For any cluster C with heterogeneity profile

n

P ρ ρ , ... ,

1

=

slide-20
SLIDE 20

20

On Comparing Heterogeneity Profiles

  • For any cluster C with heterogeneity profile

n

P ρ ρ , ... ,

1

=

  • C’s homogeneous-equivalent computing rate

(HECR) is

{ }

) ( ) ( max

) (

P X P X

c

≥ =

ρ ρ

ρ ρ ρ

ρ

, ... ,

) (

= P where

slide-21
SLIDE 21

21

Heterogeneity Profiles

8 1 ,..., 8 6 , 8 7 , 8 8 , 8 when range a in evenly spreads which , 1 : 1 Profile = + − = n n i n

i

ρ Number of Computers 8 16 32 HECR 0.362 0.297 0.251 Recall: faster cluster has smaller HECR value

slide-22
SLIDE 22

22

Heterogeneity Profiles

Number of Computers 8 16 32 HECR 0.216 0.116 0.061 8 1 ,..., 3 1 , 2 1 , 1 1 , 8 when 1 : 2 Profile = = n i

i

ρ

slide-23
SLIDE 23

23

  • Avg. Speed vs. Std-Dev of Speed

Randomly generate 100 profiles for each combination

Avg. Speed =0.75 Avg. Speed =0.5 Avg. Speed =0.25 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 HECR 8 computers Std-Dev=0.2 Std-Dev=0.1 Std-Dev=0.05

slide-24
SLIDE 24

24

  • Avg. Speed vs. Std-Dev of Speed

Std-Dev 8 computers’ HECR 0.2 0.1 0.05 0.75 0.681 0.735 0.759 0.5 0.411 0.482 0.501

  • Avg. Speed

0.25 0.113 0.208 0.239 The probability that these two groups have the same mean is

10

10 2

×

slide-25
SLIDE 25

25

  • Avg. Speed vs. Std-Dev of Speed

Std-Dev 8 computers’ HECR 0.2 0.1 0.05 0.75 0.681 0.735 0.759 0.5 0.411 0.482 0.501

  • Avg. Speed

0.25 0.113 0.208 0.239 Trials with 16, 32 computers show similar pattern

slide-26
SLIDE 26

26

Speeding Up Clusters Optimally under FIFO Protocols

  • Which one computer should you speed up,

if you can speed up only one?

slide-27
SLIDE 27

27

Speeding Up Clusters Optimally under FIFO Protocols

  • Which one computer should you speed up,

if you can speed up only one?

  • We study two variants of this question
slide-28
SLIDE 28

28

Speeding Up Clusters Optimally under FIFO Protocols

indices computer two be and let

  • ...

where , ,..., profile ity heterogene have cluster let

  • e,

convenienc For

2 1 1

i j i P

n n

> ≥ ≥ ≥ > =< ρ ρ ρ ρ ρ

C

slide-29
SLIDE 29

29

Fixed and Proportional Speed-up

  • Fixed-speedup scenario
  • by a fixed amount

n j j j i i i j n j j j i i i i

P P ρ ρ φ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ φ ρ ρ ρ ,..., , , ,..., , , ,..., ,..., , , ,..., , , ,...,

1 1 1 1 1 ) ( 1 1 1 1 1 ) ( + − + − + − + −

− = − =

n

ρ φ <

slide-30
SLIDE 30

30

Fixed and Proportional Speed-up

  • Fixed-speedup scenario (by a fixed amount )

n j j j i i i j n j j j i i i i

P P ρ ρ φ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ φ ρ ρ ρ ,..., , , ,..., , , ,..., ,..., , , ,..., , , ,...,

1 1 1 1 1 ) ( 1 1 1 1 1 ) ( + − + − + − + −

− = − =

  • Proportional-speedup scenario
  • by a relative amount

n j j j i i i j n j j j i i i i

P P ρ ρ ψρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ρ ψρ ρ ρ ,..., , , ,..., , , ,..., ,..., , , ,..., , , ,...,

1 1 1 1 1 ] [ 1 1 1 1 1 ] [ + − + − + − + −

= =

n

ρ φ <

1 < ψ

slide-31
SLIDE 31

31

Proposition for Fixed-Speedup

  • Under the fixed-speedup scenario, the

most advantageous single computer to speed up is C’s fastest computer

slide-32
SLIDE 32

32

Terms for following figures

  • Recall: work production
  • Work ratio

– the ratio of work production after speedup to work production before speedup

  • Speedup computer

– the single computer that is sped up

L X W ⋅ + = 1 1 τδ

slide-33
SLIDE 33

33

Fixed-Speedup Scenario

0.9 1 1.1 1.2 1.3 1.4 1.5 1 2 3 4 speedup computer Work ratio

<1, 1/2, 1/3, 1/4> <1/2, 1/4, 1/6, 1/8>

16 / 1 = φ

slide-34
SLIDE 34

34

Proposition for Proportional-Speedup

  • If

– speeding up (faster) is better

  • If

– speeding up (slower) is better

2

/B A

j i

τδ ρ ψρ <

2

/B A

j i

τδ ρ ψρ >

j

C

i

C ) and , 1 B , A : (Recall

j i

ρ ρ πδ π τ π > + + = + =

slide-35
SLIDE 35

35

Proposition for Proportional-Speedup

  • If

– speeding up (faster) is better

  • If

– speeding up (slower) is better

5 2

10 . 1 /

× = < B A

j i

τδ ρ ψρ

5 2

10 . 1 /

× = > B A

j i

τδ ρ ψρ

j

C

i

C

Parameter Rate A 11 second / work unit B with coarse (1 sec / task) tasks 1.000011 second / work unit

) and , 1 B , A : (Recall

j i

ρ ρ πδ π τ π > + + = + =

μ

slide-36
SLIDE 36

36

Proposition for Proportional-Speedup

  • If

– speeding up (faster) is better

  • If

– speeding up (slower) is better

5 2

10 . 1 /

× = < B A

j i

τδ ρ ψρ

5 2

10 . 1 /

× = > B A

j i

τδ ρ ψρ

j

C

i

C ) and , 1 B , A : (Recall

j i

ρ ρ πδ π τ π > + + = + =

That is, it is more advantageous to speed up the faster one unless either both computers are already “very fast” or the speedup factor is “very large.”

slide-37
SLIDE 37

37

Proportional-Speedup in Action

Round 1 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-38
SLIDE 38

38

Proportional-Speedup in Action

Round 2 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-39
SLIDE 39

39

Round 3 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

Proportional-Speedup in Action

ρ

slide-40
SLIDE 40

40

Proportional-Speedup in Action

Round 4 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-41
SLIDE 41

41

Proportional-Speedup in Action

Round 5 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-42
SLIDE 42

42

Proportional-Speedup in Action

Round 6 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-43
SLIDE 43

43

Proportional-Speedup in Action

Round 7 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-44
SLIDE 44

44

Proportional-Speedup in Action

Round 8 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-45
SLIDE 45

45

Proportional-Speedup in Action

Round 9 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-46
SLIDE 46

46

Proportional-Speedup in Action

Round 10 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-47
SLIDE 47

47

Proportional-Speedup in Action

Round 11 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-48
SLIDE 48

48

Proportional-Speedup in Action

Round 12 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-49
SLIDE 49

49

Proportional-Speedup in Action

Round 13 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-50
SLIDE 50

50

Proportional-Speedup in Action

Round 14 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-51
SLIDE 51

51

Proportional-Speedup in Action

Round 15 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-52
SLIDE 52

52

Proportional-Speedup in Action

Round 16 0.2 0.4 0.6 0.8 1 1.2 1 2 3 4 speedup computer

ρ

slide-53
SLIDE 53

53

Proportional-Speedup in Action

  • When all computers are very fast

– It is more advantageous to speed up the slower one

slide-54
SLIDE 54

54

Proportional-Speedup in Action

Round 17 0.02 0.04 0.06 0.08 1 2 3 4 speedup computer

ρ

slide-55
SLIDE 55

55

Proportional-Speedup in Action

Round 18 0.02 0.04 0.06 0.08 1 2 3 4 speedup computer

ρ

slide-56
SLIDE 56

56

Proportional-Speedup in Action

Round 19 0.02 0.04 0.06 0.08 1 2 3 4 speedup computer

ρ

slide-57
SLIDE 57

57

Proportional-Speedup in Action

Round 20 0.02 0.04 0.06 0.08 1 2 3 4 speedup computer

ρ

slide-58
SLIDE 58

58

Proportional-Speedup in Action

Round 21 0.02 0.04 0.06 0.08 1 2 3 4 speedup computer

ρ

slide-59
SLIDE 59

59

Summary

  • Two ways to measure computing power

– the X function – the HECR value

slide-60
SLIDE 60

60

Summary

  • Two ways to measure computing power

– the X function – the HECR value

  • Standard deviation influences work

production

slide-61
SLIDE 61

61

Summary

  • Two ways to measure computing power

– the X function – the HECR value

  • Standard deviation influences work

production

  • Speeding up a fast computer in a cluster is

almost always more advantageous than speeding up a slower one

slide-62
SLIDE 62

Thank you

Questions?

slide-63
SLIDE 63

63

HECR values

Number of Computers 8 16 32 Profile 1 0.362 0.297 0.251 Profile 2 0.216 0.116 0.061

n i n

i

1 : 1 Profile + − = ρ

Recall: faster cluster has smaller HECR value

i

i

1 : 2 Profile = ρ

slide-64
SLIDE 64

64

  • Avg. Speed vs. Std-Dev of Speed

Std-Dev 16 computers’ HECR 0.2 0.1 0.05 0.75 0.671 0.723 0.768 0.5 0.385 0.475 0.502

  • Avg. Speed

0.25 0.110 0.194 0.239

slide-65
SLIDE 65

65

  • Avg. Speed vs. Std-Dev of Speed

Std-Dev 32 computers’ HECR 0.2 0.1 0.05 0.75 0.669 0.742 0.782 0.5 0.380 0.478 0.502

  • Avg. Speed

0.25 0.115 0.197 0.239