Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg - - PowerPoint PPT Presentation

toward understanding heterogeneity in computing
SMART_READER_LITE
LIVE PREVIEW

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg - - PowerPoint PPT Presentation

Toward Understanding Heterogeneity in Computing Arnold L. Rosenberg Ron C. Chiang Electrical & Computer Engineering Colorado State University Fort Collins, CO, 80523, USA Heterogeneity in Computing One encounters HETEROGENEITY in virtually


slide-1
SLIDE 1

Toward Understanding Heterogeneity in Computing

Arnold L. Rosenberg Ron C. Chiang Electrical & Computer Engineering Colorado State University Fort Collins, CO, 80523, USA

slide-2
SLIDE 2

Heterogeneity in Computing One encounters HETEROGENEITY in virtually all modern computing systems

slide-3
SLIDE 3

Heterogeneity in Computing One encounters heterogeneity in virtually all modern computing systems

  • Computers in clusters/grids differ in power (NODE-HETEROGENEITY).
slide-4
SLIDE 4

Heterogeneity in Computing One encounters heterogeneity in virtually all modern computing systems

  • Computers in clusters/grids differ in power (node-heterogeneity).
  • Computers intercommunicate across varied networks (LINK-HETEROGENEITY).
slide-5
SLIDE 5

Heterogeneity in Computing One encounters heterogeneity in virtually all modern computing systems

  • Computers in clusters/grids differ in power (node-heterogeneity).
  • Computers intercommunicate across varied networks (link-heterogeneity).

WE FOCUS ON NODE-HETEROGENEITY.

slide-6
SLIDE 6

“Big” Questions about Heterogeneity Heterogeneity complicates the efficient use of multicomputer platforms

slide-7
SLIDE 7

“Big” Questions about Heterogeneity Heterogeneity complicates the efficient use of multicomputer platforms — BUT CAN IT ENHANCE THEIR PERFORMANCE?

slide-8
SLIDE 8

“Big” Questions about Heterogeneity Heterogeneity complicates the efficient use of multicomputer platforms — but can it enhance their performance? HOW DOES ONE STUDY THIS QUESTION RIGOROUSLY?

slide-9
SLIDE 9

Detailed Questions about Heterogeneity

  • WHAT MAKES ONE CLUSTER MORE POWERFUL THAN ANOTHER?
slide-10
SLIDE 10

Detailed Questions about Heterogeneity

  • What makes one cluster more powerful than another?
  • ARE YOU BETTER OFF . . .

— WITH ONE SUPER-FAST COMPUTER AND MANY “AVERAGE” ONES?

slide-11
SLIDE 11

Detailed Questions about Heterogeneity

  • What makes one cluster more powerful than another?
  • ARE YOU BETTER OFF . . .

— WITH ONE SUPER-FAST COMPUTER AND MANY “AVERAGE” ONES? — WITH ALL COMPUTERS “MODERATELY” FAST?

slide-12
SLIDE 12

Detailed Questions about Heterogeneity

  • What makes one cluster more powerful than another?
  • Are you better off with

— one super-fast computer and many “average” ones — or with all computers “moderately” fast?

  • IF YOU COULD “SPEED UP” JUST ONE COMPUTER . . .

WHICH ONE WOULD YOU CHOOSE?

slide-13
SLIDE 13

Detailed Questions about Heterogeneity

  • What makes one cluster more powerful than another?
  • Are you better off with

— one super-fast computer and many “average” ones — or with all computers “moderately” fast?

  • IF YOU COULD “SPEED UP” JUST ONE COMPUTER . . .

WHICH ONE WOULD YOU CHOOSE? — THE FASTEST ONE?

slide-14
SLIDE 14

Detailed Questions about Heterogeneity

  • What makes one cluster more powerful than another?
  • Are you better off with

— one super-fast computer and many “average” ones — or with all computers “moderately” fast?

  • IF YOU COULD “SPEED UP” JUST ONE COMPUTER . . .

WHICH ONE WOULD YOU CHOOSE? — THE FASTEST ONE? — THE SLOWEST ONE?

slide-15
SLIDE 15

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn

slide-16
SLIDE 16

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units.

slide-17
SLIDE 17

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units. C’s heterogeneity profile: PC = ρ1, ρ2, . . . , ρn

slide-18
SLIDE 18

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units. C’s heterogeneity profile: PC = ρ1, ρ2, . . . , ρn One finds in

  • M. Adler, Y. Gong, A.L. Rosenberg (2008): On “exploiting” node-heterogeneous

clusters optimally. Theory of Computing Systems 42, 465–487

a solution to the CLUSTER-EXPLOITATION PROBLEM . . . — a search for a schedule that maximizes C’s rate of completing work

slide-19
SLIDE 19

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units. C’s heterogeneity profile: PC = ρ1, ρ2, . . . , ρn One finds in

  • M. Adler, Y. Gong, A.L. Rosenberg (2008): On “exploiting” node-heterogeneous

clusters optimally. Theory of Computing Systems 42, 465–487

a solution to the CLUSTER-EXPLOITATION PROBLEM THE OPTIMAL SCHEDULE FOR C DEPENDS ONLY ON PC

slide-20
SLIDE 20

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units. C’s heterogeneity profile: PC = ρ1, ρ2, . . . , ρn One finds in

  • M. Adler, Y. Gong, A.L. Rosenberg (2008): On “exploiting” node-heterogeneous

clusters optimally. Theory of Computing Systems 42, 465–487

a solution the CLUSTER-EXPLOITATION PROBLEM The optimal schedule for C depends only on PC THE WORK COMPLETED UNDER THIS SCHEDULE IS OUR MEASURE OF C’s “POWER”

slide-21
SLIDE 21

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units. C’s heterogeneity profile: PC = ρ1, ρ2, . . . , ρn C’s “power”: the work completed by the optimal solution to the CLUSTER-EXPLOITATION PROBLEM The expression for this work is complicated . . . — so we also measure C’s “power” by its HECR: Homogeneous Equivalent Computing Rate

slide-22
SLIDE 22

A Formal Framework for Studying the Questions Cluster C has computers C1, C2, . . . , Cn Ci completes one unit of work in ρi time units. C’s heterogeneity profile: PC = ρ1, ρ2, . . . , ρn C’s HECR (Homogeneous Equivalent Computing Rate) . . . the computing rate ρ(C) such that the HOMOgeneous cluster with profile ρ(C), ρ(C), . . . , ρ(C) completes work at the same rate as C.

slide-23
SLIDE 23

ON TO OUR QUESTIONS!

slide-24
SLIDE 24

Which ONE Computer Should You Speed UP?

slide-25
SLIDE 25

Which Computer to Speed Up: Additive Speedup Speeding up computer Ci additively by the amount ϕ . . . replaces profile PC = ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn by profile PC = ρ1, . . . , ρi−1, ρi − ϕ , ρi+1, . . . , ρn Say that 0 < ϕ < mini{ρi}, so every Ci can be sped up.

slide-26
SLIDE 26

Which Computer to Speed Up: Additive Speedup Speeding up computer Ci additively by the amount ϕ: ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn − → ρ1, . . . , ρi−1, ρi − ϕ , ρi+1, . . . , ρn Theorem. Under the additive-speedup scenario, the most advantageous single computer to speed up is C’s fastest computer.

slide-27
SLIDE 27

Which Computer to Speed Up: Additive Speedup Speeding up computer Ci additively by the amount ϕ: ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn − → ρ1, . . . , ρi−1, ρi − ϕ , ρi+1, . . . , ρn Theorem. Under the additive-speedup scenario, the most advantageous single computer to speed up is C’s fastest computer. Initial profile: 1, 1/2, 1/3, 1/4 Speedup amount: ϕ = 1/16 Speed up Work ratio i computer Ci OLD ÷ NEW 1 15/16, 1/2, 1/3, 1/4 1.008 2 1, 7/16, 1/3, 1/4 1.014 3 1, 1/2, 13/48, 1/4 1.034 4 1, 1/2, 1/3, 3/16 1.159

slide-28
SLIDE 28

Which Computer to Speed Up: Additive Speedup Speeding up computer Ci additively by the amount ϕ: ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn − → ρ1, . . . , ρi−1, ρi − ϕ , ρi+1, . . . , ρn Theorem. Under the additive-speedup scenario, the most advantageous single computer to speed up is C’s fastest computer. Speed up Work ratio i computer Ci OLD ÷ NEW 1 15/16, 1/2, 1/3, 1/4 1.008 2 1, 7/16, 1/3, 1/4 1.014 3 1, 1/2, 13/48, 1/4 1.034 4 1, 1/2, 1/3, 3/16 1.159 INTUITION: MORE BANG FOR THE BUCK

slide-29
SLIDE 29

Which Computer to Speed Up: Multiplicative Speedup Speeding up computer Ci multiplicatively by factor ψ . . . replaces profile PC = ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn by profile PC = ρ1, . . . , ρi−1, ψρi , ρi+1, . . . , ρn Say that 0 < ψ < 1, so every Ci can be sped up.

slide-30
SLIDE 30

Which Computer to Speed Up: Multiplicative Speedup Speeding up computer Ci multiplicatively by factor ψ: ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn − → ρ1, . . . , ρi−1, ψρi , ρi+1, . . . , ρn Say that 0 < ψ < 1, so every Ci can be sped up finitely. “Theorem.” Under the multiplicative-speedup scenario: The most advantageous single computer to speed up is C’s fastest computer . . .

slide-31
SLIDE 31

Which Computer to Speed Up: Multiplicative Speedup Speeding up computer Ci multiplicatively by factor ψ: ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn − → ρ1, . . . , ρi−1, ψρi , ρi+1, . . . , ρn Say that 0 < ψ < 1, so every Ci can be sped up finitely. “Theorem.” Under the multiplicative-speedup scenario: The most advantageous single computer to speed up is C’s fastest computer . . . — UNLESS

slide-32
SLIDE 32

Which Computer to Speed Up: Multiplicative Speedup Speeding up computer Ci multiplicatively by factor ψ: ρ1, . . . , ρi−1, ρi , ρi+1, . . . , ρn − → ρ1, . . . , ρi−1, ψρi , ρi+1, . . . , ρn Say that 0 < ψ < 1, so every Ci can be sped up finitely. “Theorem.” Under the multiplicative-speedup scenario: The most advantageous single computer to speed up is C’s fastest computer . . . — UNLESS either this computer is already “very fast”

  • r the speedup factor ψ is “very small.”
slide-33
SLIDE 33

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

  • A 4-computer cluster

— HOMOgeneous (before any speedups)

  • Bar height is ρ-value . . .

— a lower bar is a faster computer

slide-34
SLIDE 34

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

  • A 4-computer cluster

— HOMOgeneous (before any speedups)

  • Bar height is ρ-value . . .

— a lower bar is a faster computer START SPEEDING UP ONE COMPUTER OPTIMALLY . . . — BY THE FACTOR ψ = 1/2

slide-35
SLIDE 35

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-36
SLIDE 36

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-37
SLIDE 37

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-38
SLIDE 38

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-39
SLIDE 39

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-40
SLIDE 40

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-41
SLIDE 41

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-42
SLIDE 42

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-43
SLIDE 43

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-44
SLIDE 44

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-45
SLIDE 45

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-46
SLIDE 46

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-47
SLIDE 47

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-48
SLIDE 48

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-49
SLIDE 49

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”:

slide-50
SLIDE 50

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”: When all computers are very fast:

slide-51
SLIDE 51

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”: When all computers are very fast:

slide-52
SLIDE 52

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”: When all computers are very fast:

slide-53
SLIDE 53

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”: When all computers are very fast:

slide-54
SLIDE 54

Which Computer to Speed Up: Multiplicative Speedup At least one computer is not “very fast”: When all computers are very fast:

slide-55
SLIDE 55

What Makes Clusters Powerful? Absolute and Relative Answers

slide-56
SLIDE 56

What Makes Clusters Powerful: Variance in Computer Speeds Say that cluster C1, with profile P1, and cluster C2, with profile P2, share the same mean speed. Theorem. Say that C1 and C2 each has 2 computers. Then C1 outperforms C2 if and only if VAR(P1) > VAR(P2).

slide-57
SLIDE 57

What Makes Clusters Powerful: Variance in Computer Speeds Say that cluster C1, with profile P1, and cluster C2, with profile P2, share the same mean speed. Say that C1 and C2 each has 2 computers. Then C1 outperforms C2 if and only if VAR(P1) > VAR(P2). Corollary. HETEROGENEITY CAN ACTUALLY LEND POWER TO A CLUSTER . . . if 2-computer clusters C1 and C2 share the same mean speed and C1 is heterogeneous, while C2 is homogeneous then C1 outperforms C2.

slide-58
SLIDE 58

What Makes Clusters Powerful: Variance in Computer Speeds Say that cluster C1, with profile P1, and cluster C2, with profile P2, share the same mean speed. Say that C1 and C2 each has 2 computers. Then C1 outperforms C2 if and only if VAR(P1) > VAR(P2). Unfortunately: THIS RESULT DOES NOT EXTEND TO 3-COMPUTER CLUSTERS

slide-59
SLIDE 59

What Makes Clusters Powerful: Variance in Computer Speeds Say that cluster C1, with profile P1, and cluster C2, with profile P2, share the same mean speed. Say that C1 and C2 each has 2 computers. Then C1 outperforms C2 if and only if VAR(P1) > VAR(P2). Unfortunately: This result does not extend to 3-computer clusters BUT . . .

slide-60
SLIDE 60

What Makes Clusters Powerful: Variance in Computer Speeds Say that cluster C1, with profile P1, and cluster C2, with profile P2, share the same mean speed. Theorem. Say that C1 and C2 each has 3 computers. There exists a threshold θ > 0 such that: if VAR(P1) ≥ VAR(P2) + θ then C1 outperforms C2.

slide-61
SLIDE 61

What Makes Clusters Powerful: Variance in Computer Speeds Say that cluster C1, with profile P1, and cluster C2, with profile P2, share the same mean speed. Theorem. Say that C1 and C2 each has 3 computers. There exists a threshold θ > 0 such that: if VAR(P1) ≥ VAR(P2) + θ then C1 outperforms C2. This result seems (based on simulations) to extend to big clusters.