Voltage Overscaling Algorithms for Energy-Efficient Workflow - - PowerPoint PPT Presentation

voltage overscaling algorithms for energy efficient
SMART_READER_LITE
LIVE PREVIEW

Voltage Overscaling Algorithms for Energy-Efficient Workflow - - PowerPoint PPT Presentation

Introduction Theoretical Approach Simulations Conclusion Voltage Overscaling Algorithms for Energy-Efficient Workflow Computations With Timing Errors elien Cavelan 1 , Yves Robert 1 , 2 , Hongyang Sun 1 Aur eric Vivien 1 and Fr ed 1 .


slide-1
SLIDE 1

1/27 Introduction Theoretical Approach Simulations Conclusion

Voltage Overscaling Algorithms for Energy-Efficient Workflow Computations With Timing Errors

Aur´ elien Cavelan1, Yves Robert1,2, Hongyang Sun1 and Fr´ ed´ eric Vivien1

  • 1. ENS Lyon & INRIA, France
  • 2. University of Tennessee Knoxville, USA

aurelien.cavelan@ens-lyon.fr Green Days – Toulouse March 17, 2015

slide-2
SLIDE 2

2/27 Introduction Theoretical Approach Simulations Conclusion

Outline

1

Introduction

2

Theoretical Approach

3

Simulations

4

Conclusion

slide-3
SLIDE 3

3/27 Introduction Theoretical Approach Simulations Conclusion

Dynamic Power Consumption

One can use Dynamic Voltage and Frequency Scaling (DVFS) to reduce power consumption.

slide-4
SLIDE 4

3/27 Introduction Theoretical Approach Simulations Conclusion

Dynamic Power Consumption

One can use Dynamic Voltage and Frequency Scaling (DVFS) to reduce power consumption. Power = αfV 2 α the effective capacitance f the frequency V the operating voltage

slide-5
SLIDE 5

3/27 Introduction Theoretical Approach Simulations Conclusion

Dynamic Power Consumption

One can use Dynamic Voltage and Frequency Scaling (DVFS) to reduce power consumption. Power = αfV 2 α the effective capacitance f the frequency V the operating voltage

⇒Voltage has a quadratic impact on the dynamic power.

slide-6
SLIDE 6

4/27 Introduction Theoretical Approach Simulations Conclusion

The Easy Way

We target energy consumption only, not time. For any frequency value, there is a threshold voltage:

slide-7
SLIDE 7

4/27 Introduction Theoretical Approach Simulations Conclusion

The Easy Way

We target energy consumption only, not time. For any frequency value, there is a threshold voltage:

1 Find the frequency that minimizes energy consumption

slide-8
SLIDE 8

4/27 Introduction Theoretical Approach Simulations Conclusion

The Easy Way

We target energy consumption only, not time. For any frequency value, there is a threshold voltage:

1 Find the frequency that minimizes energy consumption 2 Decrease the voltage to threshold voltage

slide-9
SLIDE 9

4/27 Introduction Theoretical Approach Simulations Conclusion

The Easy Way

We target energy consumption only, not time. For any frequency value, there is a threshold voltage:

1 Find the frequency that minimizes energy consumption 2 Decrease the voltage to threshold voltage

?

Can we do better ?

slide-10
SLIDE 10

5/27 Introduction Theoretical Approach Simulations Conclusion

Threshold Voltage

1.14 1.18 1.22 1.26 1.30 1.34 1.38 1.42 1.46 1.50 1.54 1.58 1.62 1.66 1.70 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

threshold voltage safety margin

  • env. margin

Voltage Failure Probability One operation

Figure: Set of voltages of a FPGA multiplier block and the associated error probabilities measured on random inputs at 90MHz and 27°C

slide-11
SLIDE 11

6/27 Introduction Theoretical Approach Simulations Conclusion

Timing Errors

Definition The results of some logic gates could be used before their

  • utput signals reach their final values.

Occur when Vdd < Vth Deterministic but unpredictable Induce Silent Data Corruptions (SDC)

slide-12
SLIDE 12

6/27 Introduction Theoretical Approach Simulations Conclusion

Timing Errors

Definition The results of some logic gates could be used before their

  • utput signals reach their final values.

Occur when Vdd < Vth Deterministic but unpredictable Induce Silent Data Corruptions (SDC) Unlike lightning, timing errors always strike twice

slide-13
SLIDE 13

6/27 Introduction Theoretical Approach Simulations Conclusion

Timing Errors

Definition The results of some logic gates could be used before their

  • utput signals reach their final values.

Occur when Vdd < Vth Deterministic but unpredictable Induce Silent Data Corruptions (SDC) Unlike lightning, timing errors always strike twice Silent errors are detected only when the corrupt data is activated

slide-14
SLIDE 14

7/27 Introduction Theoretical Approach Simulations Conclusion

Two Approaches

Near-Threshold Computing (Vdd ≈ Vth) Used in NTC circuits (hardware) Almost safe Great energy savings

slide-15
SLIDE 15

7/27 Introduction Theoretical Approach Simulations Conclusion

Two Approaches

Near-Threshold Computing (Vdd ≈ Vth) Used in NTC circuits (hardware) Almost safe Great energy savings Voltage Overscaling (Vdd < Vth) Even more energy savings Purely software-based approach Generate timing errors Require a verification mechanism Require probabilities of failure for the platform

slide-16
SLIDE 16

8/27 Introduction Theoretical Approach Simulations Conclusion

Question

?

Is it possible to obtain the (correct) result of a computation for a lower energy budget than that of the best DVFS / NTC solution?

slide-17
SLIDE 17

9/27 Introduction Theoretical Approach Simulations Conclusion

Outline

1

Introduction

2

Theoretical Approach

3

Simulations

4

Conclusion

slide-18
SLIDE 18

10/27 Introduction Theoretical Approach Simulations Conclusion

Model Assumptions

Consider a task and a set of voltages V: Voltages V1 V2 · · · Vm = Vth P(Vℓ-fail) p1 p2 . . . pm = 0 Cost c1 c2 . . . cm

slide-19
SLIDE 19

10/27 Introduction Theoretical Approach Simulations Conclusion

Model Assumptions

Consider a task and a set of voltages V: Voltages V1 V2 · · · Vm = Vth P(Vℓ-fail) p1 p2 . . . pm = 0 Cost c1 c2 . . . cm Remember: timing errors always strike twice.

slide-20
SLIDE 20

10/27 Introduction Theoretical Approach Simulations Conclusion

Model Assumptions

Consider a task and a set of voltages V: Voltages V1 V2 · · · Vm = Vth P(Vℓ-fail) p1 p2 . . . pm = 0 Cost c1 c2 . . . cm Remember: timing errors always strike twice. When an error strikes, a higher voltage must be used Switching from voltage Vℓ to Vh incurs a cost oℓ,h Execution at Vth always succeeds

slide-21
SLIDE 21

10/27 Introduction Theoretical Approach Simulations Conclusion

Model Assumptions

Consider a task and a set of voltages V: Voltages V1 V2 · · · Vm = Vth P(Vℓ-fail) p1 p2 . . . pm = 0 Cost c1 c2 . . . cm Remember: timing errors always strike twice. When an error strikes, a higher voltage must be used Switching from voltage Vℓ to Vh incurs a cost oℓ,h Execution at Vth always succeeds How to compute the probability of failure ?

slide-22
SLIDE 22

10/27 Introduction Theoretical Approach Simulations Conclusion

Model Assumptions

Consider a task and a set of voltages V: Voltages V1 V2 · · · Vm = Vth P(Vℓ-fail) p1 p2 . . . pm = 0 Cost c1 c2 . . . cm Remember: timing errors always strike twice. When an error strikes, a higher voltage must be used Switching from voltage Vℓ to Vh incurs a cost oℓ,h Execution at Vth always succeeds How to compute the probability of failure ? The optimal sequence of voltages ?

slide-23
SLIDE 23

11/27 Introduction Theoretical Approach Simulations Conclusion

Property of Timing Errors

1 Given an operation and an input I, there exists a threshold

voltage Vth(I):

V < Vth(I) will always lead to an incorrect result V ≥ Vth(I) will always lead to a successful execution

slide-24
SLIDE 24

11/27 Introduction Theoretical Approach Simulations Conclusion

Property of Timing Errors

1 Given an operation and an input I, there exists a threshold

voltage Vth(I):

V < Vth(I) will always lead to an incorrect result V ≥ Vth(I) will always lead to a successful execution

2 Given an operation and a voltage V ∈ V:

I denotes the set of all possible inputs If (V ) ⊆ I is the set of inputs that will fail at voltage V Failure probability is computed as pV = | If (V )|/| I | For any two voltages V1 ≥ V2, we have If (V1) ⊆ If (V2)

slide-25
SLIDE 25

11/27 Introduction Theoretical Approach Simulations Conclusion

Property of Timing Errors

1 Given an operation and an input I, there exists a threshold

voltage Vth(I):

V < Vth(I) will always lead to an incorrect result V ≥ Vth(I) will always lead to a successful execution

2 Given an operation and a voltage V ∈ V:

I denotes the set of all possible inputs If (V ) ⊆ I is the set of inputs that will fail at voltage V Failure probability is computed as pV = | If (V )|/| I | For any two voltages V1 ≥ V2, we have If (V1) ⊆ If (V2)

P(Vℓ-fail | V0V1 · · · Vℓ−1-fail) = | If (Vℓ)|/| I | | If (Vℓ−1)|/| I | = pℓ pℓ−1

slide-26
SLIDE 26

12/27 Introduction Theoretical Approach Simulations Conclusion

Energy Consumption of a Single Task

Consider a sequence L of k voltages V1 < V2 < · · · < Vk = Vth, Execution starts at voltage V1:

1 In case of success, return the result ! 2 In case of failure, go to next (higher) voltage

slide-27
SLIDE 27

12/27 Introduction Theoretical Approach Simulations Conclusion

Energy Consumption of a Single Task

Consider a sequence L of k voltages V1 < V2 < · · · < Vk = Vth, Execution starts at voltage V1:

1 In case of success, return the result ! 2 In case of failure, go to next (higher) voltage

E(L) = c1 + p1

  • 1,2 + c2 + p2

p1

  • 2,3 + c3 + . . . pk−1

pk−2 (ok−1,k + ck

  • = c1 + p1(o1,2 + c2) + p2(o2,3 + c3) + · · · + pk−1(ok−1,k + ck)
slide-28
SLIDE 28

12/27 Introduction Theoretical Approach Simulations Conclusion

Energy Consumption of a Single Task

Consider a sequence L of k voltages V1 < V2 < · · · < Vk = Vth, Execution starts at voltage V1:

1 In case of success, return the result ! 2 In case of failure, go to next (higher) voltage

E(L) = c1 + p1

  • 1,2 + c2 + p2

p1

  • 2,3 + c3 + . . . pk−1

pk−2 (ok−1,k + ck

  • = c1 + p1(o1,2 + c2) + p2(o2,3 + c3) + · · · + pk−1(ok−1,k + ck)

We generalize: E(L) = c1 +

k

  • ℓ=2

pℓ−1 (oℓ−1,ℓ + cℓ) (1)

slide-29
SLIDE 29

13/27 Introduction Theoretical Approach Simulations Conclusion

Optimal Sequence of Voltages

Theorem To minimize the expected energy consumption for a single task, the optimal sequence of voltages to execute the task with a preset voltage Vp ∈ V of the system can be obtained by dynamic programming with complexity O(k2).

slide-30
SLIDE 30

13/27 Introduction Theoretical Approach Simulations Conclusion

Optimal Sequence of Voltages

Theorem To minimize the expected energy consumption for a single task, the optimal sequence of voltages to execute the task with a preset voltage Vp ∈ V of the system can be obtained by dynamic programming with complexity O(k2). E(L∗

s)

= cs + min

s<ℓ≤k {E(L∗ ℓ) − cℓ + ps(os,ℓ + cℓ)}

(2) and the optimal sequence starting with Vs is L∗

s = Vs, L∗ ℓ′ where

ℓ′ = arg min

s<ℓ≤k

{E(L∗

ℓ) + psos,ℓ + (ps − 1)cℓ} .

The dynamic program is initialized with E(L∗

k) = ck and L∗ k = Vk

slide-31
SLIDE 31

14/27 Introduction Theoretical Approach Simulations Conclusion

Chain of Tasks

Without switching cost: optimal sequence for one task can be used to execute each task.

slide-32
SLIDE 32

14/27 Introduction Theoretical Approach Simulations Conclusion

Chain of Tasks

Without switching cost: optimal sequence for one task can be used to execute each task. With switching cost:

After execution of a task, platform is left at voltage Ve Optimal sequence starts at voltage Vs Additional switching cost oe,s must be paid Algorithm for one task is no longer optimal

slide-33
SLIDE 33

14/27 Introduction Theoretical Approach Simulations Conclusion

Chain of Tasks

Without switching cost: optimal sequence for one task can be used to execute each task. With switching cost:

After execution of a task, platform is left at voltage Ve Optimal sequence starts at voltage Vs Additional switching cost oe,s must be paid Algorithm for one task is no longer optimal

Theorem To minimize the expected energy consumption for a linear chain of tasks, the optimal sequence of voltages to execute each task, given the terminating voltage of its preceding task (or given the preset voltage Vp of the system for the first task), can be obtained by dynamic programming with complexity O(nk2).

slide-34
SLIDE 34

15/27 Introduction Theoretical Approach Simulations Conclusion

Outline

1

Introduction

2

Theoretical Approach

3

Simulations

4

Conclusion

slide-35
SLIDE 35

16/27 Introduction Theoretical Approach Simulations Conclusion

Blocked Matrix-Matrix Multiplication

Consider the blocked matrix multiplication C = A × B. Application Workflow for i = 1 to ⌈ m

b ⌉ do

for j = 1 to ⌈ m

b ⌉ do

for k = 1 to ⌈ m

b ⌉ do

Ci,j ← Ci,j + Ai,k × Bk,j m denotes the matrix size b denotes the block size

slide-36
SLIDE 36

16/27 Introduction Theoretical Approach Simulations Conclusion

Blocked Matrix-Matrix Multiplication

Consider the blocked matrix multiplication C = A × B. Application Workflow for i = 1 to ⌈ m

b ⌉ do

for j = 1 to ⌈ m

b ⌉ do

for k = 1 to ⌈ m

b ⌉ do

Ci,j ← Ci,j + Ai,k × Bk,j m denotes the matrix size b denotes the block size ABFT can be used to add per-block verification.

slide-37
SLIDE 37

17/27 Introduction Theoretical Approach Simulations Conclusion

Algorithm Based Fault Tolerence (ABFT)

Let eT = [1, 1, · · · , 1], we define Ac :=

  • A

eTA

  • , Br :=
  • B

Be

  • , Cf :=
  • C

Ce eTC eTCe

  • .

Where Ac is the column checksum matrix, Br is the row checksum matrix and Cf is the full checksum matrix.

slide-38
SLIDE 38

17/27 Introduction Theoretical Approach Simulations Conclusion

Algorithm Based Fault Tolerence (ABFT)

Let eT = [1, 1, · · · , 1], we define Ac :=

  • A

eTA

  • , Br :=
  • B

Be

  • , Cf :=
  • C

Ce eTC eTCe

  • .

Where Ac is the column checksum matrix, Br is the row checksum matrix and Cf is the full checksum matrix. Ac × Br =

  • A

eTA

  • ×
  • B

Be

  • =
  • AB

ABe eTAB eTABe

  • =
  • C

Ce eTC eTCe

  • = Cf
slide-39
SLIDE 39

18/27 Introduction Theoretical Approach Simulations Conclusion

Matrix Parameters

Consider the matrix multiplication as a chain of n = ⌈ m

b ⌉3 tasks.

Time to Execute one Task t = τ · w/η

τ = 1/f time to do one cycle η = 0.8 peak performance

w = b(b + 1)2 + σ

σ = 83 initialization overhead (b + 1) ABFT overhead

slide-40
SLIDE 40

18/27 Introduction Theoretical Approach Simulations Conclusion

Matrix Parameters

Consider the matrix multiplication as a chain of n = ⌈ m

b ⌉3 tasks.

Time to Execute one Task t = τ · w/η

τ = 1/f time to do one cycle η = 0.8 peak performance

w = b(b + 1)2 + σ

σ = 83 initialization overhead (b + 1) ABFT overhead

Failure Probabilities Consider a set of voltages V. For any voltage Vℓ ∈ V pℓ = 1 − (1 − p(1)

ℓ /γ)w

γ =

silent errors timing errors

p(1)

probablity of timing error for one random operation

slide-41
SLIDE 41

19/27 Introduction Theoretical Approach Simulations Conclusion

Platform Settings

From [1] for a FPGA at f = 90MHz and 27◦C: Set of voltages Timing errors probabilities Dynamic Power Consumption P(V , f ) = αfV 2 We assume (wlog) αf = 1

slide-42
SLIDE 42

19/27 Introduction Theoretical Approach Simulations Conclusion

Platform Settings

From [1] for a FPGA at f = 90MHz and 27◦C: Set of voltages Timing errors probabilities Dynamic Power Consumption P(V , f ) = αfV 2 We assume (wlog) αf = 1

slide-43
SLIDE 43

19/27 Introduction Theoretical Approach Simulations Conclusion

Platform Settings

From [1] for a FPGA at f = 90MHz and 27◦C: Set of voltages Timing errors probabilities Dynamic Power Consumption P(V , f ) = αfV 2 We assume (wlog) αf = 1 Voltage Switching Cost

  • ℓ,h =
  • 0,

if ℓ = h β · |Vℓ−Vh|

Vk−V1

  • therwise

β = o1,k

slide-44
SLIDE 44

20/27 Introduction Theoretical Approach Simulations Conclusion

Algorithms

N-Voltage: Baseline algorithm that applies NTC and always uses threshold voltage. DP1-detect & DP1-correct: Optimal dynamic programming algorithms for a single task. DPn-detect & DPn-correct: Optimal dynamic programming algorithms for a for a chain of tasks. detect algorithms use ABFT for error detection. correct algorithms use ABFT for detection and correction.

slide-45
SLIDE 45

21/27 Introduction Theoretical Approach Simulations Conclusion

Probabilities of failure

1 . 1 8 1 . 2 2 1 . 2 6 1 . 3 1 . 3 4 1 . 3 8 1 . 4 2 1 . 4 6 1 . 5 1 . 5 4 0.2 0.4 0.6 0.8 1 Voltage Vℓ Failure Probability pℓ

One operation

1.3 1.34 1.38 1.42 1.46 1.5 1.54 0.2 0.4 0.6 0.8 1 Voltage Vℓ Failure Probability pℓ

b=16 b=32 b=64 b=128 b=256 b=1024

Figure: Failure probabilities for one operation and for one task under different block sizes and voltages.

slide-46
SLIDE 46

22/27 Introduction Theoretical Approach Simulations Conclusion

Simulations (without switching cost)

50 100 150 200 250 0.85 0.9 0.95 1 1.05 Block Size b Normalized Expected Energy

DPn−detect DPn−correct DP1−detect DP1−correct N−Voltage

50 100 150 200 250 0.75 0.8 0.85 0.9 0.95 1 1.05 Block Size b Normalized Expected Energy

γ = 1 γ = 10 γ = 100 γ = 1000 γ = 10000 N−Voltage

Figure: Impact of b and γ on the expected energy consumption for zero voltage switching cost. Only the results for the DPn-correct algorithm are shown.

slide-47
SLIDE 47

23/27 Introduction Theoretical Approach Simulations Conclusion

Simulations (with switching cost)

50 100 150 200 250 0.85 0.9 0.95 1 1.05 1.1 1.15 Block Size b Normalized Expected Energy

DPn−detect DPn−correct DP1−detect DP1−correct N−Voltage

20 40 60 80 100 120 0.9 0.95 1 1.05 Switching Cost β = V 2

k · τx3/η

Normalized Expected Energy

DPn−detect DPn−correct DP1−detect DP1−correct N−Voltage

Figure: Impact of b and β on the expected energy consumption. The voltage switching cost is equivalent to the energy consumed to multiply two 32 × 32 matrices at threshold voltage without overhead.

slide-48
SLIDE 48

24/27 Introduction Theoretical Approach Simulations Conclusion

Outline

1

Introduction

2

Theoretical Approach

3

Simulations

4

Conclusion

slide-49
SLIDE 49

25/27 Introduction Theoretical Approach Simulations Conclusion

Conclusion

We use dynamic voltage overscaling to reduce power consumption. Summary Software based approach Model for timing errors Optimal polynomial-time solution for a chain of tasks Simulations on matrix multiplication using ABFT

slide-50
SLIDE 50

25/27 Introduction Theoretical Approach Simulations Conclusion

Conclusion

We use dynamic voltage overscaling to reduce power consumption. Summary Software based approach Model for timing errors Optimal polynomial-time solution for a chain of tasks Simulations on matrix multiplication using ABFT Original problem and encouraging results; needs further research.

slide-51
SLIDE 51

25/27 Introduction Theoretical Approach Simulations Conclusion

Conclusion

We use dynamic voltage overscaling to reduce power consumption. Summary Software based approach Model for timing errors Optimal polynomial-time solution for a chain of tasks Simulations on matrix multiplication using ABFT Original problem and encouraging results; needs further research. Future Work Algorithms for other task graphs Additional simulations, emulations and experiments

slide-52
SLIDE 52

26/27 Introduction Theoretical Approach Simulations Conclusion

Questions

Thanks! Questions?

slide-53
SLIDE 53

27/27 Introduction Theoretical Approach Simulations Conclusion

Bibliography

  • D. Ernst, S. Das, S. Lee, D. Blaauw, T. Austin, T. Mudge,
  • N. S. Kim, and K. Flautner.

Razor: circuit-level correction of timing errors for low-power

  • peration.

IEEE Micro, 24(6):10–20, 2004.