Energy-Efficient Fault Tolerance In Chip Multiprocessors Using - - PowerPoint PPT Presentation

energy efficient fault tolerance in chip multiprocessors
SMART_READER_LITE
LIVE PREVIEW

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using - - PowerPoint PPT Presentation

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding P. Subramanyan 1 V. Singh 1 K. K. Saluja 2 E. Larsson 3 1 Indian Institute of Science, Bangalore, India 2 University of Wisconsin-Madison, Madison, WI, USA 3


slide-1
SLIDE 1

Energy-Efficient Fault Tolerance In Chip Multiprocessors Using Critical Value Forwarding

  • P. Subramanyan1
  • V. Singh1
  • K. K. Saluja2
  • E. Larsson3

1Indian Institute of Science, Bangalore, India 2University of Wisconsin-Madison, Madison, WI, USA 3Link¨

  • ping University, Link¨
  • ping, Sweden

29 June 2010

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 1 / 43

slide-2
SLIDE 2

Outline

1

Introduction Motivation Related Work

2

RECVF Design Overview Design Options DVFS in the Trailing Core

3

Evaluation Methodology Results

4

Conclusion Conclusion

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 2 / 43

slide-3
SLIDE 3

Introduction

Outline

1

Introduction Motivation Related Work

2

RECVF Design Overview Design Options DVFS in the Trailing Core

3

Evaluation Methodology Results

4

Conclusion Conclusion

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 3 / 43

slide-4
SLIDE 4

Introduction Motivation

The Reliability Problem

Moore’s law is expected to apply for the next 10 years giving us smaller and faster devices with reduced power. But, there is a downside: Smaller devices make ICs more susceptible to transient faults Wearout and drift effects are now more prominent

  • negative bias temperature instability (NBTI),

electromigration (EM), hot carrier injection (HCI) etc.

Increased process variations The upshot of decreased reliability is the need for architectural mechanisms for fault tolerance.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 4 / 43

slide-5
SLIDE 5

Introduction Motivation

Requirements of a Reliability Solution

Traditional fault-tolerance systems are targeted at mainframes or specially designed processors. Fault-tolerant systems for the commodity market have different requirements. Reliability mechanisms need to have low cost

  • low performance overhead
  • low energy overhead
  • small area overhead

Mechanism must be configurable at runtime

  • Switched off for users who do not require reliability
  • Switched off for applications that are inherently resilient

Transparent to software

  • i.e., must work with existing software

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 5 / 43

slide-6
SLIDE 6

Introduction Motivation

Why is Power/Energy Important?

Power and peak temperature are key performance limiters in CMPs1

  • Since power budget for a chip is fixed, decreasing the power for a single

core increases available power and hence performance of other cores23

Decreasing operating temperatures leads to a significant increase in device reliability4

  • Decreasing temperature from 105 ◦C to 66 ◦C increased GOI median

time-to-breakdown by a factor of 9; NBTI degradation decreased by 29% equivalent to eight-fold increase in lifetime

1Isci et al., MICRO ’06 2Greskamp et al., HPCA ’10 3Intel Nehalem 4Parulkar et al., SELSE ’08 Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 6 / 43

slide-7
SLIDE 7

Introduction Related Work

Execution Assistance (1/2)

leading core inputs trailing core results predictions

  • utput comparison

error

Two streams of execution in a leader-follower configuration Leader assists execution of follower by forwarding results Forwarded values used as predictions in the follower

  • potentially more accurate than “traditional” predictors
  • help speed up the follower

[AR-SMT, FTCS ’99], [DIVA, MICRO ’99], [SRT, ISCA ’00]

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 7 / 43

slide-8
SLIDE 8

Introduction Related Work

Execution Assistance (2/2)

Classification of Mechanisms for Execution Assistance

1 Forwarding all valuesa

Highest speedup but also requires highest bandwidth Suited for components of a single core or adjacent cores

2 Forwarding loads and branchesb

Eliminates branch mispredictions and data cache misses Solves the problem of input incoherence Still requires considerable bandwidth

3 Forwarding only branchesc 4 Forwarding critical values aAR-SMT [FTCS ’99], DIVA [MICRO ’99], Slipstream∗ [ASPLOS ’00], Madan et al. [TPDS ’07] bSRT [ISCA ’00], CRT [ISCA ’02], SRTR [ISCA ’02], CRTR [ISCA ’03], SpecIV [HPCA ’08], etc. cPaceline [PACT ’08], PVA [PACT ’05], Circuit Pruning [MICRO ’07], Decoupled Performance Correctness Architecture [MICRO ’08], etc. Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 8 / 43

slide-9
SLIDE 9

Introduction Related Work

Chip-level Redundant Threading (CRT)

Execute a logical thread as two physical threads Load Value Queue (LVQ)

  • both threads see the same memory state
  • trailing thread does not suffer data cache misses

Branch outcome queue (BOQ)

  • prevents trailing thread from mis-speculating

Modified store buffer

  • ensures that stores are identical across threads

[Mukherjee et al., ISCA 2002]

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 9 / 43

slide-10
SLIDE 10

Introduction Related Work

Parallelized Verification Architecture (PVA)

Idea Split up the verification among two cores and operate the two cores at half voltage-frequency levels.

  • Energy vs. voltage is superlinear

Uses three cores for execution of a single thread Trailing threads have to consult leading thread caches

  • Higher performance overhead with increasing latency

Limited voltage scaling increases energy consumption Increase in L2 power [Rashid et al., PACT 2005]

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 10 / 43

slide-11
SLIDE 11

RECVF Design

Outline

1

Introduction Motivation Related Work

2

RECVF Design Overview Design Options DVFS in the Trailing Core

3

Evaluation Methodology Results

4

Conclusion Conclusion

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 11 / 43

slide-12
SLIDE 12

RECVF Design Overview

Overview

logical thread 1 logical thread 2 logical thread 3 core0 core1 core2 core3 shared bus interconnect Shared L2 Shared L2 core4 core5 core6 core7 System block diagram

One logical thread is executed on two cores Cores exchange information via shared bus interconnect Cores designated as leading and trailing cores

  • leading core assists execution trailing core by forwarding critical values

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 12 / 43

slide-13
SLIDE 13

RECVF Design Overview

Critical Value Forwarding (1/2)

Critical Value Forwarding The leading core identifies instructions on the critical path and forwards the results of these instructions to the trailing core.

Breaks data dependence chains in the trailing core

  • Dependent instructions can execute “early”
  • Creates a cascade effect that speeds up trailing core

Forward only a few values having the most effect on performance

  • Bandwidth is limited even in CMPs

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 13 / 43

slide-14
SLIDE 14

RECVF Design Overview

Critical Value Identification (1/3)

The mechanism for critical value identification is as follows:

Observe execution of instructions through the processor pipeline Based on predefined marking criteria, instructions are marked critical At instruction commit, values of marked instructions are forwarded The idea of observing events in the processor to detect critical instructions is from [Tune et al., HPCA 2001].

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 14 / 43

slide-15
SLIDE 15

RECVF Design Overview

Critical Value Identification (2/3)

ROBStall Unexecuted instruction at head of ROB InstQHead Unexecuted instruction at head of instruction queue InstQHFree Instruction producing a value freeing another instruction at head of instruction queue FreedN Instruction frees at least N instructions FanoutN Instruction produces a value used by at least N other in- flight instructions EveryN Every Nth instruction AllBJ All Branch/Jump Instructions MispredBJ Mispredicted Branch/Jump Instructions All All values are forwarded

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 15 / 43

slide-16
SLIDE 16

RECVF Design Overview

Critical Value Identification (3/3)

Special Handling of Branch/Jump Instructions Mispredicted branch instructions are marked as critical

  • branch direction mispredictions as well as BTB, RAS misses are all

considered as “mispredicted” branches

The outcomes (i.e. branch target) of these are forwarded This scheme eliminates most mispredictions in trailing core at the cost of forwarding the outcomes of a small fraction of branch/jump instructions.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 16 / 43

slide-17
SLIDE 17

RECVF Design Overview

Critical Value Forwarding: Microarchitectural Support

Microarchitectural structures Branch Outcome Queue (BOQ) Holds forwarded branch outcome and index in trailing core Consulted along with branch predictor Branch outcome, if available, overrides branch predictor Instruction Result Queue (IRQ) Holds results and index of forwarded instructions Accessed at the time of instruction dispatch If value is available in IRQ:

  • IRQ value written to destination physical register
  • Dependent instructions can now execute using forwarded value
  • Destination is written-to again after execution completes

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 17 / 43

slide-18
SLIDE 18

RECVF Design Overview

Block Diagram

Fetch BPred BOQ Decode Rename Issue IRQ ROB LSQ FUs D-cache Reg File WB Retire Fingerprint Critical Value Identification Heuristic to trailing core

Block Diagram

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 18 / 43

slide-19
SLIDE 19

RECVF Design Design Options

Options for Input Replication: FLR

Full Load Replication (FLR) All load values are replicated like in SRT, CRT, etc.

  • nly leading core accesses the memory hierarchy

all load values are forwarded to the trailing core trailing core uses these values without verification This option requires higher bandwidth but enables a simpler design that also consumes lesser energy.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 19 / 43

slide-20
SLIDE 20

RECVF Design Design Options

Options for Input Replication: PLR

Partial Load Replication (PLR) Only loads that may participate in data races are replicated; other load instructions are fully re-executed in the trailing core. Track unverified lines in the data cache (unverified bit) Track lines obtained from cache-to-cache tranfers (C2C bit) Loads from unverified+C2C lines not re-executed in trailing core

  • Approx. 93% of loads in SPLASH2 were redundantly executed

Higher fault coverage at the cost of complexity and energy.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 20 / 43

slide-21
SLIDE 21

RECVF Design Design Options

Fault Detection

Fault Detection Register updates, branch targets, load and store addresses store values all hashed using CRC code to generate fingerprint Cores compute and exchange fingerprints to detect errors

[Smolens et al., ASPLOS ’04]

Cores synchronize when comparing fingerprints.

  • i.e., leading core cannot commit instructions during exchange

Fingerprint comparison has to be done before I/O operations Fingerprint comparison has to be done if cache overflows

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 21 / 43

slide-22
SLIDE 22

RECVF Design Design Options

Handling Erroneous Forwarded Values

Under a single-error assumption, forwarding of an erroneous value from the leading core to the trailing core will result in a fingerprint mismatch. Assume that instruction ix in the leading core forwards an incorrect value for instruction i′

x in the trailing core.

1

Let ix be the earliest instruction that forwards an erroneous value

2

Therefore, i′

x’s input operands in the trailing core are correct

3

When i′

x is executed, it will produce the correct result

4

Fingerprint in the trailing core uses i′

x’s result

5

Fingerprint in the leading core uses ix’s result

6

Two different results imply that fingerprints computed will be different

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 22 / 43

slide-23
SLIDE 23

RECVF Design Design Options

Fault Isolation and Checkpointing

L1 Cache Modifications Two new bits introduced per cache line

Unverified bit tracks unverified cache lines C2C bit tracks cache-to-cache transferred lines

Unverified lines cannot be written back

Overflow forces a fingerprint comparison

Enable fault isolation and checkpointing of memory state Checkpointing Checkpoints taken after fingerprint comparison succeeds Restore from checkpoint in case of error

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 23 / 43

slide-24
SLIDE 24

RECVF Design DVFS in the Trailing Core

DVFS in the Trailing Core

Idea Run the trailing core at a lower voltage-frequency level than the leading

  • core. Because of critical value forwarding, there won’t be a performance

penalty while we save power in the trailing core. P = ACV 2f, so reducing voltage and frequency results in a superlinear decrease in energy. We explore two DVFS algorithms for the trailing core.

1 QSize-DVFS algorithm 2 IPC-DVFS algorithm Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 24 / 43

slide-25
SLIDE 25

RECVF Design DVFS in the Trailing Core

QSize-DVFS Algorithm (1/2)

Occupancy of IRQ and BOQ can track program phase changes: Occupancy of BOQ and IRQ are a measure of how far “behind” trailing core execution is compared to the leading core Growing queues → trailing core is slower than optimal Shrinking queues → trailing core is faster than optimal This suggests a simple algorithm that changes the frequency based on two statically defined thresholds for the queue sizes.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 25 / 43

slide-26
SLIDE 26

RECVF Design DVFS in the Trailing Core

QSize-DVFS Algorithm (2/2)

QSize-DVFS Algorithm After every Ts seconds:

1 If size(IRQ) > Tih or size(BOQ) > Tbh:

  • Increase frequency and voltage

2 Else If size(IRQ) < Til or size(BOQ) < Tbl:

  • Decrease frequency and voltage

3 Else do nothing

Tlh and Tbh are the “high” thresholds. Tll and Tbl are the “low”

  • thresholds. The algorithm tries to keep the queue sizes in between the low

and high thresholds.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 26 / 43

slide-27
SLIDE 27

RECVF Design DVFS in the Trailing Core

IPC-DVFS Algorithm (1/2)

Idea The ratio of IPCs of the the leading and trailing cores can be used to set the frequency of the trailing core. Example If for a certain program, the IPC of the leading core is 1.0, while the IPC

  • f the trailing core is 2.0, then the trailing core can operate at 1

2 the

frequency of the leading core.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 27 / 43

slide-28
SLIDE 28

RECVF Design DVFS in the Trailing Core

IPC-DVFS Algorithm (2/2)

IPC-DVFS Algorithm After every Ts seconds:

1 Il = IPC of the leading core for the last Ts seconds 2 It = IPC of the trailing core for the last Ts seconds 3 Frequency of trailing core set so that ratio is nearest to Il/It Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 28 / 43

slide-29
SLIDE 29

Evaluation

Outline

1

Introduction Motivation Related Work

2

RECVF Design Overview Design Options DVFS in the Trailing Core

3

Evaluation Methodology Results

4

Conclusion Conclusion

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 29 / 43

slide-30
SLIDE 30

Evaluation Methodology

Simulation Methodology

SESC execution-driven simulator Power model based on Wattch and CACTI Workload

  • SPEC CPU 2000
  • Single SimPoint of length 1 billion instructions

Compared with two previous proposals

1

PVA: Parallelized Verification Architecture (PVA)5

2

CRT: Chip-level Redundant Threading (CRT)6

Each core is a 4-way out-of-order superscalar processor Two different memory hierarchy/interconnect configurations

1

Shared L2 with short interconnect latencies

2

Private L2 with long interconnect latencies

5Rashid et al., PACT ’05 6Mukherjee et al., ISCA ’02 Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 30 / 43

slide-31
SLIDE 31

Evaluation Results

Results: Critical Value Identification Heuristics

Speedup = speedup of trailing thread over leading thread

a l l B J m i s p r e d B J i n s t Q H F r e e i n s t Q H e a d f r e e d 3 e v e r y 8 f a n

  • u

t 3 r

  • b

S t a l l e v e r y 4 f r e e d 2 f a n

  • u

t 2 a l l

0.0 0.5 1.0 1.5 2.0 Speedup 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Transmitted Values Per Cycle

Speedup Bandwidth

Figure: Critical value identification heuristics with PLR

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 31 / 43

slide-32
SLIDE 32

Evaluation Results

Results: Critical Value Identification Heuristics

Speedup = speedup of trailing thread over leading thread

l

  • a

d s O n l y i n s t Q H F r e e i n s t Q H e a d f r e e d 3 f a n

  • u

t 3 e v e r y 8 r

  • b

S t a l l f r e e d 2 f a n

  • u

t 2 e v e r y 4 a l l

0.0 0.5 1.0 1.5 2.0 2.5 3.0 Speedup 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Transmitted Values Per Cycle

Speedup Bandwidth

Figure: Critical value identification heuristics with FLR

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 32 / 43

slide-33
SLIDE 33

Evaluation Results

Shared L2 Configuration: IPC

Normalized IPC = IPC of fault-tolerant execution / IPC of non-redundant execution bzip2 crafty gap gcc gzip mcf parser twolf vortex vpr ammp applu art apsi equake mesa mgrid sixtrack swim wupwise gmean 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Normalized IPC

PVA CRT PLR+Fanout2+QSize PLR+Fanout2+IPC FLR+Fanout2+QSize FLR+Fanout2+IPC

Figure: Normalized IPC

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 33 / 43

slide-34
SLIDE 34

Evaluation Results

Shared L2 Configuration: IPC

Normalized IPC = IPC of fault-tolerant execution / IPC of non-redundant execution 0.0 0.2 0.4 0.6 0.8 1.0 Normalized IPC

F L R + F a n

  • u

t 2 + I P C F L R + F a n

  • u

t 2 + Q S i z e P L R + F a n

  • u

t 2 + I P C P L R + F a n

  • u

t 2 + Q S i z e C R T P V A

Figure: Normalized IPC

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 33 / 43

slide-35
SLIDE 35

Evaluation Results

Shared L2 Configuration: Energy

Normalized energy = Energy of fault-tolerant execution / Energy of non-redundant execution bzip2 crafty gap gcc gzip mcf parser twolf vortex vpr ammp applu art apsi equake mesa mgrid sixtrack swim wupwise gmean 0.0 0.5 1.0 1.5 2.0 Normalized Energy

PVA CRT PLR+Fanout2+QSize PLR+Fanout2+IPC FLR+Fanout2+QSize FLR+Fanout2+IPC

Figure: Normalized Energy

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 34 / 43

slide-36
SLIDE 36

Evaluation Results

Shared L2 Configuration: Energy

Normalized energy = Energy of fault-tolerant execution / Energy of non-redundant execution 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 Normalized Energy

F L R + F a n

  • u

t 2 + I P C F L R + F a n

  • u

t 2 + Q S i z e P L R + F a n

  • u

t 2 + I P C P L R + F a n

  • u

t 2 + Q S i z e C R T P V A

Figure: Normalized Energy

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 34 / 43

slide-37
SLIDE 37

Evaluation Results

Private L2 Configuration: IPC

Normalized IPC = IPC of fault-tolerant execution / IPC of non-redundant execution bzip2 crafty gap gcc gzip mcf parser twolf vortex vpr ammp applu art apsi equake mesa mgrid sixtrack swim wupwise gmean 0.0 0.2 0.4 0.6 0.8 1.0 1.2 Normalized IPC

PVA CRT PLR+Fanout2+QSize PLR+Fanout2+IPC FLR+Fanout2+QSize FLR+Fanout2+IPC

Figure: Normalized IPC

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 35 / 43

slide-38
SLIDE 38

Evaluation Results

Private L2 Configuration: IPC

Normalized IPC = IPC of fault-tolerant execution / IPC of non-redundant execution 0.0 0.2 0.4 0.6 0.8 1.0 Normalized IPC

F L R + F a n

  • u

t 2 + I P C F L R + F a n

  • u

t 2 + Q S i z e P L R + F a n

  • u

t 2 + I P C P L R + F a n

  • u

t 2 + Q S i z e C R T P V A

Figure: Normalized IPC

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 35 / 43

slide-39
SLIDE 39

Evaluation Results

Private L2 Configuration: Energy

Normalized energy = Energy of fault-tolerant execution / Energy of non-redundant execution bzip2 crafty gap gcc gzip mcf parser twolf vortex vpr ammp applu art apsi equake mesa mgrid sixtrack swim wupwise gmean 0.0 0.5 1.0 1.5 2.0 2.5 Normalized Energy

PVA CRT PLR+Fanout2+QSize PLR+Fanout2+IPC FLR+Fanout2+QSize FLR+Fanout2+IPC

Figure: Normalized Energy

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 36 / 43

slide-40
SLIDE 40

Evaluation Results

Private L2 Configuration: Energy

Normalized energy = Energy of fault-tolerant execution / Energy of non-redundant execution 0.0 0.5 1.0 1.5 2.0 Normalized Energy

F L R + F a n

  • u

t 2 + I P C F L R + F a n

  • u

t 2 + Q S i z e P L R + F a n

  • u

t 2 + I P C P L R + F a n

  • u

t 2 + Q S i z e C R T P V A

Figure: Normalized Energy

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 36 / 43

slide-41
SLIDE 41

Evaluation Results

Energy Consumption Breakdown

PVA CRT PLR+ Fanout2+ QSize PLR+ Fanout2+ IPC FLR+ Fanout2+ QSize FLR+ Fanout2+ IPC

0.0 0.5 1.0 1.5 2.0 2.5 Normalized Energy

Leading core Trailing core(s) L2 dynamic L2 leakage

Figure: Energy consumption breakdown

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 37 / 43

slide-42
SLIDE 42

Evaluation Results

Effect of Coarse-Grained and Slower DVFS

QSize: 0.1/1 µs QSize: 1/10 µs QSize: 10/100 µs QSize: 0.1/1 ms IPC: 0.1/1 µs IPC: 1/10 µs IPC: 10/100 µs IPC: 0.1/1 ms 0.75 0.80 0.85 0.90 0.95 1.00 1.05 Normalized IPC 1.10 1.15 1.20 1.25 1.30 1.35 Normalized Energy

IPC Energy

Figure: Impact of higher-latency and coarse-grained DVFS.

The notation IPC: 0.1/1 µs means that the latency of changing DVFS levels is 0.1µs, and DVFS levels are updated every 1µs. IPC refers to the IPC-DVFS algorithm while QSize refers to the QSize-DVFS algorithm.

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 38 / 43

slide-43
SLIDE 43

Evaluation Results

Impact of Limited Voltage Scaling

PVA PLR+ Fanout2+ QSize PLR+ Fanout2+ IPC FLR+ Fanout2+ QSize FLR+ Fanout2+ IPC

1.2 1.3 1.4 1.5 1.6 1.7 1.8 Normalized Energy

Baseline (0.6-1.0 V) Limited Scaling (0.7-1.0 V)

Figure: Impact of limited voltage scaling

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 39 / 43

slide-44
SLIDE 44

Evaluation Results

Interconnect Bandwidth

PVA CRT PLR+Fanout2+QSize PLR+Fanout2+IPC FLR+Fanout2+QSize FLR+Fanout2+IPC

0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45

Transmitted Values Per Cycle Figure: Interconnect Bandwidth requirements

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 40 / 43

slide-45
SLIDE 45

Conclusion

Outline

1

Introduction Motivation Related Work

2

RECVF Design Overview Design Options DVFS in the Trailing Core

3

Evaluation Methodology Results

4

Conclusion Conclusion

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 41 / 43

slide-46
SLIDE 46

Conclusion Conclusion

Summary of Results

RECVF has a low performance overhead

  • 1.2% for the shared L2 configuration
  • 3.9% for the private L2 configuration
  • Better performance than both PVA and CRT

Low energy overhead

  • Shared L2: 1.26× non-fault-tolerant execution
  • Private L2: 1.45× non-fault-tolerant execution
  • Lower energy consumption than both PVA and CRT

Lowest bandwidth requirements (PLR + RECVF) Effective even with conservative DVFS implementations

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 42 / 43

slide-47
SLIDE 47

Conclusion Conclusion

Thank you for your attention! Questions?

Redundant Execution Using Critical Value Forwarding DSN 2010 29 June 2010 43 / 43