Bubble Razor An Architecture-Independent Approach to Timing-Error - - PowerPoint PPT Presentation

bubble razor
SMART_READER_LITE
LIVE PREVIEW

Bubble Razor An Architecture-Independent Approach to Timing-Error - - PowerPoint PPT Presentation

1 Bubble Razor An Architecture-Independent Approach to Timing-Error Detection and Correction Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester mfojtik@umich.edu Electrical Engineering


slide-1
SLIDE 1

1 1 1

1

Bubble Razor

An Architecture-Independent Approach to Timing-Error Detection and Correction

Matthew Fojtik, David Fick, Yejoong Kim, Nathaniel Pinckney, David Harris, David Blaauw, Dennis Sylvester mfojtik@umich.edu Electrical Engineering & Computer Science Department The University of Michigan, Ann Arbor

slide-2
SLIDE 2

2 2 2

2

Outline

  • Issues with Prior Razor
  • Bubble Razor Algorithm
  • Circuitry and Implementation
  • Area Overhead Tradeoffs
  • Test Chip Results
slide-3
SLIDE 3

3 3 3

3

Timing Margins

Voltage Process Aging Temperature clock Lost performance/energy

Margins for uncertainty:

  • Process Variation
  • Temperature Variation
  • Voltage Variation
  • Aging Effects

Associated Costs:

  • Lost performance
  • Lost energy
  • Tester time (tradeoff)

actual circuit delay Data

slide-4
SLIDE 4

4 4 4

4

Technique Process Ambient Data Global Local Global Local Slow Fast Slow Fast Table Lookup X X Table & Sensors X X X Canary Circuit X X Razor Designs X X X X X X X

Eliminating Margins

  • Always Correct
  • Tables, Canaries
  • Detect and Correct
  • Razor Style

Shadow Latch Main DFF Error Q D CLK DCLK

  • S. Das, et. al.

[VLSI 2005]

slide-5
SLIDE 5

5 5 5

5

Speculation Window and Hold Time

DFF B DFF A CLK A CLK B Speculation Window Speculation window linked to minimum delay constraint (hold time)

slide-6
SLIDE 6

6 6 6

6

Architectural Invasiveness

IF ID EX MEM WB CHK Razor I Style – All Flops Reload Previous Values Razor II Style – Check Stage and Architectural Replay

  • S. Das, et. al. [VLSI 2005]
  • D. Blaauw, et. al. [ISSCC 2008]

IF ID EX MEM WB

  • K. Bowman, et. al. [ISSCC 2008]
  • Requires Designer Effort
  • RTL written with Razor in mind
slide-7
SLIDE 7

7 7 7

7

Fundamentals of Bubble Razor

  • Two-Phase Latch Timing
  • Automatically convert Flip-Flop based design
  • Time Borrowing as Correction Mechanism
  • Does not modify design architecture
  • Does not require reloading / replaying instructions
  • Local Correction (Bubbles)
  • Break requirement of stalling entire chip at once
slide-8
SLIDE 8

8 8 8

8

Two Phase Latch Razor Timing

CLK A CLK B Larger Speculation Window Minimum delay constraint the same as conventional design LD B LD A

slide-9
SLIDE 9

9 9 9

9

Time Borrowing as Error Correction

DFF DFF LD LD TD TD TD TD LD LD G closed open closed open closed X closed open D Error

Bubble Razor – Switch to Latches, Borrow Time

  • No Hold Time Issues
  • Architecture Agnostic
  • Push-button approach
  • No metastability on datapath
slide-10
SLIDE 10

10 10 10

10

Stalling Locally with Bubbles

1 3 5 7 2 4 6 8 Time Eventually it all resolves Stalling the Clock Locally

  • With flops, all registers hold data
  • With latches, half registers hold bubbles
  • Every latch stalls exactly once
  • Communication only between neighbors

Blue tells Green to stall Purple tells Blue to stall Yellow takes off again Red tells Purple to stall Yellow tells Red to stall Yellow tells downstream no new data exists Yellow stalls Not immediately overwritten

slide-11
SLIDE 11

11 11 11

11

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10 2 3 4 1

Should Arrive Timing violation Give time to Recover Prevent Double Sampling inst1 Prevent Losing inst2 Prevent Losing inst3

slide-12
SLIDE 12

12 12 12

12

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10 2 3 4 1

Should Arrive Timing violation Give time to Recover Prevent Double Sampling inst1 Prevent Losing inst2 Prevent Losing inst3

slide-13
SLIDE 13

13 13 13

13

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10 7 2 3 4 5 6 8 1 9 10

slide-14
SLIDE 14

14 14 14

14

7 2 3 4 5 6 8 1 9 10

Timing of Clock Waveforms

1 2 3 4 5 6 7 8 9 10

Timing violation Stall Neighbors Stall 3

slide-15
SLIDE 15

15 15 15

15

The Required Circuitry

2 3 1 2 CG CG CG CG TD TD TD TD B B B B

slide-16
SLIDE 16

16 16 16

16

Error Detection And OR Circuitry

TD TD TD 1

slide-17
SLIDE 17

17 17 17

17

Clock Gate Control Logic

  • A cluster stalls and sends bubbles to all neighbors if
  • Told by a neighboring cluster
  • Did not stall in the previous cycle
  • Equivalent to sending bubbles to “other” neighbors

CG B

slide-18
SLIDE 18

18 18 18

18

Clustering with hMETIS

  • Widely used Hypergraph

partitioning program, hMETIS

  • Clusters must only contain

members with the same phase

  • Create two graphs, and partition

independently

  • Connected in hMETIS graph, if

transitively connected in circuit

  • Edge Weight = number of latches

that form transitive connection 4 1 2 5 6 3 1 2 3 4 5 6 1 1 2 1 2 1

slide-19
SLIDE 19

19 19 19

19

Clustering Results

  • Tradeoff between

sizes of OR gates

  • Combining errors
  • Combining bubbles
  • 100 negative clusters
  • 70 positive clusters
slide-20
SLIDE 20

20 20 20

20

Two Port Memory Boundary Approach

Must fit edge triggered memory into stalling algorithm

slide-21
SLIDE 21

21 21 21

21

“Managing” the Synthesis/APR Tools

  • Want balanced pipelines, no time borrowing
  • Model razor latches as flip flops
  • Dynamic OR always followed by latch
  • Model dynamic OR as static
  • Model latch as flip flop (captures when latch closes)
  • Use regular ICG cells
  • Can use conventional clock tree synthesis
  • Final design appears to be relatively “normal”
  • Flip-flop based design with clock gating
  • Everything is timing constrained
  • “Razorization” process is entirely automated
  • Synthesis and netlist transformation scripts
slide-22
SLIDE 22

22 22 22

22

Retiming And Number of Latches

  • Retiming can increase the number of latches
  • Results in area overhead
slide-23
SLIDE 23

23 23 23

23

Area Overhead of Latch Transformation

slide-24
SLIDE 24

24 24 24

24

Speculation Window Size

  • Full Clock Phase (100%) Minus Delay of Error

Propagation Circuits

  • Maximum allowed by technique
  • Number / Location of Latches with Error Checking
  • Maximum slowdown that does not result in unchecked error

Speculation Window

slide-25
SLIDE 25

25 25 25

25

50%

Where Error Checking is Needed

  • If circuit delay suddenly becomes 130% of its

nominal value, all timing errors will be detected before the circuit fails

15% 30% Speculation Window A B C D 65% 50% 26% 50% 20% 65% 91% 156% Delay at PoFF Delay at Worst Leave B Arrive C

>50? >50? >50?

Arrive D

slide-26
SLIDE 26

26 26 26

26

Path Distribution for Cortex-M3

Positive Latches Negative Latches Flip Flops All Latches

slide-27
SLIDE 27

27 27 27

27

Area Increase from Error Checking

20% Area Overhead 30% Timing Speculation

slide-28
SLIDE 28

28 28 28

28

Implementation on ARM Cortex-M3

slide-29
SLIDE 29

29 29 29

29

Characterizing Throughput / Energy

  • Operating Point Set for Worst Case Operation
  • 85°C
  • 10% Supply Droop
  • 2σ Process
  • 5% Safety Margin
  • 200 MHz at 1.0 V
slide-30
SLIDE 30

30 30 30

30

Gains from Bubble Razor

slide-31
SLIDE 31

31 31 31

31

Gains from Bubble Razor

slide-32
SLIDE 32

32 32 32

32

Bubble Razor Results

Slow Average Fast

slide-33
SLIDE 33

33 33 33

33

Bubble Razor Results

Worst Case 200 MHz 8.5 FFT/ms First Failure 333 MHz 14.2 FFT/ms Optimum 425 MHz 17.3 FFT/ms Worst Case 1.0 V 3.08 μJ/FFT First Failure 0.775 V 1.42 μJ/FFT Optimum 0.725 V 1.18 μJ/FFT

slide-34
SLIDE 34

34 34 34

34

Conclusion

  • First Razor style implementation on a complete,

commercial processor (ARM Cortex-M3).

  • Proposed two-phase latch based Razor technique
  • Novel local replay algorithm
  • Demonstrated automated nature of technique
  • Successfully implemented and fabricated in 45nm
  • 60% energy efficiency or 100% throughput increase
  • ver worst case margining