[PPT] - Would Error Correction Provide a Benefit in Classical Computers? 5 PowerPoint Presentation

SLIDE 1

Would Error Correction Provide a Benefit in Classical Computers?

5 Nov 2013 INTRIQ Thomas Szkopek Department of Electrical and Computer Engineering

SLIDE 2

Acknowledgements

Vwani Roychowdhury UCLA Eli Yablonovitch, (provocateur) UC Berkeley John Damoulakis, USC/ISI Dimitri Antoniadis MIT

SLIDE 3

system reliability

3 ¡

ENIAC, 1946 17,468 vacuum tubes mean time between faults: ~2 days

Source ¡ Drain ¡ Gate ¡

IBM BlueGene/L, 2006 131,072 processors mean time between faults: ~6 days

Lawrence ¡Livermore ¡Na4onal ¡Laboratory ¡

SLIDE 4

system reliability

4 ¡

“[with] current state‐of‐the‐art fault‐tolerance strategy, checkpoint/restart, for a 1 PFlop/s system… a computational job that could complete in 100 hours in a failure‐free environment will actually take 251 hours” “While several [high-end computing] vendors are looking to address reliability at the hardware level, the costs are proving to be staggeringly high in both money and power.”

DeBardeleben et al., High‐End Computing Resilience: Analysis of Issues Facing the HEC Community and Path‐Forward for Research and Development, Los Alamos National Laboratory 2010, http://institute.lanl.gov/resilience/docs/

let’s look at the hardware level!

SLIDE 5

error correction: memory and communications

5 ¡

reliable encoding reliable decoding & error correction

channel (memory)

identity

transmitter (write) receiver (read)

errors

reliable encoding, decoding and error correcting hardware
efficient, complex codes are used

SLIDE 6

error correction: computation

6 ¡

reliable encoding reliable decoding & error correction

logic unit

encoded logic

encoder decoder

errors

reliable encoding, decoding and error correcting hardware
logic performed in code space (eg. Reed-Muller codes)
D. Pradhan & S. Reddy, IEEE Trans. Comp. 21, 1331 (1972).
however, it is likely that all hardware is equally (un)reliable

SLIDE 7

error correction: computation

7 ¡

error correction logic error correction logic

errors

errors occur in all hardware
never decode bits or they will be corrupted, in other words:

all operations must be perfomed in protected code space!

SLIDE 8

protecting 1 bit : repetition

8 ¡

repetition code “0” = 0 0 0 0 0 “1” = 1 1 1 1 1 error correction by majority vote 0 0 0 1 0 0 0 0 0 0 1 1 0 1 1 1 1 1 1 1 0 1 0 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 0 0

J. von Neumann, Lectures on Probabilistic Logics

and the Synthesis of Reliable Organisms from Unreliable Components, 1952.

single bit flip: p logical bit flip: P = 20p3 + … p ¡ error ¡ rate ¡ p ¡ P ¡= ¡20p3 ¡

SLIDE 9

protecting 1 bit

9 ¡

MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡

If majority gates are error-free, then the majority voting process is error free if <50% of input bits are in error.

MAJ MAJ = majority vote

SLIDE 10

protecting 1 bit

10 ¡

President Harry S. Truman MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡ MAJ ¡

If majority gates are error- prone, then the majority voting process is error-prone.

MAJ = majority vote

If majority gates are error-free, then the majority voting process is error free if <50% of input bits are in error.

error with probability p

SLIDE 11

fault tolerant architecture

11 ¡

majority gate M error correction concatenation copy the bits ×3 majority vote ×3

Triplicate repetition code and fault-tolerant majority

PO Boykin, VP Roychowdhury, Proc. Int. Conf. Dep. Sys. Net. 2005.

SLIDE 12

fault tolerant architecture

12 ¡

PO Boykin, VP Roychowdhury, Proc. Int. Conf. Dep. Sys. Net. 2005. error per majority gate:

p

error with L concatenations: P ≤ε p

ε " # $ % & ' 2L

bits with L concatenations:

ε ~ 1 108 N = 9L (with'ancillae)

error rate versus bits:

P ≤ε p ε ! " # $ % & Nlog2/log9

SLIDE 13

protecting more than 1 bit?

13 ¡

Can universal logic operations be performed in code space

ther than repetition codes? (difficulty lies in the parity bits)
Unknown. Best result is with an evolving RM code space.

Is the overhead prohibitive? Unknown.

error correction logic error correction logic

SLIDE 14

14 ¡

what about device physics?

+V/2

V/2

Nout C C Nin Gp Gn

complementary transistor inverter: Nin ¡ ¡ ¡= ¡input charge Nout ¡= ¡output charge N ¡= ¡CV/e ¡= ¡maximum charge Gn ¡= ¡n-channel conductance Gp ¡= ¡p-channel conductance

!Gp = G0 exp +eVGS / kBT

( )

Assume sub-threshold conductance / thermionic emission through channels:

!Gn = G0 exp −eVGS / kBT

( )

source drain VGS ¡

SLIDE 15

15 ¡

CNT inverter

Ph. Avouris, et al., Physica B 323 (2002) 6–14

Si nanowire inverter

D. Wang, et al., Small 2 (2006) 1153-8

complementary logic

ZnO nanowire inverter

S. Roy, et al., Nanotech 21 (2010) 245306

b

+V/2

V/2

Nout C C Nin Gp Gn

source ¡ drain ¡ VGS ¡

SLIDE 16

16 ¡

complementary logic

+N/2 Nout Nin

N/2

+N/2

N/2

+V/2

V/2

Nout C C Nin Gp Gn +V/2

V/2

Nout C C Nin Gp Gn

Nout = N 2 ⋅ Gp −Gn Gp +Gn = −N 2 tanh Nin kBTC / e2 # $ % % & ' ( (

information theoretic perspective: single charge -- physical bit total charge -- logical bit signal restoration -- majority vote metal-insulator transition in transistor channels:

SLIDE 17

17 ¡

complementary logic

universal NAND gate:

SLIDE 18

18 ¡

complementary logic

+N/2 Nout Nin

N/2

+N/2

N/2

NM ¡ NM ¡

!p(Nin)

Nin ¡

+V/2

V/2

Nout C C Nin Gp Gn

δq2=kBTC ¡ δq2=kBTC ¡+ ¡T (δq2) ¡ ¡ Local noise dominates when: δq/e ¡<< ¡NM ¡ δq ¡ Growth of charge fluctuations / error is suppressed by transistor error correction.

SLIDE 19

δq2=kBTC ¡

19 ¡

complementary logic

+N/2 Nout Nin

N/2

+N/2

N/2

!p(Nin)

Nin ¡ Probability of logical error:

P  1 2 2 πNln 1

ε

( )

! " # # $ % & &

1/2

εN ε =exp(−eV /8kBT)

Error scales as a ideal majority vote of N electrons with an error p per electron:

p = ε2 4

logical ¡error ¡

P = N N /2 ! " # # $ % & &pN/2  2 πN ! " # $ % &

1/2

4p

( )

N/2

NM ¡ NM ¡

SLIDE 20

reliability and redundancy

20 ¡

error rate per particle

p

logical error rate for N particles

P εT p εT ! " # # $ % & & Nlog2/log9 p P  2 πN ! " # $ % &

1/2

4p

( )

N/2

p ~ exp

−eV kBT

( )

P  1 πNln

1 4p

( )

! " # # $ % & &

1/2

4p

( )

N 2

p ~

δr r

( )

2

P  2 πN ! " # $ % &

1/2

4p

( )

N 2

ideal majority vote transistor logic circuit ballistic gates 1-bit architecture

J ¡

00010000 10111111

exponential suppression in N sub-exponential suppression in N

T. Szkopek et al PRL 106, 176801 (2011).

SLIDE 21

classical computing with spin

21 ¡

magnetic moments

interaction:

! V ~ µ2 r3

interaction error:

! δV ~V ⋅δr r

rotation for distinguishable states:

! φ = V ⋅t  = π

rotation error:

! δφ ~π ⋅δr r

J1 ¡ J2 ¡

r ¡

spin placement accurate to within δr probability of erroneous spin flip!

δr ¡

SLIDE 22

classical computing with spin

22 ¡

spin 1/2

δφ ~ π ⋅δr r

Probability of error:

p ~ 1 4 δφ 2

spin j = N × 1/2

p ~ 2 πN ! " # $ % &

1/2

δφ N

Probability of error:

✗ ¡ ✗ ¡

SLIDE 23

classical computing with spin

23 ¡

N × spin 1/2

p ~ 1 4 δφ 2 P = N N /2 ! " # # $ % & &pN/2 ~ 2 πN ! " # $ % &

1/2

δφ N

Majority vote on N spins: Probability of single error:

✗ ¡ ✗ ¡ ✗ ¡ ✗ ¡ ✗ ¡

SLIDE 24

reliability and redundancy

24 ¡

error rate per particle

p

logical error rate for N particles

P εT p εT ! " # # $ % & & Nlog2/log9 p P  2 πN ! " # $ % &

1/2

4p

( )

N/2

p ~ exp

−eV kBT

( )

P  1 πNln

1 4p

( )

! " # # $ % & &

1/2

4p

( )

N 2

p ~

δr r

( )

2

P  2 πN ! " # $ % &

1/2

4p

( )

N 2

ideal majority vote transistor logic circuit ballistic gates 1-bit architecture

J ¡

00010000 10111111

exponential suppression in N sub-exponential suppression in N

T. Szkopek et al PRL 106, 176801 (2011).

SLIDE 25

45nm ¡node ¡ (2010) ¡ 21nm ¡node ¡ (2015) ¡ 11.9nm ¡node ¡ (2020) ¡ L, ¡gate ¡length ¡[nm] ¡ 27 ¡ 17 ¡ 10.7 ¡ Cg, ¡gate ¡capacitance ¡[aF] ¡ 19.7 ¡ 10.0 ¡ 4.0 ¡ V, ¡opera4ng ¡voltage ¡[V] ¡ 0.97 ¡ 0.81 ¡ 0.68 ¡ N, ¡electrons ¡per ¡inverter ¡gate ¡ 240 ¡ 100 ¡ 34 ¡ N, ¡electrons ¡per ¡NAND ¡gate ¡ 480 ¡ 200 ¡ 68 ¡ M, ¡transistors/chip ¡ 2.2×109 ¡ 8.8×109 ¡ 35×109 ¡ f, ¡clock ¡freq. ¡[GHz] ¡ 5.9 ¡ 8.5 ¡ 12.4 ¡ P, ¡error ¡probability ¡at ¡1000 ¡FITs ¡ 2×10−29 ¡ 4×10−30 ¡ 4×10−31 ¡ P, ¡error ¡probability ¡at ¡1 ¡fault/year ¡ 2×10−27 ¡ 4×10−28 ¡ 7×10−29 ¡

CMOS

25 ¡

International Technology Roadmap for Semiconductors, 2009 edition.

Intel 45nm, strained Si

Source ¡ Drain ¡ Gate ¡

SLIDE 26

60
50
40
30
20
10

10 20 30 40 50 60 70 80

error rate, log10P electron number, N sub-threshold complementary logic 0.97V 0.81V 0.68V architecture, thermal limit

error rate comparison

26 ¡

T. Szkopek et al PRL 106, 176801 (2011).

structural disorder in transistor structures will increase error rates

1 ¡electron, ¡eV/kBT ¡= ¡970meV/26meV ¡

N ¡= ¡30 ¡at ¡eV ¡= ¡1.00 ¡eV ¡ is ¡equivalent ¡to ¡ N ¡= ¡3000 ¡at ¡eV ¡= ¡10 ¡meV ¡ ¡

SLIDE 27

conclusions

27 ¡

physics of transistors provides

protection against logical errors

for 1-bit protection, it is better to

prevent errors than to correct errors

error correction with multiple-bit

code protection is an open problem J ¡

60
50
40
30
20
10

10 20 30 40 50 60 70 80

error rate, log10P electron number, N sub-threshold complementary logic 0.97V 0.81V 0.68V architecture, thermal limit

SLIDE 28

questions

28 ¡

is quantum error prevention preferable to quantum error

correction?

can quantum error thresholds exceed classical error thresholds?

( is the quantum world more forgiving than the classical? )

are there more efficient classical codes that permit universal

computation within code space?

SLIDE 29

thank you for your attention

29 ¡