Information Processing in the Presence of g Variability and Defects - - PDF document

information processing in the presence of g variability
SMART_READER_LITE
LIVE PREVIEW

Information Processing in the Presence of g Variability and Defects - - PDF document

Information Processing in the Presence of g Variability and Defects of Nanoscale Sandip Ti ari Sandip Tiwari st222@cornell.edu Todays information processing systems are designed to be totally predictable Today s information processing


slide-1
SLIDE 1

Information Processing in the Presence of

Sandip Ti ari

g Variability and Defects of Nanoscale

Sandip Tiwari st222@cornell.edu Today’s information processing systems are designed to be totally predictable Today s information processing systems are designed to be totally predictable, reproducible, and are designed hierarchically to be manageable. Plenty of room for inefficiencies. What can we do within this model? y Are there other opportunities at the hardware - software wall?

Complexity of scales connecting nanoscale to terascale

Random and non-random variability and defects

Acknowledgements: Arvind Kumar, Ravi Nair, Chris Liu, Ravishankar Sundararaman, Joshua Rubin, Howard Davidson, Jae Yoon Kim, Wei Min Chan, Moon Kyung Kim …

Power and Heat

Rapid changes in fields -> excess charged particle energy loss to medium -> heat

energy per operation x section area of heat removal activity factor x-section area of heat removal density of heat removal time constant of energy consuming operation 3D Heat Spreading: Logic regions 1D Heat Spreading: Array regions

10 nm λ 7 5 1012 λ2 i 1 i h2 > 1012 d i / hi

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

10 nm λ 7.5x1012 λ2 in 1 inch2 => 1012 devices/chip

  • S. Tiwari et al., IEEE NMDC (2006)
slide-2
SLIDE 2

Hierarchy and Data Movement: Energy

System Scale Circuit Scale Interconnects Device Network Device A 16 nm processor! You have heard plenty about the device issues here in the morning

  • - The corollary of the discussion is power
  • - this is also true for communications

RoadRunner Supercomputer

Element Energy Units 32b integer op 0.35 pJ

p Source: W. Dally (2008) 1 Petaflops 6562 Dual-core AMD Opteron chips 12240 Cell chips (used in Sony Playstation 3) ~15 Tera transistors

g p p 64b floating op 7 pJ Instruction exec 210 pJ 32b 16K RAM read 11 pJ

98 Terabytes of memory ~800 Terabits 278 racks 2.35 MW of power

32b 16K RAM read 11 pJ 32b across 1mm 5 pJ 32b across 20mm 100 pJ

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

32b off chip 320 pJ

Moving data is expensive; LVDS mitigates only partly

Computation Problems

Most computation problems are inexact

Speech and Video’s

R iti

Recognition Machine learning Data compression …

D i i ki

Decision making

Inexact inputs Inexact model

  • Example: The Current Economic Crisis

Limited resources for decision making

FFTs, GPUs, ALUs, Compression Engines, Transform Engines, N A l i i i h E C i Neurons, Analog, … in coprocessing with Exact Computing.

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

slide-3
SLIDE 3

Self-Assembly

  • W. M. Chan et al., unpublished

PS(68k) PMMA(33 5k) HCP BCP fil PS(68k)-PMMA(33.5k) HCP BCP film on ps- r-pmma brush

Process Variability Process Variability Defects Activation Energies

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

CNT: 20% diameter control, chirality?

Energy and Defect Rates

Time evolution afm snapshot of self-assembly front propagation

dependence

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  • C. Harrison et al., Europhysics Letters (2005)
slide-4
SLIDE 4

Fundamental Limits: Energy per Operation

10-12 10-14 J) Memory Read 10000 e- 10-16 Operation (J 1000 e- 100 e- 10 10-18 nergy per O 10 e- Paramagnetic Stability Limit 10-20 E Limit in dissipative information processing Bits deleted

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

10-22 1 10 100 Feature Size (nm) Bits deleted

  • S. Tiwari et al., IEEE NMDC (2006)

Implications of Hierarchy - Communication

  • 1. When M parts connected in series, the system fails when one component does.
  • 2. If N components, w work per component over time T, and in communication to N-1 others
  • 2. If N components, w work per component over time T, and in communication to N 1 others

Work produced in time T by individual component Two Component a: rate at which individual component works b: rate at which transmitting to others and rate at which receiving from others

  • Co

po e Communications at which receiving from others Total work output: Rate of work: Rate of work: If become large

slide-5
SLIDE 5

Large System Robustness

fab + burn-in design

g

Survival of a single component: n0 n

surviving

If m parts must concurrently survive Life ne n

f parts s mber of time

t0 te t

num

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

time

Working with Defects & Power Penalty

2 1 3 2 1 Rent’s Rule 1 N T T - the average no. of external terminals (pins) in a subcircuit or partition N - is the number of modules in the subcircuit T s t e u be o

  • du es

t e subc cu t k - Rent’s constant (average no. of pins per module) p - Rent’s exponent (0 < p < 1) If logic blocks defective: Naccessible ~ ((1-d)N)p

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

If wiring defective, the number of testable logic blocks:

Naccessible ~ (1-d) Np - a considerably more serious problem

slide-6
SLIDE 6

Defects: Configurability Penalty on Power

d

  • A. Kumar et al., DFT, 280(2004)

Defects limit testability and limit the

80 100

dules tested

dINT=10-2

N=1x106 p=0.5

y usable devices and interconnects

20 40 60

age of mod

INT dINT=10-3 dINT=10-4 dINT=10-5

  • n

500 1000 1500 2000 20

Percent Number of iterations

dINT 10 dINT=0

Interconnect Defects

12

dINT = 0.001

er dissipatio

p=0.5 p=0.6 p=0.7

Interconnect Defects Number of iterations

Defect rate O ti i f

4 8

ase in powe Interconnect Defects Power penalty

Optimum size of maximum functional unit that is testable

10

4

10

5

10

6

4

actor increa

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

unit that is testable and configurable

10 10 10

Fa Number of modules

Redundancy

Nikolic et al. ailure

1012 devices assumed

ability of fa wable prob Allow

Redundancy

Using R-fold modular redundancy

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

Using R fold modular redundancy NAND multiplexing Reconfiguration (using knowledge of faulty devices)

Defects place severe constraints

slide-7
SLIDE 7

Networks: Information Flow and Robustness

Internet Brain small-world network (O S ) (Burch-Cheswick) Email Communications (Adamic-Adar) (O. Sporns)

Hub & Spoke arrangements: Fixed degree at various length scales most likely candidates for robustness

Protein Interactions

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

y

Protein Interactions (Jeong)

A mixture of high-degree and low-degree nodes. A mixture of long-range and short-range links.

Adaptation

  • A<7:0>
  • B<7:0>

Modified Booth Encoder

DGFET Multiplier

  • PP0
  • <15:0>
  • PP1
  • <13:0>
  • PP2
  • <11:0>
  • PP3
  • <9:0>
  • PP4
  • <7:0>

Number of bits 8 Power 7.72 mW

An encoder

Wallace Tree Adder

Frequency ≈500 MHz Device 100 nm

A 4-2 compressor

CLA

  • <15:0>
  • <15:0>

VDD 1 V

Booth Multiplier

  • Final Product
  • <15:0>

Booth Multiplier Nearly x10 reduction in power without sacrificing high speed Compact

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

high speed. Compact and dense

  • W. M. Chan (2007)
slide-8
SLIDE 8

In the Nanoscale Limit

10 nm λ 7.5x1012 λ2 in 1 inch2 => 1012 devices/chip

Energy In/Power In

100kT t 1000kT

Energy In/Power In <1% devices switching at any time (ignoring

100kT to 1000kT

Energy Out/Power Out stdby power, …)

25 to 150 W/cm2 25 to 150 W/cm 1 ps to 1 ns

A sea of compute element resources Power and bandwidth are the scarce resources Use the compute resources for different efficient appliances Turn on, connect, and use only those that are necessary for the task

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

A multi-tasking chip from a sea of resources

Computing Inexactly

Algorithms:

Tolerate errors in hardware Probabilistic approaches

Architecture:

Probabilistic approaches … Merge software and hardware through an interface Allow specification of precision tolerance Allow specification of incoherence tolerance

Implementation:

p … Build in dynamic testing and configuring Implement large functions that compromise solution exactness Devices and Circuits that address this within power and variability y limits – memory, new ideas that merge functions and break boundaries effciently …

Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009