information processing in the presence of g variability
play

Information Processing in the Presence of g Variability and Defects - PDF document

Information Processing in the Presence of g Variability and Defects of Nanoscale Sandip Ti ari Sandip Tiwari st222@cornell.edu Todays information processing systems are designed to be totally predictable Today s information processing


  1. Information Processing in the Presence of g Variability and Defects of Nanoscale Sandip Ti ari Sandip Tiwari st222@cornell.edu Today’s information processing systems are designed to be totally predictable Today s information processing systems are designed to be totally predictable, reproducible, and are designed hierarchically to be manageable. Plenty of room for inefficiencies. What can we do within this model? y Are there other opportunities at the hardware - software wall? Random and non-random variability and defects Complexity of scales connecting nanoscale to terascale Acknowledgements: Arvind Kumar, Ravi Nair, Chris Liu, Ravishankar Sundararaman, Joshua Rubin, Howard Davidson, Jae Yoon Kim, Wei Min Chan, Moon Kyung Kim … Power and Heat Rapid changes in fields -> excess charged particle energy loss to medium -> heat activity factor energy per operation x section area of heat removal x-section area of heat removal time constant of energy density of heat removal consuming operation 1D Heat Spreading: Array regions 3D Heat Spreading: Logic regions 7 5 10 12 λ 2 i 7.5x10 12 λ 2 in 1 inch 2 => 10 12 devices/chip 10 nm λ 10 nm λ > 10 12 d h 2 1 i i / hi S. Tiwari et al., IEEE NMDC (2006) Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  2. Hierarchy and Data Movement: Energy Interconnects System Circuit Network Scale Scale Device Device You have heard plenty about the device issues here in the morning -- The corollary of the discussion is power -- this is also true for communications A 16 nm processor! p Source: W. Dally (2008) Element Energy Units RoadRunner Supercomputer 32b integer op g p 0.35 pJ p 1 Petaflops 64b floating op 7 pJ 6562 Dual-core AMD Opteron chips 12240 Cell chips (used in Sony Playstation 3) Instruction exec 210 pJ ~15 Tera transistors 32b 16K RAM read 32b 16K RAM read 11 11 pJ pJ 98 Terabytes of memory ~800 Terabits 32b across 1mm 5 pJ 278 racks 32b across 20mm 100 pJ 2.35 MW of power 32b off chip 320 pJ Moving data is expensive; LVDS mitigates only partly Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Computation Problems � Most computation problems are inexact � Speech and Video’s � Recognition R iti � Machine learning � Data compression � … � Decision making D i i ki � Inexact inputs Example: The Current Economic Crisis � Inexact model � Limited resources for decision making � FFTs, GPUs, ALUs, Compression Engines, Transform Engines, N Neurons, Analog, … in coprocessing with Exact Computing. A l i i i h E C i Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  3. W. M. Chan et al., unpublished Self-Assembly PS(68k) PMMA(33 5k) HCP BCP fil PS(68k)-PMMA(33.5k) HCP BCP film on ps- r-pmma brush Process Variability Process Variability Defects � Activation Energies CNT: 20% diameter control, chirality? Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Energy and Defect Rates Time evolution afm snapshot of self-assembly front propagation dependence C. Harrison et al., Europhysics Letters (2005) Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  4. Fundamental Limits: Energy per Operation 10 -12 10 -14 Memory 10000 e - Read J) Operation (J 1000 e - 10 -16 100 e - nergy per O 10 e - 10 10 -18 Paramagnetic Stability Limit E 10 -20 Limit in dissipative information processing Bits deleted Bits deleted 10 -22 1 10 100 Feature Size (nm) S. Tiwari et al., IEEE NMDC (2006) Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Implications of Hierarchy - Communication 1. When M parts connected in series, the system fails when one component does. 2. If N components, w work per component over time T , and in communication to N-1 others 2. If N components, w work per component over time T , and in communication to N 1 others Work produced in time T by individual component Two Component o Co po e Communications b: rate at which transmitting to others and rate a: rate at which individual component works at which receiving from others at which receiving from others Total work output: Rate of work: Rate of work: If become large

  5. Large System Robustness Survival of a single component: design fab + burn-in g surviving If m parts must concurrently survive n 0 f parts s n n n e Life mber of num t 0 t t e time time Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Working with Defects & Power Penalty 2 Rent’s Rule 2 3 1 1 1 N T T T - the average no. of external terminals (pins) in a subcircuit or partition N - is the number of modules in the subcircuit s t e u be o odu es t e subc cu t k - Rent’s constant (average no. of pins per module) p - Rent’s exponent (0 < p < 1) If logic blocks defective: N accessible ~ ((1-d)N) p If wiring defective, the number of testable logic blocks: N accessible ~ (1-d) N p - a considerably more serious problem Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  6. Defects: Configurability Penalty on Power d dules tested N=1x106 100 A. Kumar et al., DFT, 280(2004) p=0.5 80 dINT=10-2 Defects limit testability and limit the y INT age of mod 60 usable devices and interconnects dINT=10-3 dINT=10-4 40 dINT=10-5 on dINT 10 20 20 er dissipatio Percent dINT=0 p=0.5 d INT = 0.001 Interconnect Defects p=0.6 0 12 0 500 1000 1500 2000 p=0.7 Interconnect Defects Interconnect Defects Number of iterations Number of iterations ase in powe Power penalty 8 Defect rate � 4 4 O ti Optimum size of i f actor increa maximum functional 0 unit that is testable unit that is testable 4 5 6 10 10 10 10 10 10 Fa Number of modules and configurable Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Redundancy Nikolic et al. 10 12 devices assumed ailure ability of fa wable prob Allow Redundancy Using R-fold modular redundancy Using R fold modular redundancy NAND multiplexing Defects place severe constraints Reconfiguration (using knowledge of faulty devices) Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  7. Networks: Information Flow and Robustness Internet (Burch-Cheswick) Email Communications (Adamic-Adar) Brain small-world network (O S (O. Sporns) ) Hub & Spoke arrangements: Fixed degree at various length scales most likely candidates for robustness y Protein Interactions Protein Interactions (Jeong) A mixture of high-degree and low-degree nodes. A mixture of long-range and short-range links. Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Adaptation DGFET •A<7:0> Modified Booth Encoder Multiplier •B<7:0> Number of 8 An encoder bits •PP4 •PP3 •PP2 •PP1 •PP0 •<7:0> •<9:0> •<11:0> •<13:0> •<15:0> Power 7.72 mW Wallace Tree Adder Frequency ≈ 500 MHz A 4-2 compressor Device 100 nm V DD 1 V •<15:0> •<15:0> CLA Booth Multiplier Booth Multiplier •Final Product Nearly x10 reduction in •<15:0> power without sacrificing high speed Compact high speed. Compact and dense W. M. Chan (2007) Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

  8. In the Nanoscale Limit 7.5x10 12 λ 2 in 1 inch 2 => 10 12 devices/chip 10 nm λ Energy In/Power In Energy In/Power In 100kT to 1000kT 100kT t 1000kT <1% devices switching at any time (ignoring stdby power, …) Energy Out/Power Out 25 to 150 W/cm 2 25 to 150 W/cm 1 ps to 1 ns A sea of compute element resources Power and bandwidth are the scarce resources Use the compute resources for different efficient appliances Turn on, connect, and use only those that are necessary for the task A multi-tasking chip from a sea of resources Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009 Computing Inexactly Algorithms: Tolerate errors in hardware Probabilistic approaches Probabilistic approaches … Architecture: Merge software and hardware through an interface Allow specification of precision tolerance Allow specification of incoherence tolerance p … Implementation: Build in dynamic testing and configuring Implement large functions that compromise solution exactness Devices and Circuits that address this within power and variability y limits – memory, new ideas that merge functions and break boundaries effciently … Tiwari_04_2009_Korea_NSF_Workshop.ppt – Apr 28, 2009

Recommend


More recommend