Dependability I ssues Due to Scaling Towards Nanometer Size Devices: - PowerPoint PPT Presentation

Dependability I ssues Due to Scaling Towards Nanometer Size Devices: Aggressive and Adaptive Mitigation Techniques Maybe Key for Solution Arun K. Somani Dependable Computing and Networking Laboratory Department of Electrical and Computer Engineering Iowa State University, Ames, IA, 50011 arun@iastate.edu

Technology Scaling � � Every 30% downscaling of technology node � � Transistor density doubles � � Gate delay reduces 30% � � Operating frequency improves 43% � � Active power consumption halves � � 65% energy savings � � Frequency scaling inhibited with recent generations � � Low power requirements � � Process variations � � Reliability concerns � � High speed, low leakage requirements � � Determines the choice of supply and threshold voltages

How the Progress is Holding Up? � � Drives semiconductor performance � � Enables newer technologies Source: Intel

A Few Things Are Here to Stay � � Leakage Power in MOSFETs � � Sufficient overdrive required for high speed switching � � Lower V T leads to more leakage � � Gate Leakage � � Tunneling current through gate dielectric � � High-k dielectrics used in 45nm technology � � Arrest gate leakage � � Process variations increase with scaling � � Random and systematic variations in delay, power, yield � � V t �� Delay � , L eff � � Delay � , V dd �� Delay � , T � � Delay � � � Thermal Variation

Temperature Variations Original Source: Anirudh Devgan, IBM Research

Challenges for Future Manufacturing � � Ultimate limit 0.3 nm (Silicon atoms distance) � � Various barriers seen over time � � Overcome with changes in material and process technology � � Degradation of performance with downscaling � � Interconnect delay increases with increase in resistance and capacitance of narrow and dense metal lines � � Higher power consumption will continue as a problem � � Unaffordable manufacturing cost for smaller sizes � � Semiconductor companies moving towards fab-lite model � � Yield and the time-to-market with newer technologies is becoming longer

What to Look Forward For? � � Error tolerance rather than avoidance � � Built in fault tolerance for all designs � � Selective replication instead of full scale redundancy � � Design adaptability � � Key for low overhead solutions � � Design optimizations � � Dynamic schemes Possible through speculation � �

Reliable Overclocking (Aggressive Designs) � � Typically clock period is determined by the maximum delay from A to B which depends physical implementation, operating environment, and temperature and supply voltage variations � � Traditionally, worst case delays assumed � � Result - overly conservative clock period � � Pipelined processor � � Longest/slowest stage limits the period of the entire pipeline

Reliable Overclocking (Aggressive Designs) – Contd. � � Problem to address in nanometer design space � � Provide high performance by exploiting PVT variations � � Enhance system dependability with low cost solutions � � Clock beyond worst case delay, relying on data dependent delays � � Timing errors may occur at overclocked speeds � � Aggressive, but reliable, design methodologies employ relevant timing error detection and recovery schemes � � Razor-Micro’03, Sprite-DSN’07 � � Performance 15-20%, Error rate below 1% � � Safety critical systems, real-time constraints supported

Why Past Solutions are not Acceptable � � Traditional techniques � � TMR solutions incur high cost and performance penalty � � Dual latching dynamic optimization uses less area � � False positives and high penalty for error recovery are concerns � � Static power Vs Dynamic power � � Both are comparable for today's technology � � Thus logic replication is not a viable alternative

Offering More Design Features with Added Redundancy Soft Error Mitigation, SEM [DSN’09] � � � � Circuit level speculation, local recovery, no false positives, high fault coverage (like TMR tolerates both SEU and SET) � � No performance overhead, operating frequency f sys � 1/t pd Soft and Timing Error Mitigation, STEM [DSN’09] � � � � Like SEM, but detects and correct timing errors � � Can be deployed in aggressive system designs � � Timing speculation, like overclocking [DSN’07] and DVS [MICRO’03]

Design Constraints � 1 = T 2 – T 1 � T PW ( 5 ) � 2 = T 3 – T 2 � T PW ( 6 ) T CD � � 1 � 2 + ( 7 ) T + � 1 � T PD ( 8 ) T CD = Contamination delay of the logic circuit T PD = Propagation delay of the logic circuit T PW = Expected soft error/noise pulse width � 1 = Phase shift between CLK 1 and CLK 2 � 2 = Phase shift between CLK 2 and CLK 3 T = Clock period

Dynamic Frequency Scaling � � Clock frequency is scaled while satisfying the error rate constraint T CD � D 2 � � Limits of DFS ( 9 ) D 2 – D 1 � T PW � � F MAX (Minimum possible frequency) ( 10 ) � � Set by worst-case design settings T MIN + D 1 � T PD ( 11 ) � � F MIN (Maximum possible frequency) � � As shown in equation (11) T CD = Contamination delay of the logic circuit T PD = Propagation delay of the logic circuit T PW = Expected soft error/noise pulse width D 1 = Phase shift between CLK 1 and CLK 2 D 2 = Phase shift between CLK 2 and CLK 3

Pipeline Design � � Using STEM � � Input clocks are constrained to provide fault tolerance � � Extra buffer stage to ensure only “gold” data to memory � � Stage error signa l: Generated from error signal in that stage � � Global error signal is generated from all stages � � Error rates are monitored and used by clock unit

Performance Analysis � � Limiting factor for frequency scaling � � With frequency scaling, no. of input combinations resulting in greater delays than the new clock period increases N x t ov + n x N x k x t ov < N x t wc Notation: t wc : worst case clock period t ov : overclocked clock period n : no of cycles to recover k < (t wc -t ov ) / (n x t ov ) N : total cycles required k : error rate � � For STEM cells � � 15% increase in frequency, error rate needs to be > 5.76% to yield no performance improvement � � For error rates < 1%, a 2.6% increase in frequency is required to compensate the penalty paid for error correction

Three I nterdependent Concerns � � Performance � � Device scaling � � Architectural innovations � � Better-than-worst-case designs � � Dependability � � Soft errors, silicon defects � � Fault mitigation techniques � � Power Consumption � � Low power design � � Adaptive control mechanisms � � All managed through aggressive design methodology

Dependability I ssues Due to Scaling Towards Nanometer Size Devices: - PowerPoint PPT Presentation

Dependability I ssues Due to Scaling Towards Nanometer Size Devices: Aggressive and Adaptive Mitigation Techniques Maybe Key for Solution Arun K. Somani Dependable Computing and Networking Laboratory Department of Electrical and Computer

Dependability Evaluation Techniques for Dependability Evaluation The dependability evaluation of

Software Architecture & Dependability Valrie Issarny INRIA Joint work with Apostolos

Key Factors of Dependability of Mechatronic Units - Mechatronic Dependability - Hans-Dieter Kochs

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Finance and Credit I ssues for Finance and Credit I ssues for Agriculture and the Food System :

Dependability and Architecture: An HDCP Perspective Bill Scherlis Carnegie Mellon University

Outline Motivation Opportunities and challenges O t iti d h ll Storage DepSky

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Dependability within Dependability within Peer- -to to- -Peer Systems Peer Systems Peer

System Dependability Robert Wierschke Seminar Prozesssteuerung und Robotik 14. Januar 2009

Dependability and Security Challenges Dependability and Security Challenges in Emerging

Pushing Ultra-Low-Power Digital Circuits into the Era Nanometer David Bol Microelectronics

An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

ZDLRA @ METRONOM 1 0 .2 4 .2 0 1 8 1 I ntroduction Agenda 2 Mission 3 Best Practices 4

Autonomous Helicopter Flight Pieter Abbeel UC Berkeley EECS Challenges in Helicopter Control n

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A ship in port is safe, but

PMWG Readmissions Sub-group 06/25 / 2019 Agenda 1. Revisit Workplan/Vision of Sub-Group 2.

A Logic Your Typechecker Can Count On: Unordered Tree Types in Practice Nate Foster (Penn)

Solving Large Sequential Games with the Excessive Gap Technique Christian Kroer* Gabriele

Bazel { fast, correct } choose two Klaus Aehlig August 1920, 2017 Bazel How Bazel Works

Preparation THE DEBATE TE QU QUESTION STION IS: Sh Shou ould ld the he Un Unite ited d

Dependability I ssues Due to Scaling Towards Nanometer Size Devices: - PowerPoint PPT Presentation

Dependability I ssues Due to Scaling Towards Nanometer Size Devices: Aggressive and Adaptive Mitigation Techniques Maybe Key for Solution Arun K. Somani Dependable Computing and Networking Laboratory Department of Electrical and Computer

Dependability Evaluation Techniques for Dependability Evaluation The dependability evaluation of

Software Architecture &amp; Dependability Valrie Issarny INRIA Joint work with Apostolos

Key Factors of Dependability of Mechatronic Units - Mechatronic Dependability - Hans-Dieter Kochs

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

Finance and Credit I ssues for Finance and Credit I ssues for Agriculture and the Food System :

Dependability and Architecture: An HDCP Perspective Bill Scherlis Carnegie Mellon University

Outline Motivation Opportunities and challenges O t iti d h ll Storage DepSky

An Architecture for An Architecture for Configurable Dependability of Configurable Dependability

Dependability within Dependability within Peer- -to to- -Peer Systems Peer Systems Peer

System Dependability Robert Wierschke Seminar Prozesssteuerung und Robotik 14. Januar 2009

Dependability and Security Challenges Dependability and Security Challenges in Emerging

Pushing Ultra-Low-Power Digital Circuits into the Era Nanometer David Bol Microelectronics

An Interconnect-Centric Design Flow for Nanometer Technologies Jason Cong UCLA Computer Science

Analysis of Scaling Algorithms for Matrix &amp; Operator Scaling Contents Scaling Algorithms

Effectively Scaling Effectively Scaling up/universalizing exclusive up/universalizing exclusive

ZDLRA @ METRONOM 1 0 .2 4 .2 0 1 8 1 I ntroduction Agenda 2 Mission 3 Best Practices 4

Autonomous Helicopter Flight Pieter Abbeel UC Berkeley EECS Challenges in Helicopter Control n

Applied Machine Learning CIML Chaps 4-5 (A Geometric Approach) A ship in port is safe, but

PMWG Readmissions Sub-group 06/25 / 2019 Agenda 1. Revisit Workplan/Vision of Sub-Group 2.

A Logic Your Typechecker Can Count On: Unordered Tree Types in Practice Nate Foster (Penn)

Solving Large Sequential Games with the Excessive Gap Technique Christian Kroer* Gabriele

Bazel { fast, correct } choose two Klaus Aehlig August 1920, 2017 Bazel How Bazel Works

Preparation THE DEBATE TE QU QUESTION STION IS: Sh Shou ould ld the he Un Unite ited d

Software Architecture & Dependability Valrie Issarny INRIA Joint work with Apostolos

Analysis of Scaling Algorithms for Matrix & Operator Scaling Contents Scaling Algorithms