SDCTune: A Model for Predicting the SDC Proneness of an Application - PowerPoint PPT Presentation

SDCTune: A Model for Predicting the SDC Proneness of an Application for Con � gurable Protection Qining Lu, Karthik Pattabiraman University of British Columbia (UBC) Jude Rivers, Meeta Gupta IBM Research T.J. Watson 1

Motivation: Transient Errors Particle strikes, temperature, etc., Transient hardware faults Source: Feng et. al., ASPLOS’2010 Transient hardware errors (aka. Soft errors) increase as feature sizes shrink 2

Motivation: Application-level Techniques Only a fraction of Application Level the errors at the circuit level impacts the Operating System Level application Architectural Level Device/Circuit Level Impactful Errors More economical to deploy techniques at application 3

Motivation: Silent Data Corruption (SDC) Application Execution SDC Error activated Crash/ Program Hang Finished Error Masked Fault occurs Benign Silent Data Corruption (SDC): Our focus in this paper Wrong output Correct output Example: Bfs Results lost: 4

Our Goals • Detect Silent Data Corruption (SDC) • High Coverage with Low Overhead • Configurable protection overhead Selectively protect highly SDC-prone variables in program 5

Traditional approaches Vs. Our approach Few lead to Traditional Fault injection SDCs SDC Thousands of Protect/duplicate runs of the the instructions SDC application that lead to SDCs … … • Time consuming (runs application thousands of times) • Need to manually choose variables to protect Ours Static and dynamic Selected Protect/duplicate Program code program analysis variables Selected variables Performance overhead budget • Time saving (dynamic analysis only runs the application once) • Automatically choose variables to protect subject to performance 6

Fault model • Single bit flip fault • One fault per run • Errors in registers and execution units • Program data that is visible at architectural level 7

• Motivation and Goal • Approach • Evaluation and Results • Conclusion 8

Initial Heuristic SDCTune Study s Overall Approach ! Step 1: Perform fault injections to understand SDC characteristics of code constructs ! Step 2: Heuristics identifying code regions prone to SDC causing faults ! Step 3: SDCTune model building and protection Initial SDCTune Heuristics Study (Step 3) (Step 2) (Step 1) 9

Initial Heuristic SDCTune Study s Initial study: Goals • Initial fault injection experiments • The goal is to understand the reasons for SDC failures • Used to formulate heuristics for selective protection • Manually inspect why SDC occurs • Highly executed instructions cover most SDCs • Not all highly executed instructions should be protected • Find common patterns used for developing heuristics 10

Initial Study: Method Initial Heuristic SDCTune Study s • Performed using LLFI, high level fault injector validated for SDC-causing errors [DSN’14] Fault injection Instrument IR code Start instruction/ of the program with register selector function calls Fault injection Profiling executable executable Compile time Inject ? Yes No Custom fault Next injector instruction Runtime 11

Initial Heuristic SDCTune Study s Initial study: Findings • SDC proneness of instruction depends on : • The fault propagation in its data dependency chain • The SDC proneness of the end point of that chain • End points of data dependency chain : • Store operations • Comparison operations Need heuristics for fault propagation, store operations, comparison operations 12

Initial Heuristic SDCTune Study s Heuristics: Fault propagation HP1: The SDC proneness of an instruction will decrease if its result is used in either fault masking or crash prone instructions Fault occurs Corrupted variable Corrupted bits Trunc operation Result variable Fault masked Correct output 13

Initial Heuristic SDCTune Study s Heuristics: Store operations HS1: Addr NoCmp stored values have low SDC proneness in general HS2: Addr Cmp stored values have higher SDC proneness than Addr NoCmp <More heuristics in paper> 14

Initial Heuristic SDCTune Heuristics: Study s Comparison operations HC1: Nested loop depths affect the SDC proneness of loops’ comparison operations. SDC proneness of “ nHeap>1 ” higher than “ weight[tmp]<weight[heap[zz>>1]] ” <More heuristics in paper> 15

Initial Heuristic SDCTune Study s SDCTune: Build model • Classification • Different types of usage are usually independent of each other • Classify the stored values and comparison values according to the heuristic features we observed before • Regression • With same type of usages, SDC rate may show gradually correlations to several features • Use linear regression for the classified groups. 52 features in total used in the model 16

Initial Heuristic SDCTune Study s SDCTune: Example model Example: tree structure for Store 17

Initial Heuristic SDCTune Study s SDCTune: Selection Algorithm Application Performance Overhead Source Code Representative inputs IR Selection Compiler SDCTune Algorithm Data Variables or Locations to Protect Backward slice replication 18

Initial Heuristic SDCTune Study s SDCTune: Optimizations Adding the instructions to the Move checker out of loop body protection set to save checkers 19

Evaluation: Work Flow SDC rate for Optimal Set{Instructions each instruction selection: est. } for a certain P(SDC|I) from P(SDC|I)P(|) Testing and using phase overhead bound training vs. programs ( ∑ P(I)) P(I) Training P(SDC|I) Random Fault (Regression) Predictor Injection Results Training phase from testing Measure real programs coverage on Features testing extracted based Features Actual SDC on heuristic programs extracted from coverage for knowledge from testing programs testing programs training programs 21

Evaluation: Work Flow SDC rate for Optimal Set{Instructions each instruction selection: est. } for a certain P(SDC|I) from P(SDC|I)P(|) overhead bound training programs vs. ( ∑ P(I)) P(I) Training P(SDC|I) Random Fault (Regression) Predictor Injection Results from testing programs Features extracted based Features Actual SDC on heuristic extracted from coverage for knowledge from testing programs testing programs training programs 22

Evaluation: Benchmarks Training programs Testing programs Benchmark Benchmark Program Description Program Description suite suite Integer Fluid IS NAS Lbm Parboil sorting dynamics Linear Gzip Compression SPEC LU SPLASH2 algebra Large-scale Bzip2 Compression SPEC ocean Ocean SPLASH2 movements Price portfolio of Swaptions PARSEC Breadth-First Bfs Parboil swaptions search Molecular Combinatoria Water SPLASH2 Mcf SPEC dynamics l optimization Conjugate Libquantu Quantum CG NAS SPEC gradient m computing 23

Evaluation: Experiments • Estimate overall SDC rates using SDCTune and compare with fault injection experiments • Measure correlation between predicted and actual • Measure SDC Coverage of detectors inserted using SDCTune for different overhead bounds • Consider 10, 20 and 30% performance overheads • Compared performance overhead and efficiency with full duplication and hot-path duplication • Efficiency = SDC coverage / Performance overhead 24

Results: Overall SDC Rates Training programs Testing programs Rank correlation* 0.9714 0.8286 P-value** 0.00694 0.0125 8 Rank of overall SDC rates Training 6 programs by estimation � 4 2 Tesing program 0 0 1 2 3 4 5 6 7 Rank of overall SDC rates by fault injection experiment � 25

Results: SDC Coverage Training programs: Testing programs: Overhead Coverage Overhead Coverage 10% 44.8% 10% 39% 20% 78.6% 20% 63.7% 30% 86.8% 30% 74.9% 26

Results: Full Duplication Overheads Full duplication and hot-path duplication (top 10% of paths) have high overheads. For full duplication it ranges from 53.7% to 73.6%, for hot-path duplication it ranges from 43.5 to 57.6%. 27

Results: Detection Ef � ciency Normalized Detection Efficiency 10% overhead 20% overhead 30% overhead Training programs 2.38 2.09 1.54 Testing programs 2.87 2.34 1.84 28

Conclusion and Future Work • Configurable protection techniques for SDC failures are required as transient fault rates increase • We find heuristics to estimate SDC proneness for program variables based on static and dynamic features • SDCTune model to guide configurable SDC protection • Accurate at predicting relative SDC rates of applications • Much better detection efficiency compared to full duplication • Future work • Improving the model’s accuracy using auto-tuning • Using symptom based detectors for protection http://blogs.ubc.ca/karthik/ 30

SDCTune: A Model for Predicting the SDC Proneness of an Application - PowerPoint PPT Presentation

SDCTune: A Model for Predicting the SDC Proneness of an Application for Con gurable Protection Qining Lu, Karthik Pattabiraman University of British Columbia (UBC) Jude Rivers, Meeta Gupta IBM Research T.J. Watson 1 Motivation: Transient

SDC Subcommittee Presentation, March 6, 2017 1. SDC Subcommittee met for first time on Feb. 22. We

SDC Conference Keppel Awards Presentation December 4, 2008 Photos by IMLS SDC Conference

Design Patterns and Change Proneness: An Examination of Five Evolving Systems. Norddin HABTI

Fault-Proneness of Clone Mutation and Clone Migration Shuai Xie, Foutse Khomh, Ying Zou

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

SDC DB Support for Distributed Computing PanDAMon Integration in CMS Workshop on Analysis

IT-SDC : Support for Distributed Computing 1 The problem Pick a number of generic

The SDC portfolio in Vocational Skills Development (VSD) Key figures 2016 Federal Departement

Predicting Regulatory Elements Predicting Regulatory Elements in P. falciparum in P. falciparum

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

O tt itti Outtwitting the Twitterers th T itt Predicting Information Predicting

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

Predicting implicit and explicit questions Matthijs Westera COLT kick-off workshop Predicting

SC18 INDIS Workshop Architecture, Xnet, NRE Agenda 1. Experimental Network - XNET 2. Network

Cost Analysis & Optimization of Repair Concepts and Spare Parts Using Marginal Analysis

Learning Heuristics Focus Group Initial Results Learning Heuristics Focus Group-Initial Results

Mobile Banking Competitive Index Overview Research to Keep You Ahead of the Curve Bi-Annual

Headquarters U.S. Air Force I n t e g r i t y - S e r v i c e - E x c e l l e n c e Air Force

Management Strategies and Dynamic Financial Analysis Dynamic Financial Analysis CAS Spring

Fitness Landscape Analysis of Simulation Optimisation Problems in HeuristicLab Problems in

D-SHIELD: Distributed Spacecraft with Heuristic Intelligence to Enable Logistical Decisions

SDCTune: A Model for Predicting the SDC Proneness of an Application - PowerPoint PPT Presentation

SDCTune: A Model for Predicting the SDC Proneness of an Application for Con gurable Protection Qining Lu, Karthik Pattabiraman University of British Columbia (UBC) Jude Rivers, Meeta Gupta IBM Research T.J. Watson 1 Motivation: Transient

SDC Subcommittee Presentation, March 6, 2017 1. SDC Subcommittee met for first time on Feb. 22. We

SDC Conference Keppel Awards Presentation December 4, 2008 Photos by IMLS SDC Conference

Design Patterns and Change Proneness: An Examination of Five Evolving Systems. Norddin HABTI

Fault-Proneness of Clone Mutation and Clone Migration Shuai Xie, Foutse Khomh, Ying Zou

Welcome Predicting Change Outcomes Leveraging SQL Server Profiler Lee Everest SQL Rx Predicting

SDC DB Support for Distributed Computing PanDAMon Integration in CMS Workshop on Analysis

IT-SDC : Support for Distributed Computing 1 The problem Pick a number of generic

The SDC portfolio in Vocational Skills Development (VSD) Key figures 2016 Federal Departement

Predicting Regulatory Elements Predicting Regulatory Elements in P. falciparum in P. falciparum

Predicting Return to Work Predicting Return to Work with Data Mining with Data Mining Claim A

Predicting and Comprehending Predicting and Comprehending Asteroid Impacts Asteroid Impacts

Predicting and modeling water chemistry Predicting and modeling water chemistry associated with

O tt itti Outtwitting the Twitterers th T itt Predicting Information Predicting

An Agent Architecture An Agent Architecture An Agent Architecture An Agent Architecture for

Predicting Min Predicting Min-Bias and the Bias and the Underlying Event at

Predicting implicit and explicit questions Matthijs Westera COLT kick-off workshop Predicting

SC18 INDIS Workshop Architecture, Xnet, NRE Agenda 1. Experimental Network - XNET 2. Network

Cost Analysis &amp; Optimization of Repair Concepts and Spare Parts Using Marginal Analysis

Learning Heuristics Focus Group Initial Results Learning Heuristics Focus Group-Initial Results

Mobile Banking Competitive Index Overview Research to Keep You Ahead of the Curve Bi-Annual

Headquarters U.S. Air Force I n t e g r i t y - S e r v i c e - E x c e l l e n c e Air Force

Management Strategies and Dynamic Financial Analysis Dynamic Financial Analysis CAS Spring

Fitness Landscape Analysis of Simulation Optimisation Problems in HeuristicLab Problems in

D-SHIELD: Distributed Spacecraft with Heuristic Intelligence to Enable Logistical Decisions

Cost Analysis & Optimization of Repair Concepts and Spare Parts Using Marginal Analysis