epvf an enhanced program vulnerability factor methodology
play

EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY FOR - PowerPoint PPT Presentation

EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY FOR CROSS-LAYER RESILIENCE ANALYSIS Bo Fang , Qining Lu , Karthik Pattabiraman , Matei Ripeanu and Sudhanva Gurumurthi * The University of British Columbia, Canada


  1. EPVF: AN ENHANCED PROGRAM VULNERABILITY FACTOR METHODOLOGY FOR CROSS-LAYER RESILIENCE ANALYSIS Bo Fang ☨ , Qining Lu ☨ , Karthik Pattabiraman ☨ , Matei Ripeanu ☨ and Sudhanva Gurumurthi * ☨ The University of British Columbia, Canada *Cloud Innovation Lab, IBM, USA 1

  2. Wh What ar are we we fa facing? § SoC soft error trends: overall FIT rate per SoC is increasing [DATE 2014, Chandra AMD] SoC SER FIT rate per node 1000 100 10 1 200 150 100 50 0 Memory SER Logic SER 2

  3. Wh Why So Software-ba based Fa Fault To Tolerance § Hardware-based techniques Application Level Operating System Level Architectural Level Hardware Device/Circuit Level Faults Impactful Errors Software-based techniques: more cost-effective 3

  4. Mi Mitigating Si Silent Da Data Co Corruptio ion (SDC) C): Ke Key to to Er Error Re Resilience Incorrect SDC output Crash Fault Error Hang Normal execution Benign 4

  5. Er Error Resilience Es Estimation: Ac Accuracy vs Co Cost Accuracy FI Goal High resource consumption, low `predictive power Conservative AVF/ estimation of Error PVF Resilience [HPCA2010,MICRO2003] Cost 5

  6. Id Identifying SDC-ca causing Bits § AVF/PVF: Identify Architecturally Correct Execution (ACE) Bits [MICRO03, HPCA10] ACE bits SDC- Crash- Total bits for causing causing bits execution bits e(nhanced)PVF: a methodology that distinguishes crash-causing bits from ACE bits 6

  7. PV PVF An Analysis [Sr Sridharan, , HP HPCA10’] R2 R1 = LD R2 ADDR1 R4 = ADD R1, R3 LD R8 LD R5 = ADD R6*4, R7 ST R4, R5 R1 R8 = LD R2 R3 ADD ADD * R4 § ACE Bits = ∑ 𝐶𝑗𝑢𝑡 𝑗𝑜 𝑆𝑗 +,- . R6 § Total Bits = ∑ ST 𝐶𝑗𝑢𝑡 𝑗𝑜 𝑆𝑗 +,- ADD ADDR2 R7 /01 2+34 R5 § PVF = 56378 2+34 = 88.9% ADD 7

  8. Ou Our Approach: eP ePVF § Source of crashes R2 § Segmentation faults (99% of crashes are due to segfaults) ADDR1 LD R8 LD § Direct crash-causing bits R1 § Crash model R3 ADD § Indirect crash-causing bits ADD Source of crashes R4 § Propagation model R6 ST ADD ADDR2 R7 R5 ADD 8 Segfaults Others

  9. Identify bits that cause Ov Overall methodology a program to make an invalid memory access and crash Obtaining PVF- Crash Propagation Program Identify Model Model Trace ACE bits Identify bits on the backward slice of bits that directly cause crashes 9

  10. Obtaining PVF- Crash model Cr Crash Propagation Program Identify Model Model Trace ACE bits § Determining the bits that cause an out-of-bound memory access § Applied on every memory instruction R1 = LD R2 R1 = LD R2 R4 = ADD R1, R3 R2 ∈ [addr_min, addr_max] R5 = ADD R6*4, R7 R2 vma_start vma_end ST R4, R5 01110001010010… R8 = LD R2 OS Info ESP 10

  11. Pr Propagation model Obtaining PVF- Crash Propagation Program Identify Model Model Trace ACE bits § Identifying all possible bits that can affect the bits identified by the crash model R1 = LD R2 R4 = ADD R1, R3 R5 = ADD R6*4 + R7 R5 = ADD R6*4, R7 Crash min(R5),max(R5) ST R4, R5 ST R4, R5 model R8 = LD R2 max(R6) = (max(R5) – R7)/4 max(R7) = max(R5) – R6*4 min(R6) = (min(R5) – R7)/4 min(R7) = min(R5) – R6*4 11

  12. Ov Overall eP ePVF me methodology Obtaining PVF- Crash Propagation Program Identify Model Model Trace ACE bits ePVF Bits that potentially lead to SDCs 12

  13. Ex Experimental setup § Scientific benchmarks § 8 from Rodinia [IISWC 09] § Matrix Multiplication § LULESH: DOE proxy app [IPDPS 2013] § Fault Model § LLFI [DSN 14] § 3,000 runs per benchmark 13

  14. Ev Evaluation § RQ1: Accuracy of the models § RQ2: Effectiveness of the ePVF methodology § RQ3: Performance ACE bits SDC- Crash- Total bits for causing causing execution bits bits 14

  15. RQ RQ1: Accuracy of the models FI experiments 100% Recall of the Model 90% § Recall FI experiments 80% Crash trials 70% Randomly pick 60% Pick the flipped 50% bit for a crash a bit from the trail 100% models Check that bit Recall of the Model for the model 90% 80% 100% Crash trials Precision of the Model § Precision 70% 90% Flip the exact 60% 80% bit during the Our models achieve average execution 50% 70% Pick the flipped 89% recall and 92% bit for a crash 60% precision trail 50% Check if a crash occurs Check that bit for the model 15

  16. RQ RQ1. Accuracy of the Models ACE bits SDC- Crash- Total bits for causing causing execution bits bits On average, 90% of the time the ePVF methodology is accurate to identify crash-causing bits 16

  17. RQ2: Effectiveness of the eP RQ ePVF § SDC estimate using PVF analysis, ePVF analysis and Fault Injection PVF value ePVF value SDC rate from FI ePVF significantly tightens the 100% 80% upper bound of estimated SDCs 60% by 61% on average 40% 20% 0% 17

  18. eP ePVF-in informed Duplic licatio ion § Rank instructions based on their ePVF value /01 :+34 ;0<74=;>?74+@A :+34 § ePVF value per instruction = /01 :+34 § Higher the ePVF value, Higher chance to lead to SDCs § Duplication highly-ranked ePVF instructions § 30% more SDC coverage than hot-path duplication for the same performance overhead 18

  19. RQ3: Performance RQ § Modeling time ranges from 30s (lavaMD) to ~ 4 hours (pathfinder). § Depending on the size of the DDG, hence the number of dynamic instructions § Optimization (Sampling and Extrapolation) § Intuition – scientific applications usually have repetitive behaviors. predicted ePVF computed ePVF 45% Extrapolated ePVF values 30% based on 10% of the graph, 15% and showing less than 1% 0% difference on average 19

  20. Co Conclu lusio ion § ePVF removes the crash-causing bits from PVF to get a more accurate estimate of SDC rate. § A crash model that predicts direct crash-causing bits § A propagation model that identifies bit that lead to direct crash-causing bits § Implementation with LLVM compiler § Drive selective protection of SDC-causing instructions Email: bof@ece.ubc.ca Code: https://github.com/flyree/enhancedPVF 20

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend