architectural methods to understand soft errors process
play

Architectural Methods to Understand Soft Errors/ Process Variations - PowerPoint PPT Presentation

Architectural Methods to Understand Soft Errors/ Process Variations in DSN 2012 Jun YAO Nara Institute of Science and Technology yaojun@is.naist.jp Brief Introduction of DSN12 DSN12 The 42 nd Annual IEEE/IFIP International


  1. Architectural Methods to Understand Soft Errors/ Process Variations in DSN 2012 Jun YAO Nara Institute of Science and Technology yaojun@is.naist.jp

  2. Brief Introduction of DSN’12  DSN’12 ◦ The 42 nd Annual IEEE/IFIP International Conference on Dependable System and Networks ◦ Two symposiums into one conference  PDS: Performance and Dependability Symposium  Performance, dependability and security;  DCCS: Dependable Computing and Communication Systems  Dependability and security. 2 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  3. Fields of Papers  DSN/PCS ◦ 24 papers accepted at a rate of 30%.  3 related to processor architecture or lower level (12.5%) ISCA 2012:  DSN/DCCS Rate 18%. ◦ 27 papers accepted at a rate of 17.3%.  Image: ◦ Far more SW than HW. ◦ Security/availability are more preferred. ◦ Should try PDS (rejected by DCCS). 3 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  4. Papers of Interests  Understanding Soft Error Propagation Using Efficient Vulnerability-Driven Fault Injection -- PDS ◦ Xin Xu and Man-Lap Li@George Washington Univ. ◦ doi: 10.1109/DSN.2012.6263923 ◦ Purpose:  Effectively inject error during simulation/validation  Understanding the output of error injection 4 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  5. Soft Errors in Microprocessors  Causes: particle strikes, radiation…  Consequences: ◦ Abort: System crash or hang, application abnormally exit ◦ Silent data corruption (SDC): wrong application outputs when application not abort > 90% ◦ Masked: Fault is not visible  ✔ Lower the cost for detection & protection  ✕ Bad for evaluation. 5 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  6. Some data from Yao’s research DMR processor: DARA 4 3 2 1 IF ID RR EX MA WB == == =? =? =? =? =? =? =? =? =? =? != != =? =? 4 3 2 1 IF ID RR EX MA WB  Alpha source ➜  Fault inject rate: 0.58 FF/sec in DARA 0.46  Architectural vulnerability factor (AVF) ◦ Out = in & 0x3  out sensitive to in[1:0] only. 6 SERConf 2012@ 福岡

  7. Vulnerability-Driven Fault Injection  Goals: Reduce the error injected on masked values; ◦ Same amount of injection get more erroneous result.  Approach: ◦ Guide error injection by vulnerability analysis  Results: ◦ Increases error occurrence by 59%. 7 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  8. VA: get CriticalFault injection space Instruction trace: Load [ R1 ] ➜ R2; Add 0x1, R2 ➜ R3; Move R4 ➜ R3; Load [R5] ➜ R2;  First-level dynamically dead (FDD) instruction ◦ Above Add instruction;  Transitively dynamically dead (TDD) instruction ◦ Result generated but not consumed. Remove to get critical fault injection space. 8 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  9. Guided Error Injection Flow 1. Collect instruction trace 2. Generate injection map (reduced) 3. Simulation: randomly error injection guided by the map. 4.Results analysis (visible error ?). 49% 29% 9 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  10. Overall Error Injection Result  CriticalFault provides 18% more error occurrence in average  SDC error increases under guided injection. 10 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  11. More Interesting Results  Classify injection by types ◦ Three categories  Faulty control: T= a>0; if (T) goto Loop_exit;  Faulty address: LD [R1] ➜ R2; ST R2 ➜ [R3];  Faulty data: a = b + c; etc.  Two kinds of explorations: ◦ How soft error is propagated inside the processor ◦ How long will it be a problem 11 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  12. Faulty control/address  60% of faulty control is not visible to the final program results. ➜ quite different to my imagination.  90% address faults results to ABORT: ◦ High necessity to cover address. 12 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  13. Faulty Data  Resemblance of faulty data to all faults ◦ Faulty data leads to all possibility cases 13 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  14. Life Time of Injected Error Err Branch Correct Wrong path path  Under abort cases, the control path will divert within 100 instructions. 14 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  15. Conclusion  Give a way to reduce the error injection space  Show the responses of different instruction types upon error injection  My image: balancing cost & behaviors are important ◦ Cost for redundancy is always high ◦ But without redundancy, we can not trace error occurrence. 15 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  16. Papers of Interests  VARIUS-NTV: A Microarchitectural Model to Capture the Increased Sensitivity of Manycores to Process Variations at Near-Threshold Voltages -- DCCS ◦ Ulya R. Karpuzcu, Krishna B. Kolluru, Nam Sung Kim, and Josep Torrellas@UIUC ◦ doi: 10.1109/DSN.2012.6263951 ◦ Purpose: Modeling Process Variation  Estimate NTV in many core architecture 16 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  17. Approaches  Extends existing VARIUS, adds NTV  Download at: ◦ http://iacoma.cs.uiuc.edu/varius/ntv/varius NTV.html 17 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  18. Evaluation Setup  288 core chip ◦ 36 clusters, 8 cores per cluster ◦ Core: single issue in-order  11nm process 18 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  19. Results  Variations in Vddmin 19 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  20. Variation of Frequency  Sub-threshold zone ◦ 2.3x  NTV zone ◦ 3.7x difference  Please download  and try it. 20 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

  21. The End 21 NAra Institute of Science & Technology, YAO, Jun, 2012/8/28 SERConf 2012@ 福岡

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend