machine learning based anomaly detection for post silicon
play

Machine Learning-based Anomaly Detection for Post-silicon Bug - PowerPoint PPT Presentation

Machine Learning-based Anomaly Detection for Post-silicon Bug Diagnosis Andrew DeOrio , Qingkun Li, Matthew Burgess and Valeria Bertacco University of Michigan University of Illinois Verification trends Wilson Research Group and Mentor


  1. Machine Learning-based Anomaly Detection for Post-silicon Bug Diagnosis Andrew DeOrio , Qingkun Li, Matthew Burgess and Valeria Bertacco University of Michigan University of Illinois

  2. Verification trends Wilson Research Group and Mentor Graphics 2010 Functional Verification Study 20-Mar-2013 Andrew DeOrio / University of Michigan 2

  3. Increasing post-silicon validation Design and pre-silicon Post-silicon validation effort verification effort Bob Barton, Intel. Invited talk at GSRC. 20-Mar-2013 Andrew DeOrio / University of Michigan 3

  4. Post-silicon validation Pre-silicon Post-silicon Product Goal: locate bug + Fast prototypes - Poor observability + High coverage - Slow off-chip transfer + Test full system - Noisy + Find deep bugs - Intermittent bugs 20-Mar-2013 Andrew DeOrio / University of Michigan 4

  5. Post-silicon and credit cards pushl %epb movl difficult to %epb locate bug! same test many different results difficult to locate fraud! same card many different transactions 20-Mar-2013 Andrew DeOrio / University of Michigan 5

  6. Post-silicon and credit cards pushl %epb anomalous movl compare %epb time and failing same location test test anomaly? compare same new card transaction 20-Mar-2013 Andrew DeOrio / University of Michigan 6

  7. Post-silicon and credit cards pushl clustering %epb anomalous movl %epb algorithm time and same location test training data: unknown positive example examples feature feature time@1=1 time@1=2 … signal A signal B … time@1=2 time@1=1 feature feature 20-Mar-2013 Andrew DeOrio / University of Michigan 7

  8. Learning clusters clustering algorithm feature values of One test, 1 st time window passing examples signal B feature value time@1 clusters signal A feature value signal A signal B 1 st time window 20-Mar-2013 Andrew DeOrio / University of Michigan 8

  9. Searching for anomalies clustering algorithm One test, 1 st time window feature values of signal B feature value unknown examples Added after inside clusters: no bug clustering signal A feature value signal A signal B 1 st time window 20-Mar-2013 Andrew DeOrio / University of Michigan 9

  10. Searching for anomalies clustering algorithm One test, 2 nd time window Outside clusters: signal B feature value bug found # anomalies > threshold signal A feature value signal A signal B 2 nd time window 20-Mar-2013 Andrew DeOrio / University of Michigan 10

  11. Clustering in X,000 dimensions clustering algorithm • Each signal is a dimension – Circular clusters become signal B feature value hyper-spheres – High dimensionality is a challenge • In practice: – Cap #signals in one clustering set (500) signal A feature value – Group signals by module(s) (100-500 signals) – Apply clustering to each group 20-Mar-2013 Andrew DeOrio / University of Michigan 11

  12. Experimental Setup 100 random seeds: variable memory delay, 10 seeds crossbar random traffic monitored 41,743 top level control signals 1000 passing runs training data HW unknown data 1000 10 testcases buggy runs 10 bugs: e.g. , functional bug in PCX, electrical error in Xbar 20-Mar-2013 Andrew DeOrio / University of Michigan 12

  13. Bug injection Bug Description PCX_gnt SA Stuck-at in PCX grant Xbar elect Electrical error in crossbar BR fxn Functional bug in branch logic MMU fxn Functional bug in memory controller PCX_atm SA Stuck-at in PCX atomic grant PCX fxn Functional bug in PCX XBar combo Combined electrical errors in Xbar/PCX MCU combo Combined electrical errors in mem/PCX MMU combo Combined functional bugs in MMU/PCX EXU elect Electrical error in execute unit 20-Mar-2013 Andrew DeOrio / University of Michigan 13

  14. Bug detection on OpenSPARC T2 100% exact 90% signal Percentage of Testcases detected Bug not detected 80% Bug signal not Bug detected other 70% observable signals 60% detected no bug 50% effect 40% 30% false negative 20% 10% false 0% positive 9/10 bugs caught 20-Mar-2013 Andrew DeOrio / University of Michigan 14

  15. Bug signal vs. noise More training data -> more accuracy 20-Mar-2013 Andrew DeOrio / University of Michigan 15

  16. Conclusions • Machine learning automatically localizes bug time and location • Leverages a statistical approach to tolerate noise • Effective for a variety of bugs: functional, electrical and manufacturing – 336 cycles, 347 signals on average 20-Mar-2013 Andrew DeOrio / University of Michigan 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend