scent intensification for testing debugging
play

Scent Intensification for Testing & Debugging Rui Abreu - PowerPoint PPT Presentation

Scent Intensification for Testing & Debugging Rui Abreu Economic Relevance [Embedded] Software Exponential increase LOC Despite thorough design / testing, constant fault density Typically 5-15bugs / KLOC, 75 min / bug


  1. Scent Intensification for Testing & Debugging Rui Abreu

  2. Economic Relevance [Embedded] Software • Exponential increase LOC • Despite thorough design / testing, constant fault density • Typically 5-15bugs / KLOC, 75 min / bug ➤ $4K/KLOC • Development cost $15-30K / KLOC ➤ 15-25% diagnostic cost • Residual defects cost US $60B/year [NIST 2002] • estimated 20% due to fault diagnosis (downtime, labor) •

  3. The birth of debugging: your guess?

  4. Software Errors mentioned in Ada Byron’s notes on Charles Bababage’s analytical engine 1840 2015

  5. First actual bug and actual debugging: Admiral Grace Hopper’s associates working on Mark II Computer at Harvard University 1840 1947 2015 S

  6. UNIVAC 1100’s FLIT - Fault Localization by Interpretive Testing 1840 1947 1962 2015 S

  7. Weiser’s Breakthrough paper. Input: source code and program point 1840 1947 1962 1981 2015 S

  8. Stallman’s GDB Input : faulty program and 1 failed test case 1840 1947 1962 1981 1986 2015 W S

  9. Korel and Laski’s dynamic slicing Agrawal Input: source code and failed test case 1840 1947 1962 1981 1986 1988 1993 2015 W S S

  10. DDD Input: faulty program and failed test case 1840 1947 1962 1981 1986 1988 1993 2015 W S S

  11. Delta Debugging Input: faulty program, 1 failed and 1 passed test case 1840 1947 1962 1981 1986 1988 1993 1996 2015 W S S

  12. Statistical Debugging Input: faulty program, test suite 1840 1947 1962 1981 1986 1988 1993 1996 2002 2015 W S S

  13. E Z U NIT S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 2015 W S S

  14. VIDA E S 2009 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 2015 W S S

  15. E S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 20092011/12 2015 W S S

  16. Also a survey paper is under review at TSE. More than 300 works cited. E S 1840 1947 1962 1981 1986 1988 1993 1996 2002 2007 20092011/12 2015 W S S

  17. Focus of this talk • Techniques that take into account spectra • aka abstraction of program traces • Spectrum-based Fault Localization ( SFL ) • Statistical vs. reasoning • Lightweight, scalable

  18. Integrates well with testing SFL: Principle (1) 1 Test 2 3 4 5 suite 6 t1 t2 7 8 9 t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 0 0 0 0 0 0 0 0 0 0 0 0 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail

  19. Integrates well with testing SFL: Principle (2) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 7 8 9 t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 1 1 0 0 0 1 1 1 1 0 1 0 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail

  20. Integrates well with testing SFL: Principle (3) 1 Test Status 2 3 4 5 suite t1 ! 6 7 8 9 t2 ! t3 t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 2 1 1 0 1 2 2 2 2 1 1 1 Touched, pass 0 0 0 0 0 0 0 0 0 0 0 0 Touched, fail

  21. Integrates well with testing SFL: Principle (4) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 2 1 1 0 1 2 2 2 2 1 1 1 Touched, pass 1 0 0 1 1 1 1 0 1 0 0 0 Touched, fail

  22. Integrates well with testing SFL: Principle (5) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 ! 10 11 12 t5 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 Touched, pass 1 0 0 1 1 1 1 0 1 0 0 0 Touched, fail

  23. Integrates well with testing SFL: Principle (6) 1 Test Status 2 3 4 5 suite 6 t1 ! t2 ! 7 8 9 t3 " t4 ! 10 11 12 t5 " 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 Touched, pass 2 0 0 2 1 2 2 0 2 1 0 0 Touched, fail

  24. Integrates well with testing SFL: Principle (7) 1 Status 2 3 4 5 6 t1 ! Components are 7 8 9 t2 ! ranked according to t3 " the likelihood of t4 ! causing detected errors 10 11 12 t5 " 1 2 3 4 5 6 7 8 9 10 11 12 Not touched 3 1 1 0 1 3 3 2 3 1 3 3 2 0 0 2 1 2 2 0 2 1 0 0 Touched, fail

  25. Program Test Suite class Triangle {… t 6 Suspiciousness t 1 t 2 t 3 t 4 t 5 static int type(int a, int b, int c) { int type = SCALENE; 0.09998 if ( (a == b) && (b == c) ) 0.09998 type = EQUILATERAL; 0.10001 else if ( (a*a) == ((b*b) + (c*c)) ) 0.09999 type = RIGHT; 0.10001 else if ( (a == b) || (b == a) ) /* FAULT */ 0.10000 type = ISOSCELES; 0.10001 return type; } 0.09998 static double area(int a, int b, int c) { double s = (a+b+c)/2.0; 0.10000 return Math.sqrt(s*(s-a)*(s-b)*(s-c)); } ... } 0.10000 *Fault* Spectra 25

  26. Suspiciousness score • Each component (row) is ranked according to their similarity to the error vector • Many similarity coefficients exist. • Ochiai similarity is equivalent to the cosine of the angle between two vectors in a n-dimensional space Abreu, R., Zoeteweij, P., Golsteijn, R., & Van Gemund, A. J. (2009). A practical evaluation of spectrum-based fault localization. Journal of Systems and Software, 82(11), 1780-1792. Lucia, L., Lo, D., Jiang, L., Thung, F., & Budi, A. (2014). Extended comprehensive study of association measures for fault localization. Journal of Software: Evolution and Process, 26(2).

  27. Diagnostic Performance Rank Line Suspicious Statement number Suspiciousness Position 1º type = EQUILATERAL; 3 0.10001 2º type = RIGHT; 5 0.10001 = 4 C d 3º type = ISOSCELES; 7 0.10001 4º else if ( (a == b) || (b == a) ) /* FAULT */ 6 0.10000 5º double s = (a+b+c)/2.0; 9 0.10000 6º return Math.sqrt(s*(s-a)*(s-b)*(s-c)); 10 0.10000 7º else if ( (a*a) == ((b*b) + (c*c)) ) 4 0.09999 8º int type = SCALENE; 1 0.09998 9º if ( (a == b) && (b == c) ) 2 0.09998 10º return type; } 8 0.09998 13 R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09

  28. Can we do better? • Statistics-based SFL does not reason in terms of multiple faults c 1 c 2 c 3 P/F 1 0 0 1 (F) 0 1 0 1 (F) 1 0 1 1 (F) 0 1 1 1 (F) 1 1 0 0 (P) Diagnostic report = < c 3 , c 1 , c 2 >

  29. Reasoning-based Approach • Barinel is a reasoning-based approach • Integrates the best of model-based diagnosis with spectra c 1 c 2 c 3 P/F 1 0 0 1 (F) c 1 must be faulty c 2 cannot be single fault 0 1 0 1 (F) c 3 cannot be single fault 1 0 1 1 (F) c 2 , c 3 cannot be double fault 0 1 1 1 (F) 1 1 0 0 (P)

  30. Reasoning-based Approach • Barinel is a reasoning-based approach • Integrates the best of model-based diagnosis with spectra c 1 c 2 c 3 P/F 1 0 0 1 (F) 0 1 0 1 (F) c 2 must be faulty c 1 cannot be single fault 1 0 1 1 (F) c 1 cannot be single fault 0 1 1 1 (F) c 1 , c 3 cannot be double fault 1 1 0 0 (P)

  31. Reasoning-based Approach • Barinel is a spectrum-based reasoning approach • Integrates the best of model-based diagnosis with spectra Summary: c 1 c 2 c 3 P/F c1, c2 faulty, but not single-fault c1, c2 can be double-fault 1 0 0 1 (F) c1,c3 nor c2,c3 can be double-fault 0 1 0 1 (F) so {c1,c2} is the only diagnosis possible 1 0 1 1 (F) (subsuming the triple fault {c1,c2,c3}) 0 1 1 1 (F) 1 1 0 0 (P)

  32. Spectrum-based reasoning 1. Generate sets of components that explain observed erroneous behavior • Equivalent to compute minimal hitting set (Staccato/MHS2**) • Given failed executions 2. Rank candidates according to their probability of being the true fault explanation ➤ Baye’s rule • Given both passed and failed executions R. Abreu, P. Zoeteweij, and A. J. van Gemund, “Spectrum-Based Multiple Fault Localization”, ASE ’09 **https://github.com/npcardoso/MHS2 (citable via https://zenodo.org/record/10037) ➤ contribute to the project; send pull requests; email us!

  33. Diagnostic Performance 100 75 % of faulty versions 50 25 0 0 20 40 60 80 100 Effort (% of program to be examined to find the fault) Worst technique Ideal technique

  34. 100 75 Intersection % of faulty versions Union NN DD 50 Tarantula Ochiai Sober CrossTab 25 PPDG Barinel 0 0 10 20 30 40 50 60 70 80 90 100 Effort (% of program to be examined to find the fault)

  35. No similarity coefficient is statistically significantly better!

  36. How good are we? • Best Performing techniques still require to inspect 10% of the code… • 100 LOC ➤ 10LOC • 10,000 LOC ➤ 1,000LOC • 1,000,000 LOC ➤ 10,000LOC

  37. Case Studies (NXP) Case To Inspect Out of / Previous Load Problem 2 logical threads 315 Teletext Lock-Up 2 blocks 60K NVM corrupt 96 blocks, 10 files 150K, 1.8K Scrolling Bug 5 blocks 150K Invisible Pages 12 blocks 150K Tuner Problem 2 files 1.8K Zapping Crash 1 run (15 mins) 1 day (develop) Wrong Audio 1 run (15 mins) ½ day (expert)

  38. Humm…. • Are we properly quantifying diagnostic accuracy? • Comparing techniques based on the rankings • Assuming perfect bug understanding • Are we showing providing an ecosystem offering this techniques?

  39. Human Studies Parnin & Orso et al observed that there is a lack of human studies! (ISSTA’11)

  40. Previously known as GZoltar Crowbar — http://www.crowbar.io —

  41. Visualizations

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend