Applying Classification Techniques to Remotely-Collected Program - PowerPoint PPT Presentation

Applying Classification Techniques to Remotely-Collected Program Execution Data Alessandro Orso Murali Haran Georgia Institute Penn State of Technology University Alan Karr, Ashish Sanil Adam Porter National Institute of University of Statistical Sciences Maryland This work was supported in part by NSF awards CCF-0205118 to NISS, CCR-0098158 and CCR-0205265 to University of Maryland, and CCR-0205422, CCR-0306372, and CCR-0209322 to Georgia Tech.

Testing & Analysis after Deployment User User User Program User User User User P User User User User User User User User User Field Data SE Tasks [Pavlopoulou99] Test adequacy Residual coverage data [Hilbert00] Usability testing GUI interactions [Dickinson01] Failure classification Caller/callee profiles [Bowring02] Coverage analysis Partial coverage data [Orso03] Impact analysis Dynamic slices [Liblit05] Fault localization Various profiles (returns, …) … … … Alex Orso - ESEC-FSE - Sep 2005

Tradeoffs of T&A after Deployment • In-house (+) Complete control (measurements, reruns, …) (-) Small fraction of behaviors • In the field (+) All (exercised) behaviors (-) Little control • Only partial measures, no reruns, … • In particular, no oracles • Currently, mostly crashes Alex Orso - ESEC-FSE - Sep 2005

Our Goal Provide a technique for automatically identifying failures • Mainly, in the field • Useful in-house too • Automatically generated test cases Alex Orso - ESEC-FSE - Sep 2005

Overview • Motivation and Goal • General Approach • Empirical Studies • Conclusion and Future Work Alex Orso - ESEC-FSE - Sep 2005

Background: Classification Techniques Classification -> Supervised learning -> Machine learning Random Pass/Fail Executions obj 1 Forests … label x Learning Model obj n Algorithm obj 2 … … label z label y Execution Training Data Classification Classifier obj i predicted label … Model Many existing techniques (logistic regression, neural networks, tree-based classifiers, SVM, …) Alex Orso - ESEC-FSE - Sep 2005

Background: Random Forests Classifiers (size=10, time=80) • Tree-based classifiers size ≥ 14.5 • Partition predictor space in hyper-rectangular regions size ≥ 8.5 pass • Regions are assigned a label time ≤ 111 (+) Easy to interpret fail time > 55 (-) Unstable fail fail pass • Random forests [Breiman01] • Integrate many (500) tree classifiers • Classification via a voting scheme (+) Easy to interpret (+) Stable Alex Orso - ESEC-FSE - Sep 2005

Our Approach Training Set P Execution P Instrumentor inst Data Runtime Model Learning (random Algorithm Test Labels forest) Cases (pass/fail) Training (In-House) Classification (In the Field) Classifier Predicted Runtime P Execution Labels inst Data Model (pass/fail) Users Some critical open issues • What data should we collect? • What tradeoffs exist between different types of data? • How reliable/generalizable are the statistical analyses? Alex Orso - ESEC-FSE - Sep 2005

Specific Research Questions RQ1: Can we reliably classify program outcomes using execution data? RQ2: If so, what type of execution data should we collect? RQ3: How can we reduce runtime data collection overhead while still producing accurate and reliable classifications? ⇒ Set of exploratory studies Alex Orso - ESEC-FSE - Sep 2005

Experimental Setup (I) Subject program • JABA bytecode analysis library • 60 KLOC, 400 classes, 3000 methods • 19 single-fault versions (“golden version” + 1 real fault) Training set • 707 test cases (7 drivers applied to 101 input programs) • Collected various kinds of execution data (e.g., counts for throws, catch blocks, basic blocks, branches, methods, call edges, …) • “Golden version” to label passing/failing runs Alex Orso - ESEC-FSE - Sep 2005

Experimental Setup (II) Training Set Model 2/3 Learning Training Set Training Set (random 2/3 Algorithm Training Set forest) 1/3 In-House In the Field Predicted Classifier Training Set Users’ Runs Users’ Runs Outcome Model 1/3 (pass/fail) Ideal setting, but classification error (misclassification rate) • Expensive • Difficult to get enough data points • Oracle problem => Simulate users’ runs Alex Orso - ESEC-FSE - Sep 2005

RQ1 & RQ2: Can We Classify at All? How? • RQ1: Can we reliably classify program outcomes using execution data? • RQ2: Assuming we can classify Basic-block program outcomes, what type of counts exec i execution data should we collect? … pass/fail • We first considered a specific kind of execution data: basic-block counts (~20K) (simple measure, intuitively related to faults) • Results: classification error estimates always almost 0! • But, time overheard ~15% and data volume not negligible => Other kinds of execution data Alex Orso - ESEC-FSE - Sep 2005

RQ1 & RQ2: Can We Classify at All? How? • We considered other kinds of execution data: • Basic-block counts yielded almost perfect predictors => richer data not considered • Counts for: throws, catch-blocks, methods, and call-edges • Results • Throw and catch-block counts are poor predictors • Method counts produced nearly perfect models • As accurate as block counts, but much cheaper to collect • 3000 methods vs. 20000 blocks (overhead < 5%) • Branch and call-edge counts equally accurate, but more costly than method counts Preliminary conclusion (1): Possible to classify program runs; method counts provided high accuracy at low cost Alex Orso - ESEC-FSE - Sep 2005

RQ3: Can We Collect Less Information? • Method-count models used between 2 and 7 method counts. Great for instrumentation, but… • Two alternative hypotheses • Few methods are relevant -> must choose specific methods well • Many, redundant methods -> method selection less important • To investigate, performed 100 random samplings • Took 10% random samples of method counts and rebuilt models • Models were excellent 90% of the times • Evidence that many method counts are good predictors Preliminary conclusion (2): “failure signal” spread, rather than localized to single entities => estimates can be based on a few data, collected with negligible overhead Alex Orso - ESEC-FSE - Sep 2005

Validity of the Analysis Two main issues to consider • Multiplicity • Generality Alex Orso - ESEC-FSE - Sep 2005

Statistical Issues -- Multiplicity When # of predictors far exceeds # of data points, the likelihood of finding spurious relationship increases Executions • i.e., random relationships confused for real ones 3 7 11 2 … Methods We took two steps to address the problem 21 8 69 4 … 0 58 7 12 … • Consider method counts 3376 0 3 … (least number of predictors) • Conducted study in which we • Randomly permuted method counts Executions • Took a 10% random sample of method 3 7 11 2 … counts and rebuilt models (100 times) Methods 69 8 4 21 … => Never found good models based on this data 0 58 7 12 … 3376 0 3 … Preliminary conclusion (3): Results are unlikely to be due to random chance Alex Orso - ESEC-FSE - Sep 2005

Statistical Issues -- Generality Classifiers for 1 specific bug are useful, but… • We would like to have models that encode “correct behavior” for the application in general • Looked for predictors that worked in general ⇒ Found 11 excellent predictors for all versions Programs typically contain more than 1 bug • Applied our approach to 6 multi-bug versions • Models had error rates less than 2% in most cases Preliminary conclusion (4): Results promising w.r.t. generality (but need to investigate further) Alex Orso - ESEC-FSE - Sep 2005

Summary • Possible to classify program outcomes using execution data • Method counts gave high accuracy at low cost • Estimates can be computed based on very few data, collected with negligible overhead • Our results are unlikely to depend on random chance and are promising in terms of generality • But , these are still preliminary results, and we need to investigate further Alex Orso - ESEC-FSE - Sep 2005

Future Work • Multiple faults • Investigate relationship between predictors and failures • Investigate relationship between predictors and faults • Conduct further experiments with system(s) in actual use Alex Orso - ESEC-FSE - Sep 2005

Applying Classification Techniques to Remotely-Collected Program - PowerPoint PPT Presentation

Applying Classification Techniques to Remotely-Collected Program Execution Data Alessandro Orso Murali Haran Georgia Institute Penn State of Technology University Alan Karr, Ashish Sanil Adam Porter National Institute of University of

WORKING REMOTELY HINTS, TIPS AND PRACTICAL SUPPORT WORKING REMOTELY AGENDA Introduction

DRUPAL [ REMOTELY ] Is Remote Drupal Employment For You? Gray Sadler, Drupal Developer Drupal

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Working Remotely and Managing Remote Teams Tips and resources for employees during the COVID-19

4/30/2020 1 May 5 1:00 p.m. Building and Managing Teams Remotely May 7 10:00 a.m. Creating

Working Remotely Information Technology Services CRICOS code 00025B Purpose The purpose of this

DRONE INSURANCE So Many Acronyms UAS Unmanned Air System UAV Unmanned Air Vehicle

Remotely Connected Electric Field Justin Long, Brandon McDonnell Generator Dielectrophoresis

Tips On Serving Individuals Remotely and in the Community During the Virus Crisis March 26, 2020

08 Your shell and working remotely CS 2043: Unix Tools and Scripting, Spring 2019 [1] Matthew

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

An Empirical Study on the Use of Defect U N I VE R SI TY OF WASHI N G TON Prediction for Test

COUNCIL OF THE DISTRICT OF COLUMBIA OFFICE OF THE BUDGET DIRECTOR J E N N I F E R B U D O F F,

Reducing Nutrient-Algal Biomass Relationship Uncertainty Through Mechanistic Modeling Thomas W.

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,

Low-cost Management Training in the Bangladeshi Garment Sector Vanessa Schreiber , Atonu

Religious Freedom and Economic Development. A Conceptual and Empirical Review Waqas Ahmad

Discussing proof in STEM fields Math and Science teachers use of inductive evidence Nick

Integrating empirical evidence on forest landowner behavior in forest sector models Stefan

Applying Classification Techniques to Remotely-Collected Program - PowerPoint PPT Presentation

Applying Classification Techniques to Remotely-Collected Program Execution Data Alessandro Orso Murali Haran Georgia Institute Penn State of Technology University Alan Karr, Ashish Sanil Adam Porter National Institute of University of

WORKING REMOTELY HINTS, TIPS AND PRACTICAL SUPPORT WORKING REMOTELY AGENDA Introduction

DRUPAL [ REMOTELY ] Is Remote Drupal Employment For You? Gray Sadler, Drupal Developer Drupal

Graph Classification Classification Outline Introduction, Overview Classification using

Classification of Symmetry Classification of Symmetry Classification of Symmetry Classification

1/88 Presentation: Advanced Techniques 2/88 Presentation: Advanced Techniques 3/88

Intraday Techniques Intraday Techniques Intraday Techniques Intraday Techniques Combining

Working Remotely and Managing Remote Teams Tips and resources for employees during the COVID-19

4/30/2020 1 May 5 1:00 p.m. Building and Managing Teams Remotely May 7 10:00 a.m. Creating

Working Remotely Information Technology Services CRICOS code 00025B Purpose The purpose of this

DRONE INSURANCE So Many Acronyms UAS Unmanned Air System UAV Unmanned Air Vehicle

Remotely Connected Electric Field Justin Long, Brandon McDonnell Generator Dielectrophoresis

Tips On Serving Individuals Remotely and in the Community During the Virus Crisis March 26, 2020

08 Your shell and working remotely CS 2043: Unix Tools and Scripting, Spring 2019 [1] Matthew

(a) Quantitative classification (b) Qualitative classification (c) Area classification (d) Simple

Classification Image Classification Set of predefined categories [eg: table, apple, dog, giraffe]

Classification 1 Classification: Basic Concepts and Methods Classification: Basic Concepts

An Empirical Study on the Use of Defect U N I VE R SI TY OF WASHI N G TON Prediction for Test

COUNCIL OF THE DISTRICT OF COLUMBIA OFFICE OF THE BUDGET DIRECTOR J E N N I F E R B U D O F F,

Reducing Nutrient-Algal Biomass Relationship Uncertainty Through Mechanistic Modeling Thomas W.

MELODI M achin E L earning, O ptimization, &amp; D ata I nterpretation @ UW Iyer &amp; Bilmes,

Low-cost Management Training in the Bangladeshi Garment Sector Vanessa Schreiber , Atonu

Religious Freedom and Economic Development. A Conceptual and Empirical Review Waqas Ahmad

Discussing proof in STEM fields Math and Science teachers use of inductive evidence Nick

Integrating empirical evidence on forest landowner behavior in forest sector models Stefan

MELODI M achin E L earning, O ptimization, & D ata I nterpretation @ UW Iyer & Bilmes,