Human Competitiveness of Genetic Programming for Spectrum Based - PowerPoint PPT Presentation

Human Competitiveness of Genetic Programming for Spectrum Based Fault Localisation Shin Yoo 1 , Xiaoyuan Xie 2 , Fei-Ching Kuo 3 , Tsong Yueh Chen 3 , Mark Harman 4 1: KAIST, Republic of Korea 2: Wuhan University, China 3: Swinburne University of Technology, Australia 4: University College London/Facebook, UK HUMIES@GECCO 2017

Automated Debugging • Debugging is hard for humans: we increasingly have to work on and with large code base written by others. • Debugging is hard for machines : automated repair techniques rely heavily on automated fault localisation.

Spectrum Based Fault Localisation Program

Spectrum Based Fault Localisation Program Tests

Spectrum Based Fault Localisation Program Spectrum Tests

Spectrum Based Fault Localisation e p e f − e p + n p + 1 Program Spectrum Risk Evaluation Formula Tests

Spectrum Based Fault Localisation e p e f − e p + n p + 1 Program Spectrum Risk Evaluation Formula Tests Ranking

Spectrum Based Fault Localisation e p e f − e p + n p + 1 Program Spectrum Risk Evaluation Formula Higher ranking = Fewer statements to check Tests Ranking

Spectrum Based Fault Localisation Structural Test Test Test Spectrum Tarantula Rank Elements t 1 t 2 t 3 e p e f n p n f 1 0 0 2 0.00 9 s 1 • 1 0 0 2 0.00 9 s 2 • 1 0 0 2 0.00 9 s 3 • 1 0 0 2 0.00 9 s 4 • 1 0 0 2 0.00 9 s 5 • 1 1 0 1 0.33 4 s 6 • • s 7 (faulty) 0 2 1 0 1.00 1 • • 1 1 0 1 0.33 4 s 8 • • 1 2 0 0 0.50 2 s 9 • • • Result P F F

Spectrum Based Fault Localisation Structural Test Test Test Spectrum Tarantula Rank Elements t 1 t 2 t 3 e p e f n p n f 1 0 0 2 0.00 9 s 1 • 1 0 0 2 0.00 9 s 2 • e f 1 0 0 2 0.00 9 s 3 • e f + n f 1 0 0 2 0.00 9 s 4 • Tarantula = e f e p 1 0 0 2 0.00 9 s 5 e p + n p + • e f + n f 1 1 0 1 0.33 4 s 6 • • s 7 (faulty) 0 2 1 0 1.00 1 • • 1 1 0 1 0.33 4 s 8 • • 1 2 0 0 0.50 2 s 9 • • • Result P F F

Spectrum Based Fault Localisation Structural Test Test Test Spectrum Tarantula Rank Elements t 1 t 2 t 3 e p e f n p n f 1 0 0 2 0.00 9 s 1 • 1 0 0 2 0.00 9 s 2 • 1 0 0 2 0.00 9 s 3 • 1 0 0 2 0.00 9 s 4 • 1 0 0 2 0.00 9 s 5 • 1 1 0 1 0.33 4 s 6 • • s 7 (faulty) 0 2 1 0 1.00 1 • • 1 1 0 1 0.33 4 s 8 • • 1 2 0 0 0.50 2 s 9 • • • Result P F F

(Empirical) State of the Art (circa 2012) Over 30 formulæ in the literature, manually developed over a decade’s time: none guaranteed to perform best for all types of faults

(Empirical) State of the Art (circa 2012) e f 2 e f e f + n p + 2( e p + n f ) e f + n f + e p 2( e f + n p ) e f 2( e f + n p ) + e p + n f e f e f + n f + e p e f + 2( n f + e p ) Over 30 formulæ in the literature, e f manually developed over a decade’s time: e f + n p n f + e p none guaranteed to perform best for all types of faults n f + e p e f 2 e f e f + n p 2 e f + n f + e p e f + n f + e p + n p e f + n f + e p + n p e f + n p − n f − e p 1 e f e f 2( + ) e f e f + n f + e p + n p e f + n f e f + e p e f + n f e p e f e p + n p + e f + n f

Evolving Formulæ e p e f − e p + n p + 1 Program Spectrum Risk Evaluation Formula Tests Ranking

Evolving Formulæ Program Spectrum P S e p e f − Tests e p + n p + 1 P S Risk Evaluation Formula Training Data Ranking

Evolving Formulæ Program Spectrum P S GP Tests P S Training Data Ranking

Evolving Formulæ Program Spectrum P S GP Tests P S Training Data Fitness (minimise)

Evolving Formulæ e 2 f (2 e p + 2 e f + 3 n p ) Program Spectrum P S GP Tests e 2 f ( e 2 f + √ n p ) P S . . . Training Data Fitness (minimise)

Our Claims

Our Claims GP evolved SBFL formulas are provably better than many human designs.

Our Claims GP evolved SBFL formulas are provably better than many human designs. We proved that no human can surpass what GP evolved , ever.

Our Claims GP evolved SBFL formulas are provably better than many human designs. We proved that no human can surpass what GP evolved , ever. GP has transformed the future research on fault localisation.

Crash Course into Our Proof System Statement Ranking To show that Y dominates X, we show that: S Y B S X S Y B ⊆ S X B ∧ S X A ⊆ S Y B A S Y F S X (assuming that we break ties F in F sets consistently) S Y Equivalence is defined as: S X A A X ↔ Y ⇐ ⇒ X → Y ∧ Y → X Formula X Formula Y

Crash Course into Our Proof System • Maximal Groups: a set of formulas that are equivalent to each other, but are strictly better to some others • Previous work theoretically proved the existence of maximal groups with respect to the space of known formulas: • ER1 (contains 2 manually designed formulas) and ER5 (contains 3 manually designed formulas)

GP’s Human Competitiveness • GP expanded the known Name Formula expression ⇢ − 1 if e f <F maximal groups: Naish1 if e f = F P − e p ER1’ e p Naish2 e f − e p + n p +1 • GP added one additional 1 GP13 e f (1 + 2 e p + e f ) formula to ER1 Wong1 e f ER5 e f Russel & Rao e f + n f + e p + n p ⇢ 0 • GP founded three new if e f <F Binary 1 if e f = F maximal groups, each 2( e f + √ n p ) + √ e p GP02 containing one GP-evolved q | e 2 f − √ e p | GP03 formula p GP19 | e p − e f + n f − n p | e f

GP’s Human Competitiveness • We have proved that there is no greatest formula (i.e. one that outperforms all maximals): • GP evolved the best possible formula. • No future human endeavour can surpass GP’s results.

GP’s Influence on Future Research • Manually designing SBFL formulae is no longer productive . • We need richer information than program spectrum: GP can deal with increased complexity better than human. • GP continues to produce state-of-the-art localisation results , outperforming SVMs (ISSTA 2017).

Automated Debugging Debugging is hard. • Debugging is hard for humans: we increasingly have to work on and with large code base written by others. • Debugging is hard for machines : automated repair techniques rely heavily on automated fault localisation.

Automated Debugging Debugging is hard. • Debugging is hard for humans: we increasingly have to work on and with large code base written by others. • Debugging is hard for machines : automated repair techniques rely heavily on automated fault localisation. GP’s Human GP-evolved fault localisation Competitiveness • GP expanded the known Name Formula expression techniques were provably better ⇢ − 1 if e f <F maximal groups: Naish1 P − e p if e f = F ER1’ Naish2 e p e f − e p + n p +1 • GP added one additional 1 GP13 e f (1 + 2 e p + e f ) formula to ER1 Wong1 e f ER5 e f Russel & Rao e f + n f + e p + n p ⇢ 0 • GP founded three new than over a decade’s manual work. if e f <F Binary 1 if e f = F maximal groups, each GP02 2( e f + √ n p ) + √ e p containing one GP-evolved q GP03 | e 2 f − √ e p | formula p GP19 e f | e p − e f + n f − n p |

Human Competitiveness of Genetic Programming for Spectrum Based - PowerPoint PPT Presentation

Human Competitiveness of Genetic Programming for Spectrum Based Fault Localisation Shin Yoo 1 , Xiaoyuan Xie 2 , Fei-Ching Kuo 3 , Tsong Yueh Chen 3 , Mark Harman 4 1: KAIST, Republic of Korea 2: Wuhan University, China 3: Swinburne University

Genetic Programming What is it? Genetic Programming Genetic programming (GP) is an

COMPETITIVENESS OF THE AFRICAN COMPETITIVENESS OF THE AFRICAN COMPETITIVENESS OF THE AFRICAN

1 2 Genetic Program Genetic Program Parameter 3 Genetic Program Genetic Program 4 Softcoding

Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat Chicken Human 1 Human 2 Rat

National Competitiveness Council Ireland s Competitiveness S corecard 2011 1 NCC

competitiveness www.ecrc.mn Economic Policy and Competitiveness Research Center OUR VISION The

Genetic.io Genetic Algorithms in all their shapes and forms ! Genetic.io Make something of your

Germ- -line Genetic Therapy line Genetic Therapy Germ Munson- -Davis Look Bravely at a Davis

Spectrum Sharing in Cognitive Radio Networks By: H.Feizresan Summer 2009 1 Spectrum sharing in

GENETIC PROGRAMMING John R. Koza Foresight Institute Workshop May 28, 2017 GENETIC PROGRAMMING

Introduction to Genetic Epidemiology CM van Duijn Genetic Epidemiology Unit Gene Discovery

Introduction to Genetic Epidemiology CM van Duijn Genetic Epidemiology Unit Gene Discovery

Genetic drift (two types) Genetic drift: changes in allele frequencies due to chance. Founder

All in the Family How Genetic Counselors Facilitate Familial Genetic Testing Amanda Openshaw, MS,

Spectrum The Electromagnetic Spectrum The EM spectrum is the ENTIRE range of EM waves in order

National Advanced Spectrum and Communications Test Network (NASCTN) T rusted Spectrum Testing 5G

Phase transitions and critical behavior in 2D Dirac materials Laura Classen Heidelberg, March

On the Goodwillie Derivatives of the Identity in Structured Ring Spectra Duncan Clark Ohio State

Byzantine Vector Consensus in Complete Graphs Nitin Vaidya University of

Review on results by the FLAG working group Hadron 2011 K unstlerhaus, M unchen

DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures Quan Chen, Long

GRAVITY DUALS OF 2D SUSY GAUGE THEORIES BASED ON: 0909.XXXX with E. Conde and A.V. Ramallo

A general S -unit equation solver and tables of elliptic curves over number fields Benjamin

REVIEW TALK (2+1)d dualities with N = 2 supersymmetry Antonio Amariti INFN - Sezione di Milano