Application of Information or How I learnt to stop Theory to Fault - - PowerPoint PPT Presentation

application of information
SMART_READER_LITE
LIVE PREVIEW

Application of Information or How I learnt to stop Theory to Fault - - PowerPoint PPT Presentation

p ( x ) log 2 p ( x ) x X X ) = CREST Open Workshop #41 Application of Information or How I learnt to stop Theory to Fault worrying and love the bomb entropy Localisation by Shin Yoo This talk is Not a theoretical


slide-1
SLIDE 1

CREST Open Workshop #41

Application of Information Theory to Fault Localisation

  • r

“How I learnt to stop worrying and love the bomb entropy” by Shin Yoo

) = − X

x∈X

p(x) log2 p(x)

slide-2
SLIDE 2

This talk is…

❖ Not a theoretical masterclass on application of Shannon

Entropy to software engineering, unfortunately

❖ Rather a story of a clueless software engineer who learnt

to appreciate the power of information theory

slide-3
SLIDE 3

The Problem Domain

❖ Fault Localisation: given

  • bservations from test

execution (which includes both passing and failing test cases), identify where the faulty statement lies.

slide-4
SLIDE 4

Spectra Based Fault Localisation

ef − ep ep + np + 1

Formula (Suspiciousness) Ranking Program Tests Spectrum

Higher ranking = Fewer statements to check

slide-5
SLIDE 5

Structural Test Test Test Spectrum Tarantula Rank Elements t1 t2 t3 ep ef np nf s1

  • 1

2 0.00 9 s2

  • 1

2 0.00 9 s3

  • 1

2 0.00 9 s4

  • 1

2 0.00 9 s5

  • 1

2 0.00 9 s6

  • 1

1 1 0.33 4 s7 (faulty)

  • 2

1 1.00 1 s8

  • 1

1 1 0.33 4 s9

  • 1

2 0.50 2 Result P F F

Spectra-Based Fault Localisation

Tarantula =

ef ef +nf ep ep+np + ef ef +nf

slide-6
SLIDE 6

How do we evaluate these?

ef ef + nf + ep

2ef 2ef + nf + ep

2ef ef + nf + ep

ef nf + ep

1 2( ef ef + nf + ef ef + ep )

ef ef + nf + ep + np

ef + np − nf − ep ef + nf + ep + np

ef + np ef + nf + ep + np

2(ef + np) 2(ef + np) + ep + nf

ef ef + np + 2(ep + nf)

ef + np nf + ep

ef ef +nf ep ep+np + ef ef +nf

Op1 = ( −1 if nf > 0 np

  • therwise

Op2 = ef − ep ep + np + 1

Tarantula =

ef ef +nf ep ep+np + ef ef +nf

Jaccard = ef ef + nf + ep

Ochiai = ef p (ef + nf) · (ef + ep)

AMPLE = | ef ef + nf − ep ep + np |

Wong1 = ef Wong2 = ef − ep

Wong3 = ef − h, h =      ep if ep ≤ 2 2 + 0.1(ep − 2) if 2 < ep ≤ 10 2.8 + 0.001(ep − 10) if ep > 10

slide-7
SLIDE 7

Expense Metric

❖ Assumes that the developer

checks the ranking from top to bottom

❖ The higher the faulty statement

is ranked, the earlier the fault is found

Formula X Formula Y

SX

B

SY

B

SX

F

SX

A

SY

A

SY

F Statement Ranking

E(τ, p, b) = Ranking of b according to τ Number of statements in p ∗ 100

slide-8
SLIDE 8

Does every test execution help you?

❖ When a statement is executed by a failing test, we

suspect it more; by a passing test, we suspect it less.

❖ Ideally, we want the failing test to only execute the

faulty statement, which is not possible of course.

❖ Practically, we want the subset of test runs that gives us

the most distinguishing power, and we want this as early as possible.

slide-9
SLIDE 9

What is the information gain of executing one more test?

slide-10
SLIDE 10

PTi(B(sj)) = τ(sj|Ti) Pm

j=1 τ(sj|Ti)

Convert suspiciousness into probability

HTi(S) = −

m

X

j=1

PTi(B(sj)) · log PTi(B(sj))

Compute the Shannon Entropy

  • f Fault Locality

PTi+1(B(sj)) = PTi+1(B(sj)|F(ti+1)) · α + PTi+1(B(sj)|¬F(ti+1)) · (1 − α)

Assuming the failure rate observed so far, compute lookahead P We can predict the information gain of a test case!

slide-11
SLIDE 11

grep, v3, F_KP_3

0.5 1.0 Suspiciousness

FLINT TCP Random

20 40 60 80 100

Percentage of Executed Tests −20 −10 10 Expense Reduction

  • Exp. Reduction FLINT
  • Exp. Reduction Greedy

flex, v5, F_JR_2

1.0 Suspiciousness

FLINT TCP Random

20 40 60 80 100

Percentage of Executed Tests −15 −5 5 10 Expense Reduction

  • Exp. Reduction FLINT
  • Exp. Reduction Greedy

flex, v5, F_AA_4

1.0 Suspiciousness

FLINT TCP Random

20 40 60 80 100

Percentage of Executed Tests −20 10 20 30 40 Expense Reduction

  • Exp. Reduction FLINT
  • Exp. Reduction Greedy

sed, v2, F_AG_19

1.0 Suspiciousness

FLINT TCP Random

20 40 60 80 100

Percentage of Executed Tests −40 −20 10 Expense Reduction

  • Exp. Reduction FLINT
  • Exp. Reduction Greedy
slide-12
SLIDE 12

Lessons Learned #1

❖ Probabilistic view works! Even when there are some

wrinkles in your formulations.

❖ Software artefacts tend to exhibit continuity (e.g.

coverage of a test case does not change dramatically between versions, etc). This helps the point 1.

slide-13
SLIDE 13

Problem Solved…?

❖ Various empirical study established partial rankings

between formulas at first.

❖ Then a theoretical study proved the dominance between

formulas and their performance in Expense metrics.

slide-14
SLIDE 14

But then machines arrived.

Aside: we also automatically evolved formulas using GP, which we then proved cannot be bettered by humans. So technically machines arrived twice.

slide-15
SLIDE 15

Machine Based Evaluation

❖ Qi et al. took a backward

approach

❖ Use suspicious score as weights

to mutate program states until Genetic Programming can repair the fault.

❖ The better the localisation, the

quicker the repair will be found.

slide-16
SLIDE 16

Strange Results

❖ Theory says Jaccard formula is worse than Op2. ❖ But machines found it much easier to repair programs

when using the localisation from Jaccard.

❖ Why?

slide-17
SLIDE 17

Abstraction destroys Information

❖ Expense metric assumes linear

consumption of the result (i.e. developer checks statements following the ranking).

❖ GP consumes raw

suspiciousness numbers, which is a much richer source of information. <

Same ranking, completely different amount of information.

slide-18
SLIDE 18

New Evaluation Metric

❖ Following the way we

predicted information yield, we should be able to describe the true fault locality as a probability distribution.

❖ Subsequently, measure the

cross-entropy between the true distribution and one generated by any technique.

DKL(PL||Pτ) = X

i

ln PL(si) Pτ(si) PL(si) technique, L, that can always pinpoint sf as follows: L(si) = ⇢ 1 (si = sf) ✏ (0 < ✏ ⌧ 1, si 2 S, si 6= sf) that we can convert outputs of FL techniques that

Pτ(si) = ⌧(si) Pn

i=1 ⌧(si), (1 ≤ i ≤ n)

erts suspiciousness scores given by any

Locality Information Loss (LIL) defined with Kullback-Leibler divergence

slide-19
SLIDE 19

Worth a thousand words.

Jaccard (LIL=4.92) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness MUSE (LIL=0.40) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness Ochiai (LIL=5.96) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness Op2 (LIL=7.34) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness

slide-20
SLIDE 20

Lessons Learned #2

❖ Entropy measures are much richer than simply counting

something: it gives you a holistic view.

❖ Cross-entropy is a vastly underused tool in software

engineering in general.

slide-21
SLIDE 21

Spectra Based Fault Localisation

ef − ep ep + np + 1

Formula (Suspiciousness) Ranking Program Tests Spectrum

Higher ranking = Fewer statements to check

slide-22
SLIDE 22

grep, v3, F_KP_3

0.5 1.0 Suspiciousness

FLINT TCP Random

20 40 60 80 100

Percentage of Executed Tests −20 −10 10 Expense Reduction

  • Exp. Reduction FLINT
  • Exp. Reduction Greedy