CREST Open Workshop #41
Application of Information Theory to Fault Localisation
- r
“How I learnt to stop worrying and love the bomb entropy” by Shin Yoo
) = − X
x∈X
Application of Information or How I learnt to stop Theory to Fault - - PowerPoint PPT Presentation
p ( x ) log 2 p ( x ) x X X ) = CREST Open Workshop #41 Application of Information or How I learnt to stop Theory to Fault worrying and love the bomb entropy Localisation by Shin Yoo This talk is Not a theoretical
CREST Open Workshop #41
“How I learnt to stop worrying and love the bomb entropy” by Shin Yoo
x∈X
❖ Not a theoretical masterclass on application of Shannon
❖ Rather a story of a clueless software engineer who learnt
❖ Fault Localisation: given
execution (which includes both passing and failing test cases), identify where the faulty statement lies.
ef − ep ep + np + 1
Formula (Suspiciousness) Ranking Program Tests Spectrum
Higher ranking = Fewer statements to check
Structural Test Test Test Spectrum Tarantula Rank Elements t1 t2 t3 ep ef np nf s1
2 0.00 9 s2
2 0.00 9 s3
2 0.00 9 s4
2 0.00 9 s5
2 0.00 9 s6
1 1 0.33 4 s7 (faulty)
1 1.00 1 s8
1 1 0.33 4 s9
2 0.50 2 Result P F F
ef ef +nf ep ep+np + ef ef +nf
2ef 2ef + nf + ep
2ef ef + nf + ep
ef nf + ep
1 2( ef ef + nf + ef ef + ep )
ef ef + nf + ep + np
ef + np − nf − ep ef + nf + ep + np
ef + np ef + nf + ep + np
2(ef + np) 2(ef + np) + ep + nf
ef ef + np + 2(ep + nf)
ef + np nf + ep
ef ef +nf ep ep+np + ef ef +nf
Op1 = ( −1 if nf > 0 np
Op2 = ef − ep ep + np + 1
Tarantula =
ef ef +nf ep ep+np + ef ef +nf
Jaccard = ef ef + nf + ep
Ochiai = ef p (ef + nf) · (ef + ep)
AMPLE = | ef ef + nf − ep ep + np |
Wong1 = ef Wong2 = ef − ep
Wong3 = ef − h, h = ep if ep ≤ 2 2 + 0.1(ep − 2) if 2 < ep ≤ 10 2.8 + 0.001(ep − 10) if ep > 10
❖ Assumes that the developer
checks the ranking from top to bottom
❖ The higher the faulty statement
is ranked, the earlier the fault is found
Formula X Formula Y
SX
B
SY
B
SX
F
SX
A
SY
A
SY
F Statement Ranking
E(τ, p, b) = Ranking of b according to τ Number of statements in p ∗ 100
❖ When a statement is executed by a failing test, we
❖ Ideally, we want the failing test to only execute the
❖ Practically, we want the subset of test runs that gives us
PTi(B(sj)) = τ(sj|Ti) Pm
j=1 τ(sj|Ti)
Convert suspiciousness into probability
HTi(S) = −
m
X
j=1
PTi(B(sj)) · log PTi(B(sj))
Compute the Shannon Entropy
PTi+1(B(sj)) = PTi+1(B(sj)|F(ti+1)) · α + PTi+1(B(sj)|¬F(ti+1)) · (1 − α)
Assuming the failure rate observed so far, compute lookahead P We can predict the information gain of a test case!
grep, v3, F_KP_3
0.5 1.0 Suspiciousness
FLINT TCP Random
20 40 60 80 100
Percentage of Executed Tests −20 −10 10 Expense Reduction
flex, v5, F_JR_2
1.0 Suspiciousness
FLINT TCP Random
20 40 60 80 100
Percentage of Executed Tests −15 −5 5 10 Expense Reduction
flex, v5, F_AA_4
1.0 Suspiciousness
FLINT TCP Random
20 40 60 80 100
Percentage of Executed Tests −20 10 20 30 40 Expense Reduction
sed, v2, F_AG_19
1.0 Suspiciousness
FLINT TCP Random
20 40 60 80 100
Percentage of Executed Tests −40 −20 10 Expense Reduction
❖ Probabilistic view works! Even when there are some
❖ Software artefacts tend to exhibit continuity (e.g.
❖ Various empirical study established partial rankings
❖ Then a theoretical study proved the dominance between
Aside: we also automatically evolved formulas using GP, which we then proved cannot be bettered by humans. So technically machines arrived twice.
❖ Qi et al. took a backward
approach
❖ Use suspicious score as weights
to mutate program states until Genetic Programming can repair the fault.
❖ The better the localisation, the
quicker the repair will be found.
❖ Theory says Jaccard formula is worse than Op2. ❖ But machines found it much easier to repair programs
❖ Why?
❖ Expense metric assumes linear
consumption of the result (i.e. developer checks statements following the ranking).
❖ GP consumes raw
suspiciousness numbers, which is a much richer source of information. <
Same ranking, completely different amount of information.
❖ Following the way we
predicted information yield, we should be able to describe the true fault locality as a probability distribution.
❖ Subsequently, measure the
cross-entropy between the true distribution and one generated by any technique.
DKL(PL||Pτ) = X
i
ln PL(si) Pτ(si) PL(si) technique, L, that can always pinpoint sf as follows: L(si) = ⇢ 1 (si = sf) ✏ (0 < ✏ ⌧ 1, si 2 S, si 6= sf) that we can convert outputs of FL techniques that
Pτ(si) = ⌧(si) Pn
i=1 ⌧(si), (1 ≤ i ≤ n)
erts suspiciousness scores given by any
Locality Information Loss (LIL) defined with Kullback-Leibler divergence
Jaccard (LIL=4.92) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness MUSE (LIL=0.40) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness Ochiai (LIL=5.96) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness Op2 (LIL=7.34) Executed Statements 0.0 0.4 0.8 Faulty Statement Suspiciousness
❖ Entropy measures are much richer than simply counting
❖ Cross-entropy is a vastly underused tool in software
Spectra Based Fault Localisation
ef − ep ep + np + 1
Formula (Suspiciousness) Ranking Program Tests Spectrum
Higher ranking = Fewer statements to check
grep, v3, F_KP_3
0.5 1.0 Suspiciousness
FLINT TCP Random
20 40 60 80 100
Percentage of Executed Tests −20 −10 10 Expense Reduction