Comparative Causality: Explaining the Differences Between - - PowerPoint PPT Presentation
Comparative Causality: Explaining the Differences Between - - PowerPoint PPT Presentation
Comparative Causality: Explaining the Differences Between Executions William N. Sumner Xiangyu Zhang {wsumner,xyzhang} @ cs.purdue.edu ICSE 2013 22 May 2013 Background Debugging requires understanding how a program behaves. Background
Background
Debugging requires understanding how a program behaves.
Background
Debugging requires requires understanding how a program behaves.
- Which statements are buggy and how?
Background
Debugging requires requires understanding how a program behaves.
- Which statements are buggy and how?
- How does a bug/fault lead to a failure?
Background
Debugging requires requires understanding how a program behaves.
- Which statements are buggy and how?
- How does a bug/fault lead to a failure?
- What might possible fixes be?
Background
Debugging requires requires understanding how a program behaves.
- Which statements are buggy and how?
- How does a bug/fault lead to a failure?
- What might possible fixes be?
These questions need an explanation for a bug.
Background
Explaining a bug (fault → failure)
inventory = [(Shoes,5); (Hats,0); (Ties,1)] bought = 0 for (item, available) in inventory: if bought < 3 and available >= 0: buy(item) bought += 1 print “Items bought: “, bought
Should print “Items bought: 2” Failure: prints “Items bought: 3”
Background
Explaining a bug (fault → failure)
inventory = [(Shoes,5); (Hats,0); (Ties,1)] bought = 0 for (item, available) in inventory: if bought < 3 and available >= 0: buy(item) bought += 1 print “Items bought: “, bought
Should print “Items bought: 2” Failure: prints “Items bought: 3”
BUG: Should be >
Background
Explaining a bug (fault → failure)
inv = [(S,5); (H,0); (T,1)] bt = 0 1)for (itm, av) in inv: 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt 1) for (itm, av) = (S,5): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (H,0): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (T,1): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt
Program Trace
Background
Explaining a bug (fault → failure)
1) for (itm, av) = (S,5): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (H,0): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (T,1): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt
Trace
A faulty branch is taken at A A
Background
Explaining a bug (fault → failure)
1) for (itm, av) = (S,5): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (H,0): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (T,1): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt
Trace
A faulty branch is taken at so bt is given the faulty value 2 at B A A B
Background
Explaining a bug (fault → failure)
1) for (itm, av) = (S,5): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (H,0): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (T,1): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt
Trace
A faulty branch is taken at so bt is given the faulty value 2 at so bt is given the faulty value 3 at B A C A B C
Background
Explaining a bug (fault → failure)
1) for (itm, av) = (S,5): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (H,0): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (T,1): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt
Trace
A faulty branch is taken at so bt is given the faulty value 2 at so bt is given the faulty value 3 at so '3' is printed erroneously at B A C D A B C D
Background
Explaining a bug (fault → failure)
1) for (itm, av) = (S,5): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (H,0): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 1) for (itm, av) = (T,1): 2) if bt < 3 and av >= 0: 3) buy(itm) 4) bt += 1 5)print bt
Trace
A B C D
an explanation
Existing Approaches
Dynamic Slicing [Agrawal PLAN90]
- Too large & unwieldy in practice
Existing Approaches
Dynamic Slicing [Agrawal PLAN90]
- Too large & unwieldy in practice
State Replacement
- What faulty state can reproduce a failure?
Existing Approaches
Dynamic Slicing [Agrawal PLAN90]
- Too large & unwieldy in practice
State Replacement
- What faulty state can reproduce a failure?
– Cause Effect Chains [Zeller FSE02] – Causal Paths [Sumner FASE09,FSE10]
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 2 z = 1
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 2 z = 1
?
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 2 z = 1
?
Trial
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 2 z = 1 {y=4, z=3} {y=4} {z=3}
?
Trial
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 2 z = 1 {y=4, z=3} {y=4} {z=3}
?
Trial
Blame smallest set possible
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 2 z = 1
?
Trial
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 4 z = 1 y = 4
?
Trial
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 4 z = 1 y = 4
?
Trial
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
x = 5 y = 4 z = 3 x = 5 y = 4 z = 1 y = 4
?
y=4 is responsible
y = 4
Trial
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
y = 4
Why was y=4?
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
y = 4 y = ...
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
a = 1 b = 2 c = 1 a = 1 b = 1 c = 0 y = 4
?
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
a = 1 b = 2 c = 1 a = 1 b = 2 c = 1 y = 4 b = 2 c = 1
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
a = 1 b = 2 c = 1 a = 1 b = 2 c = 1 y = 4 b = 2 c = 1 y = 4
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
y = 4 b = 2 c = 1
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
y = 4 b = 2 c = 1
Proceed until no differences
Causal Paths [FASE09, FSE10]
Reproduce the failure in a correct run: Buggy Correct
y = 4 b = 2 c = 1
Line 23 Line 42
An explanation
- f the bug!
Causal Paths [FASE09, FSE10]
Problems?
- Identifying where to compare. [Sumner FASE09]
- Identifying & replacing state. [Sumner FSE10]
Causal Paths [FASE09, FSE10]
Problems?
- Identifying where to compare. [Sumner FASE09]
- Identifying & replacing state. [Sumner FSE10]
- Replacing state confounds the explanation.
– Arbitrary replacement can yield arbitrary results.
Causal Paths [FASE09, FSE10]
Problems?
- Identifying where to compare. [Sumner FASE09]
- Identifying & replacing state. [Sumner FSE10]
- Replacing state confounds the explanation.
– Arbitrary replacement can yield arbitrary results.
- Not helpful with execution omission
– A failure isn't just bad behavior, it is missing good
behavior
Confounding
Which state should we blame? Recall:
?
Trial
Confounding
Which state should we blame? Recall:
?
Trial
Confounding
Which state should we blame? Recall:
x = 5 y = 4 z = 3 x = 5 y = 4 z = 1 y = 4
?
Trial
Confounding
Which state should we blame? Recall:
x = 5 y = 4 z = 3 x = 5 y = 4 z = 1 y = 4
?
Trial
Confounding
Which state should we blame? Recall:
x = 5 y = 4 z = 3 x = 5 y = 4 z = 1 y = 4
?
Trial
What does this patched run even mean?
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Correct Buggy Trial
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
What should we blame here?
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Trial Correct Buggy
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Trial Correct Buggy
y ← 2 x ← 1 y ← 2 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Trial Correct Buggy
x ← 1 y ← 2 z ← 3 x ← 1 y ← 2 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Trial Correct Buggy
x ← 1 y ← 2 z ← 3 if True: y ← 5 print(5) x ← 1 y ← 2 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Trial Correct Buggy
x ← 1 y ← 2 z ← 3 if True: y ← 5 print(5) x ← 1 y ← 2 z ← 3 if False: else: y ← 2 print(2)
Example – Altered Meaning
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3)
Trial
- New control flow unlike original runs
- Occurs in large portion of real bugs
Correct Buggy
Confounding of Explanations
Behavior not found in original executions:
- includes irrelevant information
- excludes necessary information
Confounding of Explanations
Behavior not found in original executions:
- includes irrelevant information
- excludes necessary information
Solution: Dual Slicing
- Identify & extract execution differences
relevant to the failure
– Those that differ across executions
Confounding of Explanations
Behavior not found in original executions:
- includes irrelevant information
- excludes necessary information
Solution: Dual Slicing
- Identify & extract execution differences
relevant to the failure
– Those that differ across executions
- Run trials on the extracted program
Dual Slicing
- A slice of two executions at once
– Includes dependences that differ across executions – Skips dependences that are the same
1)x ← 1 2)y ← 1 3)print(x+y) 1)x ← 0 2)y ← 1 3)print(x+y)
Dual Slicing
- A slice of two executions at once
– Includes dependences that differ across executions – Skips dependences that are the same
1)x ← 1 2)y ← 1 3)print(x+y) 1)x ← 0 2)y ← 1 3)print(x+y) 3 2 1
1
Dual Slicing
- A slice of two executions at once
– Includes dependences that differ across executions – Skips dependences that are the same
1)x ← 1 2)y ← 1 3)print(x+y) 1)x ← 0 2)y ← 1 3)print(x+y) 3 2 1 3 2 1
1 1 1
Dual Slicing
- A slice of two executions at once
– Includes dependences that differ across executions – Skips dependences that are the same
1)x ← 1 2)y ← 1 3)print(x+y) 1)x ← 0 2)y ← 1 3)print(x+y) 3 2 1 3 2 1 3 2 1
1 1 1
Dual Slicing
- A slice of two executions at once
– Includes dependences that differ across executions – Skips dependences that are the same
1)x ← 1 2)y ← 1 3)print(x+y) 1)x ← 0 2)y ← 1 3)print(x+y) 3 2 1 3 2 1 3 2 1
1 1 1 1
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Dual Slicing
- Identify differences affecting the failure
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3) 7
Correct Buggy
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Dual Slicing
- Identify differences affecting the failure
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3) 7 6
Correct Buggy
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Dual Slicing
- Identify differences affecting the failure
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3) 7 6 2
Correct Buggy
x ← 1 y ← 1 z ← 3 if False: else: y ← 2 print(2)
Dual Slicing
1)x ← input() 2)y ← input() 3)z ← input() 4)if y>1 & z<6: 5) y ← 5 6)else: y ← y+1 7)print(y) x ← 0 y ← 2 z ← 6 if False: else: y ← 3 print(3) 7 6 2 2)y ← input() 6)y ← y+1 7)print(y)
Extract
Correct Buggy
y ← 1 y ← 2 print(2)
Example – Extracted Meaning
y ← 2 y ← 3 print(3)
Trial
2)y ← input() 6)y ← y+1 7)print(y)
Correct Buggy
y ← 2 y ← 2 y ← 2 print(2)
Example – Extracted Meaning
y ← 2 y ← 3 print(3)
Trial
2)y ← input() 6)y ← y+1 7)print(y)
Correct Buggy
y ← 3 print(3) y ← 2 y ← 2 y ← 2 print(2)
Example – Extracted Meaning
y ← 2 y ← 3 print(3)
Trial
2)y ← input() 6)y ← y+1 7)print(y)
Trial can now correctly blame y
Correct Buggy
Data Confounding
1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y])
Trial Correct Buggy
- Control flow is not the only source of
confounding
x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2]) 2 x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1
x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2])
Data Confounding
What should we blame here?
1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y])
Trial Correct Buggy
2 x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1
x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2])
Data Confounding
1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y])
Trial Correct Buggy
2 x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1
Data Confounding
1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y]) x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2])
Trial Correct Buggy
y ← 2 z ← 2 2 x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1
Data Confounding
x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y]) x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2])
Trial Correct Buggy
y ← 2 z ← 2 x[2] ← 5 print(x[2]) 2 1 5
Data Confounding
x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y]) x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2])
Trial Correct Buggy
y ← 2 z ← 2 x[2] ← 5 print(x[2]) 2 1 5
Data Confounding
x ← … y ← 1 z ← 2 x[2] ← 5 print(x[1]) 1)x ← [0,1,2,3] 2)y ← input() 3)z ← input() 4)x[z] ← 5 5)print(x[y]) x ← … y ← 2 z ← 3 x[3] ← 5 print(x[2])
Trial Correct Buggy
y ← 2 z ← 2 x[2] ← 5 print(x[2]) 2 1 5
- Either new control flow or new data flow can
cause confounding.
- Removing them is crucial.
Execution Omission
- A failure is not just incorrect behavior, it is
missing correct behavior.
Execution Omission
- A failure is not just incorrect behavior, it is
missing correct behavior.
– Also known as execution omission – Cannot be explained by reproducing faulty behavior
Execution Omission
x ← 0 y ← 2 if False: print('*') x ← 4 y ← 5 if True: print(5) Print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 0 y ← 2 if False: print('*')
What should we blame here?
1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
x ← 0 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 0 y ← 5 x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
Execution Omission
1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*') x ← 0 y ← 2 if False: print('*') x ← 0 y ← 5 if False: print('*') x ← 0 y ← 5 if True: print(5) Print('*')
- x alone reproduces the failure!
- Does x alone explain the bug?
Correct Buggy
Execution Omission
1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*') x ← 0 y ← 2 if False: print('*') x ← 0 y ← 5 if False: print('*') x ← 0 y ← 5 if True: print(5) Print('*')
- x alone reproduces the failure!
- Does x alone explain the bug?
– Can you fix the bug by only fixing x?
Correct Buggy
Execution Omission
1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*') x ← 0 y ← 2 if False: print('*') x ← 0 y ← 5 if False: print('*') x ← 0 y ← 5 if True: print(5) Print('*')
- x alone reproduces the failure!
- Does x alone explain the bug?
– Can you fix the bug by only fixing x?
Correct Buggy
We can run a symmetric trial to find out!
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 4 y ← 2 x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
x ← 4 y ← 5 if True: print(5) print('*')
Execution Omission
x ← 4 y ← 2 if True: print(2) print('*') x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
- Fixing x does not fix the missing behavior!
x ← 4 y ← 5 if True: print(5) print('*')
Execution Omission
x ← 4 y ← 2 if True: print(2) print('*') x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
- Fixing x does not fix the missing behavior!
- x alone does not explain the bug.
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
What if we try both x and y?
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
What if we try both x and y?
x ← 4 y ← 5 if True: print(5) Print('*')
Execution Omission
x ← 4 y ← 5 x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
x ← 4 y ← 5 if True: print(5) print('*')
Execution Omission
x ← 4 y ← 5 if True: print(5) print('*') x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
- Fixing x and y together can fix the bug
- Fixing x and y together can fix the bug
x ← 4 y ← 5 if True: print(5) print('*')
Execution Omission
x ← 4 y ← 5 if True: print(5) print('*') x ← 0 y ← 2 if False: print('*') 1) x ← input() 2) y ← input() 3) if x > 3: 4) print(y) 5) print('*')
Correct Buggy
Symmetric trials at each step 1) 2) explain the missed behavior, too.
Comparative Causality
Explaining a failure:
- Reproducing the failure is not enough.
- Requires explaining why both executions
differ from each other
Comparative Causality
Explaining a failure:
- Reproducing the failure is not enough.
- Requires explaining why both executions
differ from each other
- Dual slicing ensures
– That we compare behaviors from the two executions
Comparative Causality
Explaining a failure:
- Reproducing the failure is not enough.
- Requires explaining why both executions
differ from each other
- Dual slicing ensures
– That we compare behaviors from the two executions
- Symmetric comparison explains
– Why the buggy execution did something wrong – Why the buggy execution didn't do something right
Real Results
- Implemented with LLVM
- 20 KLOC
- Automatically explains bugs in C programs.
– 20kloc - 400kloc – 400kinst – 2.24minst
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
Precise reasoning is more efficient! 3 min 14 min
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
125 trials 1100 trials
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
35 stmts 20 stmts
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
35 stmts
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
27% explained 73% explained
Real Results
Program Size CC Old Trials Time Stmts Root Trials Time Stmts Root Precision Recall find 73k 15 12 6 X 1260 253
- gnuplot
144k 33 44 10 X 469 141 10 X 1 1 gnuplot 139k 323 200 48 X 208 51
- gnuplot
134k 337 961 129
- 1888 950
121
- 0.97
0.91 gnuplot 134k 130 140 33
- 3012 931
38
- 0.87
1 grep 12k 186 114 62
- 1012 8263 23
- 0.96
0.35 grep 12k 327 156 69
- 1734 183
32
- 1
0.46 grep 12k 78 49 27 X 1546 168 23
- 0.96
0.81 make 30k 62 342 27 X 543 416 17
- 1
0.63 tar 20k 8 22 3 X 221 50 3 X 1 1 tar 24k 125 124 48 X 332 110
- tar
20k 121 53 20 X 296 66
- tar
20k 28 43 10 X 2117 439 5
- 1
0.5 tar 21k 87 80 23 X 709 165 15 X 0.73 0.48 tar 21k 15 22 4 X 1283 228 4 X 1 1 Avg. 125 157 34 1109 828 20 0.22 0.26
Real Results
Program Size #Inst CC Slicing Stmts Stmts find 73k 481K 6 185 gnuplot 144k 461K 10 148 gnuplot 139k 1.34M 48 464 gnuplot 134k 2.18M 129 368 gnuplot 134k 2.19M 33 237 grep 12k 415K 62 153 grep 12k 434K 69 109 grep 12k 466K 27 95 make 30k 2.24M 27 38 tar 20k 1.22M 3 3 tar 24k 962K 48 61 tar 20k 1.11M 20 1239 tar 20k 1.12M 10 1270 tar 21k 1.26M 23 25 tar 21k 1.19M 4 557 Avg. 34 330
35 stmts 330 stmts
Program Size #Inst Lucid Slicing Stmts Stmts find 73k 481K 6 185 gnuplot 144k 461K 10 148 gnuplot 139k 1.34M 48 464 gnuplot 134k 2.18M 129 368 gnuplot 134k 2.19M 33 237 grep 12k 415K 62 153 grep 12k 434K 69 109 grep 12k 466K 27 95 make 30k 2.24M 27 38 tar 20k 1.22M 3 3 tar 24k 962K 48 61 tar 20k 1.11M 20 1239 tar 20k 1.12M 10 1270 tar 21k 1.26M 23 25 tar 21k 1.19M 4 557 Avg. 34 330
Real Results
More than an order
- f magnitude
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup()
Symptom:
Nothing extracted from archive when backing up existing files.
Why?
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup() 8
extract undo backup
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup() 8
extract undo backup
7
status = -1 status = 0
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup() 8
extract undo backup
7 6
status = -1 status = 0 status = 0 status = -1
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup() 8
extract undo backup
7 6 2
status = -1 status = 0 status = 0 status = -1 status = -1 status = 0
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup() 8
extract undo backup
7 6 2 1
status = -1 status = 0 status = 0 status = -1 status = -1 status = 0 name=”dir” name=”dir2”
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup()
Symptom:
Nothing extracted from archive when backing up existing files.
Why?
Because an existing directory sets a failure status the undoes extraction.
Explaining a Bug (from tar)
int read_header() 1) name ← input() int extract_dir() 2) status ← mkdir(name) 3) if status 4) if !is_dir(name): 5) error() 6) return status void extract_archive() 7) status = extract_dir(name) 8) if status: 9) undo_last_backup()
Symptom:
Nothing extracted from archive when backing up existing files.
Why?
Because an existing directory sets a failure status the undoes extraction. else: status = 0
Limitations
- Needs a deterministic, reproducible failure
- Depends on the correct run
- Requires a model of external state, I/O
Related Work
- Delta debugging
[Zeller FSE'02]
- Fault Localization
[Jones ASE'05, Liblit PLDI'04]
- Tests for Localization
[Artzi ICSE'10, Rößler ISSTA'12]
- Dynamic slicing
[Zhang PLDI'04]
- Constraint Comparison
[Qi FSE'09]
- Identifying repair candidates
[Chandra ICSE'11,Jeffrey ISSTA'08]
Future Work
Comparative Causality is a general framework for explaining why executions differ.
- Debugging
- Program understanding
- Reverse engineering
Conclusions
- Failures can be explained by explaining why
executions differ.
Conclusions
- Failures can be explained by explaining why
executions differ.
- Executing the dual slice guards against the
effects of confounding from state replacement.
Conclusions
- Failures can be explained by explaining why
executions differ.
- Executing the dual slice guards against the
effects of confounding from state replacement.
- Symmetric causality testing explains
– Observed incorrect behaviors – Missing correct behaviors
Conclusions
- Failures can be explained by explaining why
executions differ.
- Executing the dual slice guards against the
effects of confounding from state replacement.
- Symmetric causality testing explains
– Observed incorrect behaviors – Missing correct behaviors
- Comparative Causality yields precise and