Program Verification via Machine Learning
Aditya V. Nori
Programming Languages and Tools group Microsoft Research India
Joint work with Alex Aiken, Rahul Sharma (Stanford University)
Program Verification via Machine Learning Aditya V. Nori - - PowerPoint PPT Presentation
Program Verification via Machine Learning Aditya V. Nori Programming Languages and Tools group Microsoft Research India Joint work with Alex Aiken, Rahul Sharma (Stanford University) Software validation problem I hope some hacker cannot
Aditya V. Nori
Programming Languages and Tools group Microsoft Research India
Joint work with Alex Aiken, Rahul Sharma (Stanford University)
I hope this version still interoperates with
I hope some hacker cannot steal all my money, publish all my email on the web! I hope it doesn’t crash! I hope it can handle my peak transaction load!
a test that violates the assertion
If we view testing as a “black-box” activity, Dijkstra is right!
After executing many tests, we still don’t know if there is another test that can violate the assertion
6
C#, Java …)
for some input 𝐽 such that the assertion is violated?
1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert(y == 0);
𝑞𝑑 = 2 ⇒ 𝑦 = 𝑧 𝑞𝑑 = 6 ⇒ 𝑦 = 0 ∧ 𝑧 = 0 𝑞𝑑 = 3 ⇒ 𝑦 = 𝑧 𝑞𝑑 = 4 ⇒ 𝑦 = 𝑧 𝑞𝑑 = 5 ⇒ 𝑦 = 𝑧
1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0); 1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }
Safet ety
Is the assertion satisfied for all possible inputs?
Terminati rmination
Does gcd terminate for all inputs 𝑦, 𝑧?
Safety ety
Termin rminat ation ion
program
Guess Check
Analyze data to infer 𝜌
using PA
generate 𝑢
1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert(y == 0);
𝑞𝑑 = 2 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 6 ⇒ 𝑦 = 0 ∧ 𝑧 = 0
𝑞𝑑 = 3 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 4 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 5 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1
executing a command 𝑑 in a state satisfying 𝐵 leads to a state that satisfies 𝐶, or 𝑑 does not terminate 𝐵 is the precondition 𝐶 is the postcondition
𝑧 ≤ 𝑦 𝑨 ≔ 𝑦; 𝑨 ≔ 𝑨 + 1 𝑧 < 𝑨 Hoare triple or Hoare assertion
1. Run program starting from all states satisfying 𝐵
⊢𝐵∧𝐶 , ⊢𝐵⇒𝐶 ⊢𝐵 ⊢𝐶
⊢ 𝐵 𝑑1 𝐶 ⊢ 𝐶 𝑑2{𝐷} ⊢ 𝐵 𝑑1;𝑑2{𝐷}
⊢ 𝐵∧𝑐 𝑑1 𝐶 ⊢{𝐵∧¬𝑐} ⊢ 𝐵 𝑗𝑔 𝑐 𝑢ℎ𝑓𝑜 𝑑1𝑓𝑚𝑡𝑓 𝑑2{𝐶}
⊢ 𝐵∧𝑐 𝑑{𝐵} ⊢ 𝐵 𝑥ℎ𝑗𝑚𝑓 𝑐 𝑒𝑝 𝑑{𝐵∧¬𝑐}
1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert(y == 0);
𝑞𝑑 = 2 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 6 ⇒ 𝑦 = 0 ∧ 𝑧 = 0
𝑞𝑑 = 3 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 4 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 5 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1
1 ∧ 𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)
1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 ≠ 0
𝐵 𝐶
1: x = y = 0; 2: if (*) 3: x++; y++; 4: 5: if (x != 0) 6: x--; y--; 7: if (x == 0) 8: assert (y == 0);
Pudlák[‘97], McMillan[‘05], …)
classifier
Training data Training
Interpolants as Classifiers. Sharma, Nori, Aiken, Computer- Aided Verification (CAV 2012)
program
Classifier Check
𝜌 𝑢
counterexamples to 𝑢
1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);
𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧ 𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)
𝑧2 = 𝑧 − 1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 ≠ 0
1: x = y = 0; 2: if (*) 3: x++; y++; 4: 5: if (x != 0) 6: x--; y--; 7: if (x == 0) 8: assert (y == 0);
𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧ 𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)
𝑧2 = 𝑧 − 1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 ≠ 0
x y (0,0) (1,1)
𝐶𝑏𝑡𝑗𝑑(𝐵, 𝐶) 𝑤𝑏𝑠𝑡 := Common variables of 𝐵 and 𝐶; Add 𝑇𝑏𝑛𝑞𝑚𝑓𝑡(𝑤𝑏𝑠𝑡, 𝐵) to 𝑌+; Add 𝑇𝑏𝑛𝑞𝑚𝑓𝑡(𝑤𝑏𝑠𝑡, 𝐶) to 𝑌−; 𝑡𝑓𝑞 := 𝐶𝑗𝑜𝑏𝑠𝑧𝐷𝑚𝑏𝑡𝑡𝑗𝑔𝑓𝑠(𝑌+, 𝑌−); ℎ ≔𝐷𝑝𝑜𝑢𝑏𝑗𝑜𝑗𝑜𝑄𝑠𝑓𝑒(𝑡𝑓𝑞, 𝑌+); 𝑠𝑓𝑢𝑣𝑠𝑜 ℎ
x y
x y (0,0) (1,1) (0,1) (1,0)
𝐽𝑜𝑢𝑓𝑠𝑞𝑝𝑚𝑏𝑜𝑢(𝐵, 𝐶) (𝑌+, 𝑌−) = 𝐽𝑜𝑗𝑢(𝐵, 𝐶) while(true) { 𝐼 = 𝐶𝐷𝐽(𝑌+, 𝑌−) Find candidate interpolant if (𝑇𝐵𝑈 𝐵 ∧ ¬𝐼 ) 𝐵 ⇒ 𝐽 Add 𝑡 to 𝑌+and continue; if (𝑇𝐵𝑈 𝐶 ∧ ¬𝐼 ) 𝐽 ∧ 𝐶 =⊥ Add 𝑡 to 𝑌−and continue; break; Exit if interpolant found } return 𝐼;
Theorem: 𝐽𝑜𝑢𝑓𝑠𝑞𝑝𝑚𝑏𝑜𝑢(𝐵, 𝐶) terminates only if output 𝐼 is an interpolant between 𝐵 and 𝐶
Trace = < 1,2,3,4,5 > 𝐵 = 𝑡𝑧𝑛𝐹𝑦𝑓𝑑(< 1,2,3 >) 𝐶 = 𝑡𝑧𝑛𝐹𝑦𝑓𝑑(< 4 >) 𝐵 ≡ 𝑦1 = sin2 𝑨 ∧ 𝑧 = cos2 𝑨 𝐶 ≡ 𝑦 = 2 ∧ 𝑧 ≠ 2 𝑦 + 𝑧 = 4
void foo() { 1: z = nondet(); 2: x = 4 * sin(z) * sin(z); 3: y = 4 * cos(z) * cos(z); 4: assert(x != 2 || y == 2) } void foo() { 1: assume(x+y == 4) 2: assert(x != 2 || y == 2) }
Progr gram LOC Int Interpolant #T #Tes ests ts Time (s (s) f1a 20 𝑦 = 𝑧 12 0.017 ex1 22 𝑦 + 2𝑧 ≥ 0 13 0.019 f2 18 3𝑦 ≥ 𝑧 13 0.021 nec1 17 𝑦 ≤ 8 19 0.015 nec2 22 𝑦 < 𝑧 12 0.014 nec3 15 𝑧 ≤ 9 11 0.014 nec4 22 𝑦 = 𝑧 20 0.019 nec5 9 𝑡 ≥ 0 11 0.013 pldi08 10 𝑦 < 0 ∨ 𝑧 > 0 17 0.02 fse06 8 𝑧 ≥ 0 ∧ 𝑦 ≥ 0 11 0.014
1 2 3
program
PAC Learner Check
𝜌 𝑢
Program Verification as Learning Geometric Concepts. Sharma, Gupta, Hariharan, Aiken, Nori. Static Analysis Symposium (SAS 2013) A Data Driven Approach for Algebraic Loop Invariants. Sharma, Gupta, Hariharan, Aiken, Liang, Nori. European Symposium on Programming (ESOP 2013)
Termination proofs from tests. Nori, Sharma. Foundations of Software Engineering (FSE 2013)
program
Regression Check
𝜌 𝑢
safety checker
1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }
𝑦, 𝑧 = { 1,2 , 2,1 , 1,3 , 3,1 }
1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: // instrumented code 5: a = x; b = y; c = 0; 6: while (x !=y ) { 7: // instrumented code 8: c = c+1; 9: writeLog(a, b, c, x, y); 10: if (x > y) x = x-y; 11: if (y > x) y = y-x; 12: } 13: return x; 14: }
𝑡. 𝑢. 𝐵𝑥 ≥ 𝐷
1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: a = x; b = y; c = 0; 5: while (x !=y ) { 6: // annotation 7: c = c+1; 8: assert(c <= a+b-2); 9: if (x > y) x = x-y; 10: if (y > x) y = y-x; 11: } 12: return x; 13: }
𝑟𝑣𝑏𝑒𝑞𝑠𝑝 𝐵𝑈 ∗ 𝐵, −𝐵𝑈 ∗ 𝐷, −𝐵, −𝐷
Octanal distribution Driver distribution Poly distribution
Driver LOC #Loops Gue uess (s (s) Che heck (s (s) TpT (s (s) kbfiltr 0.9K 2 0.001 8.8 8.8 diskperf 2.3K 4 0.001 41.8 41.8 fakemodem 3.1K 3 0.001 2841.7 2841.7 serenum 5.3K 17 0.04 2081.3 2081.3 flpydisk 6K 24 0.04 305.4 305.4 kbdclass 6.5K 16 0..05 1822.3 1822.4