Program Verification via Machine Learning Aditya V. Nori - - PowerPoint PPT Presentation

program verification
SMART_READER_LITE
LIVE PREVIEW

Program Verification via Machine Learning Aditya V. Nori - - PowerPoint PPT Presentation

Program Verification via Machine Learning Aditya V. Nori Programming Languages and Tools group Microsoft Research India Joint work with Alex Aiken, Rahul Sharma (Stanford University) Software validation problem I hope some hacker cannot


slide-1
SLIDE 1

Program Verification via Machine Learning

Aditya V. Nori

Programming Languages and Tools group Microsoft Research India

Joint work with Alex Aiken, Rahul Sharma (Stanford University)

slide-2
SLIDE 2

Software validation problem

Does the software work?

I hope this version still interoperates with

  • ther software!

I hope some hacker cannot steal all my money, publish all my email on the web! I hope it doesn’t crash! I hope it can handle my peak transaction load!

slide-3
SLIDE 3
slide-4
SLIDE 4

Possible solution: Testing

  • The “old-fashioned” and practical method
  • f validating software
  • Generate test inputs and see if we can find

a test that violates the assertion

slide-5
SLIDE 5

What’s wrong with testing?

If we view testing as a “black-box” activity, Dijkstra is right!

After executing many tests, we still don’t know if there is another test that can violate the assertion

slide-6
SLIDE 6

6

Program verification

The algorithmic discovery of properties of a program by inspection of the source text

  • Manna and Pnueli, “Algorithmic Verification”

Also known as: static analysis, static program analysis, formal methods, ….

slide-7
SLIDE 7

The problem

  • Given
  • a sequential program 𝑄 with input 𝐽 (say, written in C,

C#, Java …)

  • an assertion “𝑏𝑡𝑡𝑓𝑠𝑢(𝑓)” (or a set of assertions)
  • Questions
  • Bug: Does there exist an execution of the program 𝑄

for some input 𝐽 such that the assertion is violated?

  • Proof: Does the assertion hold for all possible inputs?
slide-8
SLIDE 8

Proving correctness

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert(y == 0);

𝑞𝑑 = 2 ⇒ 𝑦 = 𝑧 𝑞𝑑 = 6 ⇒ 𝑦 = 0 ∧ 𝑧 = 0 𝑞𝑑 = 3 ⇒ 𝑦 = 𝑧 𝑞𝑑 = 4 ⇒ 𝑦 = 𝑧 𝑞𝑑 = 5 ⇒ 𝑦 = 𝑧

slide-9
SLIDE 9

Program verification

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0); 1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }

Safet ety

Is the assertion satisfied for all possible inputs?

Terminati rmination

  • n

Does gcd terminate for all inputs 𝑦, 𝑧?

slide-10
SLIDE 10

Current state of the affairs

Safety ety

  • SLAM, Yogi (device

drivers)

  • ASTREE (avionics

software)

  • Technology: predicate

abstraction, abstract interpretation …

Termin rminat ation ion

  • Terminator (device

drivers)

  • O,P, LR, LF …
  • Technology: abstract

interpretation, transition invariants, ranking functions …

slide-11
SLIDE 11

Question

  • Most applications are associated with test suites,

primarily used for regression or random testing

  • Can we use these test suites for proving program

correctness?

slide-12
SLIDE 12

This talk

program

Guess Check

𝜌 𝑢

Analyze data to infer 𝜌

  • Validate 𝜌

using PA

  • Use failures to

generate 𝑢

  • Proving safety: Guess=Classification, 𝜌=loop invariant
  • Proving termination: Guess = Regression, 𝜌=loop bound
slide-13
SLIDE 13

Proving correctness

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert(y == 0);

𝑞𝑑 = 2 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 6 ⇒ 𝑦 = 0 ∧ 𝑧 = 0

Invariants riants

𝑞𝑑 = 3 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 4 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 5 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1

slide-14
SLIDE 14

Partial correctness of programs

  • 𝐵 𝑑{𝐶}:

executing a command 𝑑 in a state satisfying 𝐵 leads to a state that satisfies 𝐶, or 𝑑 does not terminate 𝐵 is the precondition 𝐶 is the postcondition

  • Example

 𝑧 ≤ 𝑦 𝑨 ≔ 𝑦; 𝑨 ≔ 𝑨 + 1 𝑧 < 𝑨 Hoare triple or Hoare assertion

slide-15
SLIDE 15

Checking assertions

  • Deciding 𝐵 𝑑 𝐶

1. Run program starting from all states satisfying 𝐵

  • 2. Check that each final state satisfies 𝐶
  • Is this possible?
slide-16
SLIDE 16

Derivations

  • ⊢ 𝐵: when we can prove assertion 𝐵
  • ⊢ 𝐵 𝑑 𝐶 : when we can prove/derive assertion

𝐵 𝑑{𝐶}

slide-17
SLIDE 17

Derivation rules for assertions

Natural deduction style axioms:

  • ⊢𝐵 ⊢𝐶

⊢𝐵∧𝐶 , ⊢𝐵⇒𝐶 ⊢𝐵 ⊢𝐶

,…

slide-18
SLIDE 18

Derivation rules for Hoare triples

  • ⊢ 𝐵 𝑑 𝐶 : if this can be derived using derivation

rules

  • One derivation rule for each command in the

language

  • Together with a rule of consequence

⊢𝐵′⇒𝐵 ⊢ 𝐵 𝑑 𝐶 ⊢𝐶⇒𝐶′ ⊢ 𝐵′ 𝑑{𝐶′}

slide-19
SLIDE 19

Derivation rules for Hoare logic

⊢ 𝐵 𝑡𝑙𝑗𝑞{𝐵} 

⊢ 𝐵 𝑑1 𝐶 ⊢ 𝐶 𝑑2{𝐷} ⊢ 𝐵 𝑑1;𝑑2{𝐷}

⊢ 𝐵∧𝑐 𝑑1 𝐶 ⊢{𝐵∧¬𝑐} ⊢ 𝐵 𝑗𝑔 𝑐 𝑢ℎ𝑓𝑜 𝑑1𝑓𝑚𝑡𝑓 𝑑2{𝐶}

⊢ 𝐵∧𝑐 𝑑{𝐵} ⊢ 𝐵 𝑥ℎ𝑗𝑚𝑓 𝑐 𝑒𝑝 𝑑{𝐵∧¬𝑐}

slide-20
SLIDE 20

Proving correctness

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert(y == 0);

𝑞𝑑 = 2 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 6 ⇒ 𝑦 = 0 ∧ 𝑧 = 0

Invariants riants

𝑞𝑑 = 3 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 4 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1 𝑞𝑑 = 5 ⇒ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1

slide-21
SLIDE 21

Example …

  • 𝐵 ≡ 𝑦1 = 0 ∧ 𝑧1 = 0 ∧ 𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦1 +

1 ∧ 𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)

  • 𝐶 ≡ 𝑗𝑢𝑓(𝑦 ≠ 0, 𝑦2 = 𝑦 − 1 ∧ 𝑧2 = 𝑧 −

1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 ≠ 0

𝐵 𝐶

1: x = y = 0; 2: if (*) 3: x++; y++; 4: 5: if (x != 0) 6: x--; y--; 7: if (x == 0) 8: assert (y == 0);

  • 𝐵 ∧ 𝐶 =⊥
  • 𝐽 𝑦, 𝑧 ≡ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1
slide-22
SLIDE 22

Interpolants (simple invariants)

  • 𝐵 = 𝑦 ≥ 𝑧
  • 𝐶 = 𝑧 ≥ 𝑦 + 1
  • 𝐽 = 2𝑦 + 1 ≥ 2𝑧

x y

  • 𝐵 ⇒ 𝐽
  • 𝐽 ∧ 𝐶 =⊥
  • 𝑤𝑏𝑠𝑡 𝐽 ⊆ 𝑤𝑏𝑠𝑡 𝐵 ∩ 𝑤𝑏𝑠𝑡 𝐶
slide-23
SLIDE 23

Existing work

  • Interpolants used in tools

BLAST, IMPACT …

  • Based on symbolic techniques

Interpolants from proofs (Krajícek[‘97],

Pudlák[‘97], McMillan[‘05], …)

Interpolants from constraint solving (Rybalchenko et al. [‘07])

slide-24
SLIDE 24

Interpolants (simple invariants)

  • 𝐵 = 𝑦 ≥ 𝑧
  • 𝐶 = 𝑧 ≥ 𝑦 + 1
  • 𝐽 = 2𝑦 + 1 ≥ 2𝑧

x y

classifier

slide-25
SLIDE 25

Binary classification

  • Input: a set of points 𝑌 with labels 𝑚 ∈ +1, −1
  • Goal: find a classifier 𝐷: X → {𝑢𝑠𝑣𝑓, 𝑔𝑏𝑚𝑡𝑓} such

that:

  • 𝐷 𝑏 = 𝑢𝑠𝑣𝑓, ∀𝑏 ∈ 𝑌 . 𝑚𝑏𝑐𝑓𝑚 𝑏 = +1, and
  • 𝐷 𝑐 = 𝑔𝑏𝑚𝑡𝑓, ∀𝑐 ∈ X . 𝑚𝑏𝑐𝑓𝑚 𝑐 = −1
slide-26
SLIDE 26

Binary classification

  • Input: a set of points 𝑌 with labels 𝑚 ∈ +1, −1
  • Goal: find a classifier 𝐷: X → {𝑢𝑠𝑣𝑓, 𝑔𝑏𝑚𝑡𝑓} such

that:

  • 𝐷 𝑏 = 𝑢𝑠𝑣𝑓, ∀𝑏 ∈ 𝑌 . 𝑚𝑏𝑐𝑓𝑚 𝑏 = +1, and
  • 𝐷 𝑐 = 𝑔𝑏𝑚𝑡𝑓, ∀𝑐 ∈ X . 𝑚𝑏𝑐𝑓𝑚 𝑐 = −1

Training data Training

Also, 𝐷 should be predictive

slide-27
SLIDE 27

Interpolants as classifiers

Interpolants as Classifiers. Sharma, Nori, Aiken, Computer- Aided Verification (CAV 2012)

program

Classifier Check

𝜌 𝑢

  • 𝐵 ⇒ 𝜌?
  • 𝜌 ∧ 𝐶 =⊥?
  • If check fails add

counterexamples to 𝑢

slide-28
SLIDE 28

Example

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);

slide-29
SLIDE 29

Example

  • 𝐵 ≡ 𝑦1 = 0 ∧ 𝑧1 = 0 ∧

𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧ 𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)

  • 𝐶 ≡ 𝑗𝑢𝑓(𝑦 ≠ 0, 𝑦2 = 𝑦 − 1 ∧

𝑧2 = 𝑧 − 1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 ≠ 0

  • 𝐵 ∧ 𝐶 =⊥, 𝐽 𝑦, 𝑧 ≡ 𝑦 = 𝑧

1: x = y = 0; 2: if (*) 3: x++; y++; 4: 5: if (x != 0) 6: x--; y--; 7: if (x == 0) 8: assert (y == 0);

slide-30
SLIDE 30

Example

  • 𝐵 ≡ 𝑦1 = 0 ∧ 𝑧1 = 0 ∧

𝑗𝑢𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧ 𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)

  • 𝐶 ≡ 𝑗𝑢𝑓(𝑦 = 0, 𝑦2 = 𝑦 − 1 ∧

𝑧2 = 𝑧 − 1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 ≠ 0

  • 𝐽 𝑦, 𝑧 ≡ 2𝑧 ≤ 2𝑦 + 1

x y (0,0) (1,1)

slide-31
SLIDE 31

The Basic algorithm

𝐶𝑏𝑡𝑗𝑑(𝐵, 𝐶) 𝑤𝑏𝑠𝑡 := Common variables of 𝐵 and 𝐶; Add 𝑇𝑏𝑛𝑞𝑚𝑓𝑡(𝑤𝑏𝑠𝑡, 𝐵) to 𝑌+; Add 𝑇𝑏𝑛𝑞𝑚𝑓𝑡(𝑤𝑏𝑠𝑡, 𝐶) to 𝑌−; 𝑡𝑓𝑞 := 𝐶𝑗𝑜𝑏𝑠𝑧𝐷𝑚𝑏𝑡𝑡𝑗𝑔𝑓𝑠(𝑌+, 𝑌−); ℎ ≔𝐷𝑝𝑜𝑢𝑏𝑗𝑜𝑗𝑜𝑕𝑄𝑠𝑓𝑒(𝑡𝑓𝑞, 𝑌+); 𝑠𝑓𝑢𝑣𝑠𝑜 ℎ

slide-32
SLIDE 32

Problems with Basic

1. Data is not linearly separable

  • 2. The candidate interpolant might not an

interpolant

x y

slide-33
SLIDE 33

No separating inequality?

  • For each 𝑦 ∈ 𝑌−

ℎ𝑦= 𝐶𝐷(𝑌+, {𝑦})

return 𝑦 ℎ𝑦

x y (0,0) (1,1) (0,1) (1,0)

𝐽 ≡ 2𝑧 ≤ 2𝑦 + 1 ∧ 2𝑧 ≥ 2𝑦 − 1

slide-34
SLIDE 34

Candidate is not an interpolant?

𝐽𝑜𝑢𝑓𝑠𝑞𝑝𝑚𝑏𝑜𝑢(𝐵, 𝐶) (𝑌+, 𝑌−) = 𝐽𝑜𝑗𝑢(𝐵, 𝐶) while(true) { 𝐼 = 𝐶𝐷𝐽(𝑌+, 𝑌−) Find candidate interpolant if (𝑇𝐵𝑈 𝐵 ∧ ¬𝐼 ) 𝐵 ⇒ 𝐽 Add 𝑡 to 𝑌+and continue; if (𝑇𝐵𝑈 𝐶 ∧ ¬𝐼 ) 𝐽 ∧ 𝐶 =⊥ Add 𝑡 to 𝑌−and continue; break; Exit if interpolant found } return 𝐼;

Theorem: 𝐽𝑜𝑢𝑓𝑠𝑞𝑝𝑚𝑏𝑜𝑢(𝐵, 𝐶) terminates only if output 𝐼 is an interpolant between 𝐵 and 𝐶

slide-35
SLIDE 35

Handling superficial non-linearities

Trace = < 1,2,3,4,5 > 𝐵 = 𝑡𝑧𝑛𝐹𝑦𝑓𝑑(< 1,2,3 >) 𝐶 = 𝑡𝑧𝑛𝐹𝑦𝑓𝑑(< 4 >) 𝐵 ≡ 𝑦1 = sin2 𝑨 ∧ 𝑧 = cos2 𝑨 𝐶 ≡ 𝑦 = 2 ∧ 𝑧 ≠ 2 𝑦 + 𝑧 = 4

void foo() { 1: z = nondet(); 2: x = 4 * sin(z) * sin(z); 3: y = 4 * cos(z) * cos(z); 4: assert(x != 2 || y == 2) } void foo() { 1: assume(x+y == 4) 2: assert(x != 2 || y == 2) }

slide-36
SLIDE 36

Evaluation

Progr gram LOC Int Interpolant #T #Tes ests ts Time (s (s) f1a 20 𝑦 = 𝑧 12 0.017 ex1 22 𝑦 + 2𝑧 ≥ 0 13 0.019 f2 18 3𝑦 ≥ 𝑧 13 0.021 nec1 17 𝑦 ≤ 8 19 0.015 nec2 22 𝑦 < 𝑧 12 0.014 nec3 15 𝑧 ≤ 9 11 0.014 nec4 22 𝑦 = 𝑧 20 0.019 nec5 9 𝑡 ≥ 0 11 0.013 pldi08 10 𝑦 < 0 ∨ 𝑧 > 0 17 0.02 fse06 8 𝑧 ≥ 0 ∧ 𝑦 ≥ 0 11 0.014

slide-37
SLIDE 37

Interpolant not conjunctive?

1 2 3

program

PAC Learner Check

𝜌 𝑢

Program Verification as Learning Geometric Concepts. Sharma, Gupta, Hariharan, Aiken, Nori. Static Analysis Symposium (SAS 2013) A Data Driven Approach for Algebraic Loop Invariants. Sharma, Gupta, Hariharan, Aiken, Liang, Nori. European Symposium on Programming (ESOP 2013)

slide-38
SLIDE 38

Proving termination

Termination proofs from tests. Nori, Sharma. Foundations of Software Engineering (FSE 2013)

program

Regression Check

𝜌 𝑢

  • guess a loop bound 𝜌
  • check the loop bound 𝜌 with a

safety checker

slide-39
SLIDE 39

Example: GCD

1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }

slide-40
SLIDE 40

Example: Instrumented GCD

  • Inputs

𝑦, 𝑧 = { 1,2 , 2,1 , 1,3 , 3,1 }

  • 𝐵 =

𝑏 𝑐 1 2 2 1 1 3 1 3 3 1 3 1 , C = 𝑑 1 1 1 2 1 2

  • Find 𝑑 ≈ 𝑥1𝑏 + 𝑥2𝑐 + 𝑥3

(linear regression)

1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: // instrumented code 5: a = x; b = y; c = 0; 6: while (x !=y ) { 7: // instrumented code 8: c = c+1; 9: writeLog(a, b, c, x, y); 10: if (x > y) x = x-y; 11: if (y > x) y = y-x; 12: } 13: return x; 14: }

slide-41
SLIDE 41

Linear regression

  • min 𝑗(𝑥1𝑏 + 𝑥2𝑐 + 𝑥3 − 𝑑𝑗)2
slide-42
SLIDE 42

Quadratic programming

  • min 𝑗(𝑥1𝑏 + 𝑥2𝑐 + 𝑥3 − 𝑑𝑗)2

𝑡. 𝑢. 𝐵𝑥 ≥ 𝐷

  • Guess is 𝜐 𝑏, 𝑐 = 𝑏 + 𝑐 − 2
slide-43
SLIDE 43

Example: Annotated GCD

  • Check with a safety

checker

1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: a = x; b = y; c = 0; 5: while (x !=y ) { 6: // annotation 7: c = c+1; 8: assert(c <= a+b-2); 9: if (x > y) x = x-y; 10: if (y > x) y = y-x; 11: } 12: return x; 13: }

slide-44
SLIDE 44

Evaluation – Micro-benchmarks

  • Guess implemented in MATLAB

𝑟𝑣𝑏𝑒𝑞𝑠𝑝𝑕 𝐵𝑈 ∗ 𝐵, −𝐵𝑈 ∗ 𝐷, −𝐵, −𝐷

  • Benchmarks

Octanal distribution Driver distribution Poly distribution

  • TpT works on 15% more benchmarks!
slide-45
SLIDE 45

Evaluation – Device drivers

Driver LOC #Loops Gue uess (s (s) Che heck (s (s) TpT (s (s) kbfiltr 0.9K 2 0.001 8.8 8.8 diskperf 2.3K 4 0.001 41.8 41.8 fakemodem 3.1K 3 0.001 2841.7 2841.7 serenum 5.3K 17 0.04 2081.3 2081.3 flpydisk 6K 24 0.04 305.4 305.4 kbdclass 6.5K 16 0..05 1822.3 1822.4

slide-46
SLIDE 46

Summary

  • Guess-and-Check data-driven approach to

program verification

  • Simple machine learning algorithms used to

solve hard verification problems

  • Da

Data a Dr Driven ven Program ram Analy lysis sis