Machine learning Aditya V. Nori Programming Languages & Tools - - PowerPoint PPT Presentation

β–Ά
machine learning
SMART_READER_LITE
LIVE PREVIEW

Machine learning Aditya V. Nori Programming Languages & Tools - - PowerPoint PPT Presentation

Program verification via Machine learning Aditya V. Nori Programming Languages & Tools group Microsoft Research India Joint work with Rahul Sharma, Alex Aiken (Stanford University) Program verification 1: x = y = 0; 1: gcd(int x, int


slide-1
SLIDE 1

Program verification via Machine learning

Aditya V. Nori Programming Languages & Tools group Microsoft Research India

Joint work with Rahul Sharma, Alex Aiken (Stanford University)

slide-2
SLIDE 2

Program verification

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0); 1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }

Qu Questi tion

  • n

Is the assertion satisfied for all possible inputs?

Qu Questi tion

  • n

Does gcd terminate for all inputs 𝑦, 𝑧?

slide-3
SLIDE 3

Current state of affairs

  • Precision
  • Scalability
  • Testing is still the dominant technique for establishing software

quality

slide-4
SLIDE 4

Question …

  • Most applications are associated with test suites, primarily used

for regression or fuzz testing

  • Can we use these test suites profitably for proving program

correctness?

slide-5
SLIDE 5

Here’s the plan …

  • Guess: analyse data from tests in order to

infer a candidate invariant (use ML techniques)

  • Check: validate candidate invariant using

sound program analysis techniques

  • If check succeeds, then we have a proof!
  • If check fails, use failure to generate more data

and repeat guess+check

  • Why is this nice?
  • Program analysis not so good at guessing

invariants

  • Program analysis is good at checking invariants
  • Able to make use of data generated from

programs and existing ML algorithms for analysis

program

Guess Check

πœ… 𝑒

slide-6
SLIDE 6

Instantiations of Guess

  • Classification

Interpolants as Classifiers. Sharma, N, Aiken, Computer-Aided Verification (CAV 2012) Program Verification as Learning Geometric Concepts. Sharma, Gupta, Hariharan, Aiken, N. Submitted

  • Linear algebra

A Data Driven Approach for Algebraic Loop Invariants. Sharma, Gupta, Hariharan, Aiken, N. European Symposium on Programming (ESOP 2012)

  • Regression

Termination proofs from tests. N, Sharma. submitted

slide-7
SLIDE 7

Interpolants

  • An interpolant for a pair of formulas 𝐡, 𝐢 s.t. (𝐡 ∧ 𝐢 =βŠ₯) is a

formula 𝐽 satisfying:

  • 𝐡 β‡’ 𝐽
  • 𝐽 ∧ 𝐢 =βŠ₯
  • 𝑀𝑏𝑠𝑑 𝐽 βŠ† 𝑀𝑏𝑠𝑑 𝐡 ∩ 𝑀𝑏𝑠𝑑 𝐢
  • An interpolant is a β€œsimple” proof
slide-8
SLIDE 8

Example

  • 𝐡 = 𝑦 β‰₯ 𝑧
  • 𝐢 = 𝑧 β‰₯ 𝑦 + 1
  • 𝐽 = 2𝑦 + 1 β‰₯ 2𝑧

x y

slide-9
SLIDE 9

Binary classification

  • Input: a set of points π‘Œ with labels π‘š ∈ +1, βˆ’1
  • Goal: find a classifier 𝐷: X β†’ {𝑒𝑠𝑣𝑓, π‘”π‘π‘šπ‘‘π‘“} such that:
  • 𝐷 𝑏 = 𝑒𝑠𝑣𝑓, βˆ€π‘ ∈ π‘Œ . π‘šπ‘π‘π‘“π‘š 𝑏 = +1, and
  • 𝐷 𝑐 = π‘”π‘π‘šπ‘‘π‘“, βˆ€π‘ ∈ X . π‘šπ‘π‘π‘“π‘š 𝑐 = βˆ’1
slide-10
SLIDE 10

Verification & Machine-learning

  • Interpolant: separates formula 𝐡 from formula 𝐢
  • Classifier: separates positive examples from negative examples

Is there a connection?

slide-11
SLIDE 11

Yes!

  • Main result: view interpolants as classifiers which distinguish β€œ+”

examples from β€œβˆ’β€ examples

  • Use state-of-the-art classification algorithms (SVMs) for

computing invariants

  • SVMs are predictive β†’ generalized predicates for verification
slide-12
SLIDE 12

Verification & Machine-learning

Unroll the loops

  • Find interpolants
  • Get general proofs (loop

invariants)

Get positive and negative examples

  • Find a classifier
  • This is a predicate which

generalizes to test data

slide-13
SLIDE 13

Example

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);

slide-14
SLIDE 14

Example …

1: x = y = 0; 2: while (*) 3: x++; y++; 4: while (x != 0) 5: x--; y--; 6: assert (y == 0);

  • 𝐡 ≑ 𝑦1 = 0 ∧ 𝑧1 = 0 ∧ 𝑗𝑒𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧ 𝑧 =

𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)

  • 𝐢 ≑ 𝑗𝑒𝑓(𝑦 = 0, 𝑦2 = 𝑦 βˆ’ 1 ∧ 𝑧2 = 𝑧 βˆ’ 1, 𝑦2 =

𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 β‰  0

  • 𝐡 ∧ 𝐢 =βŠ₯
  • 𝐽 𝑦, 𝑧 ≑ 𝑦 = 𝑧

𝐡 𝐢

slide-15
SLIDE 15

Example

x y (0,0) + + (1,1)

ο‚‘ 𝐡 ≑ 𝑦1 = 0 ∧ 𝑧1 = 0 ∧ 𝑗𝑒𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧

𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)

ο‚‘ 𝐢 ≑ 𝑗𝑒𝑓(𝑦 = 0, 𝑦2= 𝑦 βˆ’ 1 ∧ 𝑧2 = 𝑧 βˆ’

1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 β‰  0

ο‚‘ 𝐽1 ≑ 2𝑧 ≀ 2𝑦 + 1

slide-16
SLIDE 16

Example

x y (0,0) + + (1,1)

Interpolant!

ο‚‘ 𝐡 ≑ 𝑦1 = 0 ∧ 𝑧1 = 0 ∧ 𝑗𝑒𝑓(𝑐, 𝑦 = 𝑦1 + 1 ∧

𝑧 = 𝑧1 + 1, 𝑦 = 𝑦1 ∧ 𝑧 = 𝑧1)

ο‚‘ 𝐢 ≑ 𝑗𝑒𝑓(𝑦 = 0, 𝑦2= 𝑦 βˆ’ 1 ∧ 𝑧2 = 𝑧 βˆ’

1, 𝑦2 = 𝑦 ∧ 𝑧2 = 𝑧) ∧ 𝑦2 = 0 ∧ 𝑧2 β‰  0

ο‚‘ 𝐽2 ≑ 2𝑧 ≀ 2𝑦 + 1 ∧ 2𝑧 β‰₯ 2𝑦 βˆ’ 1

slide-17
SLIDE 17

The algorithm

π½π‘œπ‘’π‘“π‘ π‘žπ‘π‘šπ‘π‘œπ‘’(𝐡, 𝐢) (π‘Œ+, π‘Œβˆ’) = π½π‘œπ‘—π‘’(𝐡, 𝐢) while(true) { 𝐼 = π‘‡π‘Šπ‘π½(π‘Œ+, π‘Œβˆ’) Find candidate interpolant if (π‘‡π΅π‘ˆ 𝐡 ∧ ¬𝐼 ) 𝐡 β‡’ 𝐽 Add 𝑑 to π‘Œ+and continue; if (π‘‡π΅π‘ˆ 𝐢 ∧ ¬𝐼 ) 𝐽 ∧ 𝐢 =βŠ₯ Add 𝑑 to π‘Œβˆ’and continue; break; Exit if interpolant found } return 𝐼; Theorem: π½π‘œπ‘’π‘“π‘ π‘žπ‘π‘šπ‘π‘œπ‘’(𝐡, 𝐢) terminates only if

  • utput 𝐼 is an interpolant between 𝐡 and 𝐢
slide-18
SLIDE 18

Evaluation

  • 1000 lines of C++
  • LIBSVM for SVM queries
  • Z3 theorem prover
slide-19
SLIDE 19

Proving termination

  • For every loop, guess a bound on the number of iterations
  • Check the bound with a safety checker
slide-20
SLIDE 20

Example: GCD

1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: while (x !=y ) { 5: if (x > y) x = x-y; 6: if (y > x) y = y-x; 7: } 8: return x; 9 }

slide-21
SLIDE 21

Example: Instrumented GCD

  • Inputs

𝑦, 𝑧 = { 1,2 , 2,1 , 1,3 , 3,1 }

  • 𝐡 =

1 𝑏 𝑐 1 1 2 1 2 1 1 1 3 1 1 3 1 3 1 1 3 1 , C = 𝑑 1 1 1 2 1 2

  • Find 𝑑 β‰ˆ π‘₯1𝑏 + π‘₯2𝑐 + π‘₯3 (linear regression)

1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: // instrumented code 5: a = x; b = y; c = 0; 6: while (x !=y ) { 7: // instrumented code 8: c = c+1; 9: writeLog(a, b, c, x, y); 10: if (x > y) x = x-y; 11: if (y > x) y = y-x; 12: } 13: return x; 14: }

slide-22
SLIDE 22

Linear regression

  • min 𝑗(π‘₯1𝑏 + π‘₯2𝑐 + π‘₯3 βˆ’ 𝑑𝑗)2
slide-23
SLIDE 23

Quadratic programming

  • min 𝑗(π‘₯1𝑏 + π‘₯2𝑐 + π‘₯3 βˆ’ 𝑑𝑗)2

𝑑. 𝑒. 𝐡π‘₯ β‰₯ 𝐷

  • Guess is 𝜐 𝑏, 𝑐 = 𝑏 + 𝑐 βˆ’ 2
slide-24
SLIDE 24

Example: Annotated GCD

  • Check with a safety checker
  • Free invariant to aid checker

𝑑 ≀ 𝑏 + 𝑐 βˆ’ 𝑦 βˆ’ 𝑧 ∧ 𝑦 > 0 ∧ 𝑧 > 0

  • Corrective measures
  • Sound rounding for polynomials

with integer coefficients

  • Partitioning of tests for

discovering disjunctive loop bounds

1: gcd(int x, int y) 2: { 3: assume(x>0 && y>0); 4: a = x; b = y; c = 0; 5: while (x !=y ) { 6: // annotation 7: free_invariant(c <= a+b-x-y); 8: // annotation 9: assert(c <= a+b-2); 10: if (x > y) x = x-y; 11: if (y > x) y = y-x; 12: } 13: return x; 14: }

slide-25
SLIDE 25

Evaluation

slide-26
SLIDE 26

Summary

  • Classification based algorithms can be used for computing proofs

in program verification

  • Follow-up work on using techniques from linear algebra and PAC

learning for scalable proofs

  • Proving program termination via linear regression
  • Data

a Driven ven Program ram An Analys lysis is