Discovering Relational Specifications by Calvin Smith, Gabriel - - PowerPoint PPT Presentation

discovering relational specifications
SMART_READER_LITE
LIVE PREVIEW

Discovering Relational Specifications by Calvin Smith, Gabriel - - PowerPoint PPT Presentation

Discovering Relational Specifications Discovering Relational Specifications by Calvin Smith, Gabriel Ferns, Aws Albarghouthi Muqsit Azeem TRDDC, Pune July 21, 2018 Formal Methods Update Meeting, BITS Pilani, Goa Campus Muqsit Azeem TRDDC, Pune


slide-1
SLIDE 1

Discovering Relational Specifications

Discovering Relational Specifications

by Calvin Smith, Gabriel Ferns, Aws Albarghouthi Muqsit Azeem TRDDC, Pune July 21, 2018 Formal Methods Update Meeting, BITS Pilani, Goa Campus

Muqsit Azeem TRDDC, Pune | July 21, 2018 1 / 40

slide-2
SLIDE 2

Discovering Relational Specifications

What are we interested in?

Formal specifications of library functions

Muqsit Azeem TRDDC, Pune | July 21, 2018 2 / 40

slide-3
SLIDE 3

Discovering Relational Specifications

What are we interested in?

Formal specifications of library functions

Problems:

code unavailable large code

partial behavior of these functions discover a rich class of specifications

Muqsit Azeem TRDDC, Pune | July 21, 2018 2 / 40

slide-4
SLIDE 4

Discovering Relational Specifications

Problem

Given a function f and a data set D, a partial picture of i/o behavior

  • f f , perhaps collected through some random testing

What can we learn about the function f by simply analyzing the dataset D?

Muqsit Azeem TRDDC, Pune | July 21, 2018 3 / 40

slide-5
SLIDE 5

Discovering Relational Specifications

Example 1

f i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . . f is commutative Specification

Muqsit Azeem TRDDC, Pune | July 21, 2018 4 / 40

slide-6
SLIDE 6

Discovering Relational Specifications

Example 2

f i1 r 1 1 7 7

  • 10

10 f (x) = |x| Specification

Muqsit Azeem TRDDC, Pune | July 21, 2018 5 / 40

slide-7
SLIDE 7

Discovering Relational Specifications

Example 2

f i1 r 1 1 7 7

  • 10

10 f (x) = x Specification

Muqsit Azeem TRDDC, Pune | July 21, 2018 6 / 40

slide-8
SLIDE 8

Discovering Relational Specifications

D-restricted assignment (σD)

f i1 r 1 1 7 7

  • 10

10 f (x) = x Specification assign each variable of specification to a constant that appears in the dataset σD = {x → 1} is a D-restricted assignment to f (x) = x but σD = {x → 2} is not a D-restricted assignment because f is not defined for 2 in the given dataset

Muqsit Azeem TRDDC, Pune | July 21, 2018 7 / 40

slide-9
SLIDE 9

Discovering Relational Specifications

Example 2

f i1 r 1 1 7 7

  • 10

10 f (x) = x Specification

positive evidence

D-restricted assignments that satisfies the specification positive evidence is {x → 1, x → 7}

Muqsit Azeem TRDDC, Pune | July 21, 2018 8 / 40

slide-10
SLIDE 10

Discovering Relational Specifications

Example 2

f i1 r 1 1 7 7

  • 10

10 f (x) = x Specification

negative evidence

D-restricted assignments that does not satisfy the specification negative evidence is {x → −10}

Muqsit Azeem TRDDC, Pune | July 21, 2018 9 / 40

slide-11
SLIDE 11

Discovering Relational Specifications

What does it mean for a specification to explain a data-set?

if there exists a negative evidence - the specification is considered inconsistent with the data

  • therwise the specification is considered more likely to be true

depending on a measure of the positive evidence that is available for it

Muqsit Azeem TRDDC, Pune | July 21, 2018 10 / 40

slide-12
SLIDE 12

Discovering Relational Specifications

Want to learn specifications

commutativity

f (x, y) = z ⇔ f (y, x) = z

transitivity

g(x, y) = t ∧ g(y, z) = t ⇒ g(x, z) = t

sin is periodic by 2π

∃k.x = 2πk + y ⇒ sin(x) = z ⇔ sin(y) = z

rotating a shape by a multiple of 2π does not change the shape

∃k.x = 2πk ⇒ rotate(y, x) = y

Muqsit Azeem TRDDC, Pune | July 21, 2018 11 / 40

slide-13
SLIDE 13

Discovering Relational Specifications

Example 3

f i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . . f (x, y) = z ⇔ f (y, x) = z Specification

Muqsit Azeem TRDDC, Pune | July 21, 2018 12 / 40

slide-14
SLIDE 14

Discovering Relational Specifications

Example 3

f i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . . f (x, y) = z ⇔ f (y, x) = z Specification

positive and negative evidence

positive evidence is {{x → 3, y → 4}, {x → 4, y → 3}} no negative evidence

Muqsit Azeem TRDDC, Pune | July 21, 2018 12 / 40

slide-15
SLIDE 15

Discovering Relational Specifications

Example 4

concat i1 i2 r a b ab a ǫ a ǫ a a b ǫ b . . . . . . . . . len i1 r a 1 ǫ b 1 ab 2 . . . . . . Specification: len(concat(x, y)) = z ⇔ len(x) = z

Muqsit Azeem TRDDC, Pune | July 21, 2018 13 / 40

slide-16
SLIDE 16

Discovering Relational Specifications

Example 4

concat i1 i2 r a b ab a ǫ a ǫ a a b ǫ b . . . . . . . . . len i1 r a 1 ǫ b 1 ab 2 . . . . . . Specification: len(concat(x, y)) = z ⇔ len(x) = z

positive and negative evidence

positive evidence is {{x → a, y → ǫ}, {x → b, y → ǫ}} negative evidence is {{x → a, y → b}, {x → ǫ, y → a}}

Muqsit Azeem TRDDC, Pune | July 21, 2018 13 / 40

slide-17
SLIDE 17

Discovering Relational Specifications

Example 4

add constraint to weaken the specification by finding a formula G s.t. for all negative evidences, G is unsat. for some positive evidences, G is sat. G ⇒ len(concat(x, y) = z) ⇔ len(x) = z has some positive evidences and has no negative evidence. y = ǫ ⇒ len(concat(x, y)) = z ⇔ len(x) = z

Muqsit Azeem TRDDC, Pune | July 21, 2018 14 / 40

slide-18
SLIDE 18

Discovering Relational Specifications

Bach

A technique for discovering likely specifications from data generated for a number of standard libraries.

Muqsit Azeem TRDDC, Pune | July 21, 2018 15 / 40

slide-19
SLIDE 19

Discovering Relational Specifications

Bach

A technique for discovering likely specifications from data generated for a number of standard libraries.

Discovers rich array of specifications

by combining novel insights of program synthesis and databases.

Muqsit Azeem TRDDC, Pune | July 21, 2018 15 / 40

slide-20
SLIDE 20

Discovering Relational Specifications

Specification

Consider specification as a formula over an interpreted theory

Specification (F):

∀V .G ⇒ (Ψ ⇔ Φ) or ∀V .G ⇒ (Ψ ⇒ Φ),

where Ψ = ∧iψi and Φ = ∧jφj V : set of variables G : a formula over interpreted set of predicates and function symbols each ψi is an atom of the form t = o (analogously, φi) t is a nested function application over V , is a finite set of uninterpreted functions {f1, ..., fn}

  • ∈ V

Muqsit Azeem TRDDC, Pune | July 21, 2018 16 / 40

slide-21
SLIDE 21

Discovering Relational Specifications

Specification

Consider specification as a formula over an interpreted theory

Specification (F):

∀V .G ⇒ (Ψ ⇔ Φ) or ∀V .G ⇒ (Ψ ⇒ Φ),

where Ψ = ∧iψi and Φ = ∧jφj V : set of variables G : a formula over interpreted set of predicates and function symbols each ψi is an atom of the form t = o (analogously, φi) t is a nested function application over V , is a finite set of uninterpreted functions {f1, ..., fn}

  • ∈ V

E.g. ∀x, y.x > 0 ⇒ (f (g(x)) = y ⇔ g(f (x)) = y)

Muqsit Azeem TRDDC, Pune | July 21, 2018 16 / 40

slide-22
SLIDE 22

Discovering Relational Specifications

Searching of specifications: Specification Induction

iteratively constructs specifications by traversing set of programs and connections between them in order from smallest to largest based on a set of rules

Muqsit Azeem TRDDC, Pune | July 21, 2018 17 / 40

slide-23
SLIDE 23

Discovering Relational Specifications

Searching of specifications: Specification Induction

iteratively constructs specifications by traversing set of programs and connections between them in order from smallest to largest based on a set of rules Enumerative synthesis

Muqsit Azeem TRDDC, Pune | July 21, 2018 17 / 40

slide-24
SLIDE 24

Discovering Relational Specifications

Specification preference

Given Ψ, Φ

learn Ψ ⇔ Φ if fail, learn either Ψ ⇒ Φ or Φ ⇒ Ψ If no implication can be learned, Bach resorts to abduction

Muqsit Azeem TRDDC, Pune | July 21, 2018 18 / 40

slide-25
SLIDE 25

Discovering Relational Specifications

Guard abduction

Bach solves a number of abduction problems to learn guard G ⇒ (Ψ ⇔ Φ), G ⇒ (Ψ ⇒ Φ), G ⇒ (Φ ⇒ Ψ)

Each provided predicate is instantiated with every combination of variables E.g. if a > b is provided and vars(F) = {x, y}, abduction will use x > y and y > x

Muqsit Azeem TRDDC, Pune | July 21, 2018 19 / 40

slide-26
SLIDE 26

Discovering Relational Specifications

Specification Preference: Example

h1 i1 r 1 true 2 false 3 true . . . . . . h2 i1 r 1 true 2 true 3 true . . . . . . Ψ : h1(x) = p, Φ : h2(x) = p, where p = {true, false} Specification: h1(x) = p ⇔ h2(x) = p

Muqsit Azeem TRDDC, Pune | July 21, 2018 20 / 40

slide-27
SLIDE 27

Discovering Relational Specifications

(⇒) h1(x) = p ⇒ h2(x) = p

Muqsit Azeem TRDDC, Pune | July 21, 2018 21 / 40

slide-28
SLIDE 28

Discovering Relational Specifications

(⇒) h1(x) = p ⇒ h2(x) = p

Negative evidence

{x = 2, p = false}

Muqsit Azeem TRDDC, Pune | July 21, 2018 21 / 40

slide-29
SLIDE 29

Discovering Relational Specifications

(⇐) h2(x) = p ⇒ h1(x) = p

Muqsit Azeem TRDDC, Pune | July 21, 2018 22 / 40

slide-30
SLIDE 30

Discovering Relational Specifications

(⇐) h2(x) = p ⇒ h1(x) = p

Negative evidence

{x = 2, p = true}

Muqsit Azeem TRDDC, Pune | July 21, 2018 22 / 40

slide-31
SLIDE 31

Discovering Relational Specifications

Guard abduction

G ⇒ (h1(x) = p ⇔ h2(x) = p) G ⇒ (h1(x) = p ⇒ h2(x) = p) G ⇒ (h2(x) = p ⇒ h1(x) = p)

Muqsit Azeem TRDDC, Pune | July 21, 2018 23 / 40

slide-32
SLIDE 32

Discovering Relational Specifications

Guard abduction

G ⇒ (h1(x) = p ⇔ h2(x) = p) G ⇒ (h1(x) = p ⇒ h2(x) = p) G ⇒ (h2(x) = p ⇒ h1(x) = p)

Learned specification

p = true ⇒ (h1(x) = p ⇒ h2(x) = p)

Muqsit Azeem TRDDC, Pune | July 21, 2018 23 / 40

slide-33
SLIDE 33

Discovering Relational Specifications

Specification Consistency Verification How to efficiently verify the consistency of the specification with the dataset?

Muqsit Azeem TRDDC, Pune | July 21, 2018 24 / 40

slide-34
SLIDE 34

Discovering Relational Specifications

How to efficiently verify the consistency?

model positive and negative evidence of a formula F and data-set D as a union of conjunctive query (UCQ). the evaluation should return the positive and negative evidence formulation as a database query evaluation allow us to leverage efficient, highly engineered database engines and Datalog server query is typically small and data is large

Muqsit Azeem TRDDC, Pune | July 21, 2018 25 / 40

slide-35
SLIDE 35

Discovering Relational Specifications

Encoding Specifications

Specification(F):

∀¯ x.Ψ ⇔ Φ,

where Ψ = ∧iψi and Φ = ∧jφj

Muqsit Azeem TRDDC, Pune | July 21, 2018 26 / 40

slide-36
SLIDE 36

Discovering Relational Specifications

Encoding Specifications

Specification(F):

∀¯ x.Ψ ⇔ Φ,

where Ψ = ∧iψi and Φ = ∧jφj for each var x ∈ ¯ x in formula F, create a Datalog variable Xx ∈ ¯ X, i.e.

x ∈ ¯ x ⇒ Xx ∈ ¯ X

for each n-ary function f , create (n + 1)-ary relation Rf

Muqsit Azeem TRDDC, Pune | July 21, 2018 26 / 40

slide-37
SLIDE 37

Discovering Relational Specifications

Encoding Specifications: Example

f i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . . f (x, y) = z ⇔ f (y, x) = z Specification

Muqsit Azeem TRDDC, Pune | July 21, 2018 27 / 40

slide-38
SLIDE 38

Discovering Relational Specifications

Encoding Specifications: Example

f i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . . f (x, y) = z ⇔ f (y, x) = z Specification

positive and negative evidence

positive evidence P(X) ← Rf (Xx, Xy, O), Rf (Xy, Xx, O′), O = O′ negative evidence N(X) ← Rf (Xx, Xy, O), Rf (Xy, Xx, O′), O = O′

Muqsit Azeem TRDDC, Pune | July 21, 2018 27 / 40

slide-39
SLIDE 39

Discovering Relational Specifications

Components of Bach

Specification Consistency Verification Specification Induction f1, D1 f2, D2 . . . fn, Dn Guard abduction ∀x, y.φ ⇔ ψ ∀x, y.φ ⇒ ψ ∀x, y . . . . . . Specification (S) +ve/-ve evidence Refined Specification (G ⇒ S)

Muqsit Azeem TRDDC, Pune | July 21, 2018 28 / 40

slide-40
SLIDE 40

Discovering Relational Specifications

Implementation

Implemented in OCaml. Input:

a signature of simply typed functions i/o data for each function a set of predicates to compute guards

Uses Souffle Datalog engine to compute +ve and -ve evidence

Muqsit Azeem TRDDC, Pune | July 21, 2018 29 / 40

slide-41
SLIDE 41

Discovering Relational Specifications

Exploratory Evaluation

Targeted 9 set of python libraries. Each benchmark consists of a finite set of signature a set of predicates a data-set of 1000 randomly samples executions for each function

Figure: List of benchmarks; number of functions is in parentheses

Muqsit Azeem TRDDC, Pune | July 21, 2018 30 / 40

slide-42
SLIDE 42

Discovering Relational Specifications

z3 specifications

The z3 benchmark contains functions from a subset of Python’s z3 API. Learned specification for z3

p = true ⇒ (valid(x) = p ⇒ sat(x) = p) and(x, y) = z ⇔ and(y, x) = z valid(x) = p ∧ valid(y) = p ⇒ valid(and(x, y)) = p

Muqsit Azeem TRDDC, Pune | July 21, 2018 31 / 40

slide-43
SLIDE 43

Discovering Relational Specifications

strings specifications

The strings benchmark contains the typical set of funtions for manipulating strings. Learned specification for strings

lstrip(x) = y ⇒ lstrip(y) = y p = true ⇒ (prefix(x, x) = p) concat(y, reverse(y)) = x ⇒ reverse(x) = x

Muqsit Azeem TRDDC, Pune | July 21, 2018 32 / 40

slide-44
SLIDE 44

Discovering Relational Specifications

trig specifications

The trig benchmark contains trigonometric functions from Python’s math module. Learned specification for trig

∃k.x = 2πk + y ⇒ (sin(x) = z ⇔ sin(y) = z) arcsin(z) = x ⇒ sin(x) = z

Muqsit Azeem TRDDC, Pune | July 21, 2018 33 / 40

slide-45
SLIDE 45

Discovering Relational Specifications

geometry specifications

The geometry benchmark contains functions from sympy’s geometry module. Learned specification for geometry

b = true ⇒ (encl(x, y) = b ∧ encl pt(y, p) = true ⇒ encl pt(x, p) = true) ∃k.x = 2πk ⇒ rotate(y, x) = y

Muqsit Azeem TRDDC, Pune | July 21, 2018 34 / 40

slide-46
SLIDE 46

Discovering Relational Specifications

Empirical Evaluation: Scalability

Figure: with more data, Bach checks less specification in same amount of time

Muqsit Azeem TRDDC, Pune | July 21, 2018 35 / 40

slide-47
SLIDE 47

Discovering Relational Specifications

Empirical Evaluation: Error Analysis

Figure: Average correctness results

(T1 is the type-1 error, T2 is the type-2 error, Size is the number of specifications produced)

Muqsit Azeem TRDDC, Pune | July 21, 2018 36 / 40

slide-48
SLIDE 48

Discovering Relational Specifications

Empirical Evaluation: Error Analysis

Figure: Worst-case benchmark’s error rates with respect to number of

  • bservations

Muqsit Azeem TRDDC, Pune | July 21, 2018 37 / 40

slide-49
SLIDE 49

Discovering Relational Specifications

Empirical Evaluation: Error Analysis

Figure: Best-case benchmark’s error rates with respect to number of observations

Muqsit Azeem TRDDC, Pune | July 21, 2018 38 / 40

slide-50
SLIDE 50

Discovering Relational Specifications

Conclusion

a technique for learning relational specification from i/o data learns specification that correlates different executions of multiple functions novel idea combining program synthesis and databases learns interesting specifications of real world libraries useful in program verification and development tasks

Muqsit Azeem TRDDC, Pune | July 21, 2018 39 / 40

slide-51
SLIDE 51

Discovering Relational Specifications

Questions?

Muqsit Azeem TRDDC, Pune | July 21, 2018 40 / 40