Signature Inference for Functional Property Discovery or: How never - - PowerPoint PPT Presentation

signature inference for functional property discovery
SMART_READER_LITE
LIVE PREVIEW

Signature Inference for Functional Property Discovery or: How never - - PowerPoint PPT Presentation

Signature Inference for Functional Property Discovery or: How never to come up with tests manually anymore(*) Tom Sydney Kerckhove FP Complete https://cs-syd.eu/ https://github.com/NorfairKing https://fpcomplete.com 2018-02-22 Motivation


slide-1
SLIDE 1

Signature Inference for Functional Property Discovery

  • r: How never to come up with tests manually anymore(*)

Tom Sydney Kerckhove

FP Complete https://cs-syd.eu/ https://github.com/NorfairKing https://fpcomplete.com

2018-02-22

slide-2
SLIDE 2

Motivation

Writing correct software is hard for humans.

slide-3
SLIDE 3

Unit Testing

sort [4, 1, 6] == [1, 4, 6]

slide-4
SLIDE 4

Unit Testing

sort [4, 1, 6] == [1, 4, 6]

slide-5
SLIDE 5

Property Testing

forAll arbitrary $ \ls -> isSorted (sort ls)

slide-6
SLIDE 6

Property Testing

forAll arbitrary $ \ls -> isSorted (sort ls)

slide-7
SLIDE 7

Property Testing

forAll arbitrary $ \ls -> isSorted (sort ls)

slide-8
SLIDE 8

Property Discovery

forAll arbitrary $ \ls -> isSorted (sort ls)

slide-9
SLIDE 9

Property Discovery with QuickSpec

slide-10
SLIDE 10

Example Code

module MySort where mySort :: Ord a => [a] -> [a] mySort [] = [] mySort (x:xs) = insert (mySort xs) where insert [] = [x] insert (y:ys) | x <= y = x : y : ys | otherwise = y : insert ys myIsSorted :: Ord a => [a] -> Bool myIsSorted [] = True myIsSorted [_] = True myIsSorted (x:y:ls) = x <= y && myIsSorted (y : ls)

slide-11
SLIDE 11

Example Code

module MySort where mySort :: Ord a => [a] -> [a] mySort [] = [] mySort (x:xs) = insert (mySort xs) where insert [] = [x] insert (y:ys) | x <= y = x : y : ys | otherwise = y : insert ys myIsSorted :: Ord a => [a] -> Bool myIsSorted [] = True myIsSorted [_] = True myIsSorted (x:y:ls) = x <= y && myIsSorted (y : ls)

slide-12
SLIDE 12

Property Discovery using QuickSpec

== Signature == True :: Bool (<=) :: Ord a => a -> a -> Bool (:) :: a -> [a] -> [a] mySort :: Ord a => [a] -> [a] myIsSorted :: Ord a => [a] -> Bool

slide-13
SLIDE 13

Property Discovery using QuickSpec

== Signature == True :: Bool (<=) :: Ord a => a -> a -> Bool (:) :: a -> [a] -> [a] mySort :: Ord a => [a] -> [a] myIsSorted :: Ord a => [a] -> Bool == Laws ==

  • 1. y <= y = True
  • 2. y <= True = True
  • 3. True <= x = x
  • 4. myIsSorted (mySort xs) = True
  • 5. mySort (mySort xs) = mySort xs
  • 6. xs <= mySort xs = myIsSorted xs
  • 7. mySort xs <= xs = True
  • 8. myIsSorted (y : (y : xs)) = myIsSorted (y : xs)
  • 9. mySort (y : mySort xs) = mySort (y : xs)
slide-14
SLIDE 14

Property Discovery using QuickSpec

== Signature == True :: Bool (<=) :: Ord a => a -> a -> Bool (:) :: a -> [a] -> [a] mySort :: Ord a => [a] -> [a] myIsSorted :: Ord a => [a] -> Bool == Laws ==

  • 1. y <= y = True
  • 2. y <= True = True
  • 3. True <= x = x
  • 4. myIsSorted (mySort xs) = True
  • 5. mySort (mySort xs) = mySort xs
  • 6. xs <= mySort xs = myIsSorted xs
  • 7. mySort xs <= xs = True
  • 8. myIsSorted (y : (y : xs)) = myIsSorted (y : xs)
  • 9. mySort (y : mySort xs) = mySort (y : xs)
slide-15
SLIDE 15

QuickSpec Code

{-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE ConstraintKinds #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE FlexibleContexts #-} module MySortQuickSpec where import Control.Monad import MySort import QuickSpec main :: IO () main = void $ quickSpec signature { constants = [ constant "True" (True :: Bool) , constant "<=" (mkDict (<=) :: Dict (Ord A) -> A -> A -> Bool) , constant ":" ((:) :: A -> [A] -> [A]) , constant "mySort" (mkDict mySort :: Dict (Ord A) -> [A] -> [A]) , constant "myIsSorted" (mkDict myIsSorted :: Dict (Ord A) -> [A] -> Bool) ] } mkDict :: (c => a)

  • > Dict c
  • > a

mkDict x Dict = x

slide-16
SLIDE 16

Problems with QuickSpec: Monomorphisation

Only for monomorphic functions constant "filter" (filter :: (A -> Bool) -> [A] -> [A])

slide-17
SLIDE 17

Problems with QuickSpec: Code

Programmer has to write code for all functions of interest 15 lines of subject code. 33 lines of QuickSpec code.

slide-18
SLIDE 18

Problems with QuickSpec: Speed

Dumb version of the QuickSpec approach:

  • 1. Generate all possible terms
  • 2. Generate all possible equations (tuples) of terms
  • 3. Type check them to make sure the equation makes sense
  • 4. Check that the input can be generated and the output

compared for equality

  • 5. Run QuickCheck to see if the equation holds
slide-19
SLIDE 19

Property Discovery with EasySpec

slide-20
SLIDE 20

Step 1: Automation

slide-21
SLIDE 21

Signatures

{-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE ConstraintKinds #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE FlexibleContexts #-} module MySortQuickSpec where import Control.Monad import MySort import QuickSpec main :: IO () main = void $ quickSpec signature { constants = [ constant "True" (True :: Bool) , constant "<=" (mkDict (<=) :: Dict (Ord A) -> A -> A -> Bool) , constant ":" ((:) :: A -> [A] -> [A]) , constant "mySort" (mkDict mySort :: Dict (Ord A) -> [A] -> [A]) , constant "myIsSorted" (mkDict myIsSorted :: Dict (Ord A) -> [A] -> Bool) ] } mkDict :: (c => a)

  • > Dict c
  • > a

mkDict x Dict = x

slide-22
SLIDE 22

Signatures

{-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE ConstraintKinds #-} {-# LANGUAGE RankNTypes #-} {-# LANGUAGE FlexibleContexts #-} module MySortQuickSpec where import Control.Monad import MySort import QuickSpec main :: IO () main = void $ quickSpec signature { constants = [ constant "True" (True :: Bool) , constant "<=" (mkDict (<=) :: Dict (Ord A) -> A -> A -> Bool) , constant ":" ((:) :: A -> [A] -> [A]) , constant "mySort" (mkDict mySort :: Dict (Ord A) -> [A] -> [A]) , constant "myIsSorted" (mkDict myIsSorted :: Dict (Ord A) -> [A] -> Bool) ] } mkDict :: (c => a)

  • > Dict c
  • > a

mkDict x Dict = x

slide-23
SLIDE 23

A QuickSpec Signature

data Signature = Signature { functions :: [Function], [...] background :: [Prop], [...] } quickSpec :: Signature -> IO Signature

slide-24
SLIDE 24

Signature Expression Generation

slide-25
SLIDE 25

Signature Expression Generation

filter :: (a -> Bool) -> [a] -> [a]

slide-26
SLIDE 26

Signature Expression Generation

filter :: (a -> Bool) -> [a] -> [a] filter :: (A -> Bool) -> [A] -> [A]

slide-27
SLIDE 27

Signature Expression Generation

filter :: (a -> Bool) -> [a] -> [a] filter :: (A -> Bool) -> [A] -> [A] function "filter" (filter :: (A -> Bool) -> [A] -> [A])

slide-28
SLIDE 28

Signature Expression Generation

filter :: (a -> Bool) -> [a] -> [a] filter :: (A -> Bool) -> [A] -> [A] function "filter" (filter :: (A -> Bool) -> [A] -> [A]) signature { constants = [...] }

slide-29
SLIDE 29

Current Situation

$ cat Reverse.hs {-# LANGUAGE NoImplicitPrelude #-} module Reverse where import Data.List (reverse, sort)

slide-30
SLIDE 30

Current Situation

$ cat Reverse.hs {-# LANGUAGE NoImplicitPrelude #-} module Reverse where import Data.List (reverse, sort) $ easyspec discover Reverse.hs reverse (reverse xs) = xs sort (reverse xs) = sort xs

slide-31
SLIDE 31

Automated, but still slow

1 10 100 5 10 15

scope−size (functions) log(runtime) (seconds)

slide-32
SLIDE 32

Definition: Property

Example: reverse (reverse ls) = ls Short for: (\ls -> reverse (reverse ls)) = (\ls -> ls) In general: (f :: A -> B) = (g :: A -> B) for some A and B with instance Arbitrary A instance Eq B

slide-33
SLIDE 33

Why is this slow?

  • 1. Maximum size of the discovered properties
slide-34
SLIDE 34

Why is this slow?

  • 1. Maximum size of the discovered properties
  • 2. Size of the signature
slide-35
SLIDE 35

Idea

slide-36
SLIDE 36

Critical Insight

We are not interested in the entire codebase. We are interested in a relatively small amount of code.

slide-37
SLIDE 37

Reducing the Size of the Signature

inferSignature :: [Function] -- Focus functions

  • > [Function] -- Functions in scope
  • > [Function] -- Chosen functions
slide-38
SLIDE 38

Full Background and Empty Background

inferFullBackground _ scope = scope inferEmptyBackground focus _ = focus

slide-39
SLIDE 39

Full Background and Empty Background

inferFullBackground _ scope = scope inferEmptyBackground focus _ = focus

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

empty−background full−background

slide-40
SLIDE 40

Full Background and Empty Background

inferFullBackground _ scope = scope inferEmptyBackground focus _ = focus

  • empty−background

full−background 5 10 15 20 25 30

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-41
SLIDE 41

Syntactic Similarity: Name

inferSyntacticSimilarityName [focus] scope = take 5 $ sortOn (\sf -> distance (name focus) (name sf)) scope

slide-42
SLIDE 42

Syntactic Similarity: Name

inferSyntacticSimilarityName [focus] scope = take 5 $ sortOn (\sf -> distance (name focus) (name sf)) scope

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

full−background syntactical−similarity−name−5

slide-43
SLIDE 43

Syntactic Similarity: Name

inferSyntacticSimilarityName [focus] scope = take 5 $ sortOn (\sf -> distance (name focus) (name sf)) scope

  • full−background

syntactical−similarity−name−5 10 20 30 40

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-44
SLIDE 44

Syntactic Similarity: Implementation

inferSyntacticSimilaritySymbols i [focus] scope = take i $ sortOn (\sf -> distance (symbols focus) (symbols sf)) scope

slide-45
SLIDE 45

Syntactic Similarity: Implementation

inferSyntacticSimilaritySymbols i [focus] scope = take i $ sortOn (\sf -> distance (symbols focus) (symbols sf)) scope

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

full−background syntactical−similarity−symbols−5

slide-46
SLIDE 46

Syntactic Similarity: Implementation

inferSyntacticSimilaritySymbols i [focus] scope = take i $ sortOn (\sf -> distance (symbols focus) (symbols sf)) scope

  • full−background

syntactical−similarity−symbols−5 10 20 30

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-47
SLIDE 47

Syntactic Similarity: Type

inferSyntacticSimilarityType i [focus] scope = take i $ sortOn (\sf -> distance (getTypeParts focus) (getTypeParts sf)) scope

slide-48
SLIDE 48

Syntactic Similarity: Type

inferSyntacticSimilarityType i [focus] scope = take i $ sortOn (\sf -> distance (getTypeParts focus) (getTypeParts sf)) scope

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

full−background syntactical−similarity−type−5

slide-49
SLIDE 49

Syntactic Similarity: Type

inferSyntacticSimilarityType i [focus] scope = take i $ sortOn (\sf -> distance (getTypeParts focus) (getTypeParts sf)) scope

  • full−background

syntactical−similarity−type−5 10 20 30 40

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-50
SLIDE 50

Other Things we Tried

  • 1. Similarity using a different metric: edit distance
  • 2. Unions of the previous strategies
slide-51
SLIDE 51

Breakthrough

Histogram of the number of different functions in an equation

Different functions relative # of cases 1 2 3 4 5 0.0 0.1 0.2 0.3 0.4

slide-52
SLIDE 52

Idea

slide-53
SLIDE 53

We can run QuickSpec more than

  • nce!
slide-54
SLIDE 54

Inferred Signature

Combine the results of multiple runs: [Signature]

slide-55
SLIDE 55

Inferred Signature

Combine the results of multiple runs: [Signature] User previous results as background properties: Forest Signature

slide-56
SLIDE 56

Inferred Signature

Combine the results of multiple runs: [Signature] User previous results as background properties: Forest Signature Share previous runs: DAG Signature

slide-57
SLIDE 57

Chunks

chunks :: SignatureInferenceStrategy

> chunks > [sort :: Ord a => [a] -> [a]] > [reverse :: [a] -> [a], id :: a -> a]

[sort] [sort, reverse] [sort, id] [sort, not]

slide-58
SLIDE 58

The Runtime of Chunks

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

chunks full−background

slide-59
SLIDE 59

The Outcome of Chunks: Relevant equations

chunks full−background 10 20 30 40 50 60

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-60
SLIDE 60

Why does chunks find more relevant equations?

chunks full−background 20 40 60 80

Boxplot for equations (More is better.)

equations ( # equations )

slide-61
SLIDE 61

Why does chunks find more relevant equations?

Scope: a = (+ 1) b = (+ 2) c = (+ 3) d = (+ 4)

slide-62
SLIDE 62

Why does chunks find more relevant equations?

Scope: a = (+ 1) b = (+ 2) c = (+ 3) d = (+ 4) Full background: a (a x) = b x a (b x) = c x a (c x) = d x Relevant to d: a (c x) = d x

slide-63
SLIDE 63

Why does chunks find more relevant equations?

Scope: a = (+ 1) b = (+ 2) c = (+ 3) d = (+ 4) Full background: a (a x) = b x a (b x) = c x a (c x) = d x Relevant to d: a (c x) = d x Chunks for d: b (b x) = d x a (a (a (a x))) = d x All relevant

slide-64
SLIDE 64

Inferred Signature

type SignatureInferenceStrategy = [Function] -> [Function] -> InferredSignature type InferredSignature = DAG ([(Signature, [Equation])] -> Signature)

slide-65
SLIDE 65

Inferred Signature

type SignatureInferenceStrategy = [Function] -> [Function] -> InferM () data InferM a where InferPure :: a -> InferM a InferFmap :: (a -> b) -> InferM a -> InferM b InferApp :: InferM (a -> b) -> InferM a -> InferM b InferBind :: InferM a -> (a -> InferM b) -> InferM b InferFrom :: Signature

  • > [OptiToken]
  • > InferM (OptiToken, [Equation])
slide-66
SLIDE 66

Chunks Plus

chunksPlus :: SignatureInferenceStrategy

> chunksPlus > [sort :: Ord a => [a] -> [a]] > [reverse :: [a] -> [a], id :: a -> a]

[sort] [sort, reverse] [sort, id] [sort, not] [sort, reverse, id] [sort, id, not] [sort, not, reverse]

slide-67
SLIDE 67

The runtime of chunks plus

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

chunks−plus full−background

slide-68
SLIDE 68

The outcome of chunks plus: Relevant equations

chunks−plus full−background 20 40 60 80 100 120

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-69
SLIDE 69

Neat

$ time stack exec easyspec \

  • - discover MySort.hs MySort.mySort

xs <= mySort xs = myIsSorted xs mySort xs <= xs = True myIsSorted (mySort xs) = True mySort (mySort xs) = mySort xs 3.61s user 1.14s system 193% cpu 2.450 total

slide-70
SLIDE 70

Composing Strategies

type Reducing = [Function] -> [Function] -> [Function] type Drilling = [Function] -> [Function] -> InferM ()

slide-71
SLIDE 71

Composing Strategies

composeReducings :: Reducing -> Reducing -> Reducing composeReducings r1 r2 focus = r2 focus . r1 focus composeDrillings :: Drilling -> Drilling -> Drilling composeDrillings d1 d2 focus scope = do d1 focus scope d2 focus scope composeReducingWithDrilling :: Reducing -> Drilling -> Drilling composeReducingWithDrilling r d focus scope = d focus $ r focus scope

slide-72
SLIDE 72

The runtime of chunks plus composed with reducings

50 100 150 5 10 15

scope−size ( # functions ) runtime ( time seconds ) strategy

chunks−plus−similarity−name−5 chunks−plus−similarity−symbols−5 chunks−plus−similarity−type−5 chunks−plus−type−reachability−7

slide-73
SLIDE 73

The outcome of chunks plus composed with reducings: Relevant equations

  • chunks−plus−similarity−name−5

chunks−plus−similarity−symbols−5 chunks−plus−similarity−type−5 chunks−plus−type−reachability−7 full−background 20 40 60 80 100 120

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-74
SLIDE 74

All strategies

  • chunks

chunks−plus chunks−plus−reachability−name−5−7 chunks−plus−reachability−symbols−5−7 chunks−plus−reachability−type−5−7 chunks−plus−similarity−name−5 chunks−plus−similarity−symbols−5 chunks−plus−similarity−type−5 chunks−plus−type−reachability−7 chunks−similarity−name−5 chunks−similarity−symbols−5 chunks−similarity−type−5 chunks−type−reachability−7 empty−background full−background iterative−chunks−4−2 syntactical−similarity−name−5 syntactical−similarity−symbols−5 syntactical−similarity−type−5 type−reachability−7 20 40 60 80 100 120

Boxplot for relevant−equations (More is better.)

relevant−equations ( # equations )

slide-75
SLIDE 75

Great promise, but ...

slide-76
SLIDE 76

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

slide-77
SLIDE 77

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

  • 2. Crashes on partial functions.
slide-78
SLIDE 78

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

  • 2. Crashes on partial functions.
  • 3. Only works with built in instances.
slide-79
SLIDE 79

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

  • 2. Crashes on partial functions.
  • 3. Only works with built in instances.
  • 4. Data has to have an Arbitrary instance in scope.
slide-80
SLIDE 80

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

  • 2. Crashes on partial functions.
  • 3. Only works with built in instances.
  • 4. Data has to have an Arbitrary instance in scope.
  • 5. Does not play with CPP.
slide-81
SLIDE 81

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

  • 2. Crashes on partial functions.
  • 3. Only works with built in instances.
  • 4. Data has to have an Arbitrary instance in scope.
  • 5. Does not play with CPP.
  • 6. Does not play well with higher kinded type variables.
slide-82
SLIDE 82

Great promise, but ...

  • 1. Only works for functions in scope of which the type is in scope

too.

  • 2. Crashes on partial functions.
  • 3. Only works with built in instances.
  • 4. Data has to have an Arbitrary instance in scope.
  • 5. Does not play with CPP.
  • 6. Does not play well with higher kinded type variables.

All technical problems, not theoretical problems!

slide-83
SLIDE 83

Further Research

  • 1. Can we go faster?
slide-84
SLIDE 84

Further Research

  • 1. Can we go faster?
  • 2. Which constants do we choose for built in types?
slide-85
SLIDE 85

Further Research

  • 1. Can we go faster?
  • 2. Which constants do we choose for built in types?
  • 3. Can we apply this to effectful code?
slide-86
SLIDE 86

Further Research

  • 1. Can we go faster?
  • 2. Which constants do we choose for built in types?
  • 3. Can we apply this to effectful code?
  • 4. Relative importance of equations
slide-87
SLIDE 87

Signature Inference for Functional Property Discovery

  • r: How never to come up with tests manually anymore(*)

Tom Sydney Kerckhove

FP Complete https://cs-syd.eu/ https://github.com/NorfairKing https://fpcomplete.com

2018-02-22