Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak - - PowerPoint PPT Presentation

fast synthesis of fast collections
SMART_READER_LITE
LIVE PREVIEW

Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak - - PowerPoint PPT Presentation

Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington Data structures are everywhere 2 Data structures are everywhere 2 Data structures are everywhere 2 Data structures are


slide-1
SLIDE 1

Fast Synthesis of Fast Collections

Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington

slide-2
SLIDE 2

Data structures are everywhere

2

slide-3
SLIDE 3

Data structures are everywhere

2

slide-4
SLIDE 4

Data structures are everywhere

2

slide-5
SLIDE 5

Data structures are everywhere

2

slide-6
SLIDE 6

Data structures are everywhere

2

Lists, maps, and sets solve many problems

slide-7
SLIDE 7

Data structures are everywhere

2

Lists, maps, and sets solve many problems What if I need a custom data structure?

slide-8
SLIDE 8

3

Cozy synthesizes collections

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

slide-9
SLIDE 9

3

Cozy synthesizes collections

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

  • Correct by construction
slide-10
SLIDE 10

3

Cozy synthesizes collections

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

  • Correct by construction
  • Specifications orders-of-magnitude shorter than

implementations, synthesized in < 90 seconds

slide-11
SLIDE 11

3

Cozy synthesizes collections

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

  • Correct by construction
  • Specifications orders-of-magnitude shorter than

implementations, synthesized in < 90 seconds

  • Equivalent performance to human-written code
slide-12
SLIDE 12

Myria Analytics Storage

4

Request 1 Request 2 time

slide-13
SLIDE 13

Myria Analytics Storage

4

Request 1 Request 2 time

slide-14
SLIDE 14

Myria Analytics Storage

4

Request 1 Request 2 time

slide-15
SLIDE 15

Myria Analytics Storage

4

Request 1 Request 2 time

Goal: efficient retrieval of entries for a particular request ID in a particular timespan

slide-16
SLIDE 16

Myria Analytics Storage

class AnalyticsLog {
 
 void log(Entry e)
 
 Iterator<Entry> getEntries(
 int queryId,
 int subqueryId,
 int fragmentId,
 long start,
 long end)
 
 }

5

slide-17
SLIDE 17

Myria Analytics Storage

class AnalyticsLog {
 
 void log(Entry e)
 
 Iterator<Entry> getEntries(
 int queryId,
 int subqueryId,
 int fragmentId,
 long start,
 long end)
 
 }

5

Insert an entry into the data structure

slide-18
SLIDE 18

Myria Analytics Storage

class AnalyticsLog {
 
 void log(Entry e)
 
 Iterator<Entry> getEntries(
 int queryId,
 int subqueryId,
 int fragmentId,
 long start,
 long end)
 
 }

5

Insert an entry into the data structure Retrieve entries

slide-19
SLIDE 19

Myria Analytics Storage

6

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 fragmentId : Int,
 start, end : Long,
 … getEntries: all e where
 e.queryId = queryId and
 e.subqueryId = subqueryId and
 e.fragmentId = fragmentId and
 e.end >= start and
 e.start <= end

class AnalyticsLog {
 
 void log(Entry e)
 
 Iterator<Entry> getEntries(
 int queryId,
 int subqueryId,
 int fragmentId,
 long start,
 long end)
 
 }

slide-20
SLIDE 20

7

Specification:

Entry has:
 field1 : Type1,
 field2 : Type2,
 …,
 start, end,
 … retrieveA: all e where
 condition
 e.subqueryId = subqueryId and
 retrieveB: all e where
 condition art and
 e.start <= end

Cozy

Cozy synthesizes collections

class Structure {
 
 void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator<Entry> retrieveA(…)
 Iterator<Entry> retrieveB(…)
 
 }
 
 
 
 


slide-21
SLIDE 21

Trivial Solution

8

retrieve: all e where
 P(e, input)


slide-22
SLIDE 22

Trivial Solution

8

List<Entry> data;
 
 Iterator<Entry> retrieve(input) {
 for e in data:
 if P(e, input):
 yield e
 }

retrieve: all e where
 P(e, input)


slide-23
SLIDE 23

Trivial Solution

8

List<Entry> data;
 
 Iterator<Entry> retrieve(input) {
 for e in data:
 if P(e, input):
 yield e
 }

retrieve: all e where
 P(e, input)


There has to be a better way!

slide-24
SLIDE 24

In the quest for a good solution, the search space of “all possible programs” is simply too large

9

Specification Implementation

Intractable

synthesis algorithm

Entry has:
 field1, field2, …
 retrieveA: all e where
 condition
 retrieveB: all e where
 condition void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator retrieveA(…)
 Iterator retrieveB(…)

slide-25
SLIDE 25

In the quest for a good solution, the search space of “all possible programs” is simply too large

9

Specification Implementation

Intractable

synthesis algorithm

Entry has:
 field1, field2, …
 retrieveA: all e where
 condition
 retrieveB: all e where
 condition void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator retrieveA(…)
 Iterator retrieveB(…)

Tractable ? Tractable

slide-26
SLIDE 26

9

Specification Implementation

Intractable

synthesis algorithm

Outline

Specification → Outline Outline → Implementation Entry has:
 field1, field2, …
 retrieveA: all e where
 condition
 retrieveB: all e where
 condition void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator retrieveA(…)
 Iterator retrieveB(…)

Tractable ? Tractable

slide-27
SLIDE 27

9

Specification Implementation

Intractable

synthesis algorithm

Outline

Specification → Outline Outline → Implementation

specific enough to describe asymptotic performance general enough to encode a data structure succinctly

Entry has:
 field1, field2, …
 retrieveA: all e where
 condition
 retrieveB: all e where
 condition void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator retrieveA(…)
 Iterator retrieveB(…)

Tractable ? Tractable

slide-28
SLIDE 28

10

Outlines

Plans for retrieving entries

slide-29
SLIDE 29
  • All ( )

10

Outlines

Plans for retrieving entries

slide-30
SLIDE 30
  • All ( )
  • HashLookup ( outline, field = var )

10

Outlines

Plans for retrieving entries

slide-31
SLIDE 31
  • All ( )
  • HashLookup ( outline, field = var )
  • BinarySearch ( outline, field > var )

10

Outlines

Plans for retrieving entries

slide-32
SLIDE 32
  • All ( )
  • HashLookup ( outline, field = var )
  • BinarySearch ( outline, field > var )
  • Concat ( outline1, outline2 )

10

Outlines

Plans for retrieving entries

slide-33
SLIDE 33
  • All ( )
  • HashLookup ( outline, field = var )
  • BinarySearch ( outline, field > var )
  • Concat ( outline1, outline2 )
  • Filter ( outline, predicate )

10

Outlines

Plans for retrieving entries

slide-34
SLIDE 34

11

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

Outlines → Implementations

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

slide-35
SLIDE 35

11

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

Outlines → Implementations

slide-36
SLIDE 36

Outlines → Implementations

12

HashLookup (
 All(),
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


slide-37
SLIDE 37

Outlines → Implementations

12

HashLookup (
 All(),
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


slide-38
SLIDE 38

Outlines → Implementations

12

HashLookup (
 All(),
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


T data;

slide-39
SLIDE 39

Outlines → Implementations

13

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


T data;

slide-40
SLIDE 40

Outlines → Implementations

13

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


T data;

slide-41
SLIDE 41

Outlines → Implementations

13

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


T data;

slide-42
SLIDE 42

Outlines → Implementations

14

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<K,V> data;

slide-43
SLIDE 43

Outlines → Implementations

14

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<K,V> data;

slide-44
SLIDE 44

Outlines → Implementations

14

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<K,V> data;

slide-45
SLIDE 45

Outlines → Implementations

15

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

slide-46
SLIDE 46

Outlines → Implementations

15

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

slide-47
SLIDE 47

Outlines → Implementations

15

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

slide-48
SLIDE 48

Outlines → Implementations

15

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

V = ArrayList<Entry>

slide-49
SLIDE 49

Outlines → Implementations

15

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

V = ArrayList<Entry> V = LinkedList<Entry>

slide-50
SLIDE 50

Outlines → Implementations

16

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

slide-51
SLIDE 51

Outlines → Implementations

17

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q)
 
 


HMap<int,V> data;

{ v = data.get(q);
 return v.iterator(); }

slide-52
SLIDE 52

Outlines → Implementations

17

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q)
 
 


HMap<int,V> data;

{ v = data.get(q);
 return v.iterator(); }

add, remove, update

slide-53
SLIDE 53

18

Impl. Impl. Impl. Rep. Rep. Rep. Outline

Inductive Synthesizer Verifier

Specification

Specification → Outline

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

slide-54
SLIDE 54

18

Impl. Impl. Impl. Rep. Rep. Rep. Outline

Inductive Synthesizer Verifier

Specification

Specification → Outline

slide-55
SLIDE 55

Specification → Outline

CEGIS

19

Inductive Synthesizer

candidate

Verifier

counterexample

  • or -

certification of correctness

retrieve: all e where
 e.queryId = q and …

slide-56
SLIDE 56

Specification → Outline

CEGIS

19

Inductive Synthesizer

candidate

Verifier

counterexample

  • or -

certification of correctness Remembers all examples; only reasons about examples collected thus far.

retrieve: all e where
 e.queryId = q and …

slide-57
SLIDE 57

Specification → Outline

CEGIS

19

Inductive Synthesizer

candidate

Verifier

counterexample

  • or -

certification of correctness Remembers all examples; only reasons about examples collected thus far. Must ensure the

  • utline is correct for

all possible inputs and all possible data structure states.

∀I ∀S, out = { e | e ∈ S ∧ P(I, e) }

retrieve: all e where
 e.queryId = q and …

slide-58
SLIDE 58

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

slide-59
SLIDE 59

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1)

slide-60
SLIDE 60

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1) O(n)

slide-61
SLIDE 61

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1) O(n) O(1)

slide-62
SLIDE 62

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1) O(n) O(1) O(1)

slide-63
SLIDE 63

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1) O(n) O(1) O(1)

>

slide-64
SLIDE 64

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1) O(n) O(1) O(1)

>

Cozy prefers outlines with lower cost

slide-65
SLIDE 65

Inductive Synthesis

Enumerative search

21

slide-66
SLIDE 66

Inductive Synthesis

Enumerative search

21

All

size 1

slide-67
SLIDE 67

Inductive Synthesis

Enumerative search

21

All

size 1 size 2

HashLookup(All, x=y) BinarySearch(All, x>y) Filter(All, x=y)

slide-68
SLIDE 68

Inductive Synthesis

Enumerative search

21

All

size 1 size 2

HashLookup(All, x=y) BinarySearch(All, x>y) Filter(All, x=y)

slide-69
SLIDE 69

Inductive Synthesis

Enumerative search

21

All

size 1 size 2

HashLookup(All, x=y) BinarySearch(All, x>y) Filter(All, x=y)

Concat(HashLookup(…),…) vs Concat(Filter(…),…)

slide-70
SLIDE 70

Inductive Synthesis

Enumerative search

21

All

size 1 size 2

HashLookup(All, x=y) BinarySearch(All, x>y) Filter(All, x=y)

… size 3

HashLookup( HashLookup(…), a=b) Filter( BinarySearch(…), x<y) Filter( HashLookup(…), p=q)

slide-71
SLIDE 71

Inductive Synthesis

Enumerative search

21

All

size 1 size 2

HashLookup(All, x=y) BinarySearch(All, x>y) Filter(All, x=y)

… size 3

HashLookup( HashLookup(…), a=b) Filter( BinarySearch(…), x<y) Filter( HashLookup(…), p=q)

Filter( HashLookup(…), p=q)

correct on all current examples

slide-72
SLIDE 72

Outline Verification

22

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 … retrieve: all e where
 e.queryId = q and …
 
 
 


P

slide-73
SLIDE 73

Outline Verification

22

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 … retrieve: all e where
 e.queryId = q and …
 
 
 


P { e | e ∈ S ∧ P(I, e) }

slide-74
SLIDE 74

Outline Verification

22

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 … retrieve: all e where
 e.queryId = q and …
 
 
 


HashLookup(
 All(),
 e.queryId = q) P { e | e ∈ S ∧ P(I, e) }

slide-75
SLIDE 75

Outline Verification

22

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 … retrieve: all e where
 e.queryId = q and …
 
 
 


HashLookup(
 All(),
 e.queryId = q) e.queryId = q representative predicate Q P { e | e ∈ S ∧ P(I, e) }

slide-76
SLIDE 76

Outline Verification

22

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 … retrieve: all e where
 e.queryId = q and …
 
 
 


HashLookup(
 All(),
 e.queryId = q) e.queryId = q representative predicate Q { e | e ∈ S ∧ Q(I, e) } P { e | e ∈ S ∧ P(I, e) }

slide-77
SLIDE 77

Outline Verification

23

{ e | e ∈ S ∧ P(I, e) } { e | e ∈ S ∧ Q(I, e) }

=

?

slide-78
SLIDE 78

Outline Verification

23

{ e | e ∈ S ∧ P(I, e) } { e | e ∈ S ∧ Q(I, e) }

=

?

yes if and only if for all I, e: P(I, e) = Q(I, e)

slide-79
SLIDE 79

Outline Verification

23

equivalence can be checked with an SMT solver { e | e ∈ S ∧ P(I, e) } { e | e ∈ S ∧ Q(I, e) }

=

?

yes if and only if for all I, e: P(I, e) = Q(I, e)

slide-80
SLIDE 80

Evaluation

24

slide-81
SLIDE 81

Evaluation

  • Improve correctness

24

slide-82
SLIDE 82

Evaluation

  • Improve correctness
  • Save programmer effort

24

slide-83
SLIDE 83

Evaluation

  • Improve correctness
  • Save programmer effort
  • Match performance

24

slide-84
SLIDE 84

Evaluation

  • Improve correctness
  • Save programmer effort
  • Match performance

24

slide-85
SLIDE 85

Case studies

25

slide-86
SLIDE 86

Case studies

25

Analytics data indexed by timespan and by request ID

  • Myria: analytics
slide-87
SLIDE 87

Case studies

25

Analytics data indexed by timespan and by request ID Tracks map tiles in a least-recently-used cache

  • ZTopo: tile cache
  • Myria: analytics
slide-88
SLIDE 88

Case studies

25

Analytics data indexed by timespan and by request ID Tracks map tiles in a least-recently-used cache Stores axis-aligned bounding boxes for fast collision detection

  • ZTopo: tile cache
  • Bullet: volume tree
  • Myria: analytics
slide-89
SLIDE 89

Case studies

25

Analytics data indexed by timespan and by request ID Tracks map tiles in a least-recently-used cache Stores axis-aligned bounding boxes for fast collision detection

  • ZTopo: tile cache
  • Bullet: volume tree
  • Myria: analytics
  • Sat4j: variable metadata

Tracks information about each variable in the formula

slide-90
SLIDE 90

Case studies

25

Analytics data indexed by timespan and by request ID Tracks map tiles in a least-recently-used cache Stores axis-aligned bounding boxes for fast collision detection

  • ZTopo: tile cache
  • Bullet: volume tree
  • Myria: analytics
  • Sat4j: variable metadata

Tracks information about each variable in the formula

11 bugs 15 bugs 7 bugs

slide-91
SLIDE 91

Specifications vs. Implementations

26

Myria ZTopo Sat4j Bullet Lines of code 11 292 25 22 23 1383 2582 269 Original Spec

slide-92
SLIDE 92

Synthesis Time

27

Myria ZTopo Sat4j Bullet Time (s) Outline Synthesis Auto-Tuning 30 90 60

slide-93
SLIDE 93

Performance

28

Original Synthesized

slide-94
SLIDE 94

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-95
SLIDE 95

Bullet

Binary search tree vs. space partitioning tree

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-96
SLIDE 96

Bullet

Binary search tree vs. space partitioning tree

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-97
SLIDE 97

Small overhead; performance dominated by other factors

Sat4j Bullet

Binary search tree vs. space partitioning tree

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-98
SLIDE 98

Myria

Original implementation has worst-case linear time Small overhead; performance dominated by other factors

Sat4j Bullet

Binary search tree vs. space partitioning tree

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-99
SLIDE 99

Myria

Original implementation has worst-case linear time Small overhead; performance dominated by other factors

Sat4j Bullet

Binary search tree vs. space partitioning tree

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-100
SLIDE 100

Related Work

29

slide-101
SLIDE 101

Related Work

  • J. Earley: “High level iterators and a method for automatically

designing data structure representation” (1974)

  • Hard-coded rewrite rules

29

slide-102
SLIDE 102

Related Work

  • J. Earley: “High level iterators and a method for automatically

designing data structure representation” (1974)

  • Hard-coded rewrite rules
  • S. Agrawal et al: “Automated selection of materialized views and

indexes in sql databases” (2000)

  • Enumerate possible views & indexes based on query syntax and use

the planner to decide which ones to keep

29

slide-103
SLIDE 103

Related Work

  • J. Earley: “High level iterators and a method for automatically

designing data structure representation” (1974)

  • Hard-coded rewrite rules
  • S. Agrawal et al: “Automated selection of materialized views and

indexes in sql databases” (2000)

  • Enumerate possible views & indexes based on query syntax and use

the planner to decide which ones to keep

  • P. Hawkins et al: “Data representation synthesis” (2011)
  • Enumerate representations and use a planner to implement retrieval
  • perations; conjunctions of equalities only

29

slide-104
SLIDE 104

http://cozy.uwplse.org

  • Implementation outlines make the

problem tractable

  • Synthesis completes < 90 seconds
  • Cozy generates correct code, and

matches handwritten implementation performance

30

Special thanks to: Michael Ernst Emina Torlak

also Haoming Liu & Daniel Perelman

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification