Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak - - PowerPoint PPT Presentation

fast synthesis of fast collections
SMART_READER_LITE
LIVE PREVIEW

Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak - - PowerPoint PPT Presentation

Fast Synthesis of Fast Collections Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington Data structures are everywhere Lists, maps, and sets solve many problems What if I need a custom data structure? 2 Cozy


slide-1
SLIDE 1

Fast Synthesis of Fast Collections

Calvin Loncaric Emina Torlak Michael D. Ernst University of Washington

slide-2
SLIDE 2

Data structures are everywhere

2

Lists, maps, and sets solve many problems What if I need a custom data structure?

slide-3
SLIDE 3

3

Cozy synthesizes collections

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

  • Correct by construction
  • Specifications orders-of-magnitude shorter than

implementations, synthesized in < 90 seconds

  • Equivalent performance to human-written code
slide-4
SLIDE 4

Myria Analytics Storage

4

Request 1 Request 2 time

Goal: efficient retrieval of entries for a particular request ID in a particular timespan

slide-5
SLIDE 5

Myria Analytics Storage

class AnalyticsLog {
 
 void log(Entry e)
 
 Iterator<Entry> getEntries(
 int queryId,
 int subqueryId,
 int fragmentId,
 long start,
 long end)
 
 }

5

Insert an entry into the data structure Retrieve entries

slide-6
SLIDE 6

Myria Analytics Storage

6

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 fragmentId : Int,
 start, end : Long,
 … getEntries: all e where
 e.queryId = queryId and
 e.subqueryId = subqueryId and
 e.fragmentId = fragmentId and
 e.end >= start and
 e.start <= end

class AnalyticsLog {
 
 void log(Entry e)
 
 Iterator<Entry> getEntries(
 int queryId,
 int subqueryId,
 int fragmentId,
 long start,
 long end)
 
 }

slide-7
SLIDE 7

7

Specification:

Entry has:
 field1 : Type1,
 field2 : Type2,
 …,
 start, end,
 … retrieveA: all e where
 condition
 e.subqueryId = subqueryId and
 retrieveB: all e where
 condition art and
 e.start <= end

Cozy

Cozy synthesizes collections

class Structure {
 
 void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator<Entry> retrieveA(…)
 Iterator<Entry> retrieveB(…)
 
 }
 
 
 
 


slide-8
SLIDE 8

Trivial Solution

8

List<Entry> data;
 
 Iterator<Entry> retrieve(input) {
 for e in data:
 if P(e, input):
 yield e
 }

retrieve: all e where
 P(e, input)


There has to be a better way!

slide-9
SLIDE 9

In the quest for a good solution, the search space of “all possible programs” is simply too large

9

Specification Implementation

Intractable

synthesis algorithm

Outline

Specification → Outline Outline → Implementation

specific enough to describe asymptotic performance general enough to encode a data structure succinctly

Entry has:
 field1, field2, …
 retrieveA: all e where
 condition
 retrieveB: all e where
 condition void add(Entry e)
 void remove(Entry e)
 void update(Entry e, …)
 
 Iterator retrieveA(…)
 Iterator retrieveB(…)

Tractable ? Tractable

slide-10
SLIDE 10
  • All ( )
  • HashLookup ( outline, field = var )
  • BinarySearch ( outline, field > var )
  • Concat ( outline1, outline2 )
  • Filter ( outline, predicate )

10

Outlines

Plans for retrieving entries

slide-11
SLIDE 11

11

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

Outlines → Implementations

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

slide-12
SLIDE 12

Outlines → Implementations

12

HashLookup (
 All(),
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


T data;

slide-13
SLIDE 13

Outlines → Implementations

13

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


T data;

slide-14
SLIDE 14

Outlines → Implementations

14

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<K,V> data;

slide-15
SLIDE 15

Outlines → Implementations

15

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

V = ArrayList<Entry> V = LinkedList<Entry>

slide-16
SLIDE 16

Outlines → Implementations

16

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q) { … }
 
 }


HMap<int,V> data;

slide-17
SLIDE 17

Outlines → Implementations

17

HashLookup (
 data,
 e.queryId = q )

class Structure {
 
 
 
 Iterator<Entry>
 retrieve(q)
 
 


HMap<int,V> data;

{ v = data.get(q);
 return v.iterator(); }

add, remove, update

slide-18
SLIDE 18

18

Impl. Impl. Impl. Rep. Rep. Rep. Outline

Inductive Synthesizer Verifier

Specification

Specification → Outline

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification

slide-19
SLIDE 19

Specification → Outline

CEGIS

19

Inductive Synthesizer

candidate

Verifier

counterexample

  • or -

certification of correctness Remembers all examples; only reasons about examples collected thus far. Must ensure the

  • utline is correct for

all possible inputs and all possible data structure states.

∀I ∀S, out = { e | e ∈ S ∧ P(I, e) }

retrieve: all e where
 e.queryId = q and …

slide-20
SLIDE 20

Cost Model

20

HashLookup (
 All(),
 e.queryId = q ) Filter (
 All(),
 e.queryId = q )

O(1) O(n) O(1) O(1)

>

Cozy prefers outlines with lower cost

slide-21
SLIDE 21

Inductive Synthesis

Enumerative search

21

All

size 1 size 2

HashLookup(All, x=y) BinarySearch(All, x>y) Filter(All, x=y)

… size 3

HashLookup( HashLookup(…), a=b) Filter( BinarySearch(…), x<y) Filter( HashLookup(…), p=q)

Filter( HashLookup(…), p=q)

correct on all current examples

Concat(HashLookup(…),…) vs Concat(Filter(…),…)

slide-22
SLIDE 22

Outline Verification

22

Specification:

Entry has:
 queryId : Int,
 subqueryId : Int,
 … retrieve: all e where
 e.queryId = q and …
 
 
 


HashLookup(
 All(),
 e.queryId = q) e.queryId = q representative predicate Q { e | e ∈ S ∧ Q(I, e) } P { e | e ∈ S ∧ P(I, e) }

slide-23
SLIDE 23

Outline Verification

23

equivalence can be checked with an SMT solver { e | e ∈ S ∧ P(I, e) } { e | e ∈ S ∧ Q(I, e) }

=

?

yes if and only if for all I, e: P(I, e) = Q(I, e)

slide-24
SLIDE 24

Evaluation

  • Improve correctness
  • Save programmer effort
  • Match performance

24

slide-25
SLIDE 25

Case studies

25

Analytics data indexed by timespan and by request ID Tracks map tiles in a least-recently-used cache Stores axis-aligned bounding boxes for fast collision detection

  • ZTopo: tile cache
  • Bullet: volume tree
  • Myria: analytics
  • Sat4j: variable metadata

Tracks information about each variable in the formula

11 bugs 15 bugs 7 bugs

slide-26
SLIDE 26

Specifications vs. Implementations

26

Myria ZTopo Sat4j Bullet Lines of code 11 292 25 22 23 1383 2582 269 Original Spec

slide-27
SLIDE 27

Synthesis Time

27

Myria ZTopo Sat4j Bullet Time (s) Outline Synthesis Auto-Tuning 30 90 60

slide-28
SLIDE 28

Myria

Original implementation has worst-case linear time Small overhead; performance dominated by other factors

Sat4j Bullet

Binary search tree vs. space partitioning tree

ZTopo

Data structures are nearly identical

Performance

28

Original Synthesized

slide-29
SLIDE 29

Related Work

  • J. Earley: “High level iterators and a method for automatically

designing data structure representation” (1974)

  • Hard-coded rewrite rules
  • S. Agrawal et al: “Automated selection of materialized views and

indexes in sql databases” (2000)

  • Enumerate possible views & indexes based on query syntax and use

the planner to decide which ones to keep

  • P. Hawkins et al: “Data representation synthesis” (2011)
  • Enumerate representations and use a planner to implement retrieval
  • perations; conjunctions of equalities only

29

slide-30
SLIDE 30

http://cozy.uwplse.org

  • Implementation outlines make the

problem tractable

  • Synthesis completes < 90 seconds
  • Cozy generates correct code, and

matches handwritten implementation performance

30

Special thanks to: Michael Ernst Emina Torlak

also Haoming Liu & Daniel Perelman

Impl. Impl. Impl. Outline Rep. Rep. Rep.

Inductive Synthesizer Verifier

Specification