Datafun a functional query language Michael Arntzenius - - PowerPoint PPT Presentation

datafun
SMART_READER_LITE
LIVE PREVIEW

Datafun a functional query language Michael Arntzenius - - PowerPoint PPT Presentation

Datafun a functional query language Michael Arntzenius daekharel@gmail.com http://www.rntz.net/datafun Strange Loop, September 2017 Recurse Center, March 2018 Early stage work What if programming languages were more like query languages? 1.


slide-1
SLIDE 1

Datafun

a functional query language Michael Arntzenius daekharel@gmail.com http://www.rntz.net/datafun Strange Loop, September 2017 Recurse Center, March 2018

slide-2
SLIDE 2

Early stage work

slide-3
SLIDE 3

What if programming languages were more like query languages?

slide-4
SLIDE 4
  • 1. What’s a functional query language?
  • 2. From Datalog to Datafun
  • 3. Incremental Datafun
slide-5
SLIDE 5

SQL

Parent Child Arathorn Aragorn Drogo Frodo E¨ arwen Galadriel Finarfin Galadriel . . . . . . SELECT parent FROM parentage WHERE child = "Galadriel"

slide-6
SLIDE 6

Tables as sets

Parent Child Arathorn Aragorn Drogo Frodo E¨ arwen Galadriel Finarfin Galadriel . . . . . .

=

// set of (parent, child) pairs {(Arathorn, Aragorn) , (Drogo, Frodo) , (E¨ arwen, Galadriel) , (Finarfin, Galadriel) ... }

slide-7
SLIDE 7

Tuples and sets are just datatypes!

slide-8
SLIDE 8

Tuples and sets are just datatypes! If tables are sets, what are queries?

slide-9
SLIDE 9

Queries as set comprehensions SELECT parent FROM parentage WHERE child = "Galadriel"

slide-10
SLIDE 10

Queries as set comprehensions SELECT parent FROM parentage WHERE child = "Galadriel"

= ⇒

{ parent | (parent, child) in parentage , child = "Galadriel" }

slide-11
SLIDE 11

Queries as set comprehensions: finding siblings

SELECT DISTINCT A.child, B.child FROM parentage A INNER JOIN parentage B ON A.parent = B.parent WHERE A.child <> B.child

= ⇒

{ (a,b) | (parent, a) in parentage , (parent, b) in parentage , not (a = b) }

slide-12
SLIDE 12

Queries as set comprehensions: finding siblings

SELECT DISTINCT A.child, B.child FROM parentage A INNER JOIN parentage B ON A.parent = B.parent WHERE A.child <> B.child

= ⇒

{ (a,b) | (parent, a) in parentage , (parent, b) in parentage , not (a = b) }

slide-13
SLIDE 13

Recipe for a functional query language

  • 1. Take a functional language
  • 2. Add sets and set comprehensions
  • 3. ... done?
slide-14
SLIDE 14

But can it go fast?

slide-15
SLIDE 15

Loop reordering

{ ... | x in EXPR1, y in EXPR2 }

=?

{ ... | y in EXPR2, x in EXPR1 }

slide-16
SLIDE 16

Loop reordering

{ ... | x in EXPR1, y in EXPR2 }

=

{ ... | y in EXPR2, x in EXPR1 }

  • 1. Side-effects
  • 2. Nontermination
slide-17
SLIDE 17

Loop reordering

{ print x | x in {"hello"}, y in {0,1} }

=

{ print x | y in {0,1}, x in {"hello"} }

  • 1. Side-effects
  • 2. Nontermination
slide-18
SLIDE 18

Loop reordering

{ ... | x in {}, y in ∞-loop } = ⇒ {}

=

{ ... | y in ∞-loop, x in {} } = ⇒ ∞-loop

  • 1. Side-effects
  • 2. Nontermination
slide-19
SLIDE 19

Recipe for a functional query language, v2

  • 1. Take a pure, total functional language
  • 2. Add sets and set comprehensions
  • 3. Optimize!
slide-20
SLIDE 20

What have we gained?

◮ Can factor out repeated patterns with

higher-order functions

◮ Sets are just ordinary values ◮ Sets, bags, lists: choose your container semantics!

slide-21
SLIDE 21

What have we gained?

◮ Can factor out repeated patterns with

higher-order functions

◮ Sets are just ordinary values ◮ Sets, bags, lists: choose your container semantics!

At what cost?

◮ Implementation complexity:

GC, closures, nested sets, optimizing comprehensions...

◮ Re-inventing the wheel:

persistence, transactions, replication...

slide-22
SLIDE 22
  • 1. What’s a functional query language?
  • 2. From Datalog to Datafun
  • 3. Incremental Datafun
slide-23
SLIDE 23

Parent Child Arathorn Aragorn Drogo Frodo E¨ arwen Galadriel Finarfin Galadriel . . . . . .

Is E¨ arendil one of Aragorn’s ancestors?

slide-24
SLIDE 24

Datalog in a nutshell

X is Z’s ancestor if X is Z’s parent. X is Z’s ancestor if X is Y ’s parent and Y is Z’s ancestor.

slide-25
SLIDE 25

Datalog in a nutshell

ancestor(X,Z) if parent(X, Z). ancestor(X, Z) if parent(X, Y ) and ancestor(Y , Z).

slide-26
SLIDE 26

Datalog in a nutshell

ancestor(X,Z) :- parent(X, Z). ancestor(X, Z) :- parent(X, Y ), ancestor(Y , Z).

slide-27
SLIDE 27

Datalog is deductive: it chases rules to their logical conclusions. Can we capture this feature functionally?

slide-28
SLIDE 28

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros).

slide-29
SLIDE 29

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros).

slide-30
SLIDE 30

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros).

slide-31
SLIDE 31

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros). ancestor(Idril, E¨ arendil). (new!)

slide-32
SLIDE 32

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros). ancestor(Idril, E¨ arendil).

slide-33
SLIDE 33

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros). ancestor(Idril, E¨ arendil). ancestor(E¨ arendil, Elros). (new!)

slide-34
SLIDE 34

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros). ancestor(Idril, E¨ arendil). ancestor(E¨ arendil, Elros).

slide-35
SLIDE 35

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros). ancestor(Idril, E¨ arendil). ancestor(E¨ arendil, Elros). ancestor(Idril, Elros). (new!)

slide-36
SLIDE 36

Procedure:

  • 1. Pick a rule.
  • 2. Find facts satisfying its premises.
  • 3. Add its conclusion to the known facts.

Rules: ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). Facts: parent(Idril, E¨ arendil). parent(E¨ arendil, Elros). ancestor(Idril, E¨ arendil). ancestor(E¨ arendil, Elros). ancestor(Idril, Elros).

slide-37
SLIDE 37

Repeatedly apply a set of rules until nothing changes

slide-38
SLIDE 38

Repeatedly apply a function until nothing changes

slide-39
SLIDE 39

Repeatedly apply a function until its output equals its input

slide-40
SLIDE 40

Repeatedly apply a function until its output equals its input i.e. it reaches a fixed point

slide-41
SLIDE 41

Repeatedly apply a function until its output equals its input i.e. it reaches a fixed point

fix x = ... function of x ...

slide-42
SLIDE 42

// Datalog ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). // Datafun fix ancestor = parent ∪ {(x,z) | (x,y) in parent , (y,z) in ancestor}

slide-43
SLIDE 43

// Datalog ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). // Datafun fix ancestor = parent ∪ {(x,z) | (x,y) in parent , (y,z) in ancestor}

slide-44
SLIDE 44

// Datalog ancestor(X,Z) :- parent(X,Z). ancestor(X,Z) :- parent(X,Y), ancestor(Y,Z). // Datafun fix ancestor = parent ∪ {(x,z) | (x,y) in parent , (y,z) in ancestor}

slide-45
SLIDE 45

Repeatedly applying: X − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in X} Where parent = {(Idril, E¨ arendil), (E¨ arendil, Elros)} Steps: ∅

slide-46
SLIDE 46

Repeatedly applying: X − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in X} Where parent = {(Idril, E¨ arendil), (E¨ arendil, Elros)} Steps: ∅ − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in ∅}

slide-47
SLIDE 47

Repeatedly applying: X − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in X} Where parent = {(Idril, E¨ arendil), (E¨ arendil, Elros)} Steps: ∅ − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in ∅} = parent

slide-48
SLIDE 48

Repeatedly applying: X − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in X} Where parent = {(Idril, E¨ arendil), (E¨ arendil, Elros)} Steps: ∅ − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in ∅} = parent − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in parent}

slide-49
SLIDE 49

Repeatedly applying: X − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in X} Where parent = {(Idril, E¨ arendil), (E¨ arendil, Elros)} Steps: ∅ − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in ∅} = parent − → parent ∪ {(x,z) | (x,y) in parent, (y,z) in parent} = {(Idril, E¨ arendil), (E¨ arendil, Elros), (Idril, Elros)}

slide-50
SLIDE 50

But can it go fast?

slide-51
SLIDE 51
  • 1. What’s a functional query language?
  • 2. From Datalog to Datafun
  • 3. Incremental Datafun
slide-52
SLIDE 52

Three problems

  • 1. View maintenance:

How do we update a cached query efficiently after a mutation?

slide-53
SLIDE 53

Three problems

  • 1. View maintenance:

How do we update a cached query efficiently after a mutation?

  • 2. Semina¨

ıve evaluation in Datalog:

How do we avoid re-deducing facts we already know?

slide-54
SLIDE 54

Three problems

  • 1. View maintenance:

How do we update a cached query efficiently after a mutation?

  • 2. Semina¨

ıve evaluation in Datalog:

How do we avoid re-deducing facts we already know?

  • 3. Incremental computation:

How do we efficiently recompute a function as its inputs change?

slide-55
SLIDE 55

Three problems

  • 1. View maintenance:

How do we update a cached query efficiently after a mutation?

  • 2. Semina¨

ıve evaluation in Datalog:

How do we avoid re-deducing facts we already know?

  • 3. Incremental computation:

How do we efficiently recompute a function as its inputs change?

slide-56
SLIDE 56

“A Theory of Changes for Higher-Order Languages: Incrementalizing λ-calculi by Static Differentiation”

[PLDI 2014]

by Yufei Cai, Paolo G Giarrusso, Tillmann Rendel, and Klaus Ostermann

slide-57
SLIDE 57

Static differentiation

Every type A has a type of changes, ∆A.

slide-58
SLIDE 58

Static differentiation

Every type A has a type of changes, ∆A. ∆N = Z ∆(A × B) = ∆A × ∆B

slide-59
SLIDE 59

Static differentiation

Every type A has a type of changes, ∆A. ∆N = Z ∆(A × B) = ∆A × ∆B Every type also gets an operator ⊕A : A → ∆A → A.

slide-60
SLIDE 60

Static differentiation

Every type A has a type of changes, ∆A. ∆N = Z ∆(A × B) = ∆A × ∆B Every type also gets an operator ⊕A : A → ∆A → A. x ⊕N dx = x + dx (x, y) ⊕A×B (dx, dy) = (x ⊕A dx, y ⊕B dy)

slide-61
SLIDE 61

Static differentiation

Every type A has a type of changes, ∆A. ∆N = Z ∆(A × B) = ∆A × ∆B Every type also gets an operator ⊕A : A → ∆A → A. x ⊕N dx = x + dx (x, y) ⊕A×B (dx, dy) = (x ⊕A dx, y ⊕B dy) A function f : A → B gets a derivative, δf : A → ∆A → ∆B.

slide-62
SLIDE 62

Static differentiation

Every type A has a type of changes, ∆A. ∆N = Z ∆(A × B) = ∆A × ∆B Every type also gets an operator ⊕A : A → ∆A → A. x ⊕N dx = x + dx (x, y) ⊕A×B (dx, dy) = (x ⊕A dx, y ⊕B dy) A function f : A → B gets a derivative, δf : A → ∆A → ∆B. f (x) = x2 δf (x)(dx) = 2x · dx + dx2

slide-63
SLIDE 63

Static differentiation

Every type A has a type of changes, ∆A. ∆N = Z ∆(A × B) = ∆A × ∆B Every type also gets an operator ⊕A : A → ∆A → A. x ⊕N dx = x + dx (x, y) ⊕A×B (dx, dy) = (x ⊕A dx, y ⊕B dy) A function f : A → B gets a derivative, δf : A → ∆A → ∆B. f (x) = x2 δf (x)(dx) = 2x · dx + dx2 f (x) + δf (x)(dx) = x2 + 2x · dx + dx2 = (x + dx)2

slide-64
SLIDE 64

We’ve extended this technique to handle all of Datafun!

(As of about three weeks ago.)

slide-65
SLIDE 65

Finding fixed points faster with derivatives

The na¨ ıve way to find fixed points looks like this: ∅ → f (∅) → f 2(∅) → f 3(∅) → ...

slide-66
SLIDE 66

Finding fixed points faster with derivatives

The na¨ ıve way to find fixed points looks like this: ∅ → f (∅) → f 2(∅) → f 3(∅) → ... f i(∅) and f i+1(∅) overlap a lot. Computing f i+1(∅) from f i(∅) does a lot of recomputation.

slide-67
SLIDE 67

Finding fixed points faster with derivatives

The na¨ ıve way to find fixed points looks like this: ∅ → f (∅) → f 2(∅) → f 3(∅) → ... f i(∅) and f i+1(∅) overlap a lot. Computing f i+1(∅) from f i(∅) does a lot of recomputation. What if we could only compute what changed between iterations?

slide-68
SLIDE 68

x0 = ∅ dx0 = f (∅) xi+1 = xi ∪ dxi dxi+1 = δf (xi)(dxi) Theorem: xi = f i(x)

slide-69
SLIDE 69

Takeaways

  • 1. Set comprehensions = queries
  • 2. Fixed points = recursive queries (like Datalog)
  • 3. Incremental computation = faster fixed points
  • 4. Datafun has all three!*

* In theory.

slide-70
SLIDE 70

Michael Arntzenius daekharel@gmail.com @arntzenius

rntz.net/datafun