The Big Project Aws Albarghouthi Calvin Smith University of - - PowerPoint PPT Presentation

the big project
SMART_READER_LITE
LIVE PREVIEW

The Big Project Aws Albarghouthi Calvin Smith University of - - PowerPoint PPT Presentation

The Big Project Aws Albarghouthi Calvin Smith University of Wisconsin-Madison input data map shu ffl e reduce output data m(i 1 ) i 1 reduce i 2 m(i 2 ) reduce output i 3 reduce m(i 3 ) Big : Analyses from Examples [PLDI16]


slide-1
SLIDE 1

The Bigλ Project

Aws Albarghouthi Calvin Smith University of Wisconsin-Madison

slide-2
SLIDE 2
slide-3
SLIDE 3

input data map reduce

  • utput data

shuffle

i1

  • utput

i2 i3 m(i1) … m(i2) m(i3) reduce reduce reduce

slide-4
SLIDE 4

Bigλ: Analyses from Examples

Synthesize data-parallel programs from input/output examples Example:

i1

  • i2

{ {

,

Output: [PLDI16]

slide-5
SLIDE 5

Challenges

Non-determinism generate proven-deterministic solutions Variety of domains parameterize by extensible APIs Sparse search space syntactically restrict to data- parallel programs

slide-6
SLIDE 6

Bias search heavily towards data-parallel programs

Higher-order sketches

Bigλ uses 8 templates, gathered from reference implementations

slide-7
SLIDE 7

Bias search heavily towards data-parallel programs

Higher-order sketches

map x . reduce x flatmap x . reduce x . apply x map x . reduceByKey x . filter x

{

e.g.

Bigλ uses 8 templates, gathered from reference implementations

slide-8
SLIDE 8

Who uses the most #hashtags?

@Alice: “Hello AAIP #aaip #germany” @Bob: “Coffee machine refilled yet? #caffeine #java #4thcup #zzz” @Claire: “Torn between wine cellar and seminar #wine #seminar #zzz”

slide-9
SLIDE 9

@Alice: “Hello AAIP #aaip #germany” @Bob: “Coffee machine refilled yet? #caffeine #java #4thcup #zzz” @Claire: “Torn between wine cellar and seminar #wine #seminar #zzz”

Who uses the most #hashtags?

2, 4, 3…must be @Bob!

{ {

,

@Bob

slide-10
SLIDE 10

let p = map m . reduce r . apply f

@Alice: “Hello AAIP #aaip #germany” @Bob: “Coffee machine refilled yet? #caffeine #java #4thcup #zzz” @Claire: “Torn between wine cellar and seminar #wine #seminar #zzz”

slide-11
SLIDE 11

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t))

@Alice: “Hello AAIP #aaip #germany” @Bob: “Coffee machine refilled yet? #caffeine #java #4thcup #zzz” @Claire: “Torn between wine cellar and seminar #wine #seminar #zzz”

{2, @Alice} {4, @Bob} {3, @Claire}

slide-12
SLIDE 12

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t))

{2, @Alice} {4, @Bob} {3, @Claire}

@Alice: “Hello AAIP #aaip #germany” @Bob: “Coffee machine refilled yet? #caffeine #java #4thcup #zzz” @Claire: “Torn between wine cellar and seminar #wine #seminar #zzz”

slide-13
SLIDE 13

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t)) r = λx,y. max(x, y)

{2, @Alice} {4, @Bob} {3, @Claire}

{

{

{4, @Bob} {4, @Bob}

slide-14
SLIDE 14

{2, @Alice} {4, @Bob} {3, @Claire}

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t)) r = λx,y. max(x, y)

{

{

{4, @Bob} {4, @Bob}

slide-15
SLIDE 15

{2, @Alice} {4, @Bob} {3, @Claire}

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t)) r = λx,y. max(x, y)

{

{

{4, @Bob} {4, @Bob}

slide-16
SLIDE 16

{3, @Claire} {2, @Alice} {4, @Bob} {3, @Claire}

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t)) r = λx,y. max(x, y)

{

{

{4, @Bob}

slide-17
SLIDE 17

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t)) r = λx,y. max(x, y)

{4, @Bob} {2, @Alice} {3, @Claire} {3, @Claire}

{

{

{4, @Bob}

slide-18
SLIDE 18

let p = map m . reduce r . apply f where m = λt. (len(filter(is_hashtag, t)), author(t)) r = λx,y. max(x, y) f = λp. snd(p)

{4, @Bob} @Bob

slide-19
SLIDE 19

Synthesis modulo differential privacy? [in progress]

slide-20
SLIDE 20

map m . reduce r

slide-21
SLIDE 21

map m . reduce r

compute sensitivity charge price

slide-22
SLIDE 22

map m . reduce r

compute sensitivity charge price add noise

slide-23
SLIDE 23

Key Idea

Linear type system induce cheapest program

slide-24
SLIDE 24

How can we automatically learn relational specifications? [FSE17, best paper award]

slide-25
SLIDE 25

add(x, y) = z ⇐

⇒ add(y, x) = z

slide-26
SLIDE 26

add

i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . .

add(x, y) = z ⇐

⇒ add(y, x) = z

slide-27
SLIDE 27

add

i1 i2 r 1 2 3 3 4 7 5 6 11 4 3 7 . . . . . . . . .

add(x, y) = z ⇐

⇒ add(y, x) = z

Unsupervised learning learn constraints consistent with

  • bservations
slide-28
SLIDE 28

Exploratory evaluation

Applied technique to learn specifications of Python APIs Used ~1000 randomly sampled inputs per function Strings

concat(y, reverse(y)) = x ⇒ reverse(x) = x

Z3

valid(x) = p ∧ valid(y) = p ⇒ valid(and(x, y)) = p

Trig

x = y − π/2 ⇒ (sin(x) = z ⇐

⇒ cos(y) = z)

slide-29
SLIDE 29

Other directions

Synthesis of Datalog programs—graph analytics Synthesis of fair decision-making programs Active-learning-based user interaction Proofs as programs …