people people name name id id age age - - PowerPoint PPT Presentation

people people name name id id age age stephanie
SMART_READER_LITE
LIVE PREVIEW

people people name name id id age age - - PowerPoint PPT Presentation

people people name name id id age age stephanie stephanie 1 1 19 19 Query 1 dylan dylan 2 2 26 26 people.filter{p => p.age 18} mary kate mary kate 3 3 17 17 pets


slide-1
SLIDE 1
slide-2
SLIDE 2
slide-3
SLIDE 3

slide-4
SLIDE 4
slide-5
SLIDE 5
slide-6
SLIDE 6

Query 1 people.filter{p => p.age 18}

people name id age stephanie 1 19 dylan 2 26 mary kate 3 17

Query 2

people.join(pets, "id === owner") .filter(people.age 18)

pets name

  • wner

catsidy 2 gigi 3 people name id age stephanie 1 19 dylan 2 26 mary kate 3 17

slide-7
SLIDE 7

Physical Planning Cache Substitution Optimization

people.filter(age 18)

filter { p => p.age > 18 } table people filter { p => p.age > 18 } table people filter { p => p.age > 18 } FileScan people Cache filter { p => p.age > 18 } table people

slide-8
SLIDE 8

Physical Planning Cache Substitution Optimization

people.join(pets, "id === owner") .filter(people.age 18)

Cache filter { p => p.age > 18 } table people filter people.age > 18 join (owner, id) table people table pets select * filter people.age > 18 table people table pets select * join (owner, id) filter people.age > 18 filescan people filescan pets select * hashjoin (owner, id)

slide-9
SLIDE 9

Physical Planning Cache Substitution Optimization Cache filter { p => p.age > 18 } table people filter people.age > 18 join (owner, id) table people table pets select * filter people.age > 18 table people table pets select * join (owner, id) filter people.age > 18 filescan people filescan pets select * hashjoin (owner, id)

slide-10
SLIDE 10

○ ○ ○ ○

slide-11
SLIDE 11

Cache

Current Pipeline

Optimization Cache Physical Planning Optimization Physical Planning

Optimization-first pipeline

slide-12
SLIDE 12
slide-13
SLIDE 13

○ ○

slide-14
SLIDE 14

Cache

Current Pipeline

Optimization Cache Physical Planning Optimization Physical Planning Partial Optimization Cache Optimization

Insight: not all optimizations help caching!

Physical Planning

Optimization-first pipeline (slow!)

slide-15
SLIDE 15

Boolean Simplification Constant Propagation ID Reassignment Filter Pruning Object Elimination Custom Rules ...

slide-16
SLIDE 16
slide-17
SLIDE 17
slide-18
SLIDE 18

○ ○ ○ ○

slide-19
SLIDE 19

UDFs are blackboxes that hide caching opportunities

select * where age > 18 table people select * table people { p => p.age > 18 }

slide-20
SLIDE 20

Program Synthesis User Annotation Froid Acorn

slide-21
SLIDE 21

Program Synthesis User Annotation Froid Acorn Correct

✓ ✓ ✓ ✓

slide-22
SLIDE 22

Program Synthesis User Annotation Froid Acorn Correct

✓ ✓ ✓ ✓

Transparent

✓ X ✓ ✓

slide-23
SLIDE 23

Program Synthesis User Annotation Froid Acorn Correct

✓ ✓ ✓ ✓

Transparent

✓ X ✓ ✓

General

(Java, Scala)

✓ X X ✓

slide-24
SLIDE 24

Program Synthesis User Annotation Froid Acorn Correct

✓ ✓ ✓ ✓

Transparent

✓ X ✓ ✓

General

(Java, Scala)

✓ X X ✓

Fast

X ✓ ✓ ✓

slide-25
SLIDE 25
  • Scala

Native Spark

slide-26
SLIDE 26

person.filter(p => p.age > 18) 1 aload_1 2 invokeinterface 3 dload_1 4 ldc2_w 5 dcmpg 6 ifge 18 7 iconst_1 8 goto 10 9 iconst_0 10 aload_0 11 aload_1 1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 < $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

slide-27
SLIDE 27

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this

slide-28
SLIDE 28

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this d0 double Attribute("age")

slide-29
SLIDE 29

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18)

slide-30
SLIDE 30

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18) If GreaterThan(Attribute("age"), Literal(18))

slide-31
SLIDE 31

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18) If GreaterThan(Attribute("age"), Literal(18)) cast (0) as boolean

slide-32
SLIDE 32

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18) If GreaterThan(Attribute("age"), Literal(18)) cast(1) as boolean

slide-33
SLIDE 33

1 Person r1 := @param0 2 double $d0 = r1.age() 3 int $d1 = 18 4 if $d0 > $d1 5 goto 8 6 boolean $zo = 1 7 goto 9 8 $zo = 0 9 return $zo

Name Type Expression r1 class[Person] this d0 double Attribute("age") d1 int Literal(18) If GreaterThan(Attribute("age"), Literal(18)) cast (1) as boolean cast (0) as boolean

slide-34
SLIDE 34

IF GreaterThan(Attribute("age"), Literal(18) cast (1) as boolean cast(0) as boolean

person.filter(p => p.age > 18)

select age table people filterUDF{ p => p.age > 18 }

slide-35
SLIDE 35

IF GreaterThan(Attribute("age"), Literal(18)) cast (1) as boolean cast(0) as boolean

person.filter(p => p.age > 18)

select age table people filter(If(GreaterThan("age", 18), cast 0 as boolean, cast 1 as boolean))

slide-36
SLIDE 36

person.filter(p => p.age > 18)

select * table people filter(If(GreaterThan("age", 18), cast 0 as boolean, cast 1 as boolean)) Partial Optimizer select * table people filter "age" > 18

person.filter(age > 18)

slide-37
SLIDE 37
slide-38
SLIDE 38
slide-39
SLIDE 39
slide-40
SLIDE 40
slide-41
SLIDE 41

฀฀

slide-42
SLIDE 42
slide-43
SLIDE 43