Data-Parallel Computing Meets STRIPS Erez Karpas Tomer Sagi Carmel - - PowerPoint PPT Presentation

data parallel computing meets strips
SMART_READER_LITE
LIVE PREVIEW

Data-Parallel Computing Meets STRIPS Erez Karpas Tomer Sagi Carmel - - PowerPoint PPT Presentation

Motivation DPPS DPPS as Planning Data-Parallel Computing Meets STRIPS Erez Karpas Tomer Sagi Carmel Domshlak Avigdor Gal Avi Mendelson Moshe Tennenholtz Technion-Microsoft Electronic-Commerce Research Center Motivation DPPS DPPS as


slide-1
SLIDE 1

Motivation DPPS DPPS as Planning

Data-Parallel Computing Meets STRIPS

Erez Karpas Tomer Sagi Carmel Domshlak Avigdor Gal Avi Mendelson Moshe Tennenholtz Technion-Microsoft Electronic-Commerce Research Center

slide-2
SLIDE 2

Motivation DPPS DPPS as Planning

Outline

1

Motivation

2

DPPS

3

DPPS as Planning

slide-3
SLIDE 3

Motivation DPPS DPPS as Planning

Data Processing — Before “Big Data”

Database Management Systems (DBMS) Declarative query — expressed in SQL Query execution plan

Easy to generate from declarative query Hard to optimize

Very limited support for user-defined functions

slide-4
SLIDE 4

Motivation DPPS DPPS as Planning

Data Processing — After “Big Data”

MapReduce / Hadoop / Dryad

Low-level programming Only user-defined functions No declarative queries

SCOPE / DryadLINQ / Pig / Hive

High-level programming Support user-defined functions Limited declarative queries

slide-5
SLIDE 5

Motivation DPPS DPPS as Planning

Data Processing — After “Big Data”

MapReduce / Hadoop / Dryad

Low-level programming Only user-defined functions No declarative queries

SCOPE / DryadLINQ / Pig / Hive

High-level programming Support user-defined functions Limited declarative queries

slide-6
SLIDE 6

Motivation DPPS DPPS as Planning

User Defined Functions in Declarative Queries

Including user-defined functions hinders query optimization

User must specify some base plan Query plan optimizer does not “understand” user-defined functions, and does not know which optimizations are safe

Existing approaches:

No optimization when user-defined function in query User-defined functions must have some pre-specfied signature Static code analysis to “understand” user-defined functions

slide-7
SLIDE 7

Motivation DPPS DPPS as Planning

User Defined Functions in Declarative Queries

Including user-defined functions hinders query optimization

User must specify some base plan Query plan optimizer does not “understand” user-defined functions, and does not know which optimizations are safe

Existing approaches:

No optimization when user-defined function in query User-defined functions must have some pre-specfied signature Static code analysis to “understand” user-defined functions

slide-8
SLIDE 8

Motivation DPPS DPPS as Planning

Running Example: Histogram Computation

Suppose we have a users table T with 109 users We want two histograms of T: by age and by relationship status

slide-9
SLIDE 9

Motivation DPPS DPPS as Planning

Running Example: Histogram Computation

Suppose we have a users table T with 109 users We want two histograms of T: by age and by relationship status In SQL or similar

SELECT COUNT(T.age) FROM T; SELECT COUNT(T.rls) FROM T;

slide-10
SLIDE 10

Motivation DPPS DPPS as Planning

Running Example: Histogram Computation

Suppose we have a users table T with 109 users We want two histograms of T: by age and by relationship status In SQL or similar

SELECT COUNT(T.age) FROM T; SELECT COUNT(T.rls) FROM T;

Query Execution Plan

Agg(age, scan(T)) Agg(rls, scan(T))

slide-11
SLIDE 11

Motivation DPPS DPPS as Planning

Running Example: Histogram Computation (2)

Suppose we have a user-defined function, DAgg, which aggregates by two fields simultaneously The question is how to come up with this execution plan automatically

slide-12
SLIDE 12

Motivation DPPS DPPS as Planning

Running Example: Histogram Computation (2)

Suppose we have a user-defined function, DAgg, which aggregates by two fields simultaneously Query Execution Plan using DAgg

DAgg(age, rls, scan(T))

The question is how to come up with this execution plan automatically

slide-13
SLIDE 13

Motivation DPPS DPPS as Planning

Running Example: Histogram Computation (2)

Suppose we have a user-defined function, DAgg, which aggregates by two fields simultaneously Query Execution Plan using DAgg

DAgg(age, rls, scan(T))

The question is how to come up with this execution plan automatically

slide-14
SLIDE 14

Motivation DPPS DPPS as Planning

Our Contribution

Introduce Data-Parallel Program Synthesis (DPPS), a formal framework for studying these problems Study expressivity and complexity of DPPS Show compilation to AI planning

slide-15
SLIDE 15

Motivation DPPS DPPS as Planning

Outline

1

Motivation

2

DPPS

3

DPPS as Planning

slide-16
SLIDE 16

Motivation DPPS DPPS as Planning

Data-Parallel Program Synthesis Framework

Framework is based on tracking data chunks A data chunk represents some piece of data, e.g.:

all records of males between the ages of 18–49 the average salary of all males between the ages of 18–49

We do not need to know the value of the data, only its description Each data chunk d is associated with the amount σd of memory it requires

slide-17
SLIDE 17

Motivation DPPS DPPS as Planning

DPPS Task

D — a set of possible data chunks, with sizes σd N — a finite set of computing units, with memory capacities κn A — a set of possible computation primitives, a ∈ A described by: ¯

I ⊆ D is the required input

¯

O ⊆ D is the produced output C : N → R0+ computation cost on each processor

T : N × D × N → R0+ — the data transmission cost function s0 — the initial state of the computation G — the goal of the computation

slide-18
SLIDE 18

Motivation DPPS DPPS as Planning

DPPS Task (2)

A DPPS state specifies which processor holds which data chunks A solution is a sequence of actions (compute / transmit / delete data) which achieves the goal from the initial state The possible data chunks D and computations A may be given explicitly or described implicitly

If they are described implicitly the sets could be infinite

slide-19
SLIDE 19

Motivation DPPS DPPS as Planning

DPPS Expressivity

Theorem DPPS is at least as expressive as relational algebra with aggregation Proof sketch. Given a relational algebra expression, we can construct a DPPS task whose operators are the RA operators, and data chunks are possible RA expressions.

slide-20
SLIDE 20

Motivation DPPS DPPS as Planning

DPPS Complexity

Theorem Satisficing data-parallel program synthesis is NP-hard, even when the possible data chunks are given explicitly. Proof sketch. By reduction from SAT, exploiting memory capacity constraints

slide-21
SLIDE 21

Motivation DPPS DPPS as Planning

DPPS Complexity

Theorem Optimal data-parallel program synthesis with a single processor is NP-hard, even if the possible data chunks are given explicitly, and there are no memory constraints. Proof sketch. By reduction from delete-free planning

slide-22
SLIDE 22

Motivation DPPS DPPS as Planning

DPPS Complexity

Theorem Optimal data-parallel program synthesis with a single data chunk is NP-hard. Proof sketch. By reduction from the Steiner tree problem

slide-23
SLIDE 23

Motivation DPPS DPPS as Planning

DPPS Complexity

Theorem Satisficing data-parallel program synthesis with no memory constraints can be solved in polynomial time, when the possible data chunks are given explicitly. Proof sketch. By reduction from delete-free planning

slide-24
SLIDE 24

Motivation DPPS DPPS as Planning

Outline

1

Motivation

2

DPPS

3

DPPS as Planning

slide-25
SLIDE 25

Motivation DPPS DPPS as Planning

DPPS Compilation

When the computations and data chunks are given explicitly, compilation to planning is straightforward

Predicate holds(?node, ?data) Actions

For each computation compute(?node, ?computation) Transmission transmit(?node, ?data, ?node2) Data deletion del(?node, ?data)

Capacity constraints can be enforced with numerical fluents

slide-26
SLIDE 26

Motivation DPPS DPPS as Planning

DPPS Compilation without Explicit Data

When the computations and data chunks are given implicitly, compilation is still possible sometimes When data chunks have a structure (e.g., expression trees), it is possible to represent such trees using predicates Expression Tree Encoding

σp ×

e1 e2

select(n1, p, n2) join(n2, e1, e2)

slide-27
SLIDE 27

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

slide-28
SLIDE 28

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T))

DAgg(n1, f1, f2, σhash(PK)=1(T))

slide-29
SLIDE 29

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T))

slide-30
SLIDE 30

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T))

slide-31
SLIDE 31

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T))

slide-32
SLIDE 32

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

slide-33
SLIDE 33

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

slide-34
SLIDE 34

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

transmit(n4, CNT(f1,σhash(PK)=4(T)),

n1)

slide-35
SLIDE 35

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f1,T)

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

transmit(n4, CNT(f1,σhash(PK)=4(T)),

n1)

merge(n1, f1)

slide-36
SLIDE 36

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f1,T) CNT(f2,σhash(PK)=1(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

transmit(n4, CNT(f1,σhash(PK)=4(T)),

n1)

merge(n1, f1) transmit(n1, CNT(f2,σhash(PK)=1(T)),

n2)

slide-37
SLIDE 37

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f1,T) CNT(f2,σhash(PK)=1(T)) CNT(f2,σhash(PK)=3(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

transmit(n4, CNT(f1,σhash(PK)=4(T)),

n1)

merge(n1, f1) transmit(n1, CNT(f2,σhash(PK)=1(T)),

n2)

transmit(n3, CNT(f2,σhash(PK)=3(T)),

n2)

slide-38
SLIDE 38

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f1,T) CNT(f2,σhash(PK)=1(T)) CNT(f2,σhash(PK)=3(T)) CNT(f2,σhash(PK)=4(T))

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

transmit(n4, CNT(f1,σhash(PK)=4(T)),

n1)

merge(n1, f1) transmit(n1, CNT(f2,σhash(PK)=1(T)),

n2)

transmit(n3, CNT(f2,σhash(PK)=3(T)),

n2)

transmit(n4, CNT(f2,σhash(PK)=4(T)),

n2)

slide-39
SLIDE 39

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

n1 n2 n3 n4

σhash(PK)=1(T) σhash(PK)=2(T) σhash(PK)=3(T) σhash(PK)=4(T)

CNT(f1,σhash(PK)=1(T)) CNT(f2,σhash(PK)=1(T)) CNT(f1,σhash(PK)=2(T)) CNT(f2,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f2,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f2,σhash(PK)=4(T)) CNT(f1,σhash(PK)=2(T)) CNT(f1,σhash(PK)=3(T)) CNT(f1,σhash(PK)=4(T)) CNT(f1,T) CNT(f2,σhash(PK)=1(T)) CNT(f2,σhash(PK)=3(T)) CNT(f2,σhash(PK)=4(T)) CNT(f2,T)

DAgg(n1, f1, f2, σhash(PK)=1(T)) DAgg(n2, f1, f2, σhash(PK)=2(T)) DAgg(n3, f1, f2, σhash(PK)=3(T)) DAgg(n4, f1, f2, σhash(PK)=4(T)) transmit(n2, CNT(f1,σhash(PK)=2(T)),

n1)

transmit(n3, CNT(f1,σhash(PK)=3(T)),

n1)

transmit(n4, CNT(f1,σhash(PK)=4(T)),

n1)

merge(n1, f1) transmit(n1, CNT(f2,σhash(PK)=1(T)),

n2)

transmit(n3, CNT(f2,σhash(PK)=3(T)),

n2)

transmit(n4, CNT(f2,σhash(PK)=4(T)),

n2)

merge(n2, f2)

slide-40
SLIDE 40

Motivation DPPS DPPS as Planning

DPPS Compilation: Proof of Concept

10 20 30 40 50 10 20 30 40 50 60 70 Planning Time (seconds) Number of Processors Fields 2 3 4 5 6

Histogram of F fields of a table divided across N processors Solved by GBFS using relaxed plan heuristic in Fast Downward Solutions were optimal (although this is not guaranteed)

slide-41
SLIDE 41

Motivation DPPS DPPS as Planning

Summary

DPPS is a flexible framework for describing data-parallel computations Solving DPPS is possible through compilation to AI planning We expect DPPS to lead to interesting questions in AI planning

slide-42
SLIDE 42

Motivation DPPS DPPS as Planning

Thank You