Robert Ikeda Jennifer Widom Stanford University Example CustList 1 - - PowerPoint PPT Presentation

robert ikeda jennifer widom
SMART_READER_LITE
LIVE PREVIEW

Robert Ikeda Jennifer Widom Stanford University Example CustList 1 - - PowerPoint PPT Presentation

Robert Ikeda Jennifer Widom Stanford University Example CustList 1 Europe CustList 2 ItemVolumes Dedup Union Predict ItemAgg ... CustList n1 USA ClothCo Buying CustList n Items Patterns Pipeline for sales predictions Robert Ikeda 2


slide-1
SLIDE 1

Robert Ikeda Jennifer Widom

Stanford University

slide-2
SLIDE 2

Robert Ikeda

Example

2

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

Pipeline for sales predictions

slide-3
SLIDE 3

Robert Ikeda

Example

3

CustListn CustListn‐1 CustList2 CustList1 Dedup Europe USA Union

...

ItemVolumes ItemAgg Predict ClothCo Items Buying Patterns

slide-4
SLIDE 4

Robert Ikeda

Example

4

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Item Demand Cowboy Hat 3

?

slide-5
SLIDE 5

Robert Ikeda

Example

5

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Item Demand Cowboy Hat 3 Name Item Amelie Cowboy Hat Jacques Cowboy Hat Isabelle Cowboy Hat

?

slide-6
SLIDE 6

Robert Ikeda

Example

6

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Name Address Amelie …Paris, TX Jacques …Paris, TX Isabelle …Paris, TX

slide-7
SLIDE 7

Robert Ikeda

Example

7

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris

X

Name Address Amelie 65, quai d'Orsay, Paris, France Jacques 39, rue de Bretagne, Paris, France Isabelle 20 Rue D'orsel, Paris, France

slide-8
SLIDE 8

Robert Ikeda

Example

8

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Item Demand Beret 3

slide-9
SLIDE 9

Robert Ikeda

Panda

Past work tends to be… Panda…

  • 1. Either data-based or process-based

Capture both — “data-oriented workflows”

  • 2. Focused on modeling and capturing provenance

Also provenance operators and queries

  • 3. Specific application domains

General-purpose

9

slide-10
SLIDE 10

Robert Ikeda

Remainder of Talk

  • Processing nodes and provenance capture
  • Provenance operations
  • Provenance queries
  • System and other issues
  • Current research

10

slide-11
SLIDE 11

Robert Ikeda

Processing Nodes

11

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Relational nodes: structured, well-understood operations
  • Opaque nodes
slide-12
SLIDE 12

Robert Ikeda

Provenance Capture

  • Model

― Likely to be similar to Open Provenance Model ― Support provenance at a variety of granularities

  • Interface

― Allow processing nodes to create and manipulate provenance ― For relational operations, can plug in existing provenance work

12

slide-13
SLIDE 13

Robert Ikeda

Provenance Operations

13

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Basic operations

― Backward tracing

  • Where did the cowboy-hat record come from?

― Forward tracing

  • Which predictions did this customer contribute to?
slide-14
SLIDE 14

Robert Ikeda

Provenance Operations

14

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Examples of additional functionality

― Forward propagation

  • Update all affected predictions after customers

have moved from France to Texas

slide-15
SLIDE 15

Robert Ikeda

Provenance Operations

15

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Examples of additional functionality

― Refresh ≈ Backward tracing + forward propagation

  • Get latest predicted volume for cowboy hat sales

(only) using latest customer lists and buying patterns

slide-16
SLIDE 16

Robert Ikeda

Provenance Queries

16

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Examples

― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?

slide-17
SLIDE 17

Robert Ikeda

Provenance Queries

  • Examples

― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?

  • Seamlessly combine provenance and data
  • Compact and intuitive language
  • Amenable to optimization

17

slide-18
SLIDE 18

Robert Ikeda

System and Other Issues

  • Query-driven provenance capture
  • Eager vs. lazy computation and storage
  • Fine-grained vs. coarse-grained
  • Approximate provenance

18

slide-19
SLIDE 19

Robert Ikeda

Current Research

  • Building up basic system infrastructure
  • Refresh

― Efficiently compute the up-to-date value of selected

  • utput elements
  • Theoretical challenges

― Optimizing provenance storage vs. recomputation

19

slide-20
SLIDE 20

Robert Ikeda

System Infrastructure

  • Handles structured relational operations as well

as arbitrary Python processing nodes

  • Arbitrary acyclic transformation graphs
  • Backward tracing and forward propagation

20

slide-21
SLIDE 21

Robert Ikeda

Refresh

  • Problem

― Efficiently compute the up-to-date value of selected

  • utput elements
  • Challenges

― Formally defining the refresh problem ― Understanding when refresh can be done efficiently ― Supporting a wide class of transformations and workflows

21

slide-22
SLIDE 22

Robert Ikeda

Future Work

  • Most everything in this talk 

22

slide-23
SLIDE 23

Parag Agrawal, Abhijeet Mohapatra, Raghotham Murthy, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Semih Salihoglu

slide-24
SLIDE 24

Robert Ikeda

Extra Slides

24

slide-25
SLIDE 25

Robert Ikeda

Running Example

25

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

slide-26
SLIDE 26

PAND ANDA A

slide-27
SLIDE 27

Robert Ikeda Jennifer Widom

Stanford University

slide-28
SLIDE 28

Robert Ikeda

Panda’s Niche

  • 1. Data-based or process-based
  • 2. Modeling and capturing provenance
  • 3. Specific application domains

28

  • 1. Merge data-based and process-based
  • 2. Provenance operators and queries
  • 3. General-purpose
slide-29
SLIDE 29

Robert Ikeda

Overview of Past Work

  • 1. Data-based or process-based
  • 2. Modeling and capturing provenance
  • 3. Specific application domains

29

slide-30
SLIDE 30

Robert Ikeda

Running Example

30

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

Paris, France ? Paris, Texas !

slide-31
SLIDE 31

Robert Ikeda

Running Example

31

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

Pipeline for Sales Prediction

slide-32
SLIDE 32

Robert Ikeda

Provenance Capture

  • Processing Nodes

― Relational operations ― Opaque processing

  • Requirements

― Interface ― Model

32

slide-33
SLIDE 33

Robert Ikeda

Running Example

33

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

Paris, France ? Paris, Texas !

slide-34
SLIDE 34

Robert Ikeda

Processing Nodes

  • Relational Operations

― Relational operations ― Opaque processing

  • Opaque Processing

― Interface ― Model

34

slide-35
SLIDE 35

Robert Ikeda

Provenance Queries

  • Operate over provenance and data
  • Compact and intuitive
  • Amenable to efficient planning

35

Considering only customers from a specific list, which items are in the highest demand?

slide-36
SLIDE 36

Robert Ikeda

Provenance Queries

  • Seamlessly combine provenance and data
  • Compact and intuitive language
  • Amenable to optimization

36

slide-37
SLIDE 37

Robert Ikeda

Provenance Query Examples

  • How many people from each country contributed

to the cowboy hat prediction?

  • Which customer list contributed the most to the

top 100 predicted items?

37

slide-38
SLIDE 38

Robert Ikeda

Running Example

38

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Item Demand Cowboy Hat 3 Name Item Amelie Cowboy Hat Jacques Cowboy Hat Isabelle Cowboy Hat Name Address Amelie …Paris, TX Jacques …Paris, TX Isabelle …Paris, TX Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris

slide-39
SLIDE 39

Robert Ikeda

Running Example

39

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris

slide-40
SLIDE 40

Robert Ikeda

Running Example

40

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris

slide-41
SLIDE 41

Robert Ikeda

Running Example

41

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris Item Demand Beret 3

slide-42
SLIDE 42

Robert Ikeda

Processing Nodes

42

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

Relational Nodes: Structured, well-understood operations

slide-43
SLIDE 43

Robert Ikeda

Processing Nodes

43

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

Opaque Nodes

slide-44
SLIDE 44

Robert Ikeda

Predicted Uses

  • Explanation

― How was data derived?

  • Verification

― Is data erroneous or outdated?

  • Recomputation

― Can data be recomputed efficiently?

44

slide-45
SLIDE 45

Robert Ikeda

Processing Nodes

45

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

Relational nodes: structured, well-understood operations

slide-46
SLIDE 46

Robert Ikeda

Processing Nodes

46

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict

O

...

ItemAgg ClothCo Items Buying Patterns

Opaque nodes

slide-47
SLIDE 47

Robert Ikeda

Provenance Operations

  • Basic operations

― Backward tracing

  • Where did the cowboy-hat record come from?

― Forward tracing

  • Which predictions did this customer contribute to?
  • Examples of additional functionality

― Forward propagation

  • Update all affected predictions after customers

move from France to Texas ― Refresh ≈ Backward tracing + forward propagation

  • Update only the cowboy hat record given updated

customer lists

47

slide-48
SLIDE 48

Robert Ikeda

Provenance Operations

48

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Examples of additional functionality

― Forward propagation

  • Update all affected predictions after customers

move from France to Texas ― Refresh ≈ Backward tracing + forward propagation

  • Update only the cowboy hat record given updated

customer lists

slide-49
SLIDE 49

Robert Ikeda

Provenance Operations

  • Basic operations

― Backward tracing

  • Where did the cowboy-hat record come from?

― Forward tracing

  • Which predictions did this customer contribute to?
  • Examples of additional functionality

― Forward propagation

  • Update all affected predictions after customers

move from France to Texas ― Refresh ≈ Backward tracing + forward propagation

  • Update only the cowboy hat record given updated

customer lists

49

slide-50
SLIDE 50

Robert Ikeda

Provenance Queries

50

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

  • Seamlessly combine provenance and data
  • Compact and intuitive language
  • Amenable to optimization
  • Examples:

― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?

slide-51
SLIDE 51

Robert Ikeda

Provenance Queries

  • Examples:

― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?

  • Seamlessly combine provenance and data
  • Compact and intuitive language
  • Amenable to optimization

51

slide-52
SLIDE 52

Robert Ikeda

Processing Nodes

52

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

Relational nodes: structured, well-understood operations

slide-53
SLIDE 53

Robert Ikeda

Processing Nodes

53

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns

Opaque nodes

slide-54
SLIDE 54

Robert Ikeda

Example

54

CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes

...

ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris Item Demand Beret 3

X