Robert Ikeda Jennifer Widom Stanford University Example CustList 1 - - PowerPoint PPT Presentation
Robert Ikeda Jennifer Widom Stanford University Example CustList 1 - - PowerPoint PPT Presentation
Robert Ikeda Jennifer Widom Stanford University Example CustList 1 Europe CustList 2 ItemVolumes Dedup Union Predict ItemAgg ... CustList n1 USA ClothCo Buying CustList n Items Patterns Pipeline for sales predictions Robert Ikeda 2
Robert Ikeda
Example
2
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
Pipeline for sales predictions
Robert Ikeda
Example
3
CustListn CustListn‐1 CustList2 CustList1 Dedup Europe USA Union
...
ItemVolumes ItemAgg Predict ClothCo Items Buying Patterns
Robert Ikeda
Example
4
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Item Demand Cowboy Hat 3
?
Robert Ikeda
Example
5
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Item Demand Cowboy Hat 3 Name Item Amelie Cowboy Hat Jacques Cowboy Hat Isabelle Cowboy Hat
?
Robert Ikeda
Example
6
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Name Address Amelie …Paris, TX Jacques …Paris, TX Isabelle …Paris, TX
Robert Ikeda
Example
7
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris
X
Name Address Amelie 65, quai d'Orsay, Paris, France Jacques 39, rue de Bretagne, Paris, France Isabelle 20 Rue D'orsel, Paris, France
Robert Ikeda
Example
8
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Item Demand Beret 3
Robert Ikeda
Panda
Past work tends to be… Panda…
- 1. Either data-based or process-based
Capture both — “data-oriented workflows”
- 2. Focused on modeling and capturing provenance
Also provenance operators and queries
- 3. Specific application domains
General-purpose
9
Robert Ikeda
Remainder of Talk
- Processing nodes and provenance capture
- Provenance operations
- Provenance queries
- System and other issues
- Current research
10
Robert Ikeda
Processing Nodes
11
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Relational nodes: structured, well-understood operations
- Opaque nodes
Robert Ikeda
Provenance Capture
- Model
― Likely to be similar to Open Provenance Model ― Support provenance at a variety of granularities
- Interface
― Allow processing nodes to create and manipulate provenance ― For relational operations, can plug in existing provenance work
12
Robert Ikeda
Provenance Operations
13
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Basic operations
― Backward tracing
- Where did the cowboy-hat record come from?
― Forward tracing
- Which predictions did this customer contribute to?
Robert Ikeda
Provenance Operations
14
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Examples of additional functionality
― Forward propagation
- Update all affected predictions after customers
have moved from France to Texas
Robert Ikeda
Provenance Operations
15
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Examples of additional functionality
― Refresh ≈ Backward tracing + forward propagation
- Get latest predicted volume for cowboy hat sales
(only) using latest customer lists and buying patterns
Robert Ikeda
Provenance Queries
16
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Examples
― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?
Robert Ikeda
Provenance Queries
- Examples
― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?
- Seamlessly combine provenance and data
- Compact and intuitive language
- Amenable to optimization
17
Robert Ikeda
System and Other Issues
- Query-driven provenance capture
- Eager vs. lazy computation and storage
- Fine-grained vs. coarse-grained
- Approximate provenance
18
Robert Ikeda
Current Research
- Building up basic system infrastructure
- Refresh
― Efficiently compute the up-to-date value of selected
- utput elements
- Theoretical challenges
― Optimizing provenance storage vs. recomputation
19
Robert Ikeda
System Infrastructure
- Handles structured relational operations as well
as arbitrary Python processing nodes
- Arbitrary acyclic transformation graphs
- Backward tracing and forward propagation
20
Robert Ikeda
Refresh
- Problem
― Efficiently compute the up-to-date value of selected
- utput elements
- Challenges
― Formally defining the refresh problem ― Understanding when refresh can be done efficiently ― Supporting a wide class of transformations and workflows
21
Robert Ikeda
Future Work
- Most everything in this talk
22
Parag Agrawal, Abhijeet Mohapatra, Raghotham Murthy, Aditya Parameswaran, Hyunjung Park, Alkis Polyzotis, Semih Salihoglu
Robert Ikeda
Extra Slides
24
Robert Ikeda
Running Example
25
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
PAND ANDA A
Robert Ikeda Jennifer Widom
Stanford University
Robert Ikeda
Panda’s Niche
- 1. Data-based or process-based
- 2. Modeling and capturing provenance
- 3. Specific application domains
28
- 1. Merge data-based and process-based
- 2. Provenance operators and queries
- 3. General-purpose
Robert Ikeda
Overview of Past Work
- 1. Data-based or process-based
- 2. Modeling and capturing provenance
- 3. Specific application domains
29
Robert Ikeda
Running Example
30
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
Paris, France ? Paris, Texas !
Robert Ikeda
Running Example
31
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
Pipeline for Sales Prediction
Robert Ikeda
Provenance Capture
- Processing Nodes
― Relational operations ― Opaque processing
- Requirements
― Interface ― Model
32
Robert Ikeda
Running Example
33
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
Paris, France ? Paris, Texas !
Robert Ikeda
Processing Nodes
- Relational Operations
― Relational operations ― Opaque processing
- Opaque Processing
― Interface ― Model
34
Robert Ikeda
Provenance Queries
- Operate over provenance and data
- Compact and intuitive
- Amenable to efficient planning
35
Considering only customers from a specific list, which items are in the highest demand?
Robert Ikeda
Provenance Queries
- Seamlessly combine provenance and data
- Compact and intuitive language
- Amenable to optimization
36
Robert Ikeda
Provenance Query Examples
- How many people from each country contributed
to the cowboy hat prediction?
- Which customer list contributed the most to the
top 100 predicted items?
37
Robert Ikeda
Running Example
38
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Item Demand Cowboy Hat 3 Name Item Amelie Cowboy Hat Jacques Cowboy Hat Isabelle Cowboy Hat Name Address Amelie …Paris, TX Jacques …Paris, TX Isabelle …Paris, TX Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris
Robert Ikeda
Running Example
39
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris
Robert Ikeda
Running Example
40
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris
Robert Ikeda
Running Example
41
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris Item Demand Beret 3
Robert Ikeda
Processing Nodes
42
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
Relational Nodes: Structured, well-understood operations
Robert Ikeda
Processing Nodes
43
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
Opaque Nodes
Robert Ikeda
Predicted Uses
- Explanation
― How was data derived?
- Verification
― Is data erroneous or outdated?
- Recomputation
― Can data be recomputed efficiently?
44
Robert Ikeda
Processing Nodes
45
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
Relational nodes: structured, well-understood operations
Robert Ikeda
Processing Nodes
46
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict
O
...
ItemAgg ClothCo Items Buying Patterns
Opaque nodes
Robert Ikeda
Provenance Operations
- Basic operations
― Backward tracing
- Where did the cowboy-hat record come from?
― Forward tracing
- Which predictions did this customer contribute to?
- Examples of additional functionality
― Forward propagation
- Update all affected predictions after customers
move from France to Texas ― Refresh ≈ Backward tracing + forward propagation
- Update only the cowboy hat record given updated
customer lists
47
Robert Ikeda
Provenance Operations
48
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Examples of additional functionality
― Forward propagation
- Update all affected predictions after customers
move from France to Texas ― Refresh ≈ Backward tracing + forward propagation
- Update only the cowboy hat record given updated
customer lists
Robert Ikeda
Provenance Operations
- Basic operations
― Backward tracing
- Where did the cowboy-hat record come from?
― Forward tracing
- Which predictions did this customer contribute to?
- Examples of additional functionality
― Forward propagation
- Update all affected predictions after customers
move from France to Texas ― Refresh ≈ Backward tracing + forward propagation
- Update only the cowboy hat record given updated
customer lists
49
Robert Ikeda
Provenance Queries
50
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
- Seamlessly combine provenance and data
- Compact and intuitive language
- Amenable to optimization
- Examples:
― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?
Robert Ikeda
Provenance Queries
- Examples:
― How many people from each country contributed to the cowboy hat prediction? ― Which customer list contributed the most to the top 100 predicted items?
- Seamlessly combine provenance and data
- Compact and intuitive language
- Amenable to optimization
51
Robert Ikeda
Processing Nodes
52
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
Relational nodes: structured, well-understood operations
Robert Ikeda
Processing Nodes
53
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns
Opaque nodes
Robert Ikeda
Example
54
CustListn CustListn‐1 CustList2 CustList1 Europe USA Dedup Union Predict ItemVolumes
...
ItemAgg ClothCo Items Buying Patterns Name Address Amelie 65, quai d'Orsay, Paris Jacques 39, rue de Bretagne, Paris Isabelle 20 Rue D'orsel, Paris Item Demand Beret 3