Differential Dataflow
McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael
Chathura Kankanamge 08th November 2016
Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, - - PowerPoint PPT Presentation
Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016 Outline Motivation for Differential Dataflow Key Concepts Differential Dataflow in practice
McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael
Chathura Kankanamge 08th November 2016
iterations ○ Pagerank ○ Connected components
entire state between iterations
times ~ stateless
○ Wordcount in Hadoop Online.
○ Loops ○ New Data
○ Multiple variables are a problem
○ Deals well with multiple variables
gives only a partial ordering
1 2 3 4 5 (2, 2) (0, 0) (1, 0) (2, 0) (2, 1) (0, 1) (0, 2) (1, 2) (1, 1)
○ Multiple variables are a problem
○ Deals well with multiple variables
gives only a partial ordering for x
1 2 3 4 5 (2, 2) (0, 0) (1, 0) (2, 0) (2, 1) (0, 1) (0, 2) (1, 2) (1, 1)
○ Multiple variables are a problem
○ Deals well with multiple variables
gives only a partial ordering
1 2 3 4 5 (2, 2) (0, 0) (1, 0) (2, 0) (2, 1) (0, 1) (0, 2) (1, 2) (1, 1)
○ Defines how to process partially ordered data. ○ Defines state between iterations
○ Do less calculation per change ○ Converge quicker per iteration
graph
○ Ingress - adds a counter ○ Feedback - increments a counter ○ Egress - removes a counter
and time
6 7 8 3 4 5 1 2
6 6 6 1 1 1 1 1
Labels Edges U Min O 1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2 1 1 2 2 3 3 4 4 5 5
Labels Edges U Min O 1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2 1 3 4 3 4 2 2 5
Labels Edges U Min O 3 1 3 4 2 4 1 3 4 3 4 2 5 2 2 5 1 1 2 2 3 3 4 4 5 5 Neighbour Labels S e l f L a b e l s
Labels Edges U Min O 1 3 4 2 5 1 1 2 2 2 Result after 1st Iteration
Concat Join Concat GroupBy +Min F e e d b a c k Egress B A C E F G I J Labels Edges Ingress I Map H E
constantly
feedback
into node/label tuples
Change in state at node b at t Cumulative state at b upto t Sum of all states at b before t
3 4 5 1 2
1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2
t= (0)
1 1 2 2 3 3 4 4 5 5
t= (0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2
t= (0)
1 1 2 2 3 3 4 4 5 5
t= (0, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2
t= (0)
1 1 2 2 3 3 4 4 5 5
t= (0, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2 1 3 4 3 4 2 2 5
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 3 4 2 5 3 4 3 4 2 2
t= (0, 0)
4 2 3 1 5
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 0)
1 3 4 2 5 3 4 3 4 2 2 4 2 3 1 5 1 1 2 2 3 3 4 4 5 5
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 0)
1 1 3 1 4 2 2 2 5 2
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 1)
1 1 3 1 4 2 2 2 5 2
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 1)
1 1 3 1 4 2 2 2 5 2 1 1 2 2 3 3 4 4 5 5
t= (0, 1)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 1)
1 1 3 1 4 2 2 2 5 2 1 1 2 2 3 3 4 4 5 5
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 3 3 3 1 4 4 4 2 5 5 5 2
t= (0, 1)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 1)
3 1 3 1 3 4 4 3 4 3 4 2 4 2 5 2 3 3 4 2 4 2 5 1 5 2 2
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 4 4 3 2 1 3 1 4 2 4 3 2 1 3 2
t= (0, 1)
2 5
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 1)
4 4 4 2 2 3 3 4 4 5 1 2 4 3 1 3 3 3 4 4
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 1)
4 2 4 1 4 1 2 4 1 2 1 3 2 3 1 2 5 2
Cumulative Input from concat
1 1 3 1 4 1 2 2 5 2
Groupby + Min
1 1 3 1 4 2 2 2 5 2 1 1 3 1 4 1 2 2 5 2
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 2)
4 2 4 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 2)
4 2 4 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 2)
2 2 2 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 3)
2 2 2 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 3)
2 2 2 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 3)
5 2 5 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 4)
5 2 5 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 4)
5 2 5 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 4)
5 2 5 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 4)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (0, 4)
t= (0)
Does not increment
3 4 5 1 2 Remove Undirected Edge
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4
t= (1)
t= (1, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
4 2 2 4
t= (1)
t= (1, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4
t= (1) t= (0, 0)
t= (1, 0)
1 1 2 2 3 3 4 4 5 5
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4 4 2
t= (1, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 2 4 4 2
t= (1, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 2 4 4 2
t= (1, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 4 2 3
t= (1, 0) Cumulative Input from concat
1 1 2 2 3 1 4 3 5 2
Groupby + Min
1 3 4 2 5 3 4 3 4 2 2 4 2 3 1 5 1 1 2 2 3 3 4 4 5 5
t=(0, 0)
2 4 4 2
t=(1, 0)
1 1 2 2 3 1 4 3 5 2 1 1 3 1 4 2 2 2 5 2
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 4 2 3
t= (1, 1)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 4 2 3
t= (1, 1)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (1, 1)
4 3 4 3 2 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (1, 1)
4 4 2 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (1, 2)
2 2 2 1
3 4 5 1 2 Add Undirected Edge
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 4 4 1
t= (2)
t= (1, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4
t= (1) t= (0, 0)
t= (2, 0)
1 1 2 2 3 3 4 4 5 5 1 4 4 1
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 4 4 1 1 4
t= (2, 0)
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 0)
4 1 1 4
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 0)
4 1 1 4
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 0)
4 4 1 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 1)
4 4 1 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 1)
4 4 1 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 1)
4 1 4 1 1 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 1)
1 1 1 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 1)
1 1 1 3 4 3
Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map
t= (2, 1)
How do you deal with new data when you are iterating with old data?
○ Do you keep all data in memory ○ What about intermediate calculations ○ Incremental View Maintenance
○ Do everything from the beginning ○ Do work only in the nodes where new data came in
○ Iterative Ordering - (i1, j1) ≤ (i2, j2) iff i1 ≤ i2 and j1 ≤ j2. ○ Lexicographic Ordering - (i1, j1) ≤ (i2, j2) if i1 < i2 or i1 = i2 and j1 ≤ j2. ○ Programmer can choose the partial ordering.
○ Only the diffs are sent around as messages ○ Nodes on both sides know the previous calculations