Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, - - PowerPoint PPT Presentation

differential dataflow
SMART_READER_LITE
LIVE PREVIEW

Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, - - PowerPoint PPT Presentation

Differential Dataflow McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael Chathura Kankanamge 08th November 2016 Outline Motivation for Differential Dataflow Key Concepts Differential Dataflow in practice


slide-1
SLIDE 1

Differential Dataflow

McSherry, Frank D., Murray, Derek G., Isaacs, Rebecca, Isard, Michael

Chathura Kankanamge 08th November 2016

slide-2
SLIDE 2

Outline

  • Motivation for Differential Dataflow
  • Key Concepts
  • Differential Dataflow in practice
  • Discussion
slide-3
SLIDE 3

Motivation

slide-4
SLIDE 4

Traditional data parallel processing

  • Take input data in batches.
  • Process and output.
  • Highly evolved - Hadoop, Spark.
  • Mostly stateless.
slide-5
SLIDE 5

Interactive - Twitter Mention Graph

  • Used to find trending #hashtags.
  • Billions of vertices and edges.
  • Millions of updates per second (storm).
  • Needs low latency of streaming and throughput of spark.
  • Similar issue with interactive analytics
slide-6
SLIDE 6

Loop Processing

  • Some algorithms require

iterations ○ Pagerank ○ Connected components

  • Usually requires transferring

entire state between iterations

  • Spark, Hadoop etc execution

times ~ stateless

slide-7
SLIDE 7

Incremental Dataflow

  • Stateful.
  • Get the differences of collections.
  • Only calculate changes.
  • Example

○ Wordcount in Hadoop Online.

  • Can deal with changes due to,

○ Loops ○ New Data

  • But NOT both!!
slide-8
SLIDE 8

Concepts

slide-9
SLIDE 9

Total vs Partial Ordering

  • Traditional dataflow systems expect total
  • rdering

○ Multiple variables are a problem

  • A partial ordering uses a time vector for
  • rdering

○ Deals well with multiple variables

  • Partial because ordering by variable x

gives only a partial ordering

1 2 3 4 5 (2, 2) (0, 0) (1, 0) (2, 0) (2, 1) (0, 1) (0, 2) (1, 2) (1, 1)

slide-10
SLIDE 10

Total vs Partial Ordering

  • Traditional dataflow systems expect total
  • rdering

○ Multiple variables are a problem

  • A partial ordering uses a time vector for
  • rdering

○ Deals well with multiple variables

  • Partial because ordering by variable x

gives only a partial ordering for x

1 2 3 4 5 (2, 2) (0, 0) (1, 0) (2, 0) (2, 1) (0, 1) (0, 2) (1, 2) (1, 1)

slide-11
SLIDE 11

Total vs Partial Ordering

  • Traditional dataflow systems expect total
  • rdering

○ Multiple variables are a problem

  • A partial ordering uses a time vector for
  • rdering

○ Deals well with multiple variables

  • Partial because ordering by variable x

gives only a partial ordering

1 2 3 4 5 (2, 2) (0, 0) (1, 0) (2, 0) (2, 1) (0, 1) (0, 2) (1, 2) (1, 1)

slide-12
SLIDE 12

Differential Dataflow

  • Computational Model

○ Defines how to process partially ordered data. ○ Defines state between iterations

  • Goals

○ Do less calculation per change ○ Converge quicker per iteration

slide-13
SLIDE 13

Timely Dataflow

  • Performs Iterative Calculations
  • Computational model with directed

graph

  • Vertices exchange messages
  • Logical Timestamps for messages
slide-14
SLIDE 14

Timely Dataflow

  • Loops denoted by,

○ Ingress - adds a counter ○ Feedback - increments a counter ○ Egress - removes a counter

  • Pointstamps - events at location

and time

slide-15
SLIDE 15

Differential Dataflow in practise

slide-16
SLIDE 16

The Connected Graph Problem

6 7 8 3 4 5 1 2

slide-17
SLIDE 17

The Connected Graph Problem

6 6 6 1 1 1 1 1

slide-18
SLIDE 18

Connected Graph with Relational Algebra

Labels Edges U Min O 1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2 1 1 2 2 3 3 4 4 5 5

slide-19
SLIDE 19

Connected Graph with Relational Algebra

Labels Edges U Min O 1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2 1 3 4 3 4 2 2 5

slide-20
SLIDE 20

Connected Graph with Relational Algebra

Labels Edges U Min O 3 1 3 4 2 4 1 3 4 3 4 2 5 2 2 5 1 1 2 2 3 3 4 4 5 5 Neighbour Labels S e l f L a b e l s

slide-21
SLIDE 21

Connected Graph with Relational Algebra

Labels Edges U Min O 1 3 4 2 5 1 1 2 2 2 Result after 1st Iteration

slide-22
SLIDE 22

Connected Graph in Timely

Concat Join Concat GroupBy +Min F e e d b a c k Egress B A C E F G I J Labels Edges Ingress I Map H E

  • Edges are available

constantly

  • Add counter at Ingress
  • Remove Counter at egress
  • Increment counter at

feedback

  • Map converts joined tuples

into node/label tuples

  • Concat performs the union
slide-23
SLIDE 23

Maintaining State in Differential Dataflow

Change in state at node b at t Cumulative state at b upto t Sum of all states at b before t

slide-24
SLIDE 24

Connected Graph

3 4 5 1 2

slide-25
SLIDE 25

Connected Graph in Differential

1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2

t= (0)

1 1 2 2 3 3 4 4 5 5

t= (0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

slide-26
SLIDE 26

Connected Graph in Differential

1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2

t= (0)

1 1 2 2 3 3 4 4 5 5

t= (0, 0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

slide-27
SLIDE 27

Connected Graph in Differential

1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2

t= (0)

1 1 2 2 3 3 4 4 5 5

t= (0, 0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

?

slide-28
SLIDE 28

Connected Graph in Differential

t= (0, 0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 3 3 1 4 3 3 4 4 2 2 4 2 5 5 2 1 3 4 3 4 2 2 5

slide-29
SLIDE 29

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 3 4 2 5 3 4 3 4 2 2

t= (0, 0)

4 2 3 1 5

slide-30
SLIDE 30

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 0)

1 3 4 2 5 3 4 3 4 2 2 4 2 3 1 5 1 1 2 2 3 3 4 4 5 5

slide-31
SLIDE 31

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 0)

1 1 3 1 4 2 2 2 5 2

slide-32
SLIDE 32

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 1)

1 1 3 1 4 2 2 2 5 2

slide-33
SLIDE 33

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 1)

1 1 3 1 4 2 2 2 5 2 1 1 2 2 3 3 4 4 5 5

t= (0, 1)

slide-34
SLIDE 34

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 1)

1 1 3 1 4 2 2 2 5 2 1 1 2 2 3 3 4 4 5 5

slide-35
SLIDE 35

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 3 3 3 1 4 4 4 2 5 5 5 2

t= (0, 1)

slide-36
SLIDE 36

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 1)

3 1 3 1 3 4 4 3 4 3 4 2 4 2 5 2 3 3 4 2 4 2 5 1 5 2 2

slide-37
SLIDE 37

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 4 4 3 2 1 3 1 4 2 4 3 2 1 3 2

t= (0, 1)

2 5

slide-38
SLIDE 38

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 1)

4 4 4 2 2 3 3 4 4 5 1 2 4 3 1 3 3 3 4 4

slide-39
SLIDE 39

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 1)

4 2 4 1 4 1 2 4 1 2 1 3 2 3 1 2 5 2

Cumulative Input from concat

1 1 3 1 4 1 2 2 5 2

Groupby + Min

1 1 3 1 4 2 2 2 5 2 1 1 3 1 4 1 2 2 5 2

slide-40
SLIDE 40

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 2)

4 2 4 1

slide-41
SLIDE 41

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 2)

4 2 4 1

slide-42
SLIDE 42

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 2)

2 2 2 1

slide-43
SLIDE 43

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 3)

2 2 2 1

slide-44
SLIDE 44

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 3)

2 2 2 1

slide-45
SLIDE 45

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 3)

5 2 5 1

slide-46
SLIDE 46

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 4)

5 2 5 1

slide-47
SLIDE 47

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 4)

5 2 5 1

slide-48
SLIDE 48

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 4)

5 2 5 1

slide-49
SLIDE 49

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 4)

?

slide-50
SLIDE 50

Connected Graph in Differential

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (0, 4)

?

t= (0)

?

Does not increment

slide-51
SLIDE 51

Changes to Connected Graph - I

3 4 5 1 2 Remove Undirected Edge

slide-52
SLIDE 52

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4

t= (1)

slide-53
SLIDE 53

Changes to Connected Graph - I

t= (1, 0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

?

4 2 2 4

t= (1)

slide-54
SLIDE 54

Changes to Connected Graph - I

t= (1, 0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4

t= (1) t= (0, 0)

?

t= (1, 0)

1 1 2 2 3 3 4 4 5 5

slide-55
SLIDE 55

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4 4 2

t= (1, 0)

slide-56
SLIDE 56

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 2 4 4 2

t= (1, 0)

slide-57
SLIDE 57

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 2 4 4 2

t= (1, 0)

slide-58
SLIDE 58

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 4 2 3

t= (1, 0) Cumulative Input from concat

1 1 2 2 3 1 4 3 5 2

Groupby + Min

1 3 4 2 5 3 4 3 4 2 2 4 2 3 1 5 1 1 2 2 3 3 4 4 5 5

t=(0, 0)

2 4 4 2

t=(1, 0)

1 1 2 2 3 1 4 3 5 2 1 1 3 1 4 2 2 2 5 2

slide-59
SLIDE 59

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 4 2 3

t= (1, 1)

slide-60
SLIDE 60

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 4 2 3

t= (1, 1)

slide-61
SLIDE 61

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (1, 1)

4 3 4 3 2 3

slide-62
SLIDE 62

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (1, 1)

4 4 2 3

slide-63
SLIDE 63

Changes to Connected Graph - I

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (1, 2)

2 2 2 1

slide-64
SLIDE 64

Changes to Connected Graph - II

3 4 5 1 2 Add Undirected Edge

slide-65
SLIDE 65

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 4 4 1

t= (2)

slide-66
SLIDE 66

Changes to Connected Graph - II

t= (1, 0)

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 4 2 2 4

t= (1) t= (0, 0)

?

t= (2, 0)

1 1 2 2 3 3 4 4 5 5 1 4 4 1

slide-67
SLIDE 67

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map 1 4 4 1 1 4

t= (2, 0)

slide-68
SLIDE 68

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 0)

4 1 1 4

slide-69
SLIDE 69

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 0)

4 1 1 4

slide-70
SLIDE 70

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 0)

4 4 1 3

slide-71
SLIDE 71

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 1)

4 4 1 3

slide-72
SLIDE 72

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 1)

4 4 1 3

slide-73
SLIDE 73

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 1)

4 1 4 1 1 3

slide-74
SLIDE 74

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 1)

1 1 1 3

slide-75
SLIDE 75

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 1)

1 1 1 3 4 3

slide-76
SLIDE 76

Changes to Connected Graph - II

Concat Join Concat GroupBy +Min Feedback Egress Labels Edges Ingress Map

t= (2, 1)

?

slide-77
SLIDE 77

Discussion

slide-78
SLIDE 78

Tradeoffs in Iterative Systems

How do you deal with new data when you are iterating with old data?

  • How much state to keep (Memory)

○ Do you keep all data in memory ○ What about intermediate calculations ○ Incremental View Maintenance

  • How much work to do

○ Do everything from the beginning ○ Do work only in the nodes where new data came in

slide-79
SLIDE 79

Iterative vs Differential Dataflow

  • The flexibility of partial Ordering.

○ Iterative Ordering - (i1, j1) ≤ (i2, j2) iff i1 ≤ i2 and j1 ≤ j2. ○ Lexicographic Ordering - (i1, j1) ≤ (i2, j2) if i1 < i2 or i1 = i2 and j1 ≤ j2. ○ Programmer can choose the partial ordering.

  • Communication via Diffs

○ Only the diffs are sent around as messages ○ Nodes on both sides know the previous calculations

slide-80
SLIDE 80

Discussion

  • Is the memory for performance tradeoff justified?
slide-81
SLIDE 81

Thank You