It’s About Time: An Introduction to Timely Dataflow
Data Council, October ‘19
Its About Time: An Introduction to Timely Dataflow Data Council, - - PowerPoint PPT Presentation
Its About Time: An Introduction to Timely Dataflow Data Council, October 19 clockworks Malte Sandstede malte@clockworks.io / @MalteSandstede Nikolas Gbel In collaboration with: niko@clockworks.io / @NikolasGoebel Frank McSherry
Data Council, October ‘19
Malte Sandstede malte@clockworks.io / @MalteSandstede
clockworks Systems Group
Nikolas Göbel niko@clockworks.io / @NikolasGoebel David Bach david@clockworks.io Frank McSherry Vasia Kalavri (ETH)
In collaboration with:
Moritz Moxter moritz@clockworks.io
Timeliness Expressivity Consistency
Timeliness Expressivity Consistency
Naive Stateless Processing
Timeliness Expressivity Consistency
MapReduce
Timeliness Expressivity Consistency
Database
Timeliness Expressivity Consistency
P1 P2 T1
Reactivity queries Virtualization Repartitioning Joins Physical Representation P1 P2 T1 V1 V2 V3 V4 Virtual Partitions time order Business Logic
sources sinks
data exchange
w1 w2
SUM
(1, t0) (4, t1) (3, t0) DATA
SUM
(1, t0) (4, t1) (3, t0) DATA
SUM
(4, t1) (1, t0) (5, t1) (3, t0) DATA
SUM
(4, t1) (1, t0) (5, t1) (3, t0) DATA
A low-latency runtime for distributed cyclic dataflows
github.com/TimelyDataflow
SUM
(1, t0) (4, t1) (3, t0)
t0 t0 t2
DATA PROGRESS
t0
SUM
(1, t0) (4, t1) (3, t0)
t0 t0 t2
DATA PROGRESS
t0
SUM
(1, t0) (4, t1) (3, t0)
t0 t0 t2
DATA PROGRESS
t0
SUM
(1, t0) (4, t1) (3, t0)
t0 t2
DATA PROGRESS
t2
SUM
(1, t0) (4, t1) (3, t0)
t0 t2
DATA PROGRESS
t2
(4, t0)
t2
SUM
(1, t0) (4, t1) (3, t0)
t0 t2
DATA PROGRESS
t2
(4, t0)
t2
(8, t1)
JOIN
(1, t0) (4, t1) (3, t2)
t1 t2
CLICKSTREAM TOPIC CLICKSTREAM PROGRESS
t0 t3 t4
(2, t3) Waiting on METADATA …
t0
METADATA TOPIC METADATA PROGRESS
(MIN)
JOIN
(1, t0) (4, t1) (3, t2) CLICKSTREAM TOPIC CLICKSTREAM PROGRESS
t0
(2, t3) …
t0
METADATA TOPIC METADATA PROGRESS
t0 t1 t2 t3 t4
…
t0
JOIN
(1, t0) (4, t1) (3, t2) CLICKSTREAM TOPIC CLICKSTREAM PROGRESS
t1
(2, t3) …
t0
METADATA TOPIC METADATA PROGRESS
t0 t1 t2 t3 t4
…
t0
…
JOIN
(1, t0) (4, t1) (3, t2) CLICKSTREAM TOPIC CLICKSTREAM PROGRESS
t2
(2, t3) …
t0
METADATA TOPIC METADATA PROGRESS
t0 t1 t2 t3 t4
… …
t0
Reactivity queries Virtualization Repartitioning Joins Physical Representation P1 P2 T1 V1 V2 V3 V4 Virtual Partitions time order Business Logic
Reactivity queries Virtualization Repartitioning Joins Physical Representation P1 P2 T1 V1 V2 V3 V4 Virtual Partitions time order Business Logic
Timely
Timeliness Expressivity Consistency
Timeliness Expressivity Consistency (recursive) queries
C F B A D E
C F B A D E
C F B A D E
/// Breadth-First Search let nodes = roots.map(|x| (x, 0)); nodes.iterate(|inner| { let edges = edges.enter(&inner.scope()); let nodes = nodes.enter(&inner.scope()); inner.join(&edges, |_k,l,d| (*d, l+1)) .concat(&nodes) .reduce(|_, s, t| t.push((*s[0].0, 1))) })
BFS
EDGE CHANGES REACHABLE NODES TRANSITIVE EDGES
/// Breadth-First Search let nodes = roots.map(|x| (x, 0)); nodes.iterate(|inner| { let edges = edges.enter(&inner.scope()); let nodes = nodes.enter(&inner.scope()); inner.join(&edges, |_k,l,d| (*d, l+1)) .concat(&nodes) .reduce(|_, s, t| t.push((*s[0].0, 1))) })
BFS
EDGE CHANGES REACHABLE NODES TRANSITIVE EDGES
BFS
EDGE CHANGES REACHABLE NODES TRANSITIVE EDGES
t1 t2 t0
Have to wait while transitive graph is being discovered.
t0
BFS
EDGE CHANGES REACHABLE NODES TRANSITIVE EDGES
t1 t2 t0
t1 1 t0 1
(Product Partial Order)
t1 t0
t0 t2 t3 t1 t2 t3 t2 t2
t1 t0
t2 t3 1 2 3 t2 2
BFS
EDGE CHANGES REACHABLE NODES TRANSITIVE EDGES
t1 t2 t0
t1 1 t0 1
(Product Partial Order)
BFS
EDGE CHANGES REACHABLE NODES TRANSITIVE EDGES Have to start from scratch for every transaction?
Iterative, incrementalized operators for Timely
github.com/TimelyDataflow
/// BFS let nodes = roots.map(|x| (x, 0)); nodes.iterate(|inner| { let edges = edges.enter(&inner.scope()); let nodes = nodes.enter(&inner.scope()); inner.join_map(&edges, |_k,l,d| (*d, l+1)) .concat(&nodes) .reduce(|_, s, t| t.push((*s[0].0, 1))) }) [[(bfs ?from ?to) [?from :edge ?to]] [(bfs ?from ?to) [?from :edge ?hop] (bfs ?hop ?to)]]
github.com/comnik/declarative-dataflow
Timeliness Expressivity Consistency
Reactivity queries Virtualization Repartitioning Joins Physical Representation P1 P2 T1 V1 V2 V3 V4 Virtual Partitions time order Business Logic
Timely
Reactivity queries Virtualization Repartitioning Joins Physical Representation P1 P2 T1 V1 V2 V3 V4 Virtual Partitions time order Business Logic
Timely DD+3DF
P1 P2 T1 V1 V2 V3 V4
Timely Dataflow
(Dataflows w/ Multidimensional Progress Tracking)
Differential Dataflow
(Iterative Incrementalized Operators)
3DF
(Streaming Relational Queries) github.com/TimelyDataflow github.com/comnik/declarative-dataflow
Repositories
Papers
Talks
Blog Posts
clockworks
www.clockworks.io {david, malte, moritz, niko}@clockworks.io