Faucet: a user-level, modular technique for flow control in dataflow engines
Andrea Lattuada Systems Group, ETH Zürich Frank McSherry Unaffiliated Zaheer Chothia Systems Group, ETH Zürich 1
Faucet: a user-level, modular technique for flow control in dataflow - - PowerPoint PPT Presentation
Faucet: a user-level, modular technique for flow control in dataflow engines Andrea Lattuada Frank McSherry Zaheer Chothia Systems Group, Una ffi liated Systems Group, ETH Zrich ETH Zrich 1 Problem RAM exhaustion due to bu ff ered
Andrea Lattuada Systems Group, ETH Zürich Frank McSherry Unaffiliated Zaheer Chothia Systems Group, ETH Zürich 1
2
3
channels Storm Flink Naiad
source
(,,,) (,,,) (,,,) (,,,) (,,,) A B
RAM exhaustion large
rate
buffered in RAM
4
Storm Heron Spark streaming source
backpressure signal
stopped by F
F G H
stopped by G source deadlock similar to TCP flow control
5
Akka Streams Flink
source backpressure signal
6
Scopes Timestamps nested operator structure tuple metadata (t1) enter leave
(,,)
7
Scopes Timestamps nested operator structure tuple metadata
7
(t1) enter leave
(,,)
(t1,t2)
(,,) (,,)
7
Scopes Timestamps nested operator structure tuple metadata (t1) enter leave
(,,)
(t1,t2)
(,,) (,,)
(t1)
(,,)
Progress Tracking tracks pending timestamps (3,4) (3,4) in flight
(,,)
7
Scopes Timestamps nested operator structure tuple metadata (t1) enter leave
(,,)
(t1,t2)
(,,) (,,)
(t1)
(,,)
scope controlled subgraph
8
(te)
(,,) (,,) (,,)
8
probe batcher scope controlled subgraph
(te,tb)
(,,) (,,) (,,)
(te)
(,,) (,,) (,,)
8
probe batcher scope controlled subgraph
(,,) (,,) (,,)
8
(te,tb)
(,,) (,,) (,,)
(te)
(,,) (,,) (,,)
8
probe batcher scope controlled subgraph
(,,) (,,)
(te,2)
(,,) (,,) (,,)
8
(te,tb)
(,,) (,,) (,,)
(te)
(,,) (,,) (,,)
8
probe batcher scope controlled subgraph (te,2) pending
input graph
(a11) (a11,a22) (a11,a22,a32)
9
a11 a21 a22 a31 a32 a33
a11 a21 a22 a31 a32 a33
9
(a11) propose Pa
9
a11 a21 a22 a31 a32 a33
(a11) (a11,a21) (a11,a22) (a11,a32) propose Pa
9
a11 a21 a22 a31 a32 a33
(a11) (a11,a21) (a11,a22) (a11,a32) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) propose propose count proposals Pa Pb1 Pb2 C
(a11) (a11,a21) (a11,a22) (a11,a32) (a11,a21,a31) (a11,a22,a32) (a11,a22,a32) (a11,a22,a33)
9
a11 a21 a22 a31 a32 a33
propose propose count proposals intersect Pa Pb1 Pb2 I1
T
C I2
10
propose propose count proposals intersect Pa Pb1 Pb2 I1
T
C I2
10
propose propose count proposals intersect Pa Pb1 Pb2 I1
T
C I2
10
(a11) (a12)
propose propose count proposals intersect Pa Pb1 Pb2 I1
T
C I2
10
(a11) (a12) (a11,a21) (a11,a22) (a12,a23)
propose propose count proposals intersect Pa Pb1 Pb2 I1
T
C I2
10
(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)
(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)
11
large
rate
11
(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)
11
(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)
11
(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)
12
Enumerate triangles in the Livejournal Dataset Hardware 4’847’571 nodes 68’993’773 edges 285’730’264 triangles Intel Xeon E5-2650 @ 2.00GHz 16 physical cores 10Gbps link
20 40 60 80 100 100 1000 10000 100000 runtime (sec) batch size (# tuples)
Nbatches number of batches in-flight in parallel batch size Nbatches ≥ 2 mitigates stragglers B
13
2 nodes x 4 threads
100 1000 10000 100000 100K 1000K total ram (MB) input size (tuples)
uncontrolled with Faucet
14
2 nodes x 4 threads
probe scope batcher controlled subgraph
15