Faucet: a user-level, modular technique for flow control in dataflow - - PowerPoint PPT Presentation

faucet a user level modular technique for flow control in
SMART_READER_LITE
LIVE PREVIEW

Faucet: a user-level, modular technique for flow control in dataflow - - PowerPoint PPT Presentation

Faucet: a user-level, modular technique for flow control in dataflow engines Andrea Lattuada Frank McSherry Zaheer Chothia Systems Group, Una ffi liated Systems Group, ETH Zrich ETH Zrich 1 Problem RAM exhaustion due to bu ff ered


slide-1
SLIDE 1

Faucet: a user-level, modular technique for flow control in dataflow engines

Andrea Lattuada Systems Group, ETH Zürich Frank McSherry Unaffiliated Zaheer Chothia Systems Group, ETH Zürich 1

slide-2
SLIDE 2

2

RAM exhaustion due to buffered intermediate results

10-100x memory savings for 15-25% runtime overhead

Problem

  • no system-level general strategy
  • application-driven scheduling

Our Solution

slide-3
SLIDE 3

Source of the problem Rate imbalance

Nout(t, tʹ) Nin(t, tʹ) 5 1 2 3 4 5 flat_map(|x| [1, …, x])

3

  • perators

channels Storm Flink Naiad

Dataflow model

slide-4
SLIDE 4

source

(,,,) (,,,) (,,,) (,,,) (,,,) A B

RAM exhaustion large

  • utput

rate

  • perator
  • utput

buffered in RAM

Existing approach #1 - Source backpressure

4

Storm Heron Spark streaming source

  • verloaded
  • perators

backpressure signal

slide-5
SLIDE 5

stopped by F

F G H

stopped by G source deadlock similar to TCP flow control

5

Existing approach #2 - Edge-by-edge backpressure

Akka Streams Flink

  • verloaded
  • perator

source backpressure signal

slide-6
SLIDE 6

control scheduling to limit intermediate results

6

Our approach - Faucet based on Timely Dataflow’s concepts

  • no fine-grained signal
  • track completion of a batch
  • f tuples
slide-7
SLIDE 7

Scopes Timestamps nested operator structure tuple metadata (t1) enter leave

(,,)

Foundation - Timely Dataflow’s Progress Tracking

7

slide-8
SLIDE 8

Scopes Timestamps nested operator structure tuple metadata

Foundation - Timely Dataflow’s Progress Tracking

7

(t1) enter leave

(,,)

(t1,t2)

(,,) (,,)

slide-9
SLIDE 9

Foundation - Timely Dataflow’s Progress Tracking

7

Scopes Timestamps nested operator structure tuple metadata (t1) enter leave

(,,)

(t1,t2)

(,,) (,,)

(t1)

(,,)

slide-10
SLIDE 10

Progress Tracking tracks pending timestamps (3,4) (3,4) in flight

(,,)

Foundation - Timely Dataflow’s Progress Tracking

7

Scopes Timestamps nested operator structure tuple metadata (t1) enter leave

(,,)

(t1,t2)

(,,) (,,)

(t1)

(,,)

slide-11
SLIDE 11

probe batcher

scope controlled subgraph

8

Faucet - Track batches of intermediate results

slide-12
SLIDE 12

(te)

(,,) (,,) (,,)

8

probe batcher scope controlled subgraph

Faucet - Track batches of intermediate results

slide-13
SLIDE 13

(te,tb)

(,,) (,,) (,,)

(te)

(,,) (,,) (,,)

8

probe batcher scope controlled subgraph

Faucet - Track batches of intermediate results

slide-14
SLIDE 14

(,,) (,,) (,,)

8

(te,tb)

(,,) (,,) (,,)

(te)

(,,) (,,) (,,)

8

probe batcher scope controlled subgraph

Faucet - Track batches of intermediate results

slide-15
SLIDE 15

(,,) (,,)

(te,2)

(,,) (,,) (,,)

8

(te,tb)

(,,) (,,) (,,)

(te)

(,,) (,,) (,,)

8

probe batcher scope controlled subgraph (te,2) pending

Faucet - Track batches of intermediate results

slide-16
SLIDE 16
  • H. Q. Ngo, C. Ré, and A. Rudra - Generic Join

input graph

build result tuples by extending prefixes

(a11) (a11,a22) (a11,a22,a32)

9

Example - Enumerate triangles in a directed graph

a11 a21 a22 a31 a32 a33

slide-17
SLIDE 17

a11 a21 a22 a31 a32 a33

9

Example - Enumerate triangles in a directed graph

(a11) propose Pa

slide-18
SLIDE 18

9

a11 a21 a22 a31 a32 a33

Example - Enumerate triangles in a directed graph

(a11) (a11,a21) (a11,a22) (a11,a32) propose Pa

slide-19
SLIDE 19

9

a11 a21 a22 a31 a32 a33

Example - Enumerate triangles in a directed graph

(a11) (a11,a21) (a11,a22) (a11,a32) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) propose propose count proposals Pa Pb1 Pb2 C

slide-20
SLIDE 20

(a11) (a11,a21) (a11,a22) (a11,a32) (a11,a21,a31) (a11,a22,a32) (a11,a22,a32) (a11,a22,a33)

9

a11 a21 a22 a31 a32 a33

Example - Enumerate triangles in a directed graph

propose propose count proposals intersect Pa Pb1 Pb2 I1

T

C I2

slide-21
SLIDE 21

10

propose propose count proposals intersect Pa Pb1 Pb2 I1

T

C I2

10

A naïve schedule can generate large intermediate state

slide-22
SLIDE 22

propose propose count proposals intersect Pa Pb1 Pb2 I1

T

C I2

10

A naïve schedule can generate large intermediate state

(a11) (a12)

slide-23
SLIDE 23

propose propose count proposals intersect Pa Pb1 Pb2 I1

T

C I2

10

A naïve schedule can generate large intermediate state

(a11) (a12) (a11,a21) (a11,a22) (a12,a23)

slide-24
SLIDE 24

propose propose count proposals intersect Pa Pb1 Pb2 I1

T

C I2

10

A naïve schedule can generate large intermediate state

(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)

slide-25
SLIDE 25

Faucet limits buffered intermediate results

(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)

11

large

  • utput

rate

slide-26
SLIDE 26

11

(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)

Faucet limits buffered intermediate results

slide-27
SLIDE 27

11

(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)

Faucet limits buffered intermediate results

slide-28
SLIDE 28

11

(a11) (a12) (a11,a21) (a11,a22) (a12,a23) (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a12,a23,a34) (a12,a23,a35)

Faucet limits buffered intermediate results

slide-29
SLIDE 29

12

Enumerate triangles in the Livejournal Dataset Hardware 4’847’571 nodes 68’993’773 edges 285’730’264 triangles Intel Xeon E5-2650 @ 2.00GHz 16 physical cores 10Gbps link

Evaluation - Dataset

slide-30
SLIDE 30

20 40 60 80 100 100 1000 10000 100000 runtime (sec) batch size (# tuples)

Nbatches number of batches in-flight in parallel batch size Nbatches ≥ 2 mitigates stragglers B

13

Evaluation - Sensitivity to parameter choice

2 nodes x 4 threads

slide-31
SLIDE 31

100 1000 10000 100000 100K 1000K total ram (MB) input size (tuples)

uncontrolled with Faucet

Evaluation

14

Memory savings

2 nodes x 4 threads

10x 100x

Runtime overhead

15-25%

slide-32
SLIDE 32

probe scope batcher controlled subgraph

RAM is increasingly the main cost of a system Memory savings 10-100x

  • r more

Overhead 15-25% limits intermediate state Faucet

15