faucet a user level modular technique for flow control in
play

Faucet: a user-level, modular technique for flow control in dataflow - PowerPoint PPT Presentation

Faucet: a user-level, modular technique for flow control in dataflow engines Andrea Lattuada Frank McSherry Zaheer Chothia Systems Group, Una ffi liated Systems Group, ETH Zrich ETH Zrich 1 Problem RAM exhaustion due to bu ff ered


  1. Faucet: a user-level, modular technique for flow control in dataflow engines Andrea Lattuada Frank McSherry Zaheer Chothia Systems Group, Una ffi liated Systems Group, ETH Zürich ETH Zürich 1

  2. Problem RAM exhaustion due to bu ff ered intermediate results Our Solution • no system-level general strategy • application-driven scheduling 10-100x memory savings for 15-25% runtime overhead 2

  3. Source of the problem Dataflow model Rate imbalance operators 1 2 3 Nout(t, t ʹ ) 4 Nin(t, t ʹ ) 5 5 Storm channels Flink flat_map(|x| [1, …, x]) Naiad 3

  4. Existing approach #1 - Source backpressure operator output backpressure buffered signal in RAM (,,,) (,,,) (,,,) (,,,) (,,,) A B source source overloaded operators large output rate Storm RAM Heron exhaustion Spark streaming 4

  5. Existing approach #2 - Edge-by-edge backpressure similar to TCP flow control backpressure stopped by G signal stopped by F overloaded operator H F G source source deadlock Akka Streams Flink 5

  6. Our approach - Faucet based on Timely Dataflow’s concepts • no fine-grained signal • track completion of a batch of tuples control scheduling to limit intermediate results 6

  7. Foundation - Timely Dataflow’s Progress Tracking Scopes nested operator structure Timestamps tuple metadata (t1) (,,) enter leave 7

  8. Foundation - Timely Dataflow’s Progress Tracking Scopes nested operator structure Timestamps tuple metadata (t1,t2) (t1) (,,) (,,) (,,) enter leave 7

  9. Foundation - Timely Dataflow’s Progress Tracking Scopes nested operator structure Timestamps tuple metadata (t1) (t1,t2) (t1) (,,) (,,) (,,) (,,) enter leave 7

  10. Foundation - Timely Dataflow’s Progress Tracking Scopes nested operator structure Progress Tracking tracks pending timestamps Timestamps tuple metadata (t1) (t1,t2) (t1) (,,) (,,) (,,) (,,) (,,) (3,4) (3,4) enter leave in flight 7

  11. Faucet - Track batches of intermediate results scope controlled subgraph batcher probe 8

  12. Faucet - Track batches of intermediate results scope probe batcher controlled subgraph (,,) (,,) (,,) (te) 8

  13. Faucet - Track batches of intermediate results scope probe batcher controlled subgraph (,,) (,,) (,,) (,,) (,,) (,,) (te) (te,tb) 8

  14. Faucet - Track batches of intermediate results scope probe batcher controlled subgraph (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (te) (te,tb) 8 8

  15. Faucet - Track batches of intermediate results scope probe batcher controlled subgraph (,,) (,,) (,,) (,,) (,,) (,,) (,,) (,,) (te,2) (,,) (,,) pending (,,) (te,2) (te) (te,tb) 8 8

  16. Example - Enumerate triangles in a directed graph H. Q. Ngo, C. Ré, and A. Rudra - Generic Join a31 build result tuples by extending prefixes a21 (a11) a32 (a11,a22) (a11,a22,a32) a11 a33 a22 input graph 9

  17. Example - Enumerate triangles in a directed graph a31 (a11) a21 a32 Pa a11 a33 a22 propose 9

  18. Example - Enumerate triangles in a directed graph (a11,a21) (a11,a22) a31 (a11) (a11,a32) a21 a32 Pa a11 a33 a22 propose 9

  19. Example - Enumerate triangles in a directed graph (a11,a21) (a11,a21,a31) (a11,a22) (a11,a22,a32) a31 (a11) (a11,a32) (a11,a22,a33) a21 Pb1 a32 Pa C a11 a33 a22 propose count Pb2 proposals propose 9

  20. Example - Enumerate triangles in a directed graph (a11,a21) (a11,a21,a31) (a11,a22) (a11,a22,a32) a31 (a11) (a11,a32) (a11,a22,a32) (a11,a22,a33) a21 Pb1 I1 a32 Pa C T a11 a33 a22 propose count I2 Pb2 proposals propose intersect 9

  21. A naïve schedule can generate large intermediate state Pb1 I1 Pa C T propose count I2 Pb2 proposals propose intersect 10 10

  22. A naïve schedule can generate large intermediate state (a11) (a12) Pb1 I1 Pa C T propose count I2 Pb2 proposals propose intersect 10

  23. A naïve schedule can generate large intermediate state (a11,a21) (a11,a22) (a11) (a12) (a12,a23) Pb1 I1 Pa C T propose count I2 Pb2 proposals propose intersect 10

  24. A naïve schedule can generate large intermediate state (a11,a21,a31) (a11,a22,a32) (a11,a22,a33) (a11,a21) (a12,a23,a34) (a11,a22) (a11) (a12,a23,a35) (a12) (a12,a23) Pb1 I1 Pa C T propose count I2 Pb2 proposals propose intersect 10

  25. Faucet limits bu ff ered intermediate results (a11,a21,a31) (a11,a22,a32) (a11,a21) (a11,a22,a33) (a11,a22) (a12,a23,a34) (a11) (a12,a23) (a12,a23,a35) (a12) large output rate 11

  26. Faucet limits bu ff ered intermediate results (a11,a21,a31) (a11,a22,a32) (a11,a21) (a11,a22,a33) (a11,a22) (a12,a23,a34) (a11) (a12,a23) (a12,a23,a35) (a12) 11

  27. Faucet limits bu ff ered intermediate results (a11,a21,a31) (a11,a22,a32) (a11,a21) (a11,a22,a33) (a11,a22) (a12,a23,a34) (a11) (a12,a23) (a12,a23,a35) (a12) 11

  28. Faucet limits bu ff ered intermediate results (a11,a21,a31) (a11,a22,a32) (a11,a21) (a11,a22,a33) (a11,a22) (a12,a23,a34) (a11) (a12,a23) (a12,a23,a35) (a12) 11

  29. Evaluation - Dataset Enumerate triangles in the Livejournal Dataset 4’847’571 nodes 68’993’773 edges 285’730’264 triangles Hardware Intel Xeon E5-2650 @ 2.00GHz 16 physical cores 10Gbps link 12

  30. Evaluation - Sensitivity to parameter choice Nbatches B batch size number of batches in-flight in parallel 2 nodes x 4 threads 100 runtime (sec) 80 60 Nbatches ≥ 2 mitigates stragglers 40 20 0 100 1000 10000 100000 batch size (# tuples) 13

  31. Evaluation Memory savings 2 nodes x 4 threads uncontrolled with Faucet 100000 Runtime overhead total ram (MB) 10000 15-25% 100x 1000 10x 100 100K 1000K input size (tuples) 14

  32. Faucet RAM is increasingly the main scope cost of a system Memory savings 10-100x controlled or more subgraph batcher probe Overhead 15-25% limits intermediate state 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend