comm
play

# Comm. ! Params * ! Optimizing Communication Spark 1.0.1 6 ! - PowerPoint PPT Presentation

Varys Efficient Coflow Scheduling ! Mosharaf Chowdhury, ! Yuan Zhong, Ion Stoica ! UC#Berkeley# Communication is Crucial ! Performance Facebook analytics jobs spend 33% of their runtime in communication 1 ! As in-memory systems proliferate, ! the


  1. Varys Efficient Coflow Scheduling ! Mosharaf Chowdhury, ! Yuan Zhong, Ion Stoica ! UC#Berkeley#

  2. Communication is Crucial ! Performance Facebook analytics jobs spend 33% of their runtime in communication 1 ! As in-memory systems proliferate, ! the network is likely to become the primary bottleneck ! 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011 !

  3. Optimizing Communication A sequence of packets ! between two endpoints ! Performance: Flow Networking Independent unit of allocation, sharing, load balancing, and/or ! Approach prioritization ! ! “Let systems figure it out” !

  4. # Comm. ! Params * ! Optimizing Communication Spark 1.0.1 6 ! Performance: Hadoop 1.0.4 10 ! Systems Approach YARN 2.3.0 20 ! ! “Let users figure it out” ! * Lower bound. Does not include many parameters that can ! indirectly impact communication; e.g., number of reducers etc. ! Also excludes control-plane communication/RPC parameters. !

  5. Optimizing Optimizing Communication Communication Performance: Performance: Systems Networking Approach Approach ! ! “Let users figure it out” ! “Let systems figure it out” !

  6. Optimizing Optimizing Communication Communication Performance: Performance: Systems Networking Approach Approach ! ! “Let users figure it out” ! “Let systems figure it out” !

  7. Optimizing Optimizing Communication Communication Coflow 1 ! Performance: Performance: Systems Networking A collection of parallel flows ! Approach Approach Completion time depends Distributed endpoints ! on the last flow to complete ! ! ! Each flow is independent ! “Let users figure it out” ! “Let systems figure it out” ! 1. Coflow: A Networking Abstraction for Cluster Applications, HotNets’2012 !

  8. Coflow 1 ! A collection of parallel flows ! Completion time depends Distributed endpoints ! on the last flow to complete ! Each flow is independent ! 1. Coflow: A Networking Abstraction for Cluster Applications, HotNets’2012 !

  9. 1 ! 1 ! … for faster #1 completion 2 ! 2 ! How to of coflows? . ! . ! schedule coflows … … to meet . ! . ! #2 more ! . ! . ! deadlines? ! ! ! ! N ! N ! ! ! DC Fabric !

  10. Varys Enables coflows in data-intensive clusters ! 1. Simpler Frameworks ! Zero user-side configuration using a simple coflow API ! 2. Better performance ! Faster and more predictable transfers through coflow scheduling !

  11. Benefits of ! Inter-Coflow Scheduling ! Coflow 1 ! Coflow 2 ! 6 Units ! Link 2 ! 3- ε Units ! Link 1 ! 3 Units ! Fair Sharing ! Flow-level Prioritization 1,2 ! The Optimal ! L2 ! L2 ! L2 ! L1 ! L1 ! L1 ! 2 ! 4 ! 6 ! 2 ! 4 ! 6 ! time ! time ! 2 ! 4 ! 6 ! time ! Coflow1 comp. time = 3 ! Coflow1 comp. time = 6 ! Coflow1 comp. time = 6 ! Coflow2 comp. time = 6 ! Coflow2 comp. time = 6 ! Coflow2 comp. time = 6 ! 1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012. ! 2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013. !

  12. Inter-Coflow Scheduling ! Coflow 1 ! Coflow 2 ! 6 Units ! Link 2 ! 3- ε Units ! Link 1 ! 3 Units ! Fair Sharing ! Flow-level Prioritization 1 ! The Optimal ! Concurrent Open Shop Scheduling 1 ! • Tasks on independent machines ! L2 ! L2 ! L2 ! • Examples include job scheduling and L1 ! L1 ! L1 ! caching blocks ! 2 ! 4 ! 6 ! 2 ! 4 ! 6 ! • Use a ordering heuristic ! time ! time ! 2 ! 4 ! 6 ! time ! Coflow1 comp. time = 3 ! Coflow1 comp. time = 6 ! Coflow1 comp. time = 6 ! Coflow2 comp. time = 6 ! Coflow2 comp. time = 6 ! Coflow2 comp. time = 6 ! 1. A note on the complexity of the concurrent open shop problem, Journal of Scheduling, 9(4):389–396, 2006 !

  13. is NP-Hard Inter-Coflow Scheduling ! Coflow 1 ! Coflow 2 ! 6 Units ! Link 2 ! 3- ε Units ! Link 1 ! 3 Units ! Ingress Ports ! Egress Ports ! with coupled resources ! (Machine Uplinks) ! (Machine Downlinks) ! Concurrent Open Shop Scheduling ! ^ ! 3 ! 3 ! • Flows on dependent links ! • Consider ordering and matching constraints ! 2 ! 6 ! 2 ! Characterized COSS-CR ! Proved that list scheduling might not 3 ! 3- ε ! 1 ! 1 ! DC Fabric ! result in optimal solution !

  14. Varys Employs a two-step algorithm to minimize coflow completion times ! 1. Ordering heuristic ! Keeps an ordered list of coflows to be scheduled, preempting if needed ! 2. Allocation algorithm ! Allocates minimum required resources to each coflow to finish in minimum time !

  15. : SEBF Ordering Heuristic ! C 1 ends ! C 2 ends ! C 2 ends ! C 1 ends ! 4 ! 1 ! 1 ! P 1 ! P 1 ! 2 ! P 2 ! P 2 ! 4 ! 2 ! 2 ! P 3 ! P 3 ! 3 ! 3 ! 3 ! 4 ! 5 ! 9 ! 4 ! 9 ! Time ! Time ! C 1 ! C 2 ! Smallest- ! Shortest-First ! Length ! 3 ! 4 ! Effective- ! Narrowest-First ! Width ! 2 ! 3 ! Bottleneck- ! Size ! 5 ! 12 ! Smallest-First ! First ! Bottleneck ! 5 ! 4 !

  16. ! MADD Allocation Algorithm ! ! ! ! Ensure minimum Finishing flows A coflow allocation to each faster than the cannot finish flow for it to ! bottleneck cannot before its finish at the ! decrease a coflow’s very last flow ! desired duration; ! completion time ! ! for example, ! at bottleneck’s completion, or ! at the deadline. ! !

  17. Varys Enables frameworks to take advantage of coflow scheduling ! 1. Exposes the coflow API ! 2. Enforces through a centralized scheduler !

  18. A 3000-node trace-driven Evaluation simulation matched against a 100-node EC2 deployment ! 2. Can it beat non-preemptive solutions? ! YES 1. Does it improve performance? !

  19. Faster Jobs ! Comm. Improv. ! Job Improv. ! Avg. ! 1.85X 1.25X 95 th ! 1.74X 1.15X

  20. Faster Jobs ! Comm. Heavy 1 ! Comm. Improv. ! Job Improv. ! Avg. ! 3.16X 2.50X 1.85X 1.25X 95 th ! 1.74X 1.15X 3.84X 2.94X 1. 26% jobs spend at least 50% of their duration in communication stages. !

  21. Better than Non-Preemptive Solutions ! w.r.t. FIFO 1 ! NO What ! Avg. ! 5.65X About ! Perpetual ! 95 th ! Starvation ! ? ! 7.70X 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011 !

  22. # 1 # 2 # 3 Coflow Unknown Flow Decentralized Four Dependencies Information Challenges Varys ! ! ! ! ! ! ! ! ! Multi-stage jobs ! Pipelining between stages ! Master failure ! in the Context of Multipoint-to-Multipoint Coflows Multi-wave stages ! Task failures and restarts ! Low-latency analytics ! ! ! !

  23. # 4 Theory Behind “Concurrent Open Shop Scheduling with Coupled Resources”

  24. Varys Greedily schedules coflows without worrying about flow-level metrics ! • Consolidates network optimization of data-intensive frameworks ! • Improves job performance by addressing the COSS-CR problem ! • Increases predictability through informed admission control ! ! http://varys.net/ ! Mosharaf Chowdhury - @mosharaf !

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend