# Comm. ! Params * ! Optimizing Communication Spark 1.0.1 6 ! - - PowerPoint PPT Presentation

comm
SMART_READER_LITE
LIVE PREVIEW

# Comm. ! Params * ! Optimizing Communication Spark 1.0.1 6 ! - - PowerPoint PPT Presentation

Varys Efficient Coflow Scheduling ! Mosharaf Chowdhury, ! Yuan Zhong, Ion Stoica ! UC#Berkeley# Communication is Crucial ! Performance Facebook analytics jobs spend 33% of their runtime in communication 1 ! As in-memory systems proliferate, ! the


slide-1
SLIDE 1

Varys

Efficient Coflow Scheduling!

Mosharaf Chowdhury, ! Yuan Zhong, Ion Stoica!

UC#Berkeley#

slide-2
SLIDE 2

Performance

Facebook analytics jobs spend 33% of their runtime in communication1!

As in-memory systems proliferate,! the network is likely to become the primary bottleneck!

  • 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011!

Communication is Crucial!

slide-3
SLIDE 3

Optimizing Communication Performance: Networking Approach

! “Let systems figure it out”!

Flow

A sequence of packets ! between two endpoints! Independent unit of allocation, sharing, load balancing, and/or! prioritization!

slide-4
SLIDE 4

Spark 1.0.1 6! # Comm. ! Params*! 10! 20! Hadoop 1.0.4 YARN2.3.0

Optimizing Communication Performance: Systems Approach

! “Let users figure it out”!

*Lower bound. Does not include many parameters that can !

indirectly impact communication; e.g., number of reducers etc. ! Also excludes control-plane communication/RPC parameters.!

slide-5
SLIDE 5

Optimizing Communication Performance: Networking Approach

! “Let systems figure it out”!

Optimizing Communication Performance: Systems Approach

! “Let users figure it out”!

slide-6
SLIDE 6

Optimizing Communication Performance: Networking Approach

! “Let systems figure it out”!

Optimizing Communication Performance: Systems Approach

! “Let users figure it out”!

slide-7
SLIDE 7

Optimizing Communication Performance: Networking Approach

! “Let systems figure it out”!

Optimizing Communication Performance: Systems Approach

! “Let users figure it out”! A collection of parallel flows! Distributed endpoints! Each flow is independent! Completion time depends

  • n the last flow to complete!

Coflow1!

  • 1. Coflow: A Networking Abstraction for Cluster Applications, HotNets’2012!
slide-8
SLIDE 8

A collection of parallel flows! Distributed endpoints! Each flow is independent! Completion time depends

  • n the last flow to complete!

Coflow1!

  • 1. Coflow: A Networking Abstraction for Cluster Applications, HotNets’2012!
slide-9
SLIDE 9

1! 2! N! 1! 2! N!

.! .! .! .! .! .!

How to schedule coflows …

! ! ! !

… for faster #1 completion

  • f coflows?

… to meet #2 more deadlines?

! ! !

DC Fabric!

slide-10
SLIDE 10

Varys

Enables coflows in data-intensive clusters!

  • 1. Simpler Frameworks! Zero user-side configuration using a

simple coflow API!

  • 2. Better performance!

Faster and more predictable transfers through coflow scheduling!

slide-11
SLIDE 11

Benefits of!

time!

2! 4! 6!

time!

2! 4! 6!

time!

2! 4! 6!

Coflow1 comp. time = 6! Coflow2 comp. time = 6! Coflow1 comp. time = 6! Coflow2 comp. time = 6! Fair Sharing! Flow-level Prioritization1,2! The Optimal! Coflow1 comp. time = 3! Coflow2 comp. time = 6! L1! L2! L1! L2! L1! L2!

  • 1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012.!
  • 2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013.!

Link 1! Link 2! 3 Units! Coflow 1! 6 Units! Coflow 2! 3-ε Units!

Inter-Coflow Scheduling!

slide-12
SLIDE 12

time!

2! 4! 6!

Coflow1 comp. time = 6! Coflow2 comp. time = 6! Fair Sharing! L1! L2! time!

2! 4! 6!

Coflow1 comp. time = 6! Coflow2 comp. time = 6! Flow-level Prioritization1! L1! L2! time!

2! 4! 6!

The Optimal! Coflow1 comp. time = 3! Coflow2 comp. time = 6! L1! L2!

Inter-Coflow Scheduling!

Concurrent Open Shop Scheduling1!

  • Tasks on independent machines!
  • Examples include job scheduling and

caching blocks!

  • Use a ordering heuristic!

Link 1! Link 2! 3 Units! Coflow 1! 6 Units! Coflow 2! 3-ε Units!

  • 1. A note on the complexity of the concurrent open shop problem, Journal of Scheduling, 9(4):389–396, 2006!
slide-13
SLIDE 13

Inter-Coflow Scheduling!

3! 2! 1! 3! 2! 1!

Ingress Ports! (Machine Uplinks)! Egress Ports! (Machine Downlinks)!

DC Fabric!

Concurrent Open Shop Scheduling!

  • Flows on dependent links!
  • Consider ordering and matching

constraints!

^! with coupled resources!

Link 1! Link 2! 3 Units! Coflow 1! 6 Units! Coflow 2! 3-ε Units!

3! 6! 3-ε!

is NP-Hard

Characterized COSS-CR! Proved that list scheduling might not result in optimal solution!

slide-14
SLIDE 14

Varys

Employs a two-step algorithm to minimize coflow completion times!

  • 1. Ordering heuristic!

Keeps an ordered list of coflows to be scheduled, preempting if needed!

  • 2. Allocation algorithm!

Allocates minimum required resources to each coflow to finish in minimum time!

slide-15
SLIDE 15

Ordering Heuristic!

1! 2! 3! 1! 2! 3! 4!

2! 3!

4! 4!

9! P3! P2!

Time!

P1! C2 ends! C1 ends! 5! 9! P3! P2!

Time!

P1! C1 ends! C2 ends! 4!

C1! C2! Length! 3! 4! Width! 2! 3! Size! 5! 12! Bottleneck! 5! 4!

Shortest-First! Narrowest-First! Smallest-First! Smallest-! Effective-! Bottleneck-! First!

: SEBF

slide-16
SLIDE 16

Allocation Algorithm!

A coflow cannot finish before its very last flow! Finishing flows faster than the bottleneck cannot decrease a coflow’s completion time!

! ! ! ! Ensure minimum allocation to each flow for it to ! finish at the ! desired duration;!

! for example, !

at bottleneck’s completion, or! at the deadline.!

!

MADD

slide-17
SLIDE 17

Varys

Enables frameworks to take advantage of coflow scheduling!

  • 1. Exposes the coflow API!
  • 2. Enforces through a centralized scheduler!
slide-18
SLIDE 18
  • 1. Does it improve performance?!
  • 2. Can it beat non-preemptive solutions?! YES

Evaluation

A 3000-node trace-driven simulation matched against a 100-node EC2 deployment!

slide-19
SLIDE 19

Faster Jobs!

95th ! Avg.!

1.85X 1.25X 1.74X 1.15X

  • Comm. Improv.!

Job Improv.!

slide-20
SLIDE 20

Faster Jobs!

95th ! Avg.!

1.85X 1.25X 1.74X 1.15X

  • Comm. Improv.!

Job Improv.!

2.50X 3.16X 2.94X 3.84X

  • Comm. Heavy1!
  • 1. 26% jobs spend at least 50% of their duration in communication stages.!
slide-21
SLIDE 21

Better than Non-Preemptive Solutions!

95th ! Avg.!

5.65X 7.70X

w.r.t. FIFO1!

What! About! Perpetual! Starvation!

NO

?!

  • 1. Managing Data Transfers in Computer Clusters with Orchestra, SIGCOMM’2011!
slide-22
SLIDE 22

Four Challenges

#3

Decentralized Varys

! ! ! Master failure! Low-latency analytics!

#1

Coflow Dependencies

! ! ! Multi-stage jobs! Multi-wave stages! ! !

#2

Unknown Flow Information

! ! ! Pipelining between stages! Task failures and restarts! !

in the Context of Multipoint-to-Multipoint Coflows

slide-23
SLIDE 23

#4

Theory Behind “Concurrent Open Shop Scheduling with Coupled Resources”

slide-24
SLIDE 24
  • Consolidates network optimization of data-intensive frameworks!
  • Improves job performance by addressing the COSS-CR problem!
  • Increases predictability through informed admission control!

Varys

Greedily schedules coflows without worrying about flow-level metrics!

! http://varys.net/!

Mosharaf Chowdhury - @mosharaf!
slide-25
SLIDE 25