Sincronia: Near-Optimal Network Design for Coflows Shijin - - PowerPoint PPT Presentation

sincronia
SMART_READER_LITE
LIVE PREVIEW

Sincronia: Near-Optimal Network Design for Coflows Shijin - - PowerPoint PPT Presentation

Sincronia: Near-Optimal Network Design for Coflows Shijin Rajakrishnan Joint work with Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat The Flow Abstraction FTP Email HTTP Traditional Applications : Care about


slide-1
SLIDE 1

Sincronia:

Near-Optimal Network Design for Coflows

Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat

Shijin Rajakrishnan

Joint work with

slide-2
SLIDE 2

The Flow Abstraction

Optimized for Flow-level performance FTP Email HTTP … Traditional Applications: Care about performance

  • f individual flows

Good Match

slide-3
SLIDE 3

Is Flow Still the Right Abstraction?

Distributed Applications: Care about performance for a group of flows … FTP Email HTTP … Traditional Applications: Care about performance

  • f individual flows

Optimized for Flow-level performance

slide-4
SLIDE 4

The Coflow abstraction

Collection of semantically related flows [Chowdhury & Stoica, 2012]

… … … … Coflow 1 Coflow 2 Coflow 3

Allows applications to more precisely express their performance goals

slide-5
SLIDE 5
  • Big-switch model
  • Clairvoyant scheduler

Coflow details known at arrival time:

➢ Source-destination for each flow ➢ Size of each flow ➢ Coflow weight

  • Metric – coflow completion time: Time when all flows complete

Network and Coflow Model

Ingress ports Egress ports

1 2’ 2 1’

DC Fabric

Goal: Minimize Average Weighted Coflow Completion Time (CCT)

slide-6
SLIDE 6

Prior Results

Practical, Near-Optimal Network Design for Coflows? Impossibility Results

  • NP-hard
  • <2x approximation hard

Systems/ Theory State-of-the-art Performance Guarantees Runs on Existing Transport Work Conserving Starvation Avoiding Systems Varys [SIGCOMM ‘14] Theory On Scheduling Coflows [IPCO ‘17]

(4-apx)

slide-7
SLIDE 7

Sincronia:

Given a set of coflows and a “right” ordering, ANY per-flow rate allocation mechanism that is work-conserving & order-preserving produces average CCT within 4x of optimal Guarantees 4-approximation for (weighted) average CCT

  • Per-flow rate allocation irrelevant
  • Transport layer agnostic

Two key results

slide-8
SLIDE 8

Systems/ Theory Name Performance Guarantees Runs on Existing Transport Work Conserving Starvation Avoiding Systems Varys Theory On Scheduling Coflows Systems Sincronia

(4-apx) (4-apx)

Sincronia – Near-Optimal Network Design

Also outperforms state-of-the-art across evaluated workloads

slide-9
SLIDE 9
  • Algorithm – BSSI

▪ Bottleneck, Select, Scale, Iterate ▪ SRPT-first style algorithm

Sincronia Design

  • Priorities set from order
  • Flows offloaded to transport layer
  • No explicit per-flow rate allocation

Set of coflows Ordered set of coflows Priorities

  • n flows

Coflow

  • rdering

Flow Scheduling

slide-10
SLIDE 10

Bottleneck-Select-Scale-Iterate (BSSI)

  • Find BOTTLENECK port
  • SELECT (weighted) largest job

▪ Ordered last

  • SCALE weights of remaining jobs
  • ITERATE on unscheduled jobs

1 2 1’ 2’

Ordering not important

slide-11
SLIDE 11

BSSI in Action

  • Bottleneck
  • Select

▪ Ordered Last

  • Scale
  • Iterate

1 2’ 2 1’ 1 2 1’ 2’

Order: Weights: Find port handling largest number of packets Select coflow with largest size-to-weight ratio Scale weight of each coflow (at bottleneck port) Iterate on unscheduled coflows Weight ← Weight(1 –

Size Weight

Size Weight)

Weight ← Weight(1 –

Size Weight

Size Weight)

Size Weight = 3

Size Weight = 4

Size Weight = 1

Size Weight = 8

Size Weight = 4

1 3 4 Weight ← × (1 – )

#packets = 4 #packets = 8 #packets = 5 #packets = 7

slide-12
SLIDE 12

End-to-End Design(Offline)

1 2’ 2 1’ Order:

Host 1 Host 2 Transport Transport

  • Each host knows ordering
  • Flows get priority of coflow
  • Offloads to priority enabled transport layer

BSSI

slide-13
SLIDE 13

Per-flow Rate Allocation is Irrelevant

  • Intuition: Sharing bandwidth does not help CCT
  • Order-preserving schedule:

Flow blocked iff ingress or egress port serving higher-ordered flow

Given the BSSI ordering, ANY per-flow rate allocation mechanism that is work conserving & order-preserving produces average CCT within 4x of optimal

slide-14
SLIDE 14

Avoiding per-flow rate allocation: Implications

  • Implement on top of any transport layer

▪ E.g. pFabric, pHost, TCP

  • Design and implementation independent of

▪ Network Topology ▪ Location of Congestion ▪ Paths of Coflows

  • More scalable

▪ No reallocations upon coflow arrivals/departures

Details in paper

slide-15
SLIDE 15

Handling Arbitrary Arrival Times

1 2 4 8 1 2 4 8

  • Framework: Khuller, Li, Sturmfels, Sun, Venkat, ‘18
  • Time divided into epochs
  • In each epoch

▪ Choose subset of unscheduled jobs ▪ Schedule in next epoch using offline alg.

Provides 12-competitive performance (details in paper)

slide-16
SLIDE 16

Evaluation Overview

  • Testbed implementation on top of TCP

▪ Evaluate impact of in-network congestion, and hardware constraints

  • Simulations

▪ Coflows arrive at time 0 ▪ Coflows arrive at arbitrary times

▪ Sensitivity analysis

➢Coflow sizes, structure, # of coflows ➢Network topologies, Oversubscription ratios, Network load ➢…

All simulations, workloads, and implementations are open- sourced on Sincronia website

slide-17
SLIDE 17

Simulation Results Offl fline

1 2 3 4 5 6 7 8 9

Average 90th percentile 99th percentile Facebook trace 1000 coflow trace 2000 coflow trace

OCT: Completion time of a coflow in an unloaded network

Sincronia not only provides near-optimal guarantees, but also improves upon state-of-the-art design in practice

526 coflow trace [Varys]

slide-18
SLIDE 18

Simulation Results Online

0.5 1 1.5 2 2.5 3 3.5 4

1000 coflow trace 2000 coflow trace

Average 90th percentile 99th percentile

Slowdown

Network Load = 0.9 Even at such high network loads, Sincronia achieves CCT close to that of an unloaded network

slide-19
SLIDE 19

Implementation Results

Implemented on top of TCP

  • 16-server Fat tree topology

▪ Full bisection bandwidth ▪ 20 PICA8 switches

➢ Supports 8 priority levels

  • DiffServ for priority scheduling
slide-20
SLIDE 20

Implementation Results

20 40 60 80 100 120 140 160

  • Unfair Evaluation
  • TCP not designed for coflows
  • TCP not designed to minimize CT

+ Compare against existing designs

  • E.g. Varys reports 1.85x improvement

at mean and at tails

Average 90th percentile 99th percentile

Sincronia achieves significant improvements over existing network designs even with a small number of priority levels

slide-21
SLIDE 21

Summary

  • Sincronia – a network design for coflows
  • 4x within optimal
  • No per-flow rate allocation

Name Performance Guarantees Run on existing Transport Work Conserving Starvation Avoiding Varys On Scheduling Coflows Sincronia

(4-apx) (4-apx)

  • Paper discusses number of open problems
slide-22
SLIDE 22

Thanks!