Sincronia: Near-Optimal Network Design for Coflows Shijin - - PowerPoint PPT Presentation

▶

Dec 03, 2023 167 likes •399 views

Sincronia: Near-Optimal Network Design for Coflows Shijin Rajakrishnan Joint work with Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat The Flow Abstraction FTP Email HTTP Traditional Applications : Care about

SLIDE 1

Sincronia:

Near-Optimal Network Design for Coflows

Saksham Agarwal Akshay Narayan Rachit Agarwal David Shmoys Amin Vahdat

Shijin Rajakrishnan

Joint work with

SLIDE 2

The Flow Abstraction

Optimized for Flow-level performance FTP Email HTTP … Traditional Applications: Care about performance

f individual flows

Good Match

SLIDE 3

Is Flow Still the Right Abstraction?

Distributed Applications: Care about performance for a group of flows … FTP Email HTTP … Traditional Applications: Care about performance

f individual flows

Optimized for Flow-level performance

SLIDE 4

The Coflow abstraction

Collection of semantically related flows [Chowdhury & Stoica, 2012]

… … … … Coflow 1 Coflow 2 Coflow 3

Allows applications to more precisely express their performance goals

SLIDE 5

Big-switch model
Clairvoyant scheduler

▪

Coflow details known at arrival time:

➢ Source-destination for each flow ➢ Size of each flow ➢ Coflow weight

Metric – coflow completion time: Time when all flows complete

Network and Coflow Model

Ingress ports Egress ports

1 2’ 2 1’

DC Fabric

Goal: Minimize Average Weighted Coflow Completion Time (CCT)

SLIDE 6

Prior Results

Practical, Near-Optimal Network Design for Coflows? Impossibility Results

NP-hard
<2x approximation hard

Systems/ Theory State-of-the-art Performance Guarantees Runs on Existing Transport Work Conserving Starvation Avoiding Systems Varys [SIGCOMM ‘14] Theory On Scheduling Coflows [IPCO ‘17]

(4-apx)

SLIDE 7

Sincronia:

Given a set of coflows and a “right” ordering, ANY per-flow rate allocation mechanism that is work-conserving & order-preserving produces average CCT within 4x of optimal Guarantees 4-approximation for (weighted) average CCT

Per-flow rate allocation irrelevant
Transport layer agnostic

Two key results

SLIDE 8

Systems/ Theory Name Performance Guarantees Runs on Existing Transport Work Conserving Starvation Avoiding Systems Varys Theory On Scheduling Coflows Systems Sincronia

(4-apx) (4-apx)

Sincronia – Near-Optimal Network Design

Also outperforms state-of-the-art across evaluated workloads

SLIDE 9

Algorithm – BSSI

▪ Bottleneck, Select, Scale, Iterate ▪ SRPT-first style algorithm

Sincronia Design

Priorities set from order
Flows offloaded to transport layer
No explicit per-flow rate allocation

Set of coflows Ordered set of coflows Priorities

n flows

Coflow

rdering

Flow Scheduling

SLIDE 10

Bottleneck-Select-Scale-Iterate (BSSI)

Find BOTTLENECK port
SELECT (weighted) largest job

▪ Ordered last

SCALE weights of remaining jobs
ITERATE on unscheduled jobs

1 2 1’ 2’

Ordering not important

SLIDE 11

BSSI in Action

Bottleneck
Select

▪ Ordered Last

Scale
Iterate

1 2’ 2 1’ 1 2 1’ 2’

Order: Weights: Find port handling largest number of packets Select coflow with largest size-to-weight ratio Scale weight of each coflow (at bottleneck port) Iterate on unscheduled coflows Weight ← Weight(1 –

ൗ

Size Weight

ൗ

Size Weight)

Weight ← Weight(1 –

ൗ

Size Weight

ൗ

Size Weight)

ൗ

Size Weight = 3

ൗ

Size Weight = 4

ൗ

Size Weight = 1

ൗ

Size Weight = 8

ൗ

Size Weight = 4

1 3 4 Weight ← × (1 – )

#packets = 4 #packets = 8 #packets = 5 #packets = 7

SLIDE 12

End-to-End Design(Offline)

1 2’ 2 1’ Order:

Host 1 Host 2 Transport Transport

Each host knows ordering
Flows get priority of coflow
Offloads to priority enabled transport layer

BSSI

SLIDE 13

Per-flow Rate Allocation is Irrelevant

Intuition: Sharing bandwidth does not help CCT
Order-preserving schedule:

Flow blocked iff ingress or egress port serving higher-ordered flow

Given the BSSI ordering, ANY per-flow rate allocation mechanism that is work conserving & order-preserving produces average CCT within 4x of optimal

SLIDE 14

Avoiding per-flow rate allocation: Implications

Implement on top of any transport layer

▪ E.g. pFabric, pHost, TCP

Design and implementation independent of

▪ Network Topology ▪ Location of Congestion ▪ Paths of Coflows

More scalable

▪ No reallocations upon coflow arrivals/departures

Details in paper

SLIDE 15

Handling Arbitrary Arrival Times

1 2 4 8 1 2 4 8

Framework: Khuller, Li, Sturmfels, Sun, Venkat, ‘18
Time divided into epochs
In each epoch

▪ Choose subset of unscheduled jobs ▪ Schedule in next epoch using offline alg.

Provides 12-competitive performance (details in paper)

SLIDE 16

Evaluation Overview

Testbed implementation on top of TCP

▪ Evaluate impact of in-network congestion, and hardware constraints

Simulations

▪ Coflows arrive at time 0 ▪ Coflows arrive at arbitrary times

▪ Sensitivity analysis

➢Coflow sizes, structure, # of coflows ➢Network topologies, Oversubscription ratios, Network load ➢…

All simulations, workloads, and implementations are open- sourced on Sincronia website

SLIDE 17

Simulation Results Offl fline

1 2 3 4 5 6 7 8 9

Average 90th percentile 99th percentile Facebook trace 1000 coflow trace 2000 coflow trace

OCT: Completion time of a coflow in an unloaded network

Sincronia not only provides near-optimal guarantees, but also improves upon state-of-the-art design in practice

526 coflow trace [Varys]

SLIDE 18

Simulation Results Online

0.5 1 1.5 2 2.5 3 3.5 4

1000 coflow trace 2000 coflow trace

Average 90th percentile 99th percentile

Slowdown

Network Load = 0.9 Even at such high network loads, Sincronia achieves CCT close to that of an unloaded network

SLIDE 19

Implementation Results

Implemented on top of TCP

16-server Fat tree topology

▪ Full bisection bandwidth ▪ 20 PICA8 switches

➢ Supports 8 priority levels

DiffServ for priority scheduling

SLIDE 20

Implementation Results

20 40 60 80 100 120 140 160

Unfair Evaluation
TCP not designed for coflows
TCP not designed to minimize CT

+ Compare against existing designs

E.g. Varys reports 1.85x improvement

at mean and at tails

Average 90th percentile 99th percentile

Sincronia achieves significant improvements over existing network designs even with a small number of priority levels

SLIDE 21

Summary

Sincronia – a network design for coflows
4x within optimal
No per-flow rate allocation

Name Performance Guarantees Run on existing Transport Work Conserving Starvation Avoiding Varys On Scheduling Coflows Sincronia

(4-apx) (4-apx)

Paper discusses number of open problems

SLIDE 22

Sincronia:

Near-Optimal Network Design for Coflows

The Flow Abstraction

Is Flow Still the Right Abstraction?

The Coflow abstraction

Collection of semantically related flows [Chowdhury & Stoica, 2012]

Allows applications to more precisely express their performance goals

Coflow details known at arrival time:

Network and Coflow Model

Goal: Minimize Average Weighted Coflow Completion Time (CCT)

Prior Results

Practical, Near-Optimal Network Design for Coflows? Impossibility Results

Sincronia:

Given a set of coflows and a “right” ordering, ANY per-flow rate allocation mechanism that is work-conserving & order-preserving produces average CCT within 4x of optimal Guarantees 4-approximation for (weighted) average CCT

Two key results

Sincronia – Near-Optimal Network Design

Also outperforms state-of-the-art across evaluated workloads

Sincronia Design

Set of coflows Ordered set of coflows Priorities

Coflow

Flow Scheduling

Bottleneck-Select-Scale-Iterate (BSSI)

BSSI in Action

End-to-End Design(Offline)

BSSI

Per-flow Rate Allocation is Irrelevant

Flow blocked iff ingress or egress port serving higher-ordered flow

Given the BSSI ordering, ANY per-flow rate allocation mechanism that is work conserving & order-preserving produces average CCT within 4x of optimal

Avoiding per-flow rate allocation: Implications

Details in paper

Handling Arbitrary Arrival Times

Provides 12-competitive performance (details in paper)

Evaluation Overview

All simulations, workloads, and implementations are open- sourced on Sincronia website

Simulation Results Offl fline

Simulation Results Online

Implementation Results

Implemented on top of TCP

Implementation Results

Summary

Thanks!