Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen , - - PowerPoint PPT Presentation

scheduling mix flows in commodity datacenters with karuna
SMART_READER_LITE
LIVE PREVIEW

Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen , - - PowerPoint PPT Presentation

Scheduling Mix-flows in Commodity Datacenters with Karuna Li Chen , Kai Chen, Wei Bai, Mohammad Alizadeh (MIT) SING Group, CSE Department Hong Kong University of Science and Technology Datacenter Transport Deadline flows Meeting


slide-1
SLIDE 1

Scheduling Mix-flows in Commodity Datacenters with Karuna

Li Chen, Kai Chen, Wei Bai, Mohammad Alizadeh (MIT) SING Group, CSE Department Hong Kong University of Science and Technology

slide-2
SLIDE 2

Datacenter Transport

  • Deadline flows
  • Meeting deadlines
  • D3, D2TCP, …
  • General (non-deadline) flows
  • Reduce flow completion time (FCT).
  • pFabric, PDQ, PASE, PIAS, …

We investigate a practical, yet neglected, problem: Coexistence of deadline and non-deadline flows Mix-flow Scheduling

2

slide-3
SLIDE 3

Prior solutions do not work for mix-flows

Shortest Job First (SJF) Scheduling – pFabric, PASE, PIAS, PDQ

Deadline flows Non-deadline flows End-host End-host t

0.1 0.2 0.3 0.4 0.5 5 10 15 20 Fraction Percentage of non-deadline flows smaller than deadline flows.

Deadline Miss Rate

Scheduling only with sizes hurts deadline flows Problem: unawareness of deadlines.

3

Flow Priority = Remaining size

slide-4
SLIDE 4

Prior solutions do not work for mix-flows

Earliest Deadline First Scheduling – pFabric, PASE, PIAS, PDQ

Deadline flows Non-deadline flows End-host End-host t Prioritizing deadline flows hurts non-deadline flows, especially short ones. Problem: Existing transports for deadline flows unnecessarily takes all bandwidth.

5 10 15 20 1 2 3 4 5 6 7 8 ms Percentage of deadline flows in overall traffic

99 Percentile FCT

Non-deadline: Overall Non-deadline: Size<10KB 4

Flow Priority = Time till Deadline Deadline Deadline

slide-5
SLIDE 5

How to schedule mix-flows?

5

Deadline Flows

  • Meet deadlines
  • Flow deadline à Priority

Non-deadline Flows

  • Reduce FCT
  • Flow Size à Priority
slide-6
SLIDE 6

Karuna

  • Deadline flows
  • High priority with minimal bandwidth to complete just before

deadlines.

  • Non-deadline flows
  • Low priority but take all available bandwidth to reduce FCT.

Deadline Flows

Non-deadline Flows

MCP Work Conserv. Transport Highest Priority Priority 2 Priority 3 Priority K End-host Network Fabric

6

SJF Key Insight: Deadline flows should minimally impact non-deadline flows.

slide-7
SLIDE 7

MCP for deadline flows:

Completing deadlines with minimal bandwidth

Minimal-impact Congestion control Protocol

7

Deadline flows Non-deadline flows Implementation Evaluation

slide-8
SLIDE 8

MCP: Formulation and solution

  • Objective à Minimal impact
  • Per-packet latency
  • Constraints:
  • Meet deadlines
  • Network capacity

Stochastic Optimization Lyapunov Optimization Framework [1] Convex Optimization Primal Solution Per-flow congestion window update function

[1] M. J. Neely. Stochastic Network Optimization with Application to Communication and Queueing Systems, Morgan & Claypool, 2010. 8

slide-9
SLIDE 9

MCP: Formulation and solution

  • Objective à Minimal impact
  • Per-packet latency
  • Constraints
  • Meet deadlines
  • Network capacity
  • Solution

à Near-deadline completion

9

Rate t Link Cap Rate t Deadline

Stochastic Optimization Lyapunov Optimization Framework [1] Convex Optimization Primal Solution Per-flow congestion window update function

slide-10
SLIDE 10

Reducing FCT for non-deadline flows

Mimicking SJF Non-deadline flows with/out known sizes

10

Deadline flows Non-deadline flows Implementation Evaluation

slide-11
SLIDE 11

Non-deadline flows with unknown size

  • PIAS [2] is best known scheme.

[2] Wei Bai, et. al., Information-Agnostic Flow Scheduling for Commodity Data Centers, USENIX NSDI 2015

Highest Priority 2nd Highest Priority

Lowest Priority

Send packets tagged with the highest priority until α# bytes sent. Send packets tagged with 2nd highest priority until α$ bytes sent. Send packets tagged with the lowest priority. Flows

11

slide-12
SLIDE 12

Karuna for non-deadline flows

  • Non-deadline flows with unknown size ç PIAS
  • Non-deadline flows with known size
  • Karuna extends PIAS to schedule flows with/out known sizes.

Sum of Linear Ratios Problem (PIAS) Reformulation to include flows with known sizes Quadratic Sum of Ratios Problem (Karuna)

Demotion Thresholds: {𝛽'} Demotion Thresholds: {𝛽'} Splitting Thresholds: {𝛾'}

12

slide-13
SLIDE 13

Karuna for non-deadline flows: mimicking SJF

PIAS

Priority 2 Priority 3 Priority K Size ≤ 𝛾# 𝛾# < Size ≤ 𝛾$ 𝛾,-# < Size Flow with known sizes Flow without known sizes

13

End-host Network Fabric

slide-14
SLIDE 14

Implementation

14

Deadline flows Non-deadline flows Implementation Evaluation

slide-15
SLIDE 15

Implementation

Flow size Socket

Deadline

SO_MARK setsockopt()

Information passing Pass flow information (deadline, size) to the kernel using SO_MARK End-host Network Fabric

15

slide-16
SLIDE 16

Implementation

Flow size Socket

Deadline

SO_MARK setsockopt()

Tc module

pkt

Tagged pkt

Information Passing Packet tagging TC module at the sender-side. Tag DSCP fields in packet headers based on thresholds. End-host Network Fabric

16

slide-17
SLIDE 17

Implementation

Flow size Socket

Deadline

SO_MARK setsockopt()

Tc module

pkt

Tagged pkt

Modulate Congestion Window with MCP

Information Passing Packet tagging Rate control TC module. Non deadline flows use DCTCP Modifies window size using MCP. End-host Network Fabric

17

DCTCP

slide-18
SLIDE 18

Implementation

Flow size Socket

Deadline

SO_MARK setsockopt()

Tc module

pkt

Tagged pkt

Strict Priority Queueing Modulate Congestion Window with MCP

Information Passing Packet tagging Rate control Switch configuration ECN marking. Priority Queueing (priorities mapped to DSCP fields). End-host Network Fabric

18

Strict Priority Queueing Strict Priority Queueing DCTCP

slide-19
SLIDE 19

Evaluation

Testbed Experiments Simulations

19

Deadline flows Non-deadline flows Implementation Evaluation

slide-20
SLIDE 20

Evaluation: Testbed Experiments

  • Setup
  • 16 servers
  • A Gigabit Pronto-3295 switch
  • 8 Priority queues mapped to DSCP
  • RTT ~100us
  • Karuna kernel module
  • Traffic trace
  • Web search (DCTCP [3])
  • Data mining (VL2 [4])

20 [3] Alizadeh, Mohammad, et al. "Data center tcp (dctcp)." ACM SIGCOMM computer communication review. Vol. 40. No. 4. ACM, 2010. [4] Greenberg, Albert, et al. "VL2: a scalable and flexible data center network."ACM SIGCOMM computer communication review. Vol. 39. No. 4. ACM, 2009.

slide-21
SLIDE 21

Testbed Experiments: Deadline Flows

Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms

200 400 600 800 1000 1200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121

Mbps Time (ms)

DCTCP

Flow 1 Flow 2 Flow 3 Flow 4

Deadline Missed for Flow 1 21

slide-22
SLIDE 22

Testbed Experiments: Deadline Flows

Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms

200 400 600 800 1000 1200 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121

Mbps Time (ms)

pFabric – Earliest Deadline First

Flow 1 Flow 2 Flow 3 Flow 4

22 Flow 1 deadline Flow 2 deadline Flow 4 deadline Flow 3 deadline

slide-23
SLIDE 23

Testbed Experiments: Deadline Flows

Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms

200 400 600 800 1000 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75 77 79 81 83 85 87 89 91 93 95 97 99 101 103 105 107 109 111 113 115 117 119 121

Mbps Time (ms)

Karuna

Flow 1 Flow 2 Flow 3 Flow 4

23

slide-24
SLIDE 24

Testbed Experiments: Deadline Flows

Flow Size Deadline Start Time 1 14.4MB 20ms 0ms 2 48MB 120ms 0ms 3 3MB 5ms 50ms 4 0.5MB 10ms 80ms Karuna completes deadline flow just before deadline, leaving bandwidth for non-deadline flows.

100 200 300 400 500 600 700 800 900 1000 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120

Karuna

200 400 600 800 1000 1200 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120

pFabric – Earliest Deadline First

200 400 600 800 1000 1200 1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120

DCTCP

24

slide-25
SLIDE 25

Testbed Experiments: Non-deadline Flows

Mimics shortest job first scheduling for non-deadline flows.

0.916 2.721 1.716 5.554 8.04 28.725 5 10 15 20 25 30 35 0-100KB (Avg) 0-100KB (99th) ms Flow Size

FCT

Karuna DCTCP TCP 81.72 9 104.6 29 120.2 86 20 40 60 80 100 120 140 100KB-10MB (Avg) ms Flow Size

FCT

Karuna DCTCP TCP 851.7 2 718.3 69 608.0 4 100 200 300 400 500 600 700 800 900 1000 >10MB (Avg) ms Flow Size

FCT

Karuna DCTCP TCP 61.13 3 67.01 9 73.63 4 10 20 30 40 50 60 70 80 Overall ms

FCT

Karuna DCTCP TCP

25

slide-26
SLIDE 26

Evaluation: Simulations

  • Simulation Setup
  • Spine-leaf with 144 servers
  • 10G Server-ToR links
  • 40G ToR-Spine links
  • Compare with:
  • D3
  • D2TCP
  • pFabric - EDF

26

slide-27
SLIDE 27

Large-scale Simulations: Key Benefit of Karuna

2 4 6 8 10 0.8 0.85 0.9 0.95 % Deadline traffic load

Deadline Miss Rate

D3 D2TCP pFabric (EDF) Karuna 5 10 15 20 0.8 0.85 0.9 0.95 ms Deadline traffic load

95th Percentile FCT

D3 D2TCP pFabric Karuna 10000 20000 30000 40000 50000 60000 0.8 0.85 0.9 0.95 # Deadline traffic load

# Completed non- deadline flows

D3 D2TCP pFabric Karuna Reducing completion times of non-deadline flows while completing deadline flows.

27

>100x

slide-28
SLIDE 28

Concluding remarks

  • Filling a gap in datacenter flow scheduling
  • Karuna
  • Prioritizes deadline flows but control their rates.
  • Uses the remaining bandwidth to schedule non-deadline flows

based on size.

  • Thank you! Q & A!

28