A Case for Fine Grained Traffic Engineering in Data Centers - - PowerPoint PPT Presentation

a case for fine grained traffic engineering in data
SMART_READER_LITE
LIVE PREVIEW

A Case for Fine Grained Traffic Engineering in Data Centers - - PowerPoint PPT Presentation

A Case for Fine Grained Traffic Engineering in Data Centers Engineering in Data Centers Theophilus Benson *, Ashok Anand*, Aditya Akella*, Ming Zhang + *University of Wisconsin, Madison + Microsoft Research Why are Data Centers Important?


slide-1
SLIDE 1

A Case for Fine Grained Traffic Engineering in Data Centers Engineering in Data Centers

Theophilus Benson*, Ashok Anand*, Aditya Akella*, Ming Zhang+ *University of Wisconsin, Madison

+ Microsoft Research

slide-2
SLIDE 2

Why are Data Centers Important?

  • Congestion == bad app. performance
  • Bad app performance == user dissatisfaction
  • User dissatisfaction == loss of revenue
  • Traffic engineering is crucial
  • IM: low B/W, loose latency
  • Multimedia: low B/W, strict latency
  • Games: high B/W, strict latency
slide-3
SLIDE 3

Outline

  • Background
  • Traffic Engineering in data centers
  • Design goals for ideal TE
  • MicroTE
  • MicroTE
  • Conclusion
slide-4
SLIDE 4

Options for TE in Data Centers?

  • Current supported techniques

– Equal Cost MultiPath (ECMP) – Spanning Tree Protocol (STP)

  • Proposed (ECMP based)
  • Proposed (ECMP based)

– Fat-Tree, VL2

  • Other existing

– TEXCP, COPE,…, OSPF link tuning

slide-5
SLIDE 5

Properties of Data Center Traffic

  • Flows are small and short-lived [Kandula et. al, 2009]
  • Traffic is bursty [Benson et. al, 2009]
  • Traffic is unpredictable at 100 secs [Maltz et. al, 2009]
slide-6
SLIDE 6

How do we evaluate TE?

  • Data center traces

– Cloud data center

  • Map-reduce app
  • ~1500 servers,
  • ~80 switches

….

  • 1 sec snapshots for 24 hours
  • Simulator

– Input:

  • Traffic matrix, Topology ,Traffic Engineering

– Output:

  • link utilization
slide-7
SLIDE 7

Draw Backs of Existing TE

  • STP does not use multiple path
  • ECMP does not adapt to burstiness
slide-8
SLIDE 8

Draw Backs of Proposed TE

  • Fat-Tree

– Rehash flows – Local opt. != global opt.

  • VL2

– Coarse grained flow assignment

  • VL2 & Fat-Tree do not adapt to burstiness
slide-9
SLIDE 9

Draw Backs of Other Approaches

  • TEXCP, COPE …. OSPF link tuning
  • x
  • Unable to react fast enough (below 100 secs)
slide-10
SLIDE 10

Design Requirements for TE

  • Calculate paths & reconfigure

network

– Use all network paths – Use global view – Must react quickly

….

  • How predictable is traffic?
slide-11
SLIDE 11

Is Data Center Traffic Predictable?

  • YES! 33% of traffic is predictable
slide-12
SLIDE 12

How Long is Traffic Predictable?

  • TE must react in under 2 seconds
slide-13
SLIDE 13

MicroTE: Architecture

….

Monitoring Component Routing Component Network Controller

….

  • Based on OpenFlow framework
  • Global view:
  • created by network controller
  • React to predictable traffic:
  • routing component tracks demand history
  • All N/W paths:
  • routing component creates routes using all paths
slide-14
SLIDE 14

Routing Component

  • Step 1: Determine predictable traffic
  • Step 2: Route along rarely utilized paths

– Currently use LP – Faster Algorithm == future work – Faster Algorithm == future work

  • Step 3: Set ECMP for other traffic
  • Step 4: Return routes
slide-15
SLIDE 15

Routing Component

Calculate Network Routes for predictable traffic Determine Predictable ToRs New Global View

Now: Use LP Future: Use heuristic

Return Calculated Routes predictable traffic Set ECMP for unpredictable traffic

No Yes

Return Nothing Significant Change in Routes? Add Network View to History

slide-16
SLIDE 16

Tradeoffs: Monitoring Component

….

Monitoring Component Network Controller Routing Component

  • Switch based

– Low complexity – High overhead

  • End-host based

– Low overhead – High complexity

….

slide-17
SLIDE 17

Preliminary Evaluation

  • Outperforms ECMP
  • Slightly worse than optimal
slide-18
SLIDE 18

Conclusion

  • Study existing TE

– Found them lacking (15-20%)

  • Study data center traffic

– Discovered traffic predictability (33% for 2 secs) – Discovered traffic predictability (33% for 2 secs)

  • Guidelines for ideal TE
  • MicroTE

– Implementation of ideal TE – Preliminary evaluation

slide-19
SLIDE 19

Thank You

  • Questions?