Efficiently Delivering Online Services over Integrated - - PowerPoint PPT Presentation

efficiently delivering online services over integrated
SMART_READER_LITE
LIVE PREVIEW

Efficiently Delivering Online Services over Integrated - - PowerPoint PPT Presentation

Efficiently Delivering Online Services over Integrated Infrastructure Hongqiang Harry Liu, Raajay Viswanathan, MaC Calder Aditya Akella, Ratul Mahajan, Jitendra Padhye, Ming Zhang 1 Online Services 2 Online Service Delivery Infrastructure


slide-1
SLIDE 1

Efficiently Delivering Online Services

  • ver Integrated Infrastructure

1

Hongqiang Harry Liu, Raajay Viswanathan, MaC Calder Aditya Akella, Ratul Mahajan, Jitendra Padhye, Ming Zhang

slide-2
SLIDE 2

Online Services

2

slide-3
SLIDE 3

Data Centers Proxies

Online Service Delivery Infrastructure

3

Wide Area Network

slide-4
SLIDE 4

WAN

MulOple owners

Online Service Delivery is Evolving

Owned by a single enOty Tradi*onal infrastructure Integrated infrastructure

4

Operated by different ISPs Operated by content providers

slide-5
SLIDE 5

WAN

Integrated Infrastructure Enables Joint Control

  • f All Decisions

1. User – Proxy mapping 2. Proxy – DC mapping 3. Paths in the wide area network

5

slide-6
SLIDE 6

WAN

DC-1 DC-2 Proxy-2 Proxy-1

  • Increase efficiency: total traffic without congesOon
  • Improve performance: aggregate end-to-end latency

Proxy-3

6

Advantages of Joint Control

slide-7
SLIDE 7

Footprint: Jointly Controls the Integrated Infrastructure

Controller

UG – proxy latency System capacity User workload

P2 P1 DC1 DC2 P3

Control decisions for a user group:

  • UG—proxy mapping
  • proxy—DC mapping
  • network paths

Goals:

  • Maximize congesOon free traffic
  • Minimize end-to-end latency

7

Topology

UG1 Users grouped by locaOon, service provider

slide-8
SLIDE 8

Outline

  • Challenges in compuOng forwarding configuraOon
  • Other challenges in realizing Footprint
  • EvaluaOon

8

slide-9
SLIDE 9

CompuOng ConfiguraOon: Basic Approach

9

d1 d2 dn C1 C2 Cm

EsOmated user demands for an epoch Resource capaciOes EsOmated load from user ‘u’ on resource ‘r’ Capacity constraint for resource ‘r’ ObjecOve maximize

w1

1

w1

2

w1

m

wn

m

nu

r = du.wu r

{d1,d2,...,dn} {C1,C2,...,Cm} nr = nu

r ≤ Cr u

nu

r r

u

ln

m

l1

m

l1

2

l1

1

− lu

rnu r r

u

Latencies

Linear Program

slide-10
SLIDE 10

Does such a simple model suffice ?

10

No Because of the nature of traffic from different

  • nline applicaOons
slide-11
SLIDE 11

User Traffic Arrives over Sessions

  • MulOple requests and responses over a single session
  • Sessions are long-lived and arrive all through the duraOon of

an epoch

11

#Sessions on a Resource Ome (s) TCP

Requests Responses

Proxy #sessions varies over Ome

slide-12
SLIDE 12

Session SOckiness

12

Sessions s*ck to proxy and DC

  • Long lived TCP sessions
  • No fresh DNS query in the middle of a session

P1 P2 P1 P2 Switch traffic at t = T

Overlay link

# sessions

  • n P1

Ome (s) T # sessions

  • n P2

Ome (s) T Non-s*cky sessions # sessions

  • n P2

Ome (s) # sessions

  • n P1

Ome (s) T T S*cky sessions Old sessions are sOll forwarded to P1 UG1 UG1

n1

1

n1

1

n1

2

n1

2

slide-13
SLIDE 13

Challenge: Temporal VariaOon of Load

13

  • 1. Non-zero session lifeOme
  • 2. Session sOckiness

Gradually varying load from a user group to a resource

  • Resource capacity constraints should be saOsfied during enOre epoch
  • ComputaOonally infeasible if

does not have a closed form

  • ApplicaOons have arbitrary session life distribuOons

nu

r ≤ Cr u

nu

r(t) ≤ Cr u

nu

r(t)

slide-14
SLIDE 14

How to guarantee congesOon free delivery for traffic on sessions?

14

slide-15
SLIDE 15

High Fidelity Modeling of Load

previous 300

nnew

Ome (s) 300

nold

Ome (s)

15

ProporOonal to arrival rate of new sessions

nr(t) = nr

new(t)+ nr

  • ld(t)

Arrival rate of sessions (decision variable) Always holds this paCern

nr

  • current

nr(t) = λ rF(t)+ no

rG(t)

PaCern FuncOons ß Session length distribuOon

300

CDF Session life Ome (s)

100 200 1

slide-16
SLIDE 16

DiscreOzing the Temporal Model

16

Approximate by a Oght piecewise linear upper bound,

nr(t) ≤ n

__ r(t) = λ r F __

(t)+ nr

0G(t)

n

__ r(t)

  • has maximum at one of the corners
  • Capacity constraints have to be checked only at fixed set of points
  • Op*mal ‘s obtained by solving a linear program

T Ome

t1 t2 t3

F

__

(t) F(t) F(t) F

__

(t) λ r

slide-17
SLIDE 17

Footprint: System ImplementaOon

17

Gathering Inputs CompuOng OpOmal Forwarding ImplemenOng Computed ConfiguraOon

slide-18
SLIDE 18

Footprint: Inputs to the controller

  • Input data collected every 5 minutes
  • Inputs:

– User group – proxy latency measurements

  • Piggy-back on end-host applicaOons
  • Instrumented JavaScript on bing.com webpage [Calder et al., IMC 2015]

– User workload

  • EsOmated using observed workload in prior epochs

– System health status

  • From Microsoo internal system monitoring pipelines
  • Deployed in producOon

18

slide-19
SLIDE 19

ImplemenOng Computed ConfiguraOon

  • UG—proxy mapping: DNS (BIND)
  • Proxy—DC mapping: Custom sooware to change configuraOon
  • WAN path selecOon: OpenFlow
  • Prototyped on a modest-sized testbed

19

slide-20
SLIDE 20

EvaluaOon

20

  • 1. Joint Decisions
  • 2. Temporal Modeling
slide-21
SLIDE 21

EvaluaOon Setup

21

  • Trace driven simulaOons
  • Data
  • Taken from producOon deployment of Footprint
  • One week worth of data
  • MulOple topologies (North America, Europe)
  • Scale
  • O(10k) user groups
  • O(100) routers and links
  • O(100) proxies
  • O(10) data centers
  • Metric
  • Efficiency: Maximum traffic with no congesOon
  • Performance: Aggregated end-to-end latency
slide-22
SLIDE 22

EvaluaOon: Efficiency of Joint Control

  • Footprint can carry 2x more load because user traffic is

diverted to resources with unused capacity

22

0.5 1 1.5 2 2.5 FastRoute Footprint Normalized Traffic Scale

FastRoute [Flavel et al., NSDI 2015]

  • UG—proxy: Closest proxy decided by Anycast rouOng
  • Proxy—DC: Closest proxy based on acOve measurements
  • WAN path selecOon: Independent traffic engineering module
slide-23
SLIDE 23

EvaluaOon: Latency Improvement

23

Latency (ms)

20 40 60 80 100 FastRoute Footprint External Delay Queuing Delay Internal PropagaOon Delay

Footprint decreases overall latency by ~60%

Compare end-to-end latency at 70% capacity of FastRoute

slide-24
SLIDE 24

EvaluaOon: Efficiency of Temporal Modeling

More than 50% gains with respect to non-temporal models.

  • Compare with non-temporal models

– JointAverage: – JointWorst:

24

1.48 1.18 2.3 1 2 3 JointAverage JointWorst Footprint Normalized Traffic Scale

nr(t) = max

t

(#old sessions) + max

t

(#new sessions)

nr(t) = λ x Average session length

slide-25
SLIDE 25

Related Work

25

  • To coordinate or not to coordinate? [Narayana et. al, SIGMETRICS 2012]
  • CooperaOve world vs Single enOty world
  • Show importance of temporal load modeling
slide-26
SLIDE 26

Summary

  • Joint decision for proxy, DC and WAN path

selecOon

  • 100% increase in supported users, and,
  • 60% reducOon in end-to-end latency
  • High fidelity temporal models 50% efficient than non-

temporal models

26

Ome (s) #Sessions

slide-27
SLIDE 27

27