Interactive Services with Tail Latency Constraint Mohammad A. Islam, - - PowerPoint PPT Presentation

โ–ถ
interactive services with tail latency constraint
SMART_READER_LITE
LIVE PREVIEW

Interactive Services with Tail Latency Constraint Mohammad A. Islam, - - PowerPoint PPT Presentation

Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832,


slide-1
SLIDE 1

Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint

Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren

This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832, CNS-1464151, CNS-1551661, CNS-1565474, and ECCS-1610471

slide-2
SLIDE 2

Data centers

  • Large IT companies have data centers all over the world
  • Can exploit spatial diversity using Geographical Load Balancing (GLB)

2

slide-3
SLIDE 3
  • Avg. latency tโ€™=mean(t1,t2,t3)

Data Center 1 Data Center 2 Data Center 3

Geographical load balancing (GLB)

3

slide-4
SLIDE 4

Data Center 1 Data Center 2 Data Center 3

Geographical load balancing (GLB)

Assuming data required is centrally managed, and replicated over all the sites

4

  • Avg. latency tโ€™=mean(t1,t2,t3)
slide-5
SLIDE 5

Europe

  • N. America

Asia

GLB is facing new challenges

5

  • Tons of locally generated data
  • Smart home, IoT, edge computing
slide-6
SLIDE 6

Europe

  • N. America

Asia

GLB is facing new challenges

6

  • Tons of locally generated data
  • Smart home, IoT, edge computing
  • Limited BW for large data transfer
slide-7
SLIDE 7

Europe

  • N. America

Asia

GLB is facing new challenges

7

X

Centralized processing is not practical

  • Tons of locally generated data
  • Smart home, IoT, edge computing
  • Limited BW for large data transfer
  • Government restriction due to data

sovereignty and privacy concerns

slide-8
SLIDE 8

Geo-distributed processing is emerging

8

slide-9
SLIDE 9

Geo-distributed processing

9

request (๐’”) request (๐’”)

User

Region 3 Region 1 Region 2

slide-10
SLIDE 10

Geo-distributed processing

10

response (๐’–๐Ÿ‘) request (๐’”) request (๐’”)

User

Region 3 Region 1 Region 2

Regional Data Center Processing Request

slide-11
SLIDE 11

Geo-distributed processing

11

response (๐’–๐Ÿ‘) request (๐’”) request (๐’”) response ๐’–โ€™ = ๐’๐’ƒ๐’š(๐’–๐Ÿ, ๐’–๐Ÿ‘, ๐’–๐Ÿ’)

User

Region 3 Region 1 Region 2

Response time depends on multiple data centers

Regional Data Center Processing Request

slide-12
SLIDE 12

Tail latency based SLO

  • Service providers prefer tail latency (i.e., response time) based SLO
  • Two parameters
  • Percentile value (e.g., 95% or p95)
  • Latency threshold
  • Example
  • SLO of p95 and 100ms, means 95% of the response times should be less than

100ms

  • Existing research on GLB mostly focuses on average latency
  • Zhenhua Liu [Sigmetricsโ€™11], Darshan S. Palasamudram [SoCCโ€™12], Kien Li [IGCCโ€™10,

SCโ€™11], Yanwei Zhang [Middlewareโ€™11]โ€ฆ

12

slide-13
SLIDE 13

Challenges of geo-distributed processing

  • How to characterize the tail latency?
  • Response time depends on multiple paths for each request
  • Includes large network latency
  • Simple queueing models like M/M/1 for average latency cannot be used
  • How to optimize load distribution among data centers?

McTail: a novel GLB algorithm with data driven profiling of tail latency

13

slide-14
SLIDE 14

Problem formulation

  • General formulation with ๐‘‚ data centers and ๐‘‡ traffic sources

minimize๐‘ เท

๐‘˜=1 ๐‘‚

๐‘Ÿ๐‘˜ โ‹… ๐‘“

๐‘˜(๐‘๐‘˜)

subject to, ๐‘ž๐‘— ิฆ ๐‘, ิฆ ๐‘  โ‰ฅ ๐‘„๐‘—

๐‘‡๐‘€๐‘ƒ, โˆ€๐‘— = 1,2, โ‹ฏ , ๐‘‡

  • ิฆ

๐‘ = {๐‘1, ๐‘2, โ‹ฏ ๐‘๐‘‚} is workload (request processed) at different data centers

  • ๐‘ 

๐‘— is the network paths from source ๐‘— to all the data centers

  • ๐‘ž๐‘— is Pr(๐‘’๐‘— โ‰ค ๐ธ๐‘—), where ๐‘’๐‘— is end-to-end response time at traffic source ๐‘—,

and ๐ธ๐‘— is delay target (e.g., 100ms) for tail latency Total electricity cost Tail latency constraint

14

slide-15
SLIDE 15

How to determine ๐’’๐’‹(๐’ƒ, ๐’”๐’‹)?

15

slide-16
SLIDE 16

User Route ๐’”๐’‹,๐’Œ Source ๐’‹ Data Center ๐’Œ

๐’’๐’‹,๐’Œ

๐’”๐’‘๐’—๐’–๐’‡(๐’ƒ๐’Œ, ๐’”๐’‹,๐’Œ) is the probability that response

time of ๐’”๐’‹,๐’Œ is less than ๐‘ฌ๐’‹

16

slide-17
SLIDE 17

User Route ๐’”๐’‹,๐Ÿ‘ Source ๐’‹ Data Center ๐Ÿ‘ Data Center ๐Ÿ Data Center ๐Ÿ’

17

Same request is sent to all the data centers of a group

slide-18
SLIDE 18

User Route ๐’”๐’‹,๐Ÿ‘ Source ๐’‹ Data Center ๐Ÿ‘

Destination group ๐’‰

Data Center ๐Ÿ Data Center ๐Ÿ’

18

Same request is sent to all the data centers of a group

slide-19
SLIDE 19

User Route ๐’”๐’‹,๐Ÿ‘ Source ๐’‹ Data Center ๐Ÿ‘

Destination group ๐’‰

Data Center ๐Ÿ Data Center ๐Ÿ’

19

Same request is sent to all the data centers of a group Because of differences in data sets, random performance interference etc., response time over different routes can be considered un-correlated

๐’’๐’‹,๐’‰

๐’‰๐’”๐’‘๐’—๐’’ ๐’ƒ, ๐’” = ๐’’๐’‹,๐Ÿ ๐’”๐’‘๐’—๐’–๐’‡ ๐’ƒ๐Ÿ, ๐’”๐’‹,๐Ÿ ร— ๐’’๐’‹,๐Ÿ‘ ๐’”๐’‘๐’—๐’–๐’‡ ๐’ƒ๐Ÿ, ๐’”๐’‹,๐Ÿ‘ ร— ๐’’๐’‹,๐Ÿ ๐’”๐’‘๐’—๐’–๐’‡ ๐’ƒ๐Ÿ’, ๐’”๐’‹,๐Ÿ’

slide-20
SLIDE 20

User ๐’’๐’‹,๐’‰

๐’‰๐’”๐’‘๐’—๐’’ =

๐Ÿ. ๐Ÿ˜๐Ÿ˜ ร— ๐Ÿ. ๐Ÿ˜๐Ÿ— ร— ๐Ÿ. ๐Ÿ˜๐Ÿ– โ‰ˆ ๐Ÿ. ๐Ÿ˜๐Ÿ“ Source ๐’‹ Data Center ๐Ÿ‘ Data Center ๐Ÿ Data Center ๐Ÿ’

Example

For requests sent to this group of data centers, 94% of the response times are less than ๐‘ฌ๐’‹

๐’’๐’‹,๐Ÿ‘

๐’”๐’‘๐’—๐’–๐’‡ = ๐Ÿ. ๐Ÿ˜๐Ÿ—

20

slide-21
SLIDE 21

Response time probability for a source

  • ๐ป = ๐‘‚1 ร— ๐‘‚2 ร— โ‹ฏ ร— ๐‘‚๐‘ possible destination groups
  • Where ๐‘‚๐‘› is the number of data center in region ๐‘›
  • Response time probability at source ๐‘— is

๐‘ž๐‘— ๐œ‡ = ๐‘ž๐‘— ิฆ ๐‘, ิฆ ๐‘  = 1 ฮ›๐‘— เท

๐‘•=1 ๐ป

๐œ‡๐‘—,๐‘• โ‹… ๐‘ž๐‘—,๐‘•

๐‘•๐‘ ๐‘๐‘ฃ๐‘ž( ิฆ

๐‘, ิฆ ๐‘ )

  • ๐œ‡๐‘—,๐‘• is the workload sent to destination group ๐‘•
  • ฮ›๐‘— = ฯƒ๐‘•=1

๐ป

๐œ‡๐‘—,๐‘• is the total workload from source ๐‘—

Weighted average over all the groups

21

slide-22
SLIDE 22

Updated problem formulation

minimize๐‘ เท

๐‘˜=1 ๐‘‚

๐‘Ÿ๐‘˜ โ‹… ๐‘“

๐‘˜(๐‘๐‘˜)

subject to, 1 ฮ›๐‘— เท

๐‘•=1 ๐ป

๐œ‡๐‘—,๐‘• โ‹… ๐‘ž๐‘—,๐‘•

๐‘•๐‘ ๐‘๐‘ฃ๐‘ž( ิฆ

๐‘, ิฆ ๐‘ ) โ‰ฅ ๐‘„๐‘—

๐‘‡๐‘€๐ต, โˆ€๐‘— = 1,2, โ‹ฏ , ๐‘‡

เท

๐‘•=1 ๐ป

๐œ‡๐‘—,๐‘• = ฮ›๐‘—, โˆ€๐‘— = 1,2, โ‹ฏ , ๐‘‡

Need to determine ๐’’๐’‹,๐’Œ

๐’”๐’‘๐’—๐’–๐’‡(๐’ƒ๐’Œ, ๐’”๐’‹,๐’Œ) for all routes

Objective same as before, minimizing electricity cost Tail latency decomposed into route-wise latencies

22

Workload constraint

slide-23
SLIDE 23

Profiling response time probability of a route

  • We need tail latency
  • Hard to model for arbitrary workload distributions
  • Data driven approach - profile the response time statistics (find the

probability distribution) from observed data

  • Example
  • Response profile for 100K request

23

slide-24
SLIDE 24

Challenges of data driven approach

  • Response time profile of a route depends on amount of data center workload
  • We set ๐‘‹ discrete levels of workload for each data center
  • ๐‘‡ ร— ๐‘‚ network paths between ๐‘‡ sources and ๐‘‚ data centers
  • Total ๐‘ป ร— ๐‘ฟ ร— ๐‘ถ number of profiles
  • Need to update if network latency distribution, data center configuration, or

workload composition changes

Slow and repeated profiling

24

slide-25
SLIDE 25

Profiling response statistics for one route

  • ๐บ๐‘—,๐‘˜

๐‘‚ is network latency distribution

  • ๐บ

๐‘˜ ๐ธ(๐‘ฆ) is data center latency distribution with load ๐‘ฆ

  • End-to-end latency distribution of route ๐‘ 

๐‘—,๐‘˜ is

๐‘ฎ๐’‹,๐’Œ

๐’ = ๐‘ฎ๐’‹,๐’Œ ๐‘ถ โˆ— ๐‘ฎ๐’Œ ๐‘ฌ(๐’š)

  • where " โˆ— โ€œ is the convolution operator

Key idea: profile ๐‘ฎ๐’‹,๐’Œ

๐‘ถ and ๐‘ฎ๐’Œ ๐‘ฌ ๐’š seperately

25

slide-26
SLIDE 26

Convolution Latency of data center ๐’Œ with load ๐’š End-to-end response profile of a route,๐‘ฎ๐’‹,๐’Œ

๐’

Example

Network latency of route ๐’”๐’‹,๐’Œ

26

slide-27
SLIDE 27

Profiling response time statistics in McTail

  • ๐‘‡ ร— ๐‘‚ network routes profiles
  • ๐‘‚ ร— ๐‘‹ data centers profiles
  • Total ๐‘ป + ๐‘ฟ ร— ๐‘ถ profiles versus ๐‘ป ร— ๐‘ฟ ร— ๐‘ถ profiles before
  • Profiling overhead
  • Only data center profiles need updating when workload composition and/or data center

configuration is changed

  • Infrequent event
  • Network latency distribution may change more frequently
  • Already monitored by service providers
  • Data overhead comparable to existing GLB studies

27

slide-28
SLIDE 28

McTail system diagram

Service Time Distribution Network Latency Distribution Traffic Gateway Traffic Gateway Traffic Gateway Data Center Data Center Network Profiler, ๐‘ฎ๐‘ถ Network Profiler, ๐‘ฎ๐‘ถ Network Profiler, ๐‘ฎ๐‘ถ Data Center Profiler, ๐‘ฎ๐‘ฌ(๐’š) Utilization Utilization Data Center Profiler, ๐‘ฎ๐‘ฌ(๐’š)

28

slide-29
SLIDE 29

McTail system diagram

McTail

Service Time Distribution Network Latency Distribution Traffic Gateway Traffic Gateway Traffic Gateway Data Center Data Center Network Profiler, ๐‘ฎ๐‘ถ Network Profiler, ๐‘ฎ๐‘ถ Network Profiler, ๐‘ฎ๐‘ถ Data Center Profiler, ๐‘ฎ๐‘ฌ(๐’š) Utilization Utilization Data Center Profiler, ๐‘ฎ๐‘ฌ(๐’š) Electricity Price, ๐’“๐’‹ Workload Prediction, ๐šณ๐ฃ

29

slide-30
SLIDE 30

McTail system diagram

McTail

Service Time Distribution Network Latency Distribution Traffic Gateway Traffic Gateway Traffic Gateway Data Center Data Center Network Profiler, ๐‘ฎ๐‘ถ Network Profiler, ๐‘ฎ๐‘ถ Network Profiler, ๐‘ฎ๐‘ถ Data Center Profiler, ๐‘ฎ๐‘ฌ(๐’š) Utilization Utilization Data Center Profiler, ๐‘ฎ๐‘ฌ(๐’š) Load Distribution, ๐ Electricity Price, ๐’“๐’‹ Workload Prediction, ๐šณ๐ฃ

30

slide-31
SLIDE 31

Evaluation

31

slide-32
SLIDE 32

Evaluation setup

32

3 regions, 9 data centers Based on Google and Facebook data center locations

slide-33
SLIDE 33

Evaluation setup

33

3 regions, 9 data centers 5 traffic sources Based on Google and Facebook data center locations

slide-34
SLIDE 34

Evaluation setup

  • Discrete event simulation using SimEvents from Mathworks
  • Half-normal network latency distribution based on route length
  • Real world traces from Google and Microsoft
  • Location wise electricity prices
  • SLO set to p95 response time of 1.5 seconds
  • 24 hour simulation with load distribution updated every 15 minutes
  • Homogenous data center setting to ease the simulation

34

slide-35
SLIDE 35

Cost saving

7% Cost saving using McTail

35

slide-36
SLIDE 36

Performance

Always โ‰ฅ ๐Ÿ. ๐Ÿ˜๐Ÿ”

36

slide-37
SLIDE 37

Impact of SLO change

37

Saving goes up as response time threshold is relaxed

slide-38
SLIDE 38

Impact of SLO change

38

More saving when percentile requirement is less stringent

slide-39
SLIDE 39

McTail

  • A novel GLB algorithm for geo-distributed interactive services
  • Data-driven approach to characterize the tail latency
  • Negligible extra profiling overhead

Practical and efficient

39