interactive services with tail latency constraint
play

Interactive Services with Tail Latency Constraint Mohammad A. Islam, - PowerPoint PPT Presentation

Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832,


  1. Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832, CNS-1464151, CNS-1551661, CNS-1565474, and ECCS-1610471

  2. Data centers • Large IT companies have data centers all over the world • Can exploit spatial diversity using Geographical Load Balancing ( GLB ) 2

  3. Geographical load balancing (GLB) Data Center 2 Data Center 1 Data Center 3 Avg. latency t’=mean(t1,t2,t3) 3

  4. Geographical load balancing (GLB) Data Center 2 Data Center 1 Data Center 3 Assuming data required is centrally managed, and replicated over all the sites Avg. latency t’=mean(t1,t2,t3) 4

  5. GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing Europe Asia 5

  6. GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing • Limited BW for large data transfer Europe Asia 6

  7. GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing • Limited BW for large data transfer • Government restriction due to data Europe sovereignty and privacy concerns X Centralized processing is not practical Asia 7

  8. Geo-distributed processing is emerging 8

  9. Geo-distributed processing Region 2 request (𝒔) Region 1 Region 3 request (𝒔) User 9

  10. Geo-distributed processing Region 2 response (𝒖𝟑) request (𝒔) Region 1 Region 3 Regional Data Center request (𝒔) Processing Request User 10

  11. Geo-distributed processing Region 2 response (𝒖𝟑) request (𝒔) Region 1 Region 3 Response time depends on multiple data centers Regional Data Center request (𝒔) response 𝒖’ = 𝒏𝒃𝒚(𝒖𝟐, 𝒖𝟑, 𝒖𝟒) Processing Request User 11

  12. Tail latency based SLO • Service providers prefer tail latency (i.e., response time) based SLO • Two parameters • Percentile value (e.g., 95% or p95) • Latency threshold • Example • SLO of p95 and 100ms , means 95% of the response times should be less than 100ms • Existing research on GLB mostly focuses on average latency • Zhenhua Liu [Sigmetrics’11], Darshan S. Palasamudram [SoCC’12], Kien Li [IGCC’10, SC’11], Yanwei Zhang [Middleware’11]… 12

  13. Challenges of geo-distributed processing • How to characterize the tail latency? • Response time depends on multiple paths for each request • Includes large network latency • Simple queueing models like M/M/1 for average latency cannot be used • How to optimize load distribution among data centers? McTail: a novel GLB algorithm with data driven profiling of tail latency 13

  14. Problem formulation Total electricity cost • General formulation with 𝑂 data centers and 𝑇 traffic sources 𝑂 minimize 𝑏 ෍ 𝑟 𝑘 ⋅ 𝑓 𝑘 (𝑏 𝑘 ) Tail latency constraint 𝑘=1 𝑇𝑀𝑃 , ∀𝑗 = 1,2, ⋯ , 𝑇 subject to, 𝑞 𝑗 Ԧ 𝑏, Ԧ 𝑠 ≥ 𝑄 𝑗 𝑏 = {𝑏 1 , 𝑏 2 , ⋯ 𝑏 𝑂 } is workload (request processed) at different data centers • Ԧ 𝑗 is the network paths from source 𝑗 to all the data centers • 𝑠 • 𝑞 𝑗 is Pr(𝑒 𝑗 ≤ 𝐸 𝑗 ) , where 𝑒 𝑗 is end-to-end response time at traffic source 𝑗, and 𝐸 𝑗 is delay target (e.g., 100ms) for tail latency 14

  15. How to determine 𝒒 𝒋 (𝒃, 𝒔 𝒋 ) ? 15

  16. User Route 𝒔 𝒋,𝒌 Source 𝒋 Data Center 𝒌 𝒔𝒑𝒗𝒖𝒇 (𝒃 𝒌 , 𝒔 𝒋,𝒌 ) is the probability that response 𝒒 𝒋,𝒌 time of 𝒔 𝒋,𝒌 is less than 𝑬 𝒋 16

  17. Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 Data Center 𝟒 17

  18. Destination group 𝒉 Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 Data Center 𝟒 18

  19. Destination group 𝒉 Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 𝒉𝒔𝒑𝒗𝒒 𝒃, 𝒔 = 𝒒 𝒋,𝟐 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟐 , 𝒔 𝒋,𝟐 × 𝒒 𝒋,𝟑 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟐 , 𝒔 𝒋,𝟑 × 𝒒 𝒋,𝟐 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟒 , 𝒔 𝒋,𝟒 𝒒 𝒋,𝒉 Data Center 𝟒 Because of differences in data sets, random performance interference etc., response time over different routes can be considered un-correlated 19

  20. Example Data Center 𝟐 User 𝒔𝒑𝒗𝒖𝒇 = 𝟏. 𝟘𝟗 Source 𝒋 𝒒 𝒋,𝟑 Data Center 𝟑 𝒉𝒔𝒑𝒗𝒒 = 𝒒 𝒋,𝒉 𝟏. 𝟘𝟘 × 𝟏. 𝟘𝟗 × 𝟏. 𝟘𝟖 ≈ 𝟏. 𝟘𝟓 Data Center 𝟒 For requests sent to this group of data centers, 94% of the response times are less than 𝑬 𝒋 20

  21. Response time probability for a source • 𝐻 = 𝑂 1 × 𝑂 2 × ⋯ × 𝑂 𝑁 possible destination groups • Where 𝑂 𝑛 is the number of data center in region 𝑛 • Response time probability at source 𝑗 is 𝐻 𝑠 = 1 𝑕𝑠𝑝𝑣𝑞 ( Ԧ 𝑞 𝑗 𝜇 = 𝑞 𝑗 Ԧ 𝑏, Ԧ ෍ 𝜇 𝑗,𝑕 ⋅ 𝑞 𝑗,𝑕 𝑏, Ԧ 𝑠) Λ 𝑗 𝑕=1 • 𝜇 𝑗,𝑕 is the workload sent to destination group 𝑕 Weighted average over all 𝐻 𝜇 𝑗,𝑕 is the total workload from source 𝑗 • Λ 𝑗 = σ 𝑕=1 the groups 21

  22. Updated problem formulation Objective same as before, 𝑂 minimizing electricity cost minimize 𝑏 ෍ 𝑟 𝑘 ⋅ 𝑓 𝑘 (𝑏 𝑘 ) 𝑘=1 𝐻 subject to, 1 Tail latency decomposed 𝑕𝑠𝑝𝑣𝑞 ( Ԧ 𝑇𝑀𝐵 , ∀𝑗 = 1,2, ⋯ , 𝑇 ෍ 𝜇 𝑗,𝑕 ⋅ 𝑞 𝑗,𝑕 𝑏, Ԧ 𝑠) ≥ 𝑄 𝑗 into route-wise latencies Λ 𝑗 𝑕=1 𝐻 Workload constraint ෍ 𝜇 𝑗,𝑕 = Λ 𝑗 , ∀𝑗 = 1,2, ⋯ , 𝑇 𝑕=1 𝒔𝒑𝒗𝒖𝒇 (𝒃 𝒌 , 𝒔 𝒋,𝒌 ) for all routes Need to determine 𝒒 𝒋,𝒌 22

  23. Profiling response time probability of a route • We need tail latency • Hard to model for arbitrary workload distributions • Data driven approach - profile the response time statistics (find the probability distribution) from observed data • Example • Response profile for 100K request 23

  24. Challenges of data driven approach • Response time profile of a route depends on amount of data center workload • We set 𝑋 discrete levels of workload for each data center • 𝑇 × 𝑂 network paths between 𝑇 sources and 𝑂 data centers • Total 𝑻 × 𝑿 × 𝑶 number of profiles • Need to update if network latency distribution, data center configuration, or workload composition changes Slow and repeated profiling 24

  25. Profiling response statistics for one route 𝑂 is network latency distribution • 𝐺 𝑗,𝑘 𝐸 (𝑦) is data center latency distribution with load 𝑦 • 𝐺 𝑘 • End-to-end latency distribution of route 𝑠 𝑗,𝑘 is 𝐒 = 𝑮 𝒋,𝒌 𝑶 ∗ 𝑮 𝒌 𝑬 (𝒚) 𝑮 𝒋,𝒌 • where " ∗ “ is the convolution operator 𝑶 and 𝑮 𝒌 𝑬 𝒚 seperately Key idea: profile 𝑮 𝒋,𝒌 25

  26. Example Latency of data center 𝒌 with load 𝒚 Network latency of route 𝒔 𝒋,𝒌 Convolution End-to-end response 𝐒 profile of a route, 𝑮 𝒋,𝒌 26

  27. Profiling response time statistics in McTail • 𝑇 × 𝑂 network routes profiles • 𝑂 × 𝑋 data centers profiles • Total 𝑻 + 𝑿 × 𝑶 profiles versus 𝑻 × 𝑿 × 𝑶 profiles before • Profiling overhead • Only data center profiles need updating when workload composition and/or data center configuration is changed • Infrequent event • Network latency distribution may change more frequently • Already monitored by service providers • Data overhead comparable to existing GLB studies 27

  28. McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway 28

  29. McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway McTail Workload Prediction, 𝚳 𝐣 Electricity Price, 𝒓 𝒋 29

  30. McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway McTail Workload Prediction, 𝚳 𝐣 Electricity Price, 𝒓 𝒋 Load Distribution, 𝝁 30

  31. Evaluation 31

  32. Evaluation setup Based on Google and Facebook data center locations 3 regions, 9 data centers 32

  33. Evaluation setup Based on Google and Facebook data center locations 3 regions, 9 data centers 5 traffic sources 33

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend