Interactive Services with Tail Latency Constraint Mohammad A. Islam, - PowerPoint PPT Presentation

Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832, CNS-1464151, CNS-1551661, CNS-1565474, and ECCS-1610471

Data centers • Large IT companies have data centers all over the world • Can exploit spatial diversity using Geographical Load Balancing ( GLB ) 2

Geographical load balancing (GLB) Data Center 2 Data Center 1 Data Center 3 Avg. latency t’=mean(t1,t2,t3) 3

Geographical load balancing (GLB) Data Center 2 Data Center 1 Data Center 3 Assuming data required is centrally managed, and replicated over all the sites Avg. latency t’=mean(t1,t2,t3) 4

GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing Europe Asia 5

GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing • Limited BW for large data transfer Europe Asia 6

GLB is facing new challenges N. America • Tons of locally generated data • Smart home, IoT, edge computing • Limited BW for large data transfer • Government restriction due to data Europe sovereignty and privacy concerns X Centralized processing is not practical Asia 7

Geo-distributed processing is emerging 8

Geo-distributed processing Region 2 request (𝒔) Region 1 Region 3 request (𝒔) User 9

Geo-distributed processing Region 2 response (𝒖𝟑) request (𝒔) Region 1 Region 3 Regional Data Center request (𝒔) Processing Request User 10

Geo-distributed processing Region 2 response (𝒖𝟑) request (𝒔) Region 1 Region 3 Response time depends on multiple data centers Regional Data Center request (𝒔) response 𝒖’ = 𝒏𝒃𝒚(𝒖𝟐, 𝒖𝟑, 𝒖𝟒) Processing Request User 11

Tail latency based SLO • Service providers prefer tail latency (i.e., response time) based SLO • Two parameters • Percentile value (e.g., 95% or p95) • Latency threshold • Example • SLO of p95 and 100ms , means 95% of the response times should be less than 100ms • Existing research on GLB mostly focuses on average latency • Zhenhua Liu [Sigmetrics’11], Darshan S. Palasamudram [SoCC’12], Kien Li [IGCC’10, SC’11], Yanwei Zhang [Middleware’11]… 12

Challenges of geo-distributed processing • How to characterize the tail latency? • Response time depends on multiple paths for each request • Includes large network latency • Simple queueing models like M/M/1 for average latency cannot be used • How to optimize load distribution among data centers? McTail: a novel GLB algorithm with data driven profiling of tail latency 13

Problem formulation Total electricity cost • General formulation with 𝑂 data centers and 𝑇 traffic sources 𝑂 minimize 𝑏 ෍ 𝑟 𝑘 ⋅ 𝑓 𝑘 (𝑏 𝑘 ) Tail latency constraint 𝑘=1 𝑇𝑀𝑃 , ∀𝑗 = 1,2, ⋯ , 𝑇 subject to, 𝑞 𝑗 Ԧ 𝑏, Ԧ 𝑠 ≥ 𝑄 𝑗 𝑏 = {𝑏 1 , 𝑏 2 , ⋯ 𝑏 𝑂 } is workload (request processed) at different data centers • Ԧ 𝑗 is the network paths from source 𝑗 to all the data centers • 𝑠 • 𝑞 𝑗 is Pr(𝑒 𝑗 ≤ 𝐸 𝑗 ) , where 𝑒 𝑗 is end-to-end response time at traffic source 𝑗, and 𝐸 𝑗 is delay target (e.g., 100ms) for tail latency 14

How to determine 𝒒 𝒋 (𝒃, 𝒔 𝒋 ) ? 15

User Route 𝒔 𝒋,𝒌 Source 𝒋 Data Center 𝒌 𝒔𝒑𝒗𝒖𝒇 (𝒃 𝒌 , 𝒔 𝒋,𝒌 ) is the probability that response 𝒒 𝒋,𝒌 time of 𝒔 𝒋,𝒌 is less than 𝑬 𝒋 16

Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 Data Center 𝟒 17

Destination group 𝒉 Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 Data Center 𝟒 18

Destination group 𝒉 Same request is sent to all the data centers of a group Data Center 𝟐 User Route 𝒔 𝒋,𝟑 Source 𝒋 Data Center 𝟑 𝒉𝒔𝒑𝒗𝒒 𝒃, 𝒔 = 𝒒 𝒋,𝟐 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟐 , 𝒔 𝒋,𝟐 × 𝒒 𝒋,𝟑 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟐 , 𝒔 𝒋,𝟑 × 𝒒 𝒋,𝟐 𝒔𝒑𝒗𝒖𝒇 𝒃 𝟒 , 𝒔 𝒋,𝟒 𝒒 𝒋,𝒉 Data Center 𝟒 Because of differences in data sets, random performance interference etc., response time over different routes can be considered un-correlated 19

Example Data Center 𝟐 User 𝒔𝒑𝒗𝒖𝒇 = 𝟏. 𝟘𝟗 Source 𝒋 𝒒 𝒋,𝟑 Data Center 𝟑 𝒉𝒔𝒑𝒗𝒒 = 𝒒 𝒋,𝒉 𝟏. 𝟘𝟘 × 𝟏. 𝟘𝟗 × 𝟏. 𝟘𝟖 ≈ 𝟏. 𝟘𝟓 Data Center 𝟒 For requests sent to this group of data centers, 94% of the response times are less than 𝑬 𝒋 20

Response time probability for a source • 𝐻 = 𝑂 1 × 𝑂 2 × ⋯ × 𝑂 𝑁 possible destination groups • Where 𝑂 𝑛 is the number of data center in region 𝑛 • Response time probability at source 𝑗 is 𝐻 𝑠 = 1 𝑕𝑠𝑝𝑣𝑞 ( Ԧ 𝑞 𝑗 𝜇 = 𝑞 𝑗 Ԧ 𝑏, Ԧ ෍ 𝜇 𝑗,𝑕 ⋅ 𝑞 𝑗,𝑕 𝑏, Ԧ 𝑠) Λ 𝑗 𝑕=1 • 𝜇 𝑗,𝑕 is the workload sent to destination group 𝑕 Weighted average over all 𝐻 𝜇 𝑗,𝑕 is the total workload from source 𝑗 • Λ 𝑗 = σ 𝑕=1 the groups 21

Updated problem formulation Objective same as before, 𝑂 minimizing electricity cost minimize 𝑏 ෍ 𝑟 𝑘 ⋅ 𝑓 𝑘 (𝑏 𝑘 ) 𝑘=1 𝐻 subject to, 1 Tail latency decomposed 𝑕𝑠𝑝𝑣𝑞 ( Ԧ 𝑇𝑀𝐵 , ∀𝑗 = 1,2, ⋯ , 𝑇 ෍ 𝜇 𝑗,𝑕 ⋅ 𝑞 𝑗,𝑕 𝑏, Ԧ 𝑠) ≥ 𝑄 𝑗 into route-wise latencies Λ 𝑗 𝑕=1 𝐻 Workload constraint ෍ 𝜇 𝑗,𝑕 = Λ 𝑗 , ∀𝑗 = 1,2, ⋯ , 𝑇 𝑕=1 𝒔𝒑𝒗𝒖𝒇 (𝒃 𝒌 , 𝒔 𝒋,𝒌 ) for all routes Need to determine 𝒒 𝒋,𝒌 22

Profiling response time probability of a route • We need tail latency • Hard to model for arbitrary workload distributions • Data driven approach - profile the response time statistics (find the probability distribution) from observed data • Example • Response profile for 100K request 23

Challenges of data driven approach • Response time profile of a route depends on amount of data center workload • We set 𝑋 discrete levels of workload for each data center • 𝑇 × 𝑂 network paths between 𝑇 sources and 𝑂 data centers • Total 𝑻 × 𝑿 × 𝑶 number of profiles • Need to update if network latency distribution, data center configuration, or workload composition changes Slow and repeated profiling 24

Profiling response statistics for one route 𝑂 is network latency distribution • 𝐺 𝑗,𝑘 𝐸 (𝑦) is data center latency distribution with load 𝑦 • 𝐺 𝑘 • End-to-end latency distribution of route 𝑠 𝑗,𝑘 is 𝐒 = 𝑮 𝒋,𝒌 𝑶 ∗ 𝑮 𝒌 𝑬 (𝒚) 𝑮 𝒋,𝒌 • where " ∗ “ is the convolution operator 𝑶 and 𝑮 𝒌 𝑬 𝒚 seperately Key idea: profile 𝑮 𝒋,𝒌 25

Example Latency of data center 𝒌 with load 𝒚 Network latency of route 𝒔 𝒋,𝒌 Convolution End-to-end response 𝐒 profile of a route, 𝑮 𝒋,𝒌 26

Profiling response time statistics in McTail • 𝑇 × 𝑂 network routes profiles • 𝑂 × 𝑋 data centers profiles • Total 𝑻 + 𝑿 × 𝑶 profiles versus 𝑻 × 𝑿 × 𝑶 profiles before • Profiling overhead • Only data center profiles need updating when workload composition and/or data center configuration is changed • Infrequent event • Network latency distribution may change more frequently • Already monitored by service providers • Data overhead comparable to existing GLB studies 27

McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway 28

McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway McTail Workload Prediction, 𝚳 𝐣 Electricity Price, 𝒓 𝒋 29

McTail system diagram Network Service Latency Time Distribution Distribution Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway Data Center Profiler, 𝑮 𝑬 (𝒚) Data Center Utilization Network Traffic Profiler, 𝑮 𝑶 Gateway McTail Workload Prediction, 𝚳 𝐣 Electricity Price, 𝒓 𝒋 Load Distribution, 𝝁 30

Evaluation 31

Evaluation setup Based on Google and Facebook data center locations 3 regions, 9 data centers 32

Evaluation setup Based on Google and Facebook data center locations 3 regions, 9 data centers 5 traffic sources 33

Interactive Services with Tail Latency Constraint Mohammad A. Islam, - PowerPoint PPT Presentation

Minimizing Electricity Cost for Geo-Distributed Interactive Services with Tail Latency Constraint Mohammad A. Islam, Anshul Gandhi, and Shaolei Ren This work was supported in part by the U.S. National Science Foundation under grants CNS-1622832,

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Constraint Networks Dario Maggi University Basel October 9, 2014 Dario Maggi Constraint

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU

Constraint Satisfaction Problems Chapter 5 Section 1 3 Constraint Satisfaction 1 Outline

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Constraint-Based Refactoring Rename Field Problem Proven Correct Solution Constraint- Based

WEM Reform: Constraint Development Responsibilities PSO-WG Meeting 3 February 2019 1

On Minimal Constraint Networks Georg Gottlob Minimal Constraint Networks Montanari 1974: To

Combining Combining Constraint Programming Constraint Programming and Integer Programming and

Constraint Satisfaction Problems Chapter 6 Constraint Satisfaction Problems A constraint

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

Outline Introduction Related work PDG design Evaluation Conclusion 2

CPUU%liza%onControlin DistributedRealTimeSystems ChenyangLu

CS4513 Goals Software Distributed Computer Client Server Systems Introduction

61A Lecture 33 Guerrilla section about logic programming coming soon... Homework 11 due

Scalable QoS for Distributed Storage Clusters using Dynamic Token Allocation Yuhan Peng 1 ,

Outpatient and Professional Services Eligible for Telemedicine Note: Only includes Fee-For-S

Adaptive Real-Time Resource Management Richard West Boston University Computer Science

Distributed Sources Via Lookup Services Tatiana Walther http://orcid.org/0000-0001-8127-2988