SLIDE 1 WebPerf: Evaluating “What- If” Scenarios for Cloud-hosted Web Applications
Yurong Jiang Lenin Ravindranath Suman Nath Ramesh Govindan
1
SLIDE 2 2
A Cloud-Hosted Web Application
SLIDE 3 3
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service
Modern Cloud Applications are Complex
Cloud
SLIDE 4 The Problem
4
Latency of these applications is critical for user experience Developers find it hard to optimize cloud-side latency for cloud-hosted Web applications Cloud-side Latency
SLIDE 5 Configuration Complexity
Front-end
5
Each choice impacts latency
SLIDE 6 Configuration Complexity
Relational Store Azure SQL
6
Latency implications hard to understand
SLIDE 7 Exploring Configuration Choices Cloud App
Service Configuration
What if?
What if I move the blob store from basic to standard tier?
$30 $100 200 ms 100 ms
7
SLIDE 8 Challenge
8 Per Req Latency (ms) under 100 Concurrent Reqs Price (USD/Mon)
Answer to what-if question may depend on workload
SLIDE 9 Challenge
End Store Insert Cache Insert Start 100 ms 20 ms 100 ms 10 ms
9
Answer to what-if question may depend on causal dependencies
SLIDE 10 Challenge
Relational Store Relational Store
What if I re-locate this component? What if I increase this component’s load?
Table Store
What if a replica fails?
10
A what-if capability should be expressive
SLIDE 11 WebPerf
11
WebPerf is a what-if scenario evaluator
❖ Input: a what-if scenario ❖ Output: resulting cloud-side latency distribution
SLIDE 12 WebPerf
What if I upgrade blob storage from basic to standard tier? 12
Frontend Processing Start Cache Insert End Blob Insert Store Insert
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
Developer supplies workload Computed Offline
SLIDE 13 Key Insights
Cloud deployments well-engineered
❖ Components designed for predictable latency ❖ Often co-located in same datacenter
Why might WebPerf work? 13
SLIDE 14 Key Insights
Many component profiles are application- independent The dependency graph is usually independent of what-if scenario
Why?
➢Cheap ➢Fast ➢Low Effort
Automate most steps Compute
reuse Compute
14
SLIDE 15 WebPerf Approach
What if I upgrade blob storage from basic to standard tier?
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
❉
15
Frontend Processing Start Cache Insert End Blob Insert Store Insert
SLIDE 16 Dependency Extraction
Fast, accurate dependency extraction with zero developer input
16 Goal
Track dependencies at run-time by instrumenting binary
Approach
SLIDE 17 Challenge
Task Asynchronous Programming
❖ Many cloud apps use this ❖ Only mechanism for asynchronous I/O in Azure ❖ AWS provides APIs for .NET
17 Prior work has not considered this
SLIDE 18 Task Asynchronous Programming
async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
Start task Continue
On front- end 18
SLIDE 19 Task Asynchronous Programming
async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
task1 Start End task2
19
SLIDE 20 Asynchronous Parallel Operations
async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1, value2 = await Task.WhenAll(task1, task2); /* construct response */ return response; }
WhenAll: Continue only when all tasks finish
task1 End task2 Start
20
SLIDE 21 Asynchronous Parallel Operations
async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1, value2 = await Task.WhenAny(task1, task2); /* construct response */ return response; }
WhenAny: Continue when any one task finishes
task1 End task2 Start
21
SLIDE 22 Key Idea
async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; } Init
Received value1 Received value2
Instrument state machine binary to dynamically track tasks and continuations
22 .NET compiler generates this
SLIDE 23 WebPerf Approach
What if I upgrade blob storage from basic to standard tier?
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
❉
23
Frontend Processing Start Cache Insert End Blob Insert Store Insert
SLIDE 24 WebPerf Approach
What if I upgrade blob storage from basic to standard tier?
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
❉
24
Frontend Processing Start Cache Insert End Blob Insert Store Insert
SLIDE 25 Component Profiling
A component’s profile contains latency distributions of API calls to component WebPerf profiles commonly used components offline
Profile dictionary
25
SLIDE 26 Generalized Profiles
Relational Store Relational Store Relational Store
Location Load
Table Store
Failure
26
Tiers
SLIDE 27 Application Dependent Profiles
Relational Store Azure SQL
SQL join latency depends on size
Cache Redis Cache
Cache latency depends on hit rate
Relational Store Sharded TableStore
Access latency depends on skew
27
Not all profiles can be computed offline
SLIDE 28 Workload Hints
WebPerf uses parameterized profiles
❖ User must specify workload hint
Relational Store Azure SQL
Size
Cache Redis Cache
Hit rate
Relational Store Sharded TableStore
Skew
28
SLIDE 29 WebPerf Approach
Frontend Processing Start Cache Insert End Blob Insert Store Insert
What if I upgrade blob storage from basic to standard tier?
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
❉
29
SLIDE 30 Cloud-Side Latency Estimation
Frontend Processing Start Cache Insert End Blob Insert Store Insert
What if I upgrade blob storage from basic to standard tier? Replace from profile 30
SLIDE 31 Max Min
31
Cloud-Side Latency Estimation
Simple operations on distributions suffice
SLIDE 32 Evaluation
WebPerf is accurate, fast, cheap, and requires low developer effort
32
❖How accurate is WebPerf? ❖Are workload hints necessary?
SLIDE 33 Applications
33 Application Azure components used Average I/O Calls SocialForum Blob storage, Redis cache, Service bus, Search, Table 116 SmartStore.Net SQL 41 ContosoAds Blob storage, Queue, SQL, Search 56 EmailSubscriber Blob storage, Queue, Table 26 ContactManager Blob storage, SQL 8 CourseManager Blob storage, SQL 44
Different functionality Different components Varying complexity
SLIDE 34 What-if Scenarios
34 What-if scenario Example Tier: A component X is upgraded to tier Y X = A Redis cache, Y = a standard tier (from a basic tier) Load: X concurrent requests to component Y X = 100 , Y = the application or a SQL database Interference: CPU and/or memory pressure, from collocated applications, of X% X = 50% CPU, 80% memory Location: A component X is deployed at location Y X = A Redis Cache or a front end, Y = Singapore Failure: An instance of a replicated component X fails X = A replicated front-end or SQL database
SLIDE 35 Accuracy
35 What if I move the Redis cache in SocialForum from basic to standard tier?
Configuration choices can significantly impact latency
SLIDE 36 Accuracy
36 What if I move the Redis cache from basic to standard tier?
Prediction closely matches ground truth
SLIDE 37 Accuracy
37
Median prediction error under 7%
Difference between predicted distribution and ground truth
SLIDE 38 Accuracy
38
Low median error for tier and replication Slightly higher error for load and failure
SLIDE 39 Workload Hints
39
Workload hints can significantly improve accuracy
SLIDE 40 Conclusions
40
WebPerf predicts cloud- side latency distributions for different what-if scenarios It accurately tracks dependencies and profiles components
Across six different applications and scenarios, its error is less than 7%
Frontend Processing Start Cache Insert End Blob Insert Store Insert
SLIDE 41 WebPerf Contributions and Summary
41
An automated tool to instrument web apps and capture both browser objects and front end cloud processing dependency Predicting web app cloud latency and end-to-end latency in probabilistic setting under six different scenarios Evaluations with six real websites show WebPerf achieves < 7% median prediction error
SLIDE 43
Large number of configuration choices
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service
SLIDE 44
Large number of configuration choices
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service
SLIDE 45
Large number of configuration choices
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service Azure SQL
SLIDE 46
Large number of configuration choices
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service Redis Cache
SLIDE 47
Large number of configuration choices
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service
Reasoning about cost-performance trade-off is hard!
SLIDE 48 Cost-Performance Trade-off
- Configuration does not directly map to performance
- End-to-end latency depends on application’s causal dependency
End Store Insert Cache Insert Start 100 ms 20 ms 100 ms 10 ms
SLIDE 49
“What-If” Analysis
Cloud App Service Configura tion
What if?
What if I move the front- end from basic to standard tier?$30
$100 900 ms 600 ms
End-to-end latency estimate
Create a new deployment and measure performance ➢Expensive ➢Time consuming ➢High overhead
SLIDE 50
WebPerf: “What-If” Analysis
Cloud App Service Configura tion
What if?
What if I move the front- end from basic to standard tier?$30
$100 900 ms 600 ms
End-to-end latency estimate
➢Zero cost Predict performance under hypothetical configurations ➢Near real-time ➢Zero developer effort
Deploy with certain configuration
SLIDE 51
WebPerf: “What-If” Analysis
Cloud App Service Configura tion
What if?
What if I move the front- end from basic to standard tier?$30
$100 900 ms 600 ms
End-to-end latency estimate
➢Zero cost Predict performance under hypothetical configurations ➢Near real-time ➢Zero developer effort
Deploy with certain configuration
SLIDE 52 WebPerf: Key Insights
- Offline, application-independent profiling is useful
- Modern cloud apps are built using existing services (PaaS)
- Individual services have predictable performance
- S3, Azure Table Storage, Dynamo DB, DocumentDB, …
- Services are co-located inside the same datacenter
- Tighter latency distribution
- Causal dependency within application is independent
- f the what-if scenarios we consider
SLIDE 53 Application-Independent Profiling
T:Table, R:Redis, S:SQL, B:Blob, Q:Queue
1 Delete(Async) (T) 2 UploadFromStream (B) 3 AddMessage(Q) 4 Execute (T) 5 ExecuteQuerySegmented (T) 6 SortedSetRangeByValue (R) 7 StringGet (R) 8 SaveChanges (S)
9
ToList (S)
10
Send (R)
11
ReadAsString (B)
SLIDE 54 WebPerf Design
- Dependency graph extraction
- Application-independent profiling
- Baseline latency estimation
- Latency prediction
Frontend Processing Start Cache Insert End Blob Insert Store Insert
SLIDE 55 Task Asynchronous Pattern (TAP)
Thread1 Thread2 Thread3
continuation continuation store.get cache.get async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
Start task asynchronously Continue after task finishes
SLIDE 56 Dependency Graph Extraction
- Design Goals
- Accurate
- Real-time with minimal data collection
- Zero developer effort
- No modifications to the platform
- Low overhead
- Automatic Binary Instrumentation
- Modern cloud applications are highly asynchronous
- Task Asynchronous Programming Pattern
SLIDE 57 Task Asynchronous Pattern (TAP)
- Asynchronous operations with a synchronous
programming pattern
- Increasingly popular for writing cloud applications
- Supported by many major languages
- C#, Java, Python, Javascript
- Most Azure services support TAP as the only
mechanism for doing asynchronous I/O
- AWS also provides TAP APIs for .NET
SLIDE 58 Synchronous Programming
Blocking I/O limits server throughput
Thread1
processRequest (input) { /* process input */ value1 = store.get(key1); value2 = cache.get(key2); /* construct response */ return response; } store.get cache.get
SLIDE 59 Asynchronous Programming Model (APM)
processRequest (input) { /* process input */ store.get(key1, callback1); } callback1 (value1) { cache.get(key2, callback2); } callback2 (value2) { /* construct response */ send(response); }
Thread1 Thread2 Thread3
callback1 callback2 store.get cache.get
SLIDE 60
Task Asynchronous Pattern (TAP)
async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
task1 Start End task2 Dependency Graph
SLIDE 61
Task Asynchronous Pattern (TAP)
task1 End task2
async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1 = await task1; value2 = await task2; /* construct response */ return response; }
Dependency Graph Start
SLIDE 62
Task Asynchronous Pattern (TAP)
async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1, value2 = await Task.WhenAll(task1, task2); /* construct response */ return response; }
Dependency Graph
WhenAll: Continue only when all tasks finish
task1 End task2 Start
SLIDE 63
Task Asynchronous Pattern (TAP)
async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value = await Task.WhenAny(task1, task2); /* construct response */ return response; }
Dependency Graph
WhenAny: Continue after any one of tasks finishes
task1 End task2 Start
SLIDE 64 Automatic Binary Instrumentation
class processRequest__ { string input; AsyncTaskMethodBuilder builder; string key1, key2, response; int asyncId = -1; public void MoveNext() { asyncId = Tracker.AsyncStart(asyncId); Tracker.StateStart(asyncId); switch (state) { case -1: state = 0; /* process input */ var task1 = store.get(key1); Tracker.TaskStart(task1, asyncId); builder.Completed(task1.Awaiter, this); Tracker.Await(task1, asyncId); case 0: state = 1; var task2 = cache.get(key1); Tracker.TaskStart(task2, asyncId); builder.Completed(task2.Awaiter, this); Tracker.Await(task2, asyncId); case 1: /* construct response */
async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
1
continuation continuation
1
Instrument state machine Track tasks and continuations
SLIDE 65 Automatic Binary Instrumentation
async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
task1 Start End task2 Dependency Graph
1
Instrument state machine Track tasks and continuations
SLIDE 66 Automatic Binary Instrumentation
- Tracking async state machines
- Monitor task start and completion
- Track state machine transitions
- Tracking pull-based continuations
- Link tasks to corresponding awaits
- Link awaits to continuations
- Tracking synchronization points
- Track WhenAll, WhenAny, cascaded task dependencies
- Keeping the overhead low
- Instrument APIs with know signatures
- Instrument only leaf tasks
SLIDE 67 Dependency graph extraction
- Highly accurate
- Real-time
- Zero developer effort
- Extremely low overhead
Dependency Graph task1 End task2 Start
[Some Result]
SLIDE 68 API Profiling
68
- A profile of a cloud API is a distribution of its latency
- Parameterized profiles
(workload-dependent) (e.g., SQL) Baseline profiles Application-specific Independent profiles (e.g., Redis) Computed offline, Or on-demand for reuse During dependency tracking Is API in what-if scenario? Workload hint given? No No Yes Yes
SLIDE 69 API Profiling
69
- WebPerf builds profiles offline and maintains in a
dictionary
- Starts with common profiles, and builds additional
profiles on-demand and reuses them
- Optimal profiling: to minimize measurement costs
(details in paper)
Profile dictionary
SLIDE 70 task1
Start
task2 Sync task3 task4 task5 End
Baseline latencies
What-If Engine
70
- Predicts cloud latency under a given what-if scenario
Instrumented App Workload What-if task1 and task4 upgraded? Profile dictionary Convolve distributions
SLIDE 71 Convolving distributions
71
task1 Start task2 Sync task3 task4 task5 End
T 1 T 2 T 3 T 4 T 5
Bottom-up evaluation:
- WhenAll: ProbMax(t1, t2, …)
- WhenAny: ProbMin(t1, t2, …)
- WhenDone: ProbAdd(t1, t2, …)
SLIDE 72 End-to-end Latency Prediction
72
Te2e = TCloud + TNetwork + TBrowser
WebPerf Network latency model WebProphet[]
Combine using Monte-Carlo simulation
SLIDE 73 WebPerf Evaluation
73
- Six 3rd party applications and six scenarios
Application Azure services used Average I/O Calls SocialForum Blob storage, Redis cache, Service bus, Search, Table 116 SmartStore.Net SQL 41 ContosoAds Blob storage, Queue, SQL, Search 56 EmailSubscriber Blob storage, Queue, Table 26 ContactManager Blob storage, SQL 8 CourseManager Blob storage, SQL 44
SLIDE 74 74
CDF of First Byte Time Percentage
SLIDE 75 WebPerf Evaluation
75
- Six 3rd party applications and six scenarios
What-if scenario Example Tier: A resource X is upgraded to tier Y X = A Redis cache, Y = a standard tier (from a basic tier) Load: X concurrent requests to resource Y X = 100 , Y = the application or a SQL database Interference: CPU and/or memory pressure, from collocated applications, of X% X = 50% CPU, 80% memory Location: A resource X is deployed at location Y X = A Redis Cache or a front end, Y = Singapore Failure: An instance of a replicated resource X fails X = A replicated front-end or SQL database
SLIDE 76 WebPerf Evaluation
76
- Metric: distribution of relative errors
Ground truth from real deployment WebPerf prediction
SLIDE 77 Underspecified Configuration Dimensions
Cache Redis Cache
77
SLIDE 78 What-if the Redis cache is upgraded from the
- riginal standard C0 to Standard C2 tier?
78
Maximum cloud side latency prediction error is only 5%
SLIDE 79 What-if the Redis cache is upgraded from the
- riginal standard C0 to Standard C2 tier?
79
SLIDE 80 What-if the Redis cache is upgraded from the
- riginal standard C0 to Standard C2 tier?
80
Maximum cloud side latency prediction error is only 5%
SLIDE 81 Performance for Six Applications
81
Scenarios
SocialForum ContosoAds EmailSubscriber CourseManager SmartStore.Net ContactManager
Median prediction error for cloud side latency is < 7%
SLIDE 82 Performance for Six Applications
82
Scenarios Median prediction error for cloud side
latency is < 7%
SLIDE 83 Workload Hints
83
Workload hints can bring order of magnitude accuracy improvement
SLIDE 84 Workload Hints
84
Workload hints can bring order of magnitude accuracy improvement
SLIDE 85 Other findings
85
- Performance of many applications and scenarios can be
predicted reasonably well
- Thanks to cloud provider’s SLAs
- Harder cases
- Workload-dependent performance: hints help
- High-variance profiles: prediction has high variance
- Non-deterministic control flow (e.g., cache hit/miss):
- Separate prediction for each control flow
- Hard-to-profile APIs (e.g., SQL query with join)
- Poor prediction
SLIDE 86 86
Behind apps, several distributed, asynchronous components
Modern Cloud Applications are Complex
SLIDE 87 87
Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service
Modern Cloud Applications are Complex
Cloud
SLIDE 88 Hard to reason about the performance of cloud-hosted Web apps
Cloud-side Latency
88
SLIDE 89 Hard to reason about the cost/performance tradeoffs of different configurations
89
SLIDE 90 Workload Dependence
90
SLIDE 91 Deploy Cloud App
Service Configuration
What if?
What if I move the front-end from basic to standard tier?
900 ms 600 ms
Basic Standard
➢Expensive ➢Slow ➢High Effort
91
SLIDE 92 Model and Predict Cloud App
Service Configuration
What if?
What if I move the front-end from basic to standard tier?
200 ms
Service Configuration
Standard
100 ms
92
SLIDE 93 Combining latency distributions
task1 task2
93
SLIDE 94 Combining latency distributions
task1 task2
WhenAll
Max
94
SLIDE 95 Combining latency distributions
task1 task2
WhenAny
Min
95
SLIDE 96 Cloud-Side Latency Estimation
Frontend Processing Start Cache Insert End Blob Insert Store Insert
What if I move the front-end from basic to standard tier? Replace from profile 96
SLIDE 97 WebPerf Approach
Frontend Processing Start Cache Insert End Blob Insert Store Insert
What if I move the front-end from basic to standard tier?
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
❉
97
SLIDE 98 WebPerf
Frontend Processing Start Cache Insert End Blob Insert Store Insert
What if I move the front-end from basic to standard tier?
Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation
Developer supplies workload Computed Offline 98
SLIDE 99 How accurate is WebPerf? What is WebPerf’s overhead? What are the primary sources of prediction error? Are workload hints necessary? Can WebPerf predict end-to-end latency?
Evaluation Goals
99
❉ ❉
SLIDE 100 Task Asynchronous Programming
Thread1 Thread2 Thread3
continuation continuation store.get cache.get async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }
Start task Continue
After await returns On front- end 100
SLIDE 101 Measure of Prediction Accuracy
101
Ground truth from real deployment WebPerf prediction
Distributional Difference