WebPerf: Ramesh Govindan Evaluating What- If Scenarios for - - PowerPoint PPT Presentation

webperf
SMART_READER_LITE
LIVE PREVIEW

WebPerf: Ramesh Govindan Evaluating What- If Scenarios for - - PowerPoint PPT Presentation

Yurong Jiang Lenin Ravindranath Suman Nath WebPerf: Ramesh Govindan Evaluating What- If Scenarios for Cloud-hosted Web Applications 1 A Cloud-Hosted Web Application 2 Modern Cloud Applications are Complex Cloud Notification


slide-1
SLIDE 1

WebPerf: Evaluating “What- If” Scenarios for Cloud-hosted Web Applications

Yurong Jiang Lenin Ravindranath Suman Nath Ramesh Govindan

1

slide-2
SLIDE 2

2

A Cloud-Hosted Web Application

slide-3
SLIDE 3

3

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service

Modern Cloud Applications are Complex

Cloud

slide-4
SLIDE 4

The Problem

4

Latency of these applications is critical for user experience Developers find it hard to optimize cloud-side latency for cloud-hosted Web applications Cloud-side Latency

slide-5
SLIDE 5

Configuration Complexity

Front-end

5

Each choice impacts latency

slide-6
SLIDE 6

Configuration Complexity

Relational Store Azure SQL

6

Latency implications hard to understand

slide-7
SLIDE 7

Exploring Configuration Choices Cloud App

Service Configuration

What if?

What if I move the blob store from basic to standard tier?

$30 $100 200 ms 100 ms

7

slide-8
SLIDE 8

Challenge

8 Per Req Latency (ms) under 100 Concurrent Reqs Price (USD/Mon)

Answer to what-if question may depend on workload

slide-9
SLIDE 9

Challenge

End Store Insert Cache Insert Start 100 ms 20 ms 100 ms 10 ms

9

Answer to what-if question may depend on causal dependencies

slide-10
SLIDE 10

Challenge

Relational Store Relational Store

What if I re-locate this component? What if I increase this component’s load?

Table Store

What if a replica fails?

10

A what-if capability should be expressive

slide-11
SLIDE 11

WebPerf

11

WebPerf is a what-if scenario evaluator

❖ Input: a what-if scenario ❖ Output: resulting cloud-side latency distribution

slide-12
SLIDE 12

WebPerf

What if I upgrade blob storage from basic to standard tier? 12

Frontend Processing Start Cache Insert End Blob Insert Store Insert

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

Developer supplies workload Computed Offline

slide-13
SLIDE 13

Key Insights

Cloud deployments well-engineered

❖ Components designed for predictable latency ❖ Often co-located in same datacenter

Why might WebPerf work? 13

slide-14
SLIDE 14

Key Insights

Many component profiles are application- independent The dependency graph is usually independent of what-if scenario

Why?

➢Cheap ➢Fast ➢Low Effort

Automate most steps Compute

  • ffline,

reuse Compute

  • nce

14

slide-15
SLIDE 15

WebPerf Approach

What if I upgrade blob storage from basic to standard tier?

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

15

Frontend Processing Start Cache Insert End Blob Insert Store Insert

slide-16
SLIDE 16

Dependency Extraction

Fast, accurate dependency extraction with zero developer input

16 Goal

Track dependencies at run-time by instrumenting binary

Approach

slide-17
SLIDE 17

Challenge

Task Asynchronous Programming

❖ Many cloud apps use this ❖ Only mechanism for asynchronous I/O in Azure ❖ AWS provides APIs for .NET

17 Prior work has not considered this

slide-18
SLIDE 18

Task Asynchronous Programming

async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

Start task Continue

On front- end 18

slide-19
SLIDE 19

Task Asynchronous Programming

async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

task1 Start End task2

19

slide-20
SLIDE 20

Asynchronous Parallel Operations

async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1, value2 = await Task.WhenAll(task1, task2); /* construct response */ return response; }

WhenAll: Continue only when all tasks finish

task1 End task2 Start

20

slide-21
SLIDE 21

Asynchronous Parallel Operations

async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1, value2 = await Task.WhenAny(task1, task2); /* construct response */ return response; }

WhenAny: Continue when any one task finishes

task1 End task2 Start

21

slide-22
SLIDE 22

Key Idea

async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; } Init

Received value1 Received value2

Instrument state machine binary to dynamically track tasks and continuations

22 .NET compiler generates this

slide-23
SLIDE 23

WebPerf Approach

What if I upgrade blob storage from basic to standard tier?

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

23

Frontend Processing Start Cache Insert End Blob Insert Store Insert

slide-24
SLIDE 24

WebPerf Approach

What if I upgrade blob storage from basic to standard tier?

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

24

Frontend Processing Start Cache Insert End Blob Insert Store Insert

slide-25
SLIDE 25

Component Profiling

A component’s profile contains latency distributions of API calls to component WebPerf profiles commonly used components offline

Profile dictionary

25

slide-26
SLIDE 26

Generalized Profiles

Relational Store Relational Store Relational Store

Location Load

Table Store

Failure

26

Tiers

slide-27
SLIDE 27

Application Dependent Profiles

Relational Store Azure SQL

SQL join latency depends on size

Cache Redis Cache

Cache latency depends on hit rate

Relational Store Sharded TableStore

Access latency depends on skew

27

Not all profiles can be computed offline

slide-28
SLIDE 28

Workload Hints

WebPerf uses parameterized profiles

❖ User must specify workload hint

Relational Store Azure SQL

Size

Cache Redis Cache

Hit rate

Relational Store Sharded TableStore

Skew

28

slide-29
SLIDE 29

WebPerf Approach

Frontend Processing Start Cache Insert End Blob Insert Store Insert

What if I upgrade blob storage from basic to standard tier?

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

29

slide-30
SLIDE 30

Cloud-Side Latency Estimation

Frontend Processing Start Cache Insert End Blob Insert Store Insert

What if I upgrade blob storage from basic to standard tier? Replace from profile 30

slide-31
SLIDE 31

Max Min

31

Cloud-Side Latency Estimation

Simple operations on distributions suffice

slide-32
SLIDE 32

Evaluation

WebPerf is accurate, fast, cheap, and requires low developer effort

32

❖How accurate is WebPerf? ❖Are workload hints necessary?

slide-33
SLIDE 33

Applications

33 Application Azure components used Average I/O Calls SocialForum Blob storage, Redis cache, Service bus, Search, Table 116 SmartStore.Net SQL 41 ContosoAds Blob storage, Queue, SQL, Search 56 EmailSubscriber Blob storage, Queue, Table 26 ContactManager Blob storage, SQL 8 CourseManager Blob storage, SQL 44

Different functionality Different components Varying complexity

slide-34
SLIDE 34

What-if Scenarios

34 What-if scenario Example Tier: A component X is upgraded to tier Y X = A Redis cache, Y = a standard tier (from a basic tier) Load: X concurrent requests to component Y X = 100 , Y = the application or a SQL database Interference: CPU and/or memory pressure, from collocated applications, of X% X = 50% CPU, 80% memory Location: A component X is deployed at location Y X = A Redis Cache or a front end, Y = Singapore Failure: An instance of a replicated component X fails X = A replicated front-end or SQL database

slide-35
SLIDE 35

Accuracy

35 What if I move the Redis cache in SocialForum from basic to standard tier?

Configuration choices can significantly impact latency

slide-36
SLIDE 36

Accuracy

36 What if I move the Redis cache from basic to standard tier?

Prediction closely matches ground truth

slide-37
SLIDE 37

Accuracy

37

Median prediction error under 7%

Difference between predicted distribution and ground truth

slide-38
SLIDE 38

Accuracy

38

Low median error for tier and replication Slightly higher error for load and failure

slide-39
SLIDE 39

Workload Hints

39

Workload hints can significantly improve accuracy

slide-40
SLIDE 40

Conclusions

40

WebPerf predicts cloud- side latency distributions for different what-if scenarios It accurately tracks dependencies and profiles components

  • ffline

Across six different applications and scenarios, its error is less than 7%

Frontend Processing Start Cache Insert End Blob Insert Store Insert

slide-41
SLIDE 41

WebPerf Contributions and Summary

41

An automated tool to instrument web apps and capture both browser objects and front end cloud processing dependency Predicting web app cloud latency and end-to-end latency in probabilistic setting under six different scenarios Evaluations with six real websites show WebPerf achieves < 7% median prediction error

slide-42
SLIDE 42

Thank you

42

slide-43
SLIDE 43

Large number of configuration choices

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service

slide-44
SLIDE 44

Large number of configuration choices

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service

slide-45
SLIDE 45

Large number of configuration choices

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service Azure SQL

slide-46
SLIDE 46

Large number of configuration choices

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service Redis Cache

slide-47
SLIDE 47

Large number of configuration choices

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service

Reasoning about cost-performance trade-off is hard!

slide-48
SLIDE 48

Cost-Performance Trade-off

  • Configuration does not directly map to performance
  • End-to-end latency depends on application’s causal dependency

End Store Insert Cache Insert Start 100 ms 20 ms 100 ms 10 ms

slide-49
SLIDE 49

“What-If” Analysis

Cloud App Service Configura tion

What if?

What if I move the front- end from basic to standard tier?$30

$100 900 ms 600 ms

End-to-end latency estimate

Create a new deployment and measure performance ➢Expensive ➢Time consuming ➢High overhead

slide-50
SLIDE 50

WebPerf: “What-If” Analysis

Cloud App Service Configura tion

What if?

What if I move the front- end from basic to standard tier?$30

$100 900 ms 600 ms

End-to-end latency estimate

➢Zero cost Predict performance under hypothetical configurations ➢Near real-time ➢Zero developer effort

Deploy with certain configuration

slide-51
SLIDE 51

WebPerf: “What-If” Analysis

Cloud App Service Configura tion

What if?

What if I move the front- end from basic to standard tier?$30

$100 900 ms 600 ms

End-to-end latency estimate

➢Zero cost Predict performance under hypothetical configurations ➢Near real-time ➢Zero developer effort

Deploy with certain configuration

slide-52
SLIDE 52

WebPerf: Key Insights

  • Offline, application-independent profiling is useful
  • Modern cloud apps are built using existing services (PaaS)
  • Individual services have predictable performance
  • S3, Azure Table Storage, Dynamo DB, DocumentDB, …
  • Services are co-located inside the same datacenter
  • Tighter latency distribution
  • Causal dependency within application is independent
  • f the what-if scenarios we consider
slide-53
SLIDE 53

Application-Independent Profiling

T:Table, R:Redis, S:SQL, B:Blob, Q:Queue

1 Delete(Async) (T) 2 UploadFromStream (B) 3 AddMessage(Q) 4 Execute (T) 5 ExecuteQuerySegmented (T) 6 SortedSetRangeByValue (R) 7 StringGet (R) 8 SaveChanges (S)

9

ToList (S)

10

Send (R)

11

ReadAsString (B)

slide-54
SLIDE 54

WebPerf Design

  • Dependency graph extraction
  • Application-independent profiling
  • Baseline latency estimation
  • Latency prediction

Frontend Processing Start Cache Insert End Blob Insert Store Insert

slide-55
SLIDE 55

Task Asynchronous Pattern (TAP)

Thread1 Thread2 Thread3

continuation continuation store.get cache.get async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

Start task asynchronously Continue after task finishes

slide-56
SLIDE 56

Dependency Graph Extraction

  • Design Goals
  • Accurate
  • Real-time with minimal data collection
  • Zero developer effort
  • No modifications to the platform
  • Low overhead
  • Automatic Binary Instrumentation
  • Modern cloud applications are highly asynchronous
  • Task Asynchronous Programming Pattern
slide-57
SLIDE 57

Task Asynchronous Pattern (TAP)

  • Asynchronous operations with a synchronous

programming pattern

  • Increasingly popular for writing cloud applications
  • Supported by many major languages
  • C#, Java, Python, Javascript
  • Most Azure services support TAP as the only

mechanism for doing asynchronous I/O

  • AWS also provides TAP APIs for .NET
slide-58
SLIDE 58

Synchronous Programming

Blocking I/O limits server throughput

Thread1

processRequest (input) { /* process input */ value1 = store.get(key1); value2 = cache.get(key2); /* construct response */ return response; } store.get cache.get

slide-59
SLIDE 59

Asynchronous Programming Model (APM)

processRequest (input) { /* process input */ store.get(key1, callback1); } callback1 (value1) { cache.get(key2, callback2); } callback2 (value2) { /* construct response */ send(response); }

Thread1 Thread2 Thread3

callback1 callback2 store.get cache.get

slide-60
SLIDE 60

Task Asynchronous Pattern (TAP)

async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

task1 Start End task2 Dependency Graph

slide-61
SLIDE 61

Task Asynchronous Pattern (TAP)

task1 End task2

async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1 = await task1; value2 = await task2; /* construct response */ return response; }

Dependency Graph Start

slide-62
SLIDE 62

Task Asynchronous Pattern (TAP)

async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value1, value2 = await Task.WhenAll(task1, task2); /* construct response */ return response; }

Dependency Graph

WhenAll: Continue only when all tasks finish

task1 End task2 Start

slide-63
SLIDE 63

Task Asynchronous Pattern (TAP)

async processRequest (input) { /* process input */ task1 = store.get(key1); task2 = cache.get(key2); value = await Task.WhenAny(task1, task2); /* construct response */ return response; }

Dependency Graph

WhenAny: Continue after any one of tasks finishes

task1 End task2 Start

slide-64
SLIDE 64

Automatic Binary Instrumentation

class processRequest__ { string input; AsyncTaskMethodBuilder builder; string key1, key2, response; int asyncId = -1; public void MoveNext() { asyncId = Tracker.AsyncStart(asyncId); Tracker.StateStart(asyncId); switch (state) { case -1: state = 0; /* process input */ var task1 = store.get(key1); Tracker.TaskStart(task1, asyncId); builder.Completed(task1.Awaiter, this); Tracker.Await(task1, asyncId); case 0: state = 1; var task2 = cache.get(key1); Tracker.TaskStart(task2, asyncId); builder.Completed(task2.Awaiter, this); Tracker.Await(task2, asyncId); case 1: /* construct response */

async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

  • 1

1

continuation continuation

  • 1

1

Instrument state machine Track tasks and continuations

slide-65
SLIDE 65

Automatic Binary Instrumentation

async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

task1 Start End task2 Dependency Graph

  • 1

1

Instrument state machine Track tasks and continuations

slide-66
SLIDE 66

Automatic Binary Instrumentation

  • Tracking async state machines
  • Monitor task start and completion
  • Track state machine transitions
  • Tracking pull-based continuations
  • Link tasks to corresponding awaits
  • Link awaits to continuations
  • Tracking synchronization points
  • Track WhenAll, WhenAny, cascaded task dependencies
  • Keeping the overhead low
  • Instrument APIs with know signatures
  • Instrument only leaf tasks
slide-67
SLIDE 67

Dependency graph extraction

  • Highly accurate
  • Real-time
  • Zero developer effort
  • Extremely low overhead

Dependency Graph task1 End task2 Start

[Some Result]

slide-68
SLIDE 68

API Profiling

68

  • A profile of a cloud API is a distribution of its latency
  • Parameterized profiles

(workload-dependent) (e.g., SQL) Baseline profiles Application-specific Independent profiles (e.g., Redis) Computed offline, Or on-demand for reuse During dependency tracking Is API in what-if scenario? Workload hint given? No No Yes Yes

slide-69
SLIDE 69

API Profiling

69

  • WebPerf builds profiles offline and maintains in a

dictionary

  • Starts with common profiles, and builds additional

profiles on-demand and reuses them

  • Optimal profiling: to minimize measurement costs

(details in paper)

Profile dictionary

slide-70
SLIDE 70

task1

Start

task2 Sync task3 task4 task5 End

Baseline latencies

What-If Engine

70

  • Predicts cloud latency under a given what-if scenario

Instrumented App Workload What-if task1 and task4 upgraded? Profile dictionary Convolve distributions

slide-71
SLIDE 71

Convolving distributions

71

task1 Start task2 Sync task3 task4 task5 End

T 1 T 2 T 3 T 4 T 5

Bottom-up evaluation:

  • WhenAll: ProbMax(t1, t2, …)
  • WhenAny: ProbMin(t1, t2, …)
  • WhenDone: ProbAdd(t1, t2, …)
slide-72
SLIDE 72

End-to-end Latency Prediction

72

  • Details in paper

Te2e = TCloud + TNetwork + TBrowser

WebPerf Network latency model WebProphet[]

Combine using Monte-Carlo simulation

slide-73
SLIDE 73

WebPerf Evaluation

73

  • Six 3rd party applications and six scenarios

Application Azure services used Average I/O Calls SocialForum Blob storage, Redis cache, Service bus, Search, Table 116 SmartStore.Net SQL 41 ContosoAds Blob storage, Queue, SQL, Search 56 EmailSubscriber Blob storage, Queue, Table 26 ContactManager Blob storage, SQL 8 CourseManager Blob storage, SQL 44

slide-74
SLIDE 74

74

CDF of First Byte Time Percentage

slide-75
SLIDE 75

WebPerf Evaluation

75

  • Six 3rd party applications and six scenarios

What-if scenario Example Tier: A resource X is upgraded to tier Y X = A Redis cache, Y = a standard tier (from a basic tier) Load: X concurrent requests to resource Y X = 100 , Y = the application or a SQL database Interference: CPU and/or memory pressure, from collocated applications, of X% X = 50% CPU, 80% memory Location: A resource X is deployed at location Y X = A Redis Cache or a front end, Y = Singapore Failure: An instance of a replicated resource X fails X = A replicated front-end or SQL database

slide-76
SLIDE 76

WebPerf Evaluation

76

  • Metric: distribution of relative errors

Ground truth from real deployment WebPerf prediction

slide-77
SLIDE 77

Underspecified Configuration Dimensions

Cache Redis Cache

77

slide-78
SLIDE 78

What-if the Redis cache is upgraded from the

  • riginal standard C0 to Standard C2 tier?

78

Maximum cloud side latency prediction error is only 5%

slide-79
SLIDE 79

What-if the Redis cache is upgraded from the

  • riginal standard C0 to Standard C2 tier?

79

slide-80
SLIDE 80

What-if the Redis cache is upgraded from the

  • riginal standard C0 to Standard C2 tier?

80

Maximum cloud side latency prediction error is only 5%

slide-81
SLIDE 81

Performance for Six Applications

81

Scenarios

SocialForum ContosoAds EmailSubscriber CourseManager SmartStore.Net ContactManager

Median prediction error for cloud side latency is < 7%

slide-82
SLIDE 82

Performance for Six Applications

82

Scenarios Median prediction error for cloud side

latency is < 7%

slide-83
SLIDE 83

Workload Hints

83

Workload hints can bring order of magnitude accuracy improvement

slide-84
SLIDE 84

Workload Hints

84

Workload hints can bring order of magnitude accuracy improvement

slide-85
SLIDE 85

Other findings

85

  • Performance of many applications and scenarios can be

predicted reasonably well

  • Thanks to cloud provider’s SLAs
  • Harder cases
  • Workload-dependent performance: hints help
  • High-variance profiles: prediction has high variance
  • Non-deterministic control flow (e.g., cache hit/miss):
  • Separate prediction for each control flow
  • Hard-to-profile APIs (e.g., SQL query with join)
  • Poor prediction
slide-86
SLIDE 86

86

Behind apps, several distributed, asynchronous components

Modern Cloud Applications are Complex

slide-87
SLIDE 87

87

Front-end Storage Blob Store Relational Store Cache CDN Workers Queue Facebook Google Search Store Log Store Notification Service

Modern Cloud Applications are Complex

Cloud

slide-88
SLIDE 88

Hard to reason about the performance of cloud-hosted Web apps

Cloud-side Latency

88

slide-89
SLIDE 89

Hard to reason about the cost/performance tradeoffs of different configurations

89

slide-90
SLIDE 90

Workload Dependence

90

slide-91
SLIDE 91

Deploy Cloud App

Service Configuration

What if?

What if I move the front-end from basic to standard tier?

900 ms 600 ms

Basic Standard

➢Expensive ➢Slow ➢High Effort

91

slide-92
SLIDE 92

Model and Predict Cloud App

Service Configuration

What if?

What if I move the front-end from basic to standard tier?

200 ms

Service Configuration

Standard

100 ms

92

slide-93
SLIDE 93

Combining latency distributions

task1 task2

93

slide-94
SLIDE 94

Combining latency distributions

task1 task2

WhenAll

Max

94

slide-95
SLIDE 95

Combining latency distributions

task1 task2

WhenAny

Min

95

slide-96
SLIDE 96

Cloud-Side Latency Estimation

Frontend Processing Start Cache Insert End Blob Insert Store Insert

What if I move the front-end from basic to standard tier? Replace from profile 96

slide-97
SLIDE 97

WebPerf Approach

Frontend Processing Start Cache Insert End Blob Insert Store Insert

What if I move the front-end from basic to standard tier?

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

97

slide-98
SLIDE 98

WebPerf

Frontend Processing Start Cache Insert End Blob Insert Store Insert

What if I move the front-end from basic to standard tier?

Dependency graph extraction Component Profiling Baseline latency estimation Cloud-side latency estimation

Developer supplies workload Computed Offline 98

slide-99
SLIDE 99

How accurate is WebPerf? What is WebPerf’s overhead? What are the primary sources of prediction error? Are workload hints necessary? Can WebPerf predict end-to-end latency?

Evaluation Goals

99

❉ ❉

slide-100
SLIDE 100

Task Asynchronous Programming

Thread1 Thread2 Thread3

continuation continuation store.get cache.get async processRequest (input) { /* process input */ task1 = store.get(key1); value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; }

Start task Continue

After await returns On front- end 100

slide-101
SLIDE 101

Measure of Prediction Accuracy

101

Ground truth from real deployment WebPerf prediction

Distributional Difference