WebPerf: Ramesh Govindan Evaluating What- If Scenarios for - PowerPoint PPT Presentation

Large number of configuration choices Redis Cache Notification Service Facebook Google Front-end Queue Log Store Cache Relational Store Workers Storage CDN Blob Store Search Store

Large number of configuration choices Notification Service Facebook Google Front-end Queue Log Store Cache Relational Store Workers Storage CDN Blob Store Search Store Reasoning about cost-performance trade-off is hard!

Cost-Performance Trade-off ● Configuration does not directly map to performance ● End-to-end latency depends on application’s causal dependency Start Store Insert Cache Insert 100 ms 20 ms 10 ms End 100 ms

“What-If” Analysis End-to-end What if? latency estimate Service Cloud 900 600 ms Configura App ms tion What if I move the front- end from basic to standard tier? $30 $100 Create a new deployment and measure performance ➢ Expensive ➢ Time consuming ➢ High overhead

WebPerf: “What-If” Analysis End-to-end What if? latency estimate Service Cloud 900 600 ms Configura App ms tion What if I move the front- Deploy with certain end configuration from basic to standard tier? $30 $100 Predict performance under hypothetical configurations ➢ Zero cost ➢ Near real-time ➢ Zero developer effort

WebPerf: Key Insights ● Offline, application-independent profiling is useful ● Modern cloud apps are built using existing services (PaaS) ● Individual services have predictable performance ● S3, Azure Table Storage, Dynamo DB, DocumentDB, … ● Services are co-located inside the same datacenter ● Tighter latency distribution ● Causal dependency within application is independent of the what-if scenarios we consider

Application-Independent Profiling 1 Delete (Async) (T) 5 ExecuteQuerySegmented (T) 9 ToList (S) 2 UploadFromStream (B) 6 SortedSetRangeByValue (R) 10 Send (R) 3 AddMessage(Q) 7 StringGet (R) 11 ReadAsString (B) 4 Execute (T) 8 SaveChanges (S) T:Table, R:Redis, S:SQL, B:Blob, Q:Queue

WebPerf Design ● Dependency graph extraction Start Frontend ● Application-independent profiling Processing Blob Store ● Baseline latency estimation Insert Insert Cache Insert ● Latency prediction End

Task Asynchronous Pattern (TAP) async processRequest (input) Start task asynchronously { /* process input */ task1 = store.get(key1); Continue after task finishes value1 = await task1; task2 = cache.get(key2); value2 = await task2; /* construct response */ cache.get return response; } store.get Thread3 Thread2 continuation Thread1 continuation

Dependency Graph Extraction ● Design Goals ● Accurate ● Real-time with minimal data collection ● Zero developer effort ● No modifications to the platform ● Low overhead ● Automatic Binary Instrumentation ● Modern cloud applications are highly asynchronous ● Task Asynchronous Programming Pattern

Task Asynchronous Pattern (TAP) ● Asynchronous operations with a synchronous programming pattern ● Increasingly popular for writing cloud applications ● Supported by many major languages ● C#, Java, Python, Javascript ● Most Azure services support TAP as the only mechanism for doing asynchronous I/O ● AWS also provides TAP APIs for .NET

Synchronous Programming processRequest (input) { /* process input */ value1 = store.get(key1); value2 = cache.get(key2); /* construct response */ return response; } store.get cache.get Thread1 Blocking I/O limits server throughput

Asynchronous Programming Model (APM) processRequest (input) { /* process input */ callback2 (value2) store.get(key1, callback1); { } /* construct response */ send(response); callback1 (value1) } { cache.get(key2, callback2); cache.get } Thread3 store.get Thread2 callback2 Thread1 callback1

Task Asynchronous Pattern (TAP) Dependency Graph async processRequest (input) { Start /* process input */ task1 = store.get(key1); value1 = await task1; task1 task2 = cache.get(key2); value2 = await task2; task2 /* construct response */ return response; } End

Task Asynchronous Pattern (TAP) Dependency Graph async processRequest (input) { Start /* process input */ task1 = store.get(key1); task2 = cache.get(key2); task1 task2 value1 = await task1; value2 = await task2; End /* construct response */ return response; }

Task Asynchronous Pattern (TAP) Dependency Graph async processRequest (input) { Start /* process input */ task1 = store.get(key1); task2 = cache.get(key2); task1 task2 value1, value2 = await Task.WhenAll (task1, task2); End /* construct response */ return response; } WhenAll : Continue only when all tasks finish

Task Asynchronous Pattern (TAP) Dependency Graph async processRequest (input) { Start /* process input */ task1 = store.get(key1); task2 = cache.get(key2); task1 task2 value = await Task.WhenAny (task1, task2); End /* construct response */ return response; } WhenAny : Continue after any one of tasks finishes

Automatic Binary Instrumentation class processRequest__ { async processRequest (input) string input; { AsyncTaskMethodBuilder builder; string key1, key2, response; -1 /* process input */ int asyncId = -1; public void MoveNext() task1 = store.get(key1); 0 { value1 = await task1; asyncId = Tracker.AsyncStart(asyncId); Tracker.StateStart(asyncId); task2 = cache.get(key2); 1 switch (state) -1 { value2 = await task2; case -1: state = 0; /* construct response */ continuation /* process input */ return response; var task1 = store.get(key1); 0 Tracker.TaskStart(task1, asyncId); } Instrument state machine builder.Completed(task1.Awaiter, this); Tracker.Await(task1, asyncId); continuation Track tasks and continuations case 0: state = 1; 1 var task2 = cache.get(key1); Tracker.TaskStart(task2, asyncId); builder.Completed(task2.Awaiter, this); Tracker.Await(task2, asyncId); case 1: /* construct response */

Automatic Binary Instrumentation Dependency Graph async processRequest (input) { Start -1 /* process input */ task1 = store.get(key1); 0 value1 = await task1; task1 task2 = cache.get(key2); 1 value2 = await task2; task2 /* construct response */ return response; } Instrument state machine End Track tasks and continuations

Automatic Binary Instrumentation ● Tracking async state machines ● Monitor task start and completion ● Track state machine transitions ● Tracking pull-based continuations ● Link tasks to corresponding awaits ● Link awaits to continuations ● Tracking synchronization points ● Track WhenAll, WhenAny, cascaded task dependencies ● Keeping the overhead low ● Instrument APIs with know signatures ● Instrument only leaf tasks

Dependency graph extraction Dependency Graph ● Highly accurate ● Real-time Start ● Zero developer effort task1 task2 ● Extremely low overhead End [Some Result]

API Profiling ● A profile of a cloud API is a distribution of its latency Is API in ● what-if scenario? No Yes Workload hint given? No Yes Independent Parameterized profiles Baseline profiles profiles (workload-dependent) Application-specific (e.g., Redis) (e.g., SQL) Computed offline, During dependency tracking Or on-demand for reuse 68

API Profiling ● WebPerf builds profiles offline and maintains in a dictionary Profile dictionary ● Starts with common profiles, and builds additional profiles on-demand and reuses them ● Optimal profiling: to minimize measurement costs (details in paper) 69

What-If Engine ● Predicts cloud latency under a given what-if scenario Workload Start Instrumented distributions Convolve App task1 task2 Sync task3 What-if task1 and task4 upgraded? task4 task5 Profile Baseline dictionary End latencies 70

Convolving distributions Start Bottom-up evaluation: task1 task2 T T 2 • WhenAll: ProbMax(t1, t2, …) Sync task3 1 T 3 task4 task5 T • WhenAny: ProbMin(t1, t2, …) 5 T End 4 • WhenDone: ProbAdd(t1, t2, …) 71

End-to-end Latency Prediction T e2e = T Cloud + T Network + T Browser Network WebPerf WebProphet[] latency model Combine using Monte-Carlo simulation ● Details in paper 72

WebPerf Evaluation ● Six 3 rd party applications and six scenarios Application Azure services used Average I/O Calls SocialForum Blob storage, Redis cache, Service 116 bus, Search, Table SmartStore.Net SQL 41 ContosoAds Blob storage, Queue, SQL, Search 56 EmailSubscriber Blob storage, Queue, Table 26 ContactManager Blob storage, SQL 8 CourseManager Blob storage, SQL 44 73

CDF of First Byte Time Percentage 74

WebPerf Evaluation ● Six 3 rd party applications and six scenarios What-if scenario Example Tier : A resource X is upgraded to tier Y X = A Redis cache, Y = a standard tier (from a basic tier) Load : X concurrent requests to resource Y X = 100 , Y = the application or a SQL database Interference : CPU and/or memory pressure, X = 50% CPU, 80% memory from collocated applications, of X% Location : A resource X is deployed at location X = A Redis Cache or a front end, Y = Singapore Y Failure : An instance of a replicated resource X X = A replicated front-end or SQL database fails 75

WebPerf Evaluation WebPerf prediction Ground truth from real deployment 76 ● Metric: distribution of relative errors

Underspecified Cache Redis Cache Configuration Dimensions 77

What-if the Redis cache is upgraded from the original standard C0 to Standard C2 tier? Maximum cloud side latency prediction error is only 5% 78

What-if the Redis cache is upgraded from the original standard C0 to Standard C2 tier? 79

What-if the Redis cache is upgraded from the original standard C0 to Standard C2 tier? Maximum cloud side latency prediction error is only 5% 80

Performance for Six Applications SocialForum ContosoAds SmartStore.Net EmailSubscriber ContactManager CourseManager Median prediction error for cloud side Scenarios latency is < 7% 81

Performance for Six Applications Scenarios Median prediction error for cloud side latency is < 7% 82

Workload Hints Workload hints can bring order of magnitude accuracy improvement 83

Workload Hints Workload hints can bring order of magnitude accuracy improvement 84

Other findings ● Performance of many applications and scenarios can be predicted reasonably well ● Thanks to cloud provider’s SLAs ● Harder cases ● Workload-dependent performance: hints help ● High-variance profiles: prediction has high variance ● Non-deterministic control flow (e.g., cache hit/miss): ● Separate prediction for each control flow ● Hard-to-profile APIs (e.g., SQL query with join) ● Poor prediction 85

Modern Cloud Applications are Complex Behind apps, several distributed, asynchronous components 86

Modern Cloud Applications are Complex Cloud Notification Facebook Google Front-end Service Queue Log Store Cache Relational Store Workers Storage CDN Blob Store Search Store 87

Hard to reason about the performance of cloud-hosted Cloud-side Web apps Latency 88

Hard to reason about the cost/performance tradeoffs of different configurations 89

Workload Dependence 90

Deploy What if I move the front-end What if? from basic to standard tier? Basic Standard Service Cloud App 900 ms 600 ms Configuration ➢ Expensive ➢ Slow ➢ High Effort 91

Model and Predict What if I move the front-end What if? from basic to standard tier? Service Cloud App 200 ms Configuration Standard 100 ms Service Configuration 92

Combining latency distributions task1 task2 93

Combining latency WhenAll distributions task1 task2 Max 94

Combining latency WhenAny distributions task1 task2 Min 95

What if I move the front-end Cloud-Side from basic to standard tier? Latency Estimation Start Replace from profile Frontend Processing Blob Store Insert Insert Cache Insert End 96

What if I move the front-end WebPerf from basic to standard tier? Approach Start ❉ Dependency graph Frontend extraction Processing Blob Store Baseline latency Insert Insert estimation Cache Insert Component Profiling End Cloud-side latency estimation 97

What if I move the front-end WebPerf from basic to standard tier? Start Dependency graph Frontend extraction Processing Developer Blob Store Baseline latency supplies Insert workload Insert estimation Computed Cache Offline Insert Component Profiling End Cloud-side latency estimation 98

Evaluation ❉ Goals How accurate is WebPerf? What is WebPerf’s overhead? What are the primary sources of prediction error? Are workload hints necessary? ❉ Can WebPerf predict end-to-end latency? 99

Task async processRequest (input) On front- end Asynchronous { /* process input */ Programming Start task task1 = store.get(key1); value1 = await task1; Continue task2 = cache.get(key2); value2 = await task2; /* construct response */ return response; } cache.get After await store.get Thread3 returns Thread2 continuation Thread1 continuation 100

WebPerf: Ramesh Govindan Evaluating What- If Scenarios for - PowerPoint PPT Presentation

Yurong Jiang Lenin Ravindranath Suman Nath WebPerf: Ramesh Govindan Evaluating What- If Scenarios for Cloud-hosted Web Applications 1 A Cloud-Hosted Web Application 2 Modern Cloud Applications are Complex Cloud Notification

SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence

Readings 1.0 The Revolution Against Capital Page 32, The Gramsci Reader. This is about

REMINDERS Course Website: h5p://www.stanford.edu/~eckert/InsAtute2011/

Genetic Approximations for the Failure-Free Security Games Anton Charnamord

Gas Market Restructuring in the 21th Century trading old for new shibboleths Richard ONeill

CSC 369: Distributed Computing Alex Dekhtyar April 22 Day 8: Problem-solving with

Data Infrastructures and Digital Waste Julia Velkova University of Helsinki HSS Digital

Divesh Srivastava AT&T Labs-Research The Web is Great A Lot of Information on the Web

Fall 2013 GLS102 Comparative Government & Politics Fall 2013 2 Worldview; 19 th

The Mengerian Roots of Hayeks Conservative Liberalism Professor Hannes H. Gissurarson

USCMS HCALTriDAS HCALTriDAS USCMS Lehman 2000 Project Timeline 2000 2001 2002 2003 2004

Conflict, Evolution, Hegemony, and the Power of the State David K. Levine and Salvatore Modica

Finance and corporate governance in the Chinese NSI: macro and micro, national and international

Deep-end Tasks for Low-level Learners Simon Williams & Yolanda Cerd Sussex Centre for

Text analysis From a string of characters to a list of words. 11-752, LTI, Carnegie Mellon Text

Infrastructure for Creativity and Innovation Infrastructure for Creativity and Innovation Space,

Navigating ! the ! Real ! ? ! The ! Map ! as ! Model ! and ! Metaphor ! Phil ! Cohen ! Living !

Monday, March 28, 16 The Bauhaus Monday, March 28, 16 In 1919, Walter Gropius was the first

This Prophe phecy & Creation tion Revela lation tion Present THE FUEL PROJECT: Know

STARTMRK Trial Raltegravir + TDF-FTC vs. Efavirenz + TDF-FTC STARTMRK: Study Design Study

Boomerang: Resourceful Lenses for String Data Aaron Bohannon (Penn) J. Nathan Foster (Penn)

Clusters and Lenses: Analyzing Ten Gravitational Lensing Systems Discovered in the Sloan Digital

Shattered lens Oleg Grenrus MSFP 2020 online 2020-09-01 The record update problem Lenses

Multicategories of multiary lenses Bob Rosebrugh (with Mike Johnson) Department of Mathematics

WebPerf: Ramesh Govindan Evaluating What- If Scenarios for - PowerPoint PPT Presentation

Yurong Jiang Lenin Ravindranath Suman Nath WebPerf: Ramesh Govindan Evaluating What- If Scenarios for Cloud-hosted Web Applications 1 A Cloud-Hosted Web Application 2 Modern Cloud Applications are Complex Cloud Notification

SIMILAR MARKOV CHAINS by Phil Pollett The University of Queensland MAIN REFERENCES Convergence

Readings 1.0 The Revolution Against Capital Page 32, The Gramsci Reader. This is about

REMINDERS Course Website: h5p://www.stanford.edu/~eckert/InsAtute2011/

Genetic Approximations for the Failure-Free Security Games Anton Charnamord

Gas Market Restructuring in the 21th Century trading old for new shibboleths Richard ONeill

CSC 369: Distributed Computing Alex Dekhtyar April 22 Day 8: Problem-solving with

Data Infrastructures and Digital Waste Julia Velkova University of Helsinki HSS Digital

Divesh Srivastava AT&amp;T Labs-Research The Web is Great A Lot of Information on the Web

Fall 2013 GLS102 Comparative Government &amp; Politics Fall 2013 2 Worldview; 19 th

The Mengerian Roots of Hayeks Conservative Liberalism Professor Hannes H. Gissurarson

USCMS HCALTriDAS HCALTriDAS USCMS Lehman 2000 Project Timeline 2000 2001 2002 2003 2004

Conflict, Evolution, Hegemony, and the Power of the State David K. Levine and Salvatore Modica

Finance and corporate governance in the Chinese NSI: macro and micro, national and international

Deep-end Tasks for Low-level Learners Simon Williams &amp; Yolanda Cerd Sussex Centre for

Text analysis From a string of characters to a list of words. 11-752, LTI, Carnegie Mellon Text

Infrastructure for Creativity and Innovation Infrastructure for Creativity and Innovation Space,

Navigating ! the ! Real ! ? ! The ! Map ! as ! Model ! and ! Metaphor ! Phil ! Cohen ! Living !

Monday, March 28, 16 The Bauhaus Monday, March 28, 16 In 1919, Walter Gropius was the first

This Prophe phecy &amp; Creation tion Revela lation tion Present THE FUEL PROJECT: Know

STARTMRK Trial Raltegravir + TDF-FTC vs. Efavirenz + TDF-FTC STARTMRK: Study Design Study

Boomerang: Resourceful Lenses for String Data Aaron Bohannon (Penn) J. Nathan Foster (Penn)

Clusters and Lenses: Analyzing Ten Gravitational Lensing Systems Discovered in the Sloan Digital

Shattered lens Oleg Grenrus MSFP 2020 online 2020-09-01 The record update problem Lenses

Multicategories of multiary lenses Bob Rosebrugh (with Mike Johnson) Department of Mathematics

Divesh Srivastava AT&T Labs-Research The Web is Great A Lot of Information on the Web

Fall 2013 GLS102 Comparative Government & Politics Fall 2013 2 Worldview; 19 th

Deep-end Tasks for Low-level Learners Simon Williams & Yolanda Cerd Sussex Centre for

This Prophe phecy & Creation tion Revela lation tion Present THE FUEL PROJECT: Know