Measuring and Optimizing Tail Latency Kathryn S McKinley, Google - PowerPoint PPT Presentation

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google CRA-W Undergraduate Town Hall April 5 th , 2018

Speaker & Moderator The image part with relationship ID rId3 was not found in the file. Lori Pollock Kathryn S McKinley Dr. Lori Pollock is a Professor in Computer and Dr. Kathryn S. McKinley is a Senior Research Information Sciences at University of Scientist at Google and previously was a Delaware. Her current research focuses on Researcher at Microsoft and an Endowed program analysis for building better software Professorship at The University of Texas at Austin. maintenance tools, software testing, energy- Her research spans programming languages, efficient software and computer science compilers, runtime systems, architecture, education. Dr. Pollock is an ACM Distinguished performance, and energy. She and her Scientist and was awarded the University of collaborators have produced several widely used Delaware’s Excellence in Teaching Award and tools: the DaCapo Java Benchmarks (30,000+ the E.A. Trabant Award for Women’s Equity. downloads), the TRIPS Compiler, Hoard memory manager, MMTk memory management toolkit, and the Immix garbage collector. She served as program chair for ASPLOS, PACT, PLDI, ISMM, and CGO. She is currently a CRA and CRA-W Board member. Dr. McKinley was honored to testify to the House Science Committee (Feb. 14, 2013). She is an IEEE and ACM Fellow. She has graduated 22 PhD students.

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google Xi Yang, Stephen M Blackburn, Md Haque, Sameh Elnikety, Yuxiong He, Ricardo Bianchini

Tail Latency Matters TOP PRIORITY 400 millisecond delay decreased Two second slowdown reduced searches/user by 0.59%. [Jack Brutlag, Google] revenue/user by 4.3%. [Eric Schurman, Bing] 4

5 Photo: Google/Connie Zhou

Datacenter economics quick facts* ~ $500,000 Cost of small datacenter ~3,000,000 US datacenters in 2016 ~ $1.5 trillion US Capital investment to date ~ $3,000,000,000 KW dollars / year ~ $30,000,000 Savings from 1% less work Lots more by not building a datacenter *Shehabi et al., United States Data Center Energy 6 Usage Report, Lawrence Berkeley, 2016.

TOP PRIORITY Tail Latency Efficiency 8 8

BOTH ?! Tail Latency Efficiency 9 9

Server architecture client aggregator workers 10

Characteristics of interactive services 100 5 LC 4.5 Percentage of requests 80 4 Bursty, diurnal 3.5 60 3 CDF changes slowly 2.5 Slowest server dictates tail 40 2 1.5 Orders of magnitude diff 20 1 average & tail - 99th %tile 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 11

What is in the tail? 100 5 4.5 Percentage of requests 80 4 3.5 60 3 2.5 ? 40 2 1.5 20 1 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 12

Cycle-level on-line profiling tool [ISCA’15 (Top Picks HM), ATC’16] Insight Hardware & software generate signals without instrumentation 4 HT1 IPC counters tags HT1 0 4 Core IPC performance ✓ ✓ 4 0 counters HT2 SHIM IPC HT2 0 memory ✓ ✓ HT1 IPC = Core IPC – HT2 SHIM IPC locations 13

What is in the tail? 100 5 4.5 Percentage of requests 80 4 3.5 60 3 2.5 ? 40 2 1.5 20 1 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 14

The Tail Longest 200 requests } noise Network & other Network imperfections Network and networking queueing time Idle OS imperfections Idle time } 120 CPU work Long requests not noise CPU time Queuing at worker Overload 100 Dispatch queueing time latency (ms) latency 80 60 40 20 0 0 50 100 150 200 15 Top 200 requests

Optimizing the tail Diagnosing the tail with continuous profiling No Noise ise systems are not perfect too much load is bad, but so is over Queuing Queuing provisioning Wo Work many requests are long In Insights Use the CDF off line Long requests reveal themselves, treat them specially 16

Insight Long requests reveal themselves Regardless of the cause 17

Noise Replicate & reissue The Tail at Scale, Dean & Barroso, CACM’13 All requests? 100 5 4.5 5% reissued Percentage of requests 80 4 CFD for cost & potential 3.5 10 % reissued 60 3 2.5 Fixed issue time 40 2 1.5 noise 20 1 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 18 18

Probabilistic reissue Optimal Reissue Policies for Reducing Tail Latencies, Kaler, He, & Elnickety , SPAA’17 Adding randomness to 100 5 reissue makes one earlier 4.5 5% reissued Percentage of requests reissue time d (vs n) optimal 80 4 3.5 60 3 Probability is proportional to 1-3% reissue w/ prob. p 2.5 reissue budget & noise in tail 40 2 1.5 noise 20 1 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 19 19

Single R Probabilistic reissue Optimal Reissue Policies for Reducing Tail Latencies, Kaler, He, & Elnickety , SPAA’17 20

Work Speed up the tail efficiently Judicious parallelism 100 5 [ASPLOS’15] 4.5 Percentage of requests DVFS faster on the tail 80 4 [DISC’14, MICRO’17] 3.5 Asymmetric multicore 60 3 2.5 [DISC’14, MICRO’17] 40 2 1.5 work 20 1 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 21

Work Parallelism Parallelism historically for throughput Idea Parallelism for tail latency 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Queuing theory Optimizing average latency maximizes throughput But not the tail! Shortening the tail reduces queuing latency 23

Parallelism Parallelism historically for throughput Idea Parallelism for tail latency 0 0 0 0 Insight Long requests reveal themselves 0 0 0 0 0 0 0 0 Approach Incrementally add parallelism to long requests – the tail – based on request progress & load 24

Few to Many Sequential xed: add thread every d ms Fi Fixe 4 way Dynamic: u : use l load Tail latency ms Fixed interval 20 ms 1500 Fixed interval 100 ms Fixed interval 500 ms 1200 0 0 0 0 900 0 0 0 0 short delay good at best at 0 0 0 0 low load all loads 600 long delay good at 300 high load 30 32 34 36 38 40 42 44 46 48 Lucene RPS

Evaluation 2x8 64 bit 2.3 GHz Xeon, 64 GB Dynamic parallelism 1500 Sequential Few to Many Tail latency ms 1200 21% fewer servers 900 600 or reduce tail by 28% 300 30 32 34 36 38 40 42 44 46 48 Requests per Second

Work speed up the tail efficiently Judicious parallelism 100 5 [ASPLOS’15] ✔ 4.5 Percentage of requests 80 4 3.5 60 3 2.5 40 2 1.5 work 20 1 0.5 0 0 0 2 4 6 8 1 0 0 0 0 0 0 Latency (ms) 27

BOTH ! Tail Latency Efficiency 28 28

Efficiency at scale for interactive workloads Diagnosing the tail with continuous profiling No Noise ise replication, systems are not perfect replication + judicious choice Queuing Queuing Wo Work judicious use of resources on long requests Request latency CDF is a powerful tool Tail efficiency ≠ average or throughput Hardware heterogeneity Questions? 29

Professional and Research Relationships

Your Academic Village • Peer students • Students senior & junior to you • Teaching assistants • PhD students • Faculty

My Professional Village • Researchers in all career stages – Undergrads, PhD students, post docs – Faculty, industrial researchers, staff, administrators • Industrial village – Software engineers in all career stages – Managers, directors, admins, – in/out my management chain

Faculty Mentors Don Johnson Ken Kennedy Dave Stemple My Professor PhD Advisor Dept. Chair

Building a Village

Networking is…. Building and sustaining professional relationships Participating in an academic / research community • Finding people you like and you learn from, and building a • relationship

Networking is not …. Using people • A substitute for quality work •

But I am Horrible at Small Talk • You have CS in common • Networking is not genetic • It is a research skill – Practice – Meet people – Learn – Go places – Volunteer! – Sustain your relationships

With whom do you network? People you like • People senior to you, who can show you the way • People at different career stages, so you can anticipate • Your peers •

Peer Mentors Mary Hall Doug Burger Margaret Martonosi

Your Village Will • Write letters for grad school, jobs, etc. • Help you solve problems • Point you in good directions • Encourage you • Choose you for important roles • You will do the same or more for them • Make your life and work more fun and meaningful

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google - PowerPoint PPT Presentation

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google CRA-W Undergraduate Town Hall April 5 th , 2018 Speaker & Moderator The image part with relationship ID rId3 was not found in the file. Lori Pollock Kathryn S McKinley Dr.

Tales of the Tail Hardware, OS, and Application-level Sources of Tail Latency Jialin Li, Naveen

CS161 Recursion Continued Tail recursion n Tail recursion is a recursive call that occurs as

Race Condition Shared Data: 4 5 6 1 8 5 6 20 9 ? Synchronization and Deadlocks tail

Race Condition Shared Data: 5 6 4 1 8 5 6 20 9 ? InterProcess Communication tail A[]

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU

Optimizing monitoring networks for Optimizing monitoring networks for Optimizing monitoring

Asynchronous I/O Stack: A Low-latency Kernel I/O Stack for Ultra-Low Latency SSDs Jinkyu Jeong

tail bounds tail bounds For a random variable X, the tails of X are the parts of the PMF/density

Probe or Wait : Handling tail losses using Multipath TCP Kiran Yedugundla, Per Hurtig, Anna

Lets talk locks! @kavya719 kavya locks. locks are slow locks are slow latency

FAILURE AT NETFLIX VELOCITY Cannot Connect to the Netflix Service 0 0 Ms % IMPACT LATENCY

Low Latency Live Video Streaming over HTTP 2.0 Sheng Wei, Vishy Swaminathan | Adobe Research

Synthesis and characterization of new tail-to-tail dimers of bile acids with different spacers

Tail call elimination Tail calls and their elimination Michel Schinz Loops in functional

Tail call elimination Michel Schinz Tail calls and their elimination Loops in functional

NATIONAL ELECTRICITY MARKET: NATIONAL ELECTRICITY MARKET: TAIL BETWEEN ITS NEG? TAIL BETWEEN ITS

The Met Office Weakly-Coupled Atmosphere/Land/Ocean/Sea-Ice Data Assimilation System Isabelle

Dr. Ramanan Krishnamoorti Chief Energy Officer UH Energy Hydrogen October 23 rd October 30 th

Two days in The Life of The DNS Anycast Root Servers Ziqian Liu Beijing Jiaotong Univeristy

Groundwater Surface Water interactions on Satinleaf Tree Island, Everglades National Park

Intraday Trading Invariants for Equity-Index Futures Torben G. Andersen, Oleg Bondarenko, Albert

Semi-Cyclic SGD Hubert Eichner Tomer Koren Brendan McMahan Kunal Talwar Google Google Google

SHAPE ANALYSIS OF FUNCTIONAL DATA Anuj Srivastava Joint work with Sutanoy Dasgupta, Ian Jermyn,

Covariances for four reference points t + 2 s ,t + m s ,t M s