Measuring and Optimizing Tail Latency Kathryn S McKinley, Google - - PowerPoint PPT Presentation

measuring and optimizing tail latency
SMART_READER_LITE
LIVE PREVIEW

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google - - PowerPoint PPT Presentation

Measuring and Optimizing Tail Latency Kathryn S McKinley, Google CRA-W Undergraduate Town Hall April 5 th , 2018 Speaker & Moderator The image part with relationship ID rId3 was not found in the file. Lori Pollock Kathryn S McKinley Dr.


slide-1
SLIDE 1

Measuring and Optimizing Tail Latency

Kathryn S McKinley, Google

CRA-W Undergraduate Town Hall April 5th, 2018

slide-2
SLIDE 2

Speaker & Moderator

Kathryn S McKinley

  • Dr. Kathryn S. McKinley is a Senior Research

Scientist at Google and previously was a Researcher at Microsoft and an Endowed Professorship at The University of Texas at Austin. Her research spans programming languages, compilers, runtime systems, architecture, performance, and energy. She and her collaborators have produced several widely used tools: the DaCapo Java Benchmarks (30,000+ downloads), the TRIPS Compiler, Hoard memory manager, MMTk memory management toolkit, and the Immix garbage collector. She served as program chair for ASPLOS, PACT, PLDI, ISMM, and CGO. She is currently a CRA and CRA-W Board member. Dr. McKinley was honored to testify to the House Science Committee (Feb. 14, 2013). She is an IEEE and ACM Fellow. She has graduated 22 PhD students.

Lori Pollock

  • Dr. Lori Pollock is a Professor in Computer and

Information Sciences at University of

  • Delaware. Her current research focuses on

program analysis for building better software maintenance tools, software testing, energy- efficient software and computer science

  • education. Dr. Pollock is an ACM Distinguished

Scientist and was awarded the University of Delaware’s Excellence in Teaching Award and the E.A. Trabant Award for Women’s Equity.

The image part with relationship ID rId3 was not found in the file.
slide-3
SLIDE 3

Measuring and Optimizing Tail Latency

Kathryn S McKinley, Google Xi Yang, Stephen M Blackburn, Md Haque, Sameh Elnikety, Yuxiong He, Ricardo Bianchini

slide-4
SLIDE 4
slide-5
SLIDE 5

Tail Latency Matters

4

Two second slowdown reduced revenue/user by 4.3%. [Eric Schurman, Bing] 400 millisecond delay decreased searches/user by 0.59%. [Jack Brutlag, Google]

TOP PRIORITY

slide-6
SLIDE 6

5

Photo: Google/Connie Zhou

slide-7
SLIDE 7

~ $30,000,000 Savings from 1% less work Lots more by not building a datacenter

Datacenter economics quick facts*

6

~ $500,000 Cost of small datacenter ~3,000,000 US datacenters in 2016 ~ $1.5 trillion US Capital investment to date ~ $3,000,000,000 KW dollars / year

*Shehabi et al., United States Data Center Energy Usage Report, Lawrence Berkeley, 2016.

slide-8
SLIDE 8

Tail Latency

8

TOP PRIORITY

8

Efficiency

slide-9
SLIDE 9

Tail Latency

9

BOTH ?!

9

Efficiency

slide-10
SLIDE 10

10

Server architecture

aggregator workers client

slide-11
SLIDE 11

11

Characteristics of interactive services

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

LC

Bursty, diurnal CDF changes slowly Slowest server dictates tail Orders of magnitude diff average & tail - 99th %tile

slide-12
SLIDE 12

12

What is in the tail?

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

?

slide-13
SLIDE 13

Insight Hardware & software generate signals without instrumentation

4

HT1 IPC

4

Core IPC

4

HT2 SHIM IPC

Cycle-level on-line profiling tool

13

[ISCA’15 (Top Picks HM), ATC’16] HT1 HT2

counters tags

performance counters memory locations ✓ ✓ ✓ ✓

HT1 IPC = Core IPC – HT2 SHIM IPC

slide-14
SLIDE 14

14

What is in the tail?

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

?

slide-15
SLIDE 15

The Tail Longest 200 requests

15

20 40 60 80 100 120 50 100 150 200

latency (ms)

Top 200 requests Network and networking queueing time Idle time CPU time Dispatch queueing time

latency Network & other Idle CPU work Queuing at worker

not noise

Network imperfections OS imperfections Long requests Overload

} noise }

slide-16
SLIDE 16

Optimizing the tail

Diagnosing the tail with continuous profiling No Noise ise systems are not perfect Queuing Queuing too much load is bad, but so is over provisioning Wo Work many requests are long

In Insights Use the CDF off line

Long requests reveal themselves, treat them specially

16

slide-17
SLIDE 17

Insight

Long requests reveal themselves

Regardless of the cause

17

slide-18
SLIDE 18

Noise Replicate & reissue

The Tail at Scale, Dean & Barroso, CACM’13

18 18

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

All requests? CFD for cost & potential Fixed issue time 10 % reissued 5% reissued

noise

slide-19
SLIDE 19

Probabilistic reissue

Optimal Reissue Policies for Reducing Tail Latencies, Kaler, He, & Elnickety , SPAA’17

19 19

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

Adding randomness to reissue makes one earlier reissue time d (vs n) optimal Probability is proportional to reissue budget & noise in tail

1-3% reissue w/ prob. p

noise

5% reissued

slide-20
SLIDE 20

20

Single R Probabilistic reissue

Optimal Reissue Policies for Reducing Tail Latencies, Kaler, He, & Elnickety , SPAA’17

slide-21
SLIDE 21

21

Work Speed up the tail efficiently

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

work

Judicious parallelism

[ASPLOS’15]

DVFS faster on the tail

[DISC’14, MICRO’17]

Asymmetric multicore

[DISC’14, MICRO’17]

slide-22
SLIDE 22

Work Parallelism

Parallelism historically for throughput Parallelism for tail latency Idea

slide-23
SLIDE 23

Queuing theory

Optimizing average latency maximizes throughput But not the tail! Shortening the tail reduces queuing latency

23

slide-24
SLIDE 24

24

Parallelism

Insight Approach Long requests reveal themselves Incrementally add parallelism to long requests – the tail – based on request progress & load Parallelism historically for throughput Parallelism for tail latency Idea

slide-25
SLIDE 25

300 600 900 1200 1500 30 32 34 36 38 40 42 44 46 48

Tail latency ms Lucene RPS

Sequential 4 way Fixed interval 20 ms Fixed interval 100 ms Fixed interval 500 ms

Few to Many

Fi Fixe xed: add thread every d ms Dynamic: u : use l load

short delay good at low load long delay good at high load best at all loads

slide-26
SLIDE 26

Evaluation 2x8 64 bit 2.3 GHz Xeon, 64 GB

300 600 900 1200 1500 30 32 34 36 38 40 42 44 46 48

Tail latency ms Requests per Second

21% fewer servers

  • r reduce tail by 28%

Dynamic parallelism Few to Many Sequential

slide-27
SLIDE 27

27

Work speed up the tail efficiently

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 2 4 6 8 1

20 40 60 80 100

Percentage of requests Latency (ms)

work

Judicious parallelism

[ASPLOS’15]✔

slide-28
SLIDE 28

Tail Latency

28

BOTH !

28

Efficiency

slide-29
SLIDE 29

Efficiency at scale for interactive workloads

Diagnosing the tail with continuous profiling No Noise ise replication, systems are not perfect Queuing Queuing replication + judicious choice Wo Work judicious use of resources on long requests Request latency CDF is a powerful tool Tail efficiency ≠ average or throughput Hardware heterogeneity

29

Questions?

slide-30
SLIDE 30

Professional and Research Relationships

slide-31
SLIDE 31

Your Academic Village

  • Peer students
  • Students senior & junior to you
  • Teaching assistants
  • PhD students
  • Faculty
slide-32
SLIDE 32

My Professional Village

  • Researchers in all career stages

– Undergrads, PhD students, post docs – Faculty, industrial researchers, staff, administrators

  • Industrial village

– Software engineers in all career stages – Managers, directors, admins, – in/out my management chain

slide-33
SLIDE 33

Faculty Mentors

Don Johnson My Professor PhD Advisor

  • Dept. Chair

Ken Kennedy Dave Stemple

slide-34
SLIDE 34

Building a Village

slide-35
SLIDE 35

Networking is….

Building and sustaining professional relationships

  • Participating in an academic / research community
  • Finding people you like and you learn from, and building a

relationship

slide-36
SLIDE 36

Networking is not….

  • Using people
  • A substitute for quality work
slide-37
SLIDE 37

But I am Horrible at Small Talk

  • You have CS in common
  • Networking is not genetic
  • It is a research skill

– Practice – Meet people – Learn – Go places – Volunteer! – Sustain your relationships

slide-38
SLIDE 38

With whom do you network?

  • People you like
  • People senior to you, who can show you the way
  • People at different career stages, so you can anticipate
  • Your peers
slide-39
SLIDE 39

Peer Mentors

Mary Hall Doug Burger Margaret Martonosi

slide-40
SLIDE 40

Your Village Will

  • Write letters for grad school, jobs, etc.
  • Help you solve problems
  • Point you in good directions
  • Encourage you
  • Choose you for important roles
  • You will do the same or more for them
  • Make your life and work more fun and meaningful