T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L - PowerPoint PPT Presentation

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu

Executive Summary 2  Latency-critical applications have stringent performance requirements  low datacenter utilization  Wastes billions of dollars in energy and equipment annually  Research in this area hampered by the lack of a comprehensive benchmark suite  Few latency-critical applications  limited coverage  Complicated setup and configuration Inaccurate latency  Methodological issues measurements  TailBench makes latency-critical applications easy to analyze  Varied application domains and latency characteristics  Standardized, statistically sound methodology  Supports simplified load-testing configurations

Outline 3  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

Understanding Latency-Critical Applications 4 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

Understanding Latency-Critical Applications 7 Back End Back End 1 ms Leaf Node Client Back End Client 1 ms Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter  The few slowest responses determine user-perceived latency  Tail latency (e.g., 95 th / 99 th percentile), not mean latency, determines performance

Latency Requirements Cause Low Utilization 8  End-to-end latency increases rapidly with load  Must keep utilization low to keep latency within reasonable bounds  Traditional resource management techniques (e.g., colocation) often cannot be used since they degrade latency  Low resource utilization wastes billions of dollars in energy and equipment  Sparked research in latency-critical systems

Benchmark Suite Design Goals 9  Applications from a diverse set of domains Hell K V 你好 o  Applications with diverse tail latency characteristics 100 μ s 1 ms 10 ms 100 ms 1 s Live VM Migration LLC Warmup DVFS  Easy to set up and run  Support different measurement scenarios  Robust latency measurement methodology

TailBench Applications 11 xapian masstree moses sphinx K V Hello 你好 Speech Statistical Machine Online Search Key-Value Store Recognition Translation shore silo specjbb img-dnn On-disk Database Image Recognition Java Middleware In-memory Database

Wide Range of End-to-End Latencies 12 100 μ s 1 ms 10 ms 100 ms 1 s silo specjbb masstree shore xapian img-dnn moses sphinx

Varied Service Time Characteristics 13  masstree service times are more tightly distributed  xapian service times are more loosely distributed

End-to-End Latency vs. Load 14

Tail ≠ Mean 15  Tail latency increases more rapidly with load than mean latency  Relationship between mean and tail latencies is hard to predict

Impact of Parallelism 16

Parallelism Helps Some Applications 17

…But Hurts Others 18

TailBench Harness 20  Measuring tail latency accurately is complicated  Load generation, statistics aggregation, warmup periods…  Harness encapsulates most of the complexity  Harness makes TailBench easily extensible  New benchmarks reuse existing harness functionality  Simplified harness configurations enable different measurement scenarios  Trade off some accuracy for reduced setup complexity

Example: Open- vs. Closed-Loop Clients 21 Client Ω Network Ω Client Application  Many popular load testers use closed-loop clients  Clients wait for response before submitting next request  Increase in application load throttles client request rate  Latency-critical applications typically service a large number of independent clients  Request rate independent of application load  Better modeled by open-loop clients  Closed-loop clients can underestimate latency by orders of magnitude [Tene LLS 2013, Zhang ISCA 2016]

Networked Harness Configuration 22 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector

Networked Harness Configuration 23 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

Networked Harness Configuration 27 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Faithfully captures all sources of overhead X Difficult to configure and deploy

Loopback Harness Configuration 29 App Client TCP/IP TCP/IP Loopback Application Loopback App Client  Application and clients reside on the same machine  Reduced setup complexity  Highly accurate in many cases X Difficult to simulate

Load-Latency for Networked Configuration 30

Loopback Configuration Highly Accurate 31  Loopback and Networked configurations have near-identical performance  Networking delays minimal in our setup

Loopback Harness Configuration 32 App Client TCP/IP TCP/IP Loopback Application Loopback App Client  Application and clients reside on the same machine  Reduced setup complexity  Highly accurate in many cases X Still difficult to simulate

Integrated Harness Configuration 33 App Client Application Single Process  Application and client integrated into a single process  Easy to setup X Some loss of accuracy 

Integrated Configuration Validation 34 39% 23%  Networked/Loopback configurations saturate earlier for applications with short requests (silo, specjbb)  TCP/IP processing overhead a significant fraction of request

Integrated Harness Configuration 35 App Client Application Single Process  Application and client integrated into a single process  Easy to setup X Some loss of accuracy  Enables user-level simulations

Simulation vs. Real System 36 16% 32% 20% 16% 31%  Performance difference between real and simulated systems well within usual simulation error bounds  Average absolute error in saturation QPS: 14%  zsim IPC error for SPEC CPU2006 applications: 8.5 – 21%

Conclusions 37  TailBench includes a diverse set of latency-critical applications with varied latency characteristics  TailBench harness implements a statistically sound experimental methodology to achieve accurate results  Various harness configurations allow trading off configuration complexity for some accuracy  Our results show that the integrated configuration is highly accurate for six of our eight benchmarks

T HANKS F OR Y OUR A TTENTION ! Q UESTIONS ? tailbench.csail.mit.edu

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L - PowerPoint PPT Presentation

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu Executive Summary 2 Latency-critical applications have stringent

Vincent t GA GARCIA SPACEOPS 31 May 2018 PEPS PEPS is is th the F e Fren ench h Pl

VSAM P ERFORMANCE S UITE Optimize VSAM performance with this powerful suite of tools from CSI

em ail addiction in the workplace 1 2 addicted? 3 4 w here? an obsessive-com

Lone S Star R tar Reg egional al R Rail ail Pro Project - Upd pdate Jos oseph B h Black

Lone S Star R tar Reg egional al R Rail ail Pro Project - Upd pdate Jos oseph B h Black

2014 H EALTH C ARE C OST T RENDS H EARING P ANEL 1 M EETING THE C OST G ROWTH B ENCHMARK P ANEL 2 A

MANILA WINE COURSES Lear Learn Fren ench h Wine ine & & Wine ine Tas asting ting

Will William iam Bou Bouguere guereau au 1 Fren ench ch Acad ademi emic c Clas assica

Appendix D Presentation Slides with Script oje c t 10 S. R side Plaza, Suite 400 Chic

C OMPLETION T IME T AIL IN D ATACENTER N ETWORKS David Zats, Tathagata Das, Prashanth Mohan,

S-M-XL=MS LEARNING FROM BIG RET AIL P RO P E RT Y & R E TA I L S T R AT E G I E S F O R

Co unc il Pre se nta tio n DEC EMBER 15, 2017 ATTACHMENT B F low T r ail Pr ojec t Motion

China Mobile Mobile Em ail Service Jan 24th,2006 Outline Market in China Service

Performance Based Fees Fresno County Employees Retirement Association December 3, 2008

MYREN Future Network KONVENSYEN PENTADBIR (ICT) 2016, M-S UITE HOTEL , JOHOR BHARU 28 JULAI 2016

Kent service developments and prospects beyond le leaving the EU KCC CC Rail ail Su Summit

2020 U.S. Economic & Financial Market Outlook Providence Business News 2020 Economic Trends

Global demographic trends and social security reform Orazio Attanasio, University College London

Inheritances and the Distribution of Wealth European Investment Bank, Luxembourg Edward N.

D YNAMICAL S YSTEMS 2 I NSTRUCTOR : G IANNI A. D I C ARO V ECTOR F IELDS AND O RBITS

CSE 452 Distributed Systems Arvind Krishnamurthy Distributed Systems How to make a set of

Compiling Techniques Lecture 2: The view from 35000 feet Christophe Dubach 18 September 2018

A Lightweight Library for Building Scalable T ools Emily R. Jacobson , Michael J. Brim, Barton

Prism: A Proxy Architecture for Datacenter Networks Yutaro Hayakawa (Keio University) Lars

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L - PowerPoint PPT Presentation

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu Executive Summary 2 Latency-critical applications have stringent

Vincent t GA GARCIA SPACEOPS 31 May 2018 PEPS PEPS is is th the F e Fren ench h Pl

VSAM P ERFORMANCE S UITE Optimize VSAM performance with this powerful suite of tools from CSI

em ail addiction in the workplace 1 2 addicted? 3 4 w here? an obsessive-com

Lone S Star R tar Reg egional al R Rail ail Pro Project - Upd pdate Jos oseph B h Black

Lone S Star R tar Reg egional al R Rail ail Pro Project - Upd pdate Jos oseph B h Black

2014 H EALTH C ARE C OST T RENDS H EARING P ANEL 1 M EETING THE C OST G ROWTH B ENCHMARK P ANEL 2 A

MANILA WINE COURSES Lear Learn Fren ench h Wine ine &amp; &amp; Wine ine Tas asting ting

Will William iam Bou Bouguere guereau au 1 Fren ench ch Acad ademi emic c Clas assica

Appendix D Presentation Slides with Script oje c t 10 S. R side Plaza, Suite 400 Chic

C OMPLETION T IME T AIL IN D ATACENTER N ETWORKS David Zats, Tathagata Das, Prashanth Mohan,

S-M-XL=MS LEARNING FROM BIG RET AIL P RO P E RT Y &amp; R E TA I L S T R AT E G I E S F O R

Co unc il Pre se nta tio n DEC EMBER 15, 2017 ATTACHMENT B F low T r ail Pr ojec t Motion

China Mobile Mobile Em ail Service Jan 24th,2006 Outline Market in China Service

Performance Based Fees Fresno County Employees Retirement Association December 3, 2008

MYREN Future Network KONVENSYEN PENTADBIR (ICT) 2016, M-S UITE HOTEL , JOHOR BHARU 28 JULAI 2016

Kent service developments and prospects beyond le leaving the EU KCC CC Rail ail Su Summit

2020 U.S. Economic &amp; Financial Market Outlook Providence Business News 2020 Economic Trends

Global demographic trends and social security reform Orazio Attanasio, University College London

Inheritances and the Distribution of Wealth European Investment Bank, Luxembourg Edward N.

D YNAMICAL S YSTEMS 2 I NSTRUCTOR : G IANNI A. D I C ARO V ECTOR F IELDS AND O RBITS

CSE 452 Distributed Systems Arvind Krishnamurthy Distributed Systems How to make a set of

Compiling Techniques Lecture 2: The view from 35000 feet Christophe Dubach 18 September 2018

A Lightweight Library for Building Scalable T ools Emily R. Jacobson , Michael J. Brim, Barton

Prism: A Proxy Architecture for Datacenter Networks Yutaro Hayakawa (Keio University) Lars

MANILA WINE COURSES Lear Learn Fren ench h Wine ine & & Wine ine Tas asting ting

S-M-XL=MS LEARNING FROM BIG RET AIL P RO P E RT Y & R E TA I L S T R AT E G I E S F O R

2020 U.S. Economic & Financial Market Outlook Providence Business News 2020 Economic Trends