t ail b ench a b enchmark s uite and
play

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L - PowerPoint PPT Presentation

T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu Executive Summary 2 Latency-critical applications have stringent


  1. T AIL B ENCH : A B ENCHMARK S UITE AND E VALUATION M ETHODOLOGY FOR L ATENCY - C RITICAL A PPLICATIONS H ARSHAD K ASTURE , D ANIEL S ANCHEZ IISWC 2016 tailbench.csail.mit.edu

  2. Executive Summary 2  Latency-critical applications have stringent performance requirements  low datacenter utilization  Wastes billions of dollars in energy and equipment annually  Research in this area hampered by the lack of a comprehensive benchmark suite  Few latency-critical applications  limited coverage  Complicated setup and configuration Inaccurate latency  Methodological issues measurements  TailBench makes latency-critical applications easy to analyze  Varied application domains and latency characteristics  Standardized, statistically sound methodology  Supports simplified load-testing configurations

  3. Outline 3  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  4. Understanding Latency-Critical Applications 4 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

  5. Understanding Latency-Critical Applications 5 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

  6. Understanding Latency-Critical Applications 6 Back End Back End Leaf Node Client Back End Client Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter

  7. Understanding Latency-Critical Applications 7 Back End Back End 1 ms Leaf Node Client Back End Client 1 ms Root Node Back End Client Leaf Node Back End Back End Leaf Node Datacenter  The few slowest responses determine user-perceived latency  Tail latency (e.g., 95 th / 99 th percentile), not mean latency, determines performance

  8. Latency Requirements Cause Low Utilization 8  End-to-end latency increases rapidly with load  Must keep utilization low to keep latency within reasonable bounds  Traditional resource management techniques (e.g., colocation) often cannot be used since they degrade latency  Low resource utilization wastes billions of dollars in energy and equipment  Sparked research in latency-critical systems

  9. Benchmark Suite Design Goals 9  Applications from a diverse set of domains Hell K V 你好 o  Applications with diverse tail latency characteristics 100 μ s 1 ms 10 ms 100 ms 1 s Live VM Migration LLC Warmup DVFS  Easy to set up and run  Support different measurement scenarios  Robust latency measurement methodology

  10. Outline 10  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  11. TailBench Applications 11 xapian masstree moses sphinx K V Hello 你好 Speech Statistical Machine Online Search Key-Value Store Recognition Translation shore silo specjbb img-dnn On-disk Database Image Recognition Java Middleware In-memory Database

  12. Wide Range of End-to-End Latencies 12 100 μ s 1 ms 10 ms 100 ms 1 s silo specjbb masstree shore xapian img-dnn moses sphinx

  13. Varied Service Time Characteristics 13  masstree service times are more tightly distributed  xapian service times are more loosely distributed

  14. End-to-End Latency vs. Load 14

  15. Tail ≠ Mean 15  Tail latency increases more rapidly with load than mean latency  Relationship between mean and tail latencies is hard to predict

  16. Impact of Parallelism 16

  17. Parallelism Helps Some Applications 17

  18. …But Hurts Others 18

  19. Outline 19  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  20. TailBench Harness 20  Measuring tail latency accurately is complicated  Load generation, statistics aggregation, warmup periods…  Harness encapsulates most of the complexity  Harness makes TailBench easily extensible  New benchmarks reuse existing harness functionality  Simplified harness configurations enable different measurement scenarios  Trade off some accuracy for reduced setup complexity

  21. Example: Open- vs. Closed-Loop Clients 21 Client Ω Network Ω Client Application  Many popular load testers use closed-loop clients  Clients wait for response before submitting next request  Increase in application load throttles client request rate  Latency-critical applications typically service a large number of independent clients  Request rate independent of application load  Better modeled by open-loop clients  Closed-loop clients can underestimate latency by orders of magnitude [Tene LLS 2013, Zhang ISCA 2016]

  22. Networked Harness Configuration 22 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector

  23. Networked Harness Configuration 23 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  24. Networked Harness Configuration 24 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  25. Networked Harness Configuration 25 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  26. Networked Harness Configuration 26 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Application and the clients run on separate machines  Traffic Shaper inserts inter-request delays to model load  Request Queue enqueues incoming requests and measures service times and queuing delays  Statistics Collector aggregates latency data

  27. Networked Harness Configuration 27 TCP/IP App Traffic Shaper Client Req. Queue Network Application Stats Collector TCP/IP … App TCP/IP Traffic Shaper Client Stats Collector  Faithfully captures all sources of overhead X Difficult to configure and deploy

  28. Outline 28  Background and Motivation  TailBench Applications  TailBench Harness  Simplified Configurations

  29. Loopback Harness Configuration 29 App Client TCP/IP TCP/IP Loopback Application Loopback App Client  Application and clients reside on the same machine  Reduced setup complexity  Highly accurate in many cases X Difficult to simulate

  30. Load-Latency for Networked Configuration 30

  31. Loopback Configuration Highly Accurate 31  Loopback and Networked configurations have near-identical performance  Networking delays minimal in our setup

  32. Loopback Harness Configuration 32 App Client TCP/IP TCP/IP Loopback Application Loopback App Client  Application and clients reside on the same machine  Reduced setup complexity  Highly accurate in many cases X Still difficult to simulate

  33. Integrated Harness Configuration 33 App Client Application Single Process  Application and client integrated into a single process  Easy to setup X Some loss of accuracy 

  34. Integrated Configuration Validation 34 39% 23%  Networked/Loopback configurations saturate earlier for applications with short requests (silo, specjbb)  TCP/IP processing overhead a significant fraction of request

  35. Integrated Harness Configuration 35 App Client Application Single Process  Application and client integrated into a single process  Easy to setup X Some loss of accuracy  Enables user-level simulations

  36. Simulation vs. Real System 36 16% 32% 20% 16% 31%  Performance difference between real and simulated systems well within usual simulation error bounds  Average absolute error in saturation QPS: 14%  zsim IPC error for SPEC CPU2006 applications: 8.5 – 21%

  37. Conclusions 37  TailBench includes a diverse set of latency-critical applications with varied latency characteristics  TailBench harness implements a statistically sound experimental methodology to achieve accurate results  Various harness configurations allow trading off configuration complexity for some accuracy  Our results show that the integrated configuration is highly accurate for six of our eight benchmarks

  38. T HANKS F OR Y OUR A TTENTION ! Q UESTIONS ? tailbench.csail.mit.edu

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend