c3 cutting tail latency in cloud data stores via adaptive
play

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica - PowerPoint PPT Presentation

C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU Berlin) with Marco Canini (UCL), Stefan Schmid, Anja Feldmann (TU Berlin) Tail-latency matters Tens to Thousands One of data accesses User


  1. C3: Cutting Tail Latency in Cloud Data Stores via Adaptive Replica Selection Lalith Suresh (TU Berlin) with Marco Canini (UCL), Stefan Schmid, Anja Feldmann (TU Berlin)

  2. Tail-latency matters Tens to Thousands One of data accesses User Request 2

  3. Tail-latency matters Tens to Thousands One of data accesses User Request th percentile latency For 100 100 leaf servers, 99 99 th will reflect in 63% 63% of user requests! 3

  4. Server performance fluctuations are the norm CDF Resource Queueing Background Skewed access contention delays activities patterns 4

  5. Effectiveness of replica selection in reducing tail latency? ? Server Request Client Server Server 5

  6. Replica Selection Challenges 6

  7. Replica Selection Challenges • Service-time variations 4 ms Server Request 5 ms Server Client 30 ms Server 7

  8. Replica Selection Challenges • Herd behavior and load oscillations Request Server Client Request Server Client Request Server Client 8

  9. Impact of Replica Selection in Practice? Dy Dynami mic Sn Snitching Uses history of read latencies and I/O load for replica selection 9

  10. Experimental Setup • Cassandra cluster on Amazon EC2 • 15 nodes, m1.xlarge instances • Read-heavy workload with YCSB (120 threads) • 500M 1KB records (larger than memory) • Zipfian key access pattern 10

  11. Cassandra Load Profile 11

  12. Cassandra Load Profile Also observed that 99.9 th percentile latency ~ 10x median latency 12

  13. Load Conditioning in our Approach 13

  14. C3 Adaptive replica selection mechanism that is robust to service time heterogeinity 14

  15. C3 • Replica Ranking • Distributed Rate Control 15

  16. C3 • Replica Ranking • Distributed Rate Control 16

  17. µ -1 = 2 ms Client Server Client Server Client µ -1 = 6 ms 17

  18. µ -1 = 2 ms Client Server Client Server Client µ -1 = 6 ms Balance product of queue-size and service time · µ -1 } { q q · 18

  19. Server-side Feedback Servers piggyback {q s } } and { µν 𝒕 #𝟐 } } in every response Server Client #𝟐 } { q s , , µν 𝒕 19

  20. Server-side Feedback Servers piggyback {q s } } and { µν 𝒕 #𝟐 } } in every response • Concurrency compensation 20

  21. Server-side Feedback Servers piggyback {q s } } and { µν 𝒕 #𝟐 } } in every response • Concurrency compensation 𝑟 & ' = 1 + ¡𝑝𝑡 ' . 𝑥 + 𝑟 ' Outstanding requests Feedback 21

  22. Select server #𝟐 ? with min ¡𝑟 & ' ¡. µν 𝒕 22

  23. Select server Potentially long queue sizes • #𝟐 ? with min ¡𝑟 What if a GC pause happens? & ' ¡. µν 𝒕 • µ -1 = 4 ms 100 requests! Server 20 requests Server µ -1 = 20 ms 23

  24. Penalizing Long Queues b Select server with min ¡ 𝑟 #𝟐 & ' ¡. µν 𝒕 µ -1 = 4 ms 35 requests Server b = 3 20 requests Server µ -1 = 20 ms 24

  25. C3 • Replica Ranking • Distributed Rate Control 25

  26. Need for rate control Replica ranking insufficient • Avoid saturating individual servers? • Non-internal sources of performance fluctuations? 26

  27. Cubic c Rate Control • Clients adjust sending rates according to cubic function • If receive rate isn’t increasing further, multiplicatively decrease 27

  28. Putting everything together C3 Client Replica group 1000 ¡ Server scheduler req/s 2000 ¡ Server Sort replicas req/s by score Rate Limiters { Feedback } 28

  29. Implementation in Cassandra Details in the paper! 29

  30. Evaluation Amazon EC2 Controlled Testbed Simulations 30

  31. Evaluation Amazon EC2 15 node Cassandra cluster • M1.xlarge • Workloads generated using YCSB (120 threads) • Read-heavy, update-heavy, read-only • 500M 1KB records dataset (larger than memory) • Compare against Cassandra’s Dynamic Snitching (DS) • 31

  32. Lower is better 32

  33. 2x – 3x improved 99.9 percentile latencies Also improves median and mean latencies 33

  34. 2x – 3x improved 99.9 percentile latencies 26% - 43% improved throughput 34

  35. Takeaway: C3 does not tradeoff throughput for latency 35

  36. How does C3 react to dynamic workload changes? • Begin with 80 read-heavy workload generators • 40 update-heavy generators join the system after 640s • Observe latency profile with and without C3 36

  37. Latency profile degrades gracefully with C3 Takeaway: C3 reacts effectively to dynamic workloads 37

  38. Summary of other results Higher system load > > 3x 3x better 99.9 th Skewed record sizes percentile latency SSDs instead of HDDs 50 50% higher throughput than with DS 38

  39. Ongoing work • Tests at SoundCloud and Spotify • Stability analysis of C3 • Alternative rate adaptation algorithms • Token aware Cassandra clients 39

  40. ? Server Client Server Server Summary C3 Replica Ranking + Dist. Rate Control 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend