curb tail latency
play

CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on - PowerPoint PPT Presentation

IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN ABOUT ME 6 years at Twitter, on cache maintainer of Twemcache & Twitters Redis fork operations of thousands of machines hundreds of (internal) customers Now working on


  1. IN-MEMORY CACHING: CURB TAIL LATENCY WITH PELIKAN

  2. ABOUT ME • 6 years at Twitter, on cache • maintainer of Twemcache & Twitter’s Redis fork • operations of thousands of machines • hundreds of (internal) customers • Now working on Pelikan, a next-gen cache framework to replace the above @twitter • Twitter: @thinkingfish

  3. THE PROBLEM: CACHE PERFORMANCE

  4. CACHE RULES SERVICE EVERYTHING AROUND ME CACHE DB

  5. 😤 CACHE RUINS SERVICE EVERYTHING AROUND 😤 ME CACHE DB

  6. LATENCY & FANOUT req : all tweets for #qcon ⇒ • what determines overall 99%-ile of SERVICE req ? tid 1, tid 2, …, tid n (assume n is large) fanout percentile 1 p99 10 p99 9 100 p99 99 CACHE CACHE CACHE 1000 p99 999

  7. LATENCY & DEPENDENCY • what determines overall 99%-ile? SERVICE A • adding all latencies together get timeline get tweets SERVICE B • N steps ⇒ N x exposure to tail latency get users for each tweet SERVICE C

  8. CACHE IS UBIQUITOUS SERVICE A • Exposure of cache tail CACHE A CACHE A CACHE A latency increases with both SERVICE B scale and dependency ! CACHE B CACHE B CACHE B SERVICE C CACHE C CACHE C CACHE C

  9. GOOD CACHE PERFORMANCE = PREDICTABLE LATENCY

  10. GOOD CACHE PERFORMANCE = PREDICTABLE TAIL LATENCY

  11. KING OF PERFORMANCE “MILLIONS OF QPS PER MACHINE” “SUB-MILLISECOND LATENCIES” “NEAR LINE-RATE THROUGHPUT” …

  12. GHOSTS OF PERFORMANCE “ USUALLY PRETTY FAST” “HICCUPS EVERY ONCE IN A WHILE ” “TIMEOUT SPIKES AT THE TOP OF THE HOUR ” “SLOW ONLY WHEN MEMORY IS LOW” …

  13. I SPENT FIRST 3 MONTHS AT TWITTER LEARNING CACHE BASICS… …AND THE NEXT 5 YEARS CHASING GHOSTS

  14. CHAINING DOWN GHOSTS = MINIMIZE INDETERMINISTIC BEHAVIOR

  15. HOW? IDENTIFY AVOID MITIGATE

  16. A PRIMER: CACHING IN DATACENTER

  17. DATACENTER • geographically centralized • highly homogeneous network • relatively reliable infrastructure

  18. CACHING MAINLY: REQUEST → RESPONSE INITIALLY: CONNECT ALSO (BECAUSE WE ARE GROWN-UPS): STATS, LOGGING, HEALTH CHECK…

  19. CACHE SERVER: BIRD’S VIEW data protocol storage event-driven server OS HOST network infrastructure

  20. HOW DID WE UNCOVER THE UNCERTAINTIES ?

  21. “ BANDWIDTH UTILIZATION WENT WAY UP, EVEN THOUGH REQUEST RATE WAS WAY LOWER. ”

  22. SYSCALLS

  23. CONNECTING IS SYSCALL-HEAVY read 4+ syscalls accept config register event

  24. REQUEST IS SYSCALL-LIGHT read IO post- event (read) read 3 syscalls* parse process compose write IO post- event (write) write *: event loop returns multiple read events at once, I/O syscalls can be further amortized by batching/pipelining

  25. TWEMCACHE IS MOSTLY SYSCALLS • 1-2 µs overhead per call • dominate CPU time in simple cache • What if we have 100k conns / sec? source

  26. culprit: CONNECTION STORM

  27. “ …TWEMCACHE RANDOM HICCUPS, ALWAYS AT THE TOP OF THE HOUR. ”

  28. cache t worker ⏱ l o g g i n g DISK O / I cron job “ x”

  29. culprit: BLOCKING I/O

  30. “ WE ARE SEEING SEVERAL “BLIPS” AFTER EACH CACHE REBOOT… ”

  31. A TIMELINE MEMCACHE RESTART … lock! MANY REQUESTS TIMED OUT lock! CONNECTION STORM SOME MORE REQUESTS TIMED OUT (REPEAT A FEW TIMES)

  32. culprit: LOCKING

  33. LOCKING FACTS • ~25ns per operation • more expensive on NUMA • much more costly when contended source

  34. “ HOSTS WITH LONG RUNNING TWEMCACHE/REDIS TRIGGER OOM DURING LOAD SPIKES. ”

  35. “ REDIS INSTANCES THAT STARTED EVICTING SUDDENLY GOT SLOWER. ”

  36. culprit: MEMORY LAYOUT / OPS

  37. SUMMARY CONNECTION STORM BLOCKING I/O LOCKING MEMORY

  38. HOW TO MITIGATE?

  39. HIDE EXPENSIVE OPS PUT OPERATIONS OF DIFFERENT NATURE / PURPOSE ON SEPARATE THREADS

  40. DATA PLANE, CONTROL PLANE

  41. SLOW: CONTROL PLANE STATS AGGREGATION STATS EXPORTING LOG DUMP LOG ROTATION …

  42. FAST: DATA PLANE / REQUEST read IO post- event (read) read t worker : parse process compose write IO post- event (write) write

  43. FAST: DATA PLANE / CONNECT t server read accept config dispatch : event t worker read register : event

  44. LATENCY-ORIENTED THREADING t worker REQUESTS new logging, connection stats update t server t admin CONNECTS OTHER logging, stats update

  45. WHAT TO AVOID?

  46. LOCKING

  47. WHAT WE KNOW • inter-thread communication in cache t worker • stats new logging, • logging connection stats update • connection hand-off t server t admin • locking propagates blocking/delay logging, between threads stats update

  48. LOCKLESS OPERATIONS MAKE STATS UPDATE LOCKLESS w/ atomic instructions

  49. LOCKLESS OPERATIONS MAKE LOGGING LOCKLESS RING/CYCLIC BUFFER writer reader read write position position

  50. LOCKLESS OPERATIONS MAKE CONNECTION HAND-OFF LOCKLESS … … RING ARRAY writer reader read write position position

  51. MEMORY

  52. WHAT WE KNOW • alloc-free cause fragmentation • internal vs external fragmentation • OOM/swapping is deadly • memory alloc/copy relatively expensive source

  53. PREDICTABLE FOOTPRINT AVOID EXTERNAL FRAGMENTATION CAP ALL MEMORY RESOURCES

  54. PREDICTABLE RUNTIME REUSE BUFFER PREALLOCATE

  55. IMPLEMENTATION PELIKAN CACHE

  56. WHAT IS PELIKAN CACHE? process • (Datacenter-) Caching framework server cache data model parse/compose/trace orchestration • A summary of Twitter’s cache ops data store request response • Perf goal: deterministically fast streams events • Clean, modular design poo ling • Open-source channels buffers timer alarm common core pelikan.io waitless logging lockless metrics composed config threading

  57. PERFORMANCE DESIGN DECISIONS A COMPARISON latency-oriented Memory/ Memory/ Memory/ locking threading fragmentation buffer caching pre-allocation, cap partial internal partial partial yes Memcached no->partial external no partial no->yes Redis yes internal yes yes no Pelikan

  58. TO BE FAIR… MEMCACHED REDIS • multiple threads can boost throughput • rich set of data structures • binary protocol + SASL • RDB • master-slave replication • redis-cluster • modules • tools

  59. SCALABLE CACHE IS… ALWAYS FAST

  60. “ CAREFUL ABOUT MOVING TO MULTIPLE WORKER THREADS ”

  61. QUESTIONS?

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend