EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield Vu Nguyen EVCache
EVCache: Lowering Costs for a Low Latency Cache with RocksDB Scott - - PowerPoint PPT Presentation
EVCache: Lowering Costs for a Low Latency Cache with RocksDB Scott Mansfield Vu Nguyen EVCache 90 seconds What do caches touch? Signing up* Searching* Logging in Viewing title details Choosing a profile Playing a title* Picking liked
EVCache: Lowering Costs for a Low Latency Cache with RocksDB
Scott Mansfield Vu Nguyen EVCache
90 seconds
What do caches touch?
Signing up* Logging in Choosing a profile Picking liked videos Personalization* Loading home page* Scrolling home page* A/B tests Video image selection Searching* Viewing title details Playing a title* Subtitle / language prefs Rating a title My List Video history* UI strings Video production*
* multiple caches involved
Home Page Request
Key-Value store optimized for AWS and tuned for Netflix use cases Ephemeral Volatile Cache
What is EVCache?
Distributed, sharded, replicated key-value store Tunable in-region and global replication Based on Memcached Resilient to failure Topology aware Linearly scalable Seamless deployments
Why Optimize for AWS
Instances disappear Zones fail Regions become unstable Network is lossy Customer requests bounce between regions Failures happen and we test all the time
EVCache Use @ Netflix
Hundreds of terabytes of data Trillions of ops / day Tens of billions of items stored Tens of millions of ops / sec Millions of replications / sec Thousands of servers Hundreds of instances per cluster Hundreds of microservice clients Tens of distinct clusters 3 regions 4 engineers
Architecture
Server
Memcached EVCar Application Client Library Client Eureka (Service Discovery)
Architecture
us-west-2a us-west-2c us-west-2b Client Client Client
Reading (get)
us-west-2a us-west-2c us-west-2b Client
Primary SecondaryWriting (set, delete, add, etc.)
us-west-2a us-west-2c us-west-2b Client Client Client
Use Case: Lookaside Cache
Application Client Library Client Ribbon Client S S S S C C C C
Data FlowUse Case: Transient Data Store
Application Client Library Client Application Client Library Client Application Client Library Client
TimeUse Case: Primary Store
Offline / Nearline Precomputes for Recommendations
Online Services Offline Services
Online Application Client Library Client
Data FlowOnline Services Offline Services
Use Case: Versioned Primary Store
Offline Compute Online Application Client Library Client
Data FlowArchaius (Dynamic Properties) Control System (Valhalla)
Use Case: High Volume && High Availability
Compute & Publish
Application Client Library In-memory Remote Ribbon Client
Optional
S S S S C C C C
Pipeline of Personalization
Compute A Compute B Compute C Compute D
Online Services Offline Services
Compute E
Data FlowOnline 1 Online 2
Additional Features
Kafka
Key Iteration
Additional Features (Kafka)
Global data replication Consistency metrics
Region B Region A APP APP Repl Proxy Repl Relay
1 mutate 2 send metadata 3 poll msg 5 h t t p s s e n d m s g 6 mutate 4 get data for setKafka Repl Relay Kafka Repl Proxy
Cross-Region Replication
7 readAdditional Features (Key Iteration)
Cache warming Lost instance recovery Backup (and restore)
Cache Warming
Cache Warmer (Spark) Application Client Library Client Control
S3
Data Flow Metadata Flow Control FlowMoneta
Next-generation EVCache server
Moneta
Moneta: The Goddess of Memory Juno Moneta: The Protectress of Funds for Juno
Old Server
Memcached EVCar
external
Optimization
○ Hot data is used often ○ Cold data is almost never touched
New Server
go get github.com/netflix/rend
Rend
Rend
○ Powerful concurrency primitives ○ Productive and fast
Rend
communication
Server Loop Request Orchestration Backend Handlers M E T R I C S Connection Management Protocol
Mnemonic
Mnemonic
○ Handles Memcached protocol
Rend Server Core Lib (Go) Mnemonic Op Handler (Go) Mnemonic Core (C++) RocksDB (C++)
Why RocksDB?
○ Disk--write load higher than read load (because of Memcached)
. . . SST: Static Sorted Table
How we use RocksDB
○ Generated too much traffic to SSD ○ High and unpredictable read latencies
○ Rely on Local Memcached
How we use RocksDB
○ SST’s ordered by time ○ Oldest SST deleted when full ○ Reads access every SST until record found
How we use RocksDB
○ Full Filter reduces unnecessary SSD reads
○ Minimize SSD access per request
How we use RocksDB
○ Reduces number of files checked to decrease latency
R...
Mnemonic Core
Key: ABC Key: XYZ R R RRegion-Locality Optimizations
○ Keeps Region-Local and “hot” data in memory ○ Separate Network Port for “off-line” requests ○ Memcached data “replaced”
FIFO Limitations
○ Very frequently updated records may push out valid records
AWS Instance Type
○ 4 vCPU ○ 30 GB RAM ○ 800 GB SSD ■ 32K IOPS (4KB Pages) ■ ~130MB/sec
Moneta Perf Benchmark (High-Vol Online Requests)
Moneta Perf Benchmark (cont)
Moneta Perf Benchmark (cont)
Moneta Perf Benchmark (cont)
Moneta Performance in Production (Batch Systems)
Moneta Performance in Prod (High Vol-Online Req)
Get Percentiles:
Moneta Performance in Prod (High Vol-Online Req)
Set Percentiles:
Latencies: peak (trough)
Reduction in cost*
Challenges/Concerns
○ Unclear of Overall Data Size because of duplicates and expired records ○ Restrict Unique Data Set to ½ of Max for Precompute Batch Data
○ Higher CPU usage ○ Planning must be better so we can handle unusually high request spikes
Current/Future Work
○ Less Data read/write from SSD during Level Compaction ○ Lower Latency, Higher Throughput ○ Better View of Total Data Size
○ Useful in “short” TTL use cases ○ May purge 60%+ SST earlier than FIFO Compaction ○ Reduce Worst Case Latency ○ Better Visibility of Overall Data Size
Open Source
https://github.com/netflix/EVCache https://github.com/netflix/rend
Thank You
smansfield@netflix.com (@sgmansfield) nguyenvu@netflix.com techblog.netflix.com
Failure Resilience in Client
Region A APP Consistency Checker
1 mutate 2 send metadata 3 poll msgKafka
Consistency Metrics
4 pull dataAtlas (Metrics Backend)
5 reportClient Dashboards
Lost Instance Recovery
Cache Warmer (Spark) Application Client Library Client
S3
Partial Data Flow Metadata Flow Control FlowControl Zone A Zone B
Data FlowBackup (and Restore)
Cache Warmer (Spark) Application Client Library Client Control
S3
Data Flow Control FlowMoneta in Production
○ One for standard users (read heavy or active data management) ○ Another for async and batch users: Replication and Precompute
○ Smartly replaces data in L1
external internal
EVCar Memcached (RAM) Mnemonic (SSD) Std Batch
Rend batching backend