cassandra on rocksdb
play

Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda - PowerPoint PPT Presentation

Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2 3 Stories Direct Live Explore 4 5 Apache Cassandra Highly scalable partitioned data store


  1. Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook

  2. Agenda 1. Motivation 2. Approaches 3. Design 4. Performance metrics 2

  3. 3

  4. Stories Direct Live Explore 4

  5. 5

  6. Apache Cassandra • Highly scalable partitioned data store • High performance • High availability • Tunable consistency 6

  7. Cassandra at Instagram • Thousands of Cassandra servers • 5 DCs • 100+ product use cases • millions of requests per seconds 7

  8. Top Line Metrics • Reliability • 5-9s, requests failure rate < 0.001% • Performance • Write throughput • Read latency 8

  9. Read Latency

  10. Read Latency 60ms 25ms 5ms 10

  11. GC Stalls 2.5% 1% 11

  12. Where do we play 12

  13. Approach 1: GC Tuning 13

  14. Approach 1: GC Tuning Pros: Cons: • No code changes • Hard to tune for both latency and throughput • Highly depend on the work load • Max 20% P99 latency drop 14

  15. Approach 2: Off-heap Data Structures • Memtable • Caches • Indexes • Read/write path • Compaction • … 15

  16. Approach 2: Off-heap Data Structures Pros: Cons: • Incremental improvements • Play with Java unsafe • Easier to be accepted by • Highly depend on the work load community • Max 20% P99 latency drop 16

  17. Approach 3: C++ Storage Engines • Most memory consumed by storage engine • Memtable, compaction, read/write path, etc • Switch existing Java storage engine to C++ implementation • Pluggable storage engine 17

  18. Approach 3: C++ Storage Engines Pros: Cons: • Greatly reduce JVM overhead • Non-trivial effort to make storage engine to be pluggable • CPU efficiency • JNI efficiency • Long term benefit from pluggable storage engine 18

  19. C++ Storage Engine

  20. 20

  21. RocksDB • Embedded C++ key-value database • Optimized for Flash with extremely low latencies • Popular storage engine for Mysql, MongoDB, etc • Open source, Apache 2.0 license 21

  22. Prototype • Support single key-value case • Bypass C* own storage engine • No streaming support • Shadow one production use case 22

  23. Prototype Latency 35ms 15ms 2ms-5ms 23

  24. Prototype GC Stalls 1% 0.5% 0.1% 24

  25. RocksDB + Cassandra = Rocksandra

  26. Challenges 1. Cassandra data model 2. Streaming 26

  27. Design: Data Model 27

  28. Key Encoding 28

  29. Key Encoding 29

  30. Value Encoding 30

  31. Merge operator/Compaction filter 31

  32. Streaming 32

  33. Feature Milestone Current Features: Future Features: • Most of non-nested data types • Multi-partition query • Table data model • Nested data types • Point query • Counters • Range query • Range tombstone • Mutations • Materialized views • Timestamp • Secondary indexes • TTL • Repair • Deletions/Cell tombstones 33

  34. Performance metrics

  35. Cluster A • Similar P99 read/write latency • Footprint reduced to 1/3 35

  36. Cluster B • P99 read latency reduced 3X (60ms to 20ms) • Footprint reduced to 60% 36

  37. Cluster C • High write and large fanout read • P99 read latency reduced from 1s to 10ms • Same footprint 37

  38. Benchmark on AWS • C* cluster in one us-west-2a, replication factor 1. • 3 i3.8xlarge EC2 instance: 256GB memory, 32 core CPU, raid0 with 4 nvme flash disk • NDBench cluster, https://github.com/Netflix/ndbench, run from same AZ 38

  39. Benchmark Metrics 39

  40. Benchmark Metrics 40

  41. Benchmark Metrics 41

  42. Benchmark Metrics 42

  43. Recap Switch to Rocksandra helped us: • Cuts down tail latency • Improves throughput 43

  44. Try it! Don’t just believe what we said, download from github.com/instagram • Rocksandra code • Benchmark cloud formation template and scripts 44

  45. Future work • Support more Cassandra features • Cassandra pluggable storage engine 45

  46. Acknowledgement Thanks for all the support from Cassandra community and RocksDB community 46

  47. Thank You!

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend