supercharging cassandra
play

Supercharging Cassandra... Tom Wilkie Founder & VP - PowerPoint PPT Presentation

Supercharging Cassandra... Tom Wilkie Founder & VP Engineering @tom_wilkie Before the Flood 1990 Small databases BTree indexes BTree File systems RAID Old hardware Two Revolutions 2010 Distributed, shared-nothing databases


  1. Supercharging Cassandra... Tom Wilkie Founder & VP Engineering @tom_wilkie

  2. Before the Flood 1990 Small databases BTree indexes BTree File systems RAID Old hardware

  3. Two Revolutions 2010 Distributed, shared-nothing databases Write-optimised indexes Write-optimised indexes BTree file systems BTree file systems ... RAID RAID New hardware New hardware

  4. Bridging the Gap 2011 Distributed, shared-nothing databases Castle Castle ... New hardware New hardware

  5. Big Data Applications Memcached Open API Management ... Deployment . . . . . . . . . Monitoring ... ... ... ... ... ... ... Acunu Storage Core ... ... Cross-Cluster Management UI

  6. 1. Predictability

  7. Small random inserts Inserting 3 billion rows Acunu powered Cassandra - ‘standard’ Cassandra -

  8. Insert latency While inserting 3 billion rows Acunu powered Cassandra x ‘standard’ Cassandra +

  9. Small random range queries Performed immediately after inserts Acunu powered Cassandra - ‘standard’ Cassandra -

  10. Performance summary Standard Acunu Benefits inserts rate ~32k/s ~45k/s >1.4x 95% latency ~32s ~0.3s >100x gets rate ~100/s ~350/s >3.5x 95% latency ~2s ~0.5s >4x >100x range queries ~0.4/s ~40/s >7.5x 95% latency ~15s ~2s

  11. Doubling Array Inserts 2 2 9 9 Buffer arrays in memory until we have > B of them

  12. Doubling Array Inserts 11 2 9 2 8 9 11 etc... 8 11 8 Similar to log-structured merge trees (LSM), cache- oblivious lookahead array (COLA), ...

  13. Demo https://acunu-videos.s3.amazonaws.com/dajs.html

  14. 8KB @ 100MB/s, w/ 8ms seek 100 / 5 = 100 IOs/s = 20 updates/s ~ log (2^30)/log 100 = 5 IOs/update Range Query Update (Size Z) O(log B N) O(Z/B) B-Tree random IOs random IOs O((log N)/B) O(Z/B) Doubling Array sequential IOs sequential IOs 13k / 0.2 8KB @ 100MB/s ~ log (2^30)/100 = 65k updates/s = 13k IOs/s = 0.2 IOs/update B = “block size”, say 8KB at 100 bytes/entry ~= 100 entries

  15. More Shared memory interface Castle keys Userspace Acunu Kernel userspace interface values http://goo.gl/wXNDQ In-kernel async, shared workloads memory ring shared buffers kernelspace interface Streaming interface range key buffered key buffered queries insert value insert get value get • Opensource (GPLv2, MIT Doubling Arrays doubling array mapping layer for user libraries) Bloom filters insert key queues get arrays x range arrays queries management key • http://bitbucket.org/acunu insert merges Arrays mapping layer modlist btree • Loadable Kernel Module, Version tree key btree insert http://goo.gl/gzihe key get btree targeting CentOS’s 2.6.18 range queries value arrays • http://www.acunu.com/ Cache block mapping & cacheing layer "Extent" layer prefetcher extent block cache extent blogs/andy-twigg/why- freespace allocator manager flusher & mapper page cache acunu-kernel/ linux's block & Linux Kernel MM layers Block layer Memory manager

  16. 2. Monitoring

  17. jQuery VisualVM

  18. mx4j: Rest-JMX adapter Munin, Nagios etc

  19. 3. Operations

  20. -bash-3.2$ nodetool ... Available commands: ring - Print informations on the token ring join - Join the ring info - Print node informations (uptime, load, ...) cfstats - Print statistics on column families version - Print cassandra version tpstats - Print usage statistics of thread pools drain - Drain the node (stop accepting writes and flush all column families) decommission - Decommission the node compactionstats - Print statistics on compactions disablegossip - Disable gossip (effectively marking the node dead) enablegossip - Reenable gossip disablethrift - Disable thrift server enablethrift - Reenable thrift server netstats [host] - Print network information on provided host (connecting node by default) move <new token> - Move node on the token ring to a new token removetoken status|force|<token> - Show status of current token removal, force completion of pending removal or remove providen token setcompactionthroughput <value_in_mb> - Set the MB/s throughput cap for compaction in the system, or 0 to disable throttling. snapshot [keyspaces...] -t [snapshotName] - Take a snapshot of the specified keyspaces using optional name snapshotName clearsnapshot [keyspaces...] -t [snapshotName] - Remove snapshots for the specified keyspaces. Either remove all snapshots or remove the snapshots with the given name. flush [keyspace] [cfnames] - Flush one or more column family repair [keyspace] [cfnames] - Repair one or more column family cleanup [keyspace] [cfnames] - Run cleanup on one or more column family compact [keyspace] [cfnames] - Force a (major) compaction on one or more column family scrub [keyspace] [cfnames] - Scrub (rebuild sstables for) one or more column family invalidatekeycache [keyspace] [cfnames] - Invalidate the key cache of one or more column family invalidaterowcache [keyspace] [cfnames] - Invalidate the key cache of one or more column family getcompactionthreshold <keyspace> <cfname> - Print min and max compaction thresholds for a given column family cfhistograms <keyspace> <cfname> - Print statistic histograms for a given column family setcachecapacity <keyspace> <cfname> <keycachecapacity> <rowcachecapacity> - Set the key and row cache capacities of a given column family setcompactionthreshold <keyspace> <cfname> <minthreshold> <maxthreshold> - Set the min and max compaction thresholds for a given column family

  21. * S T O H S P A N S * And clones!

  22. v0 v2 v1 v1 v5 v3 v3 v6 v4

  23. Rebuild

  24. Disk Layout: RDA random duplicate allocation 4 2 2 1 4 5 5 3 1 3 5 2 7 10 7 6 9 9 10 6 8 8 8 9 15 12 14 14 11 11 12 13 13 15 13 14 16 16

  25. Future

  26. Memcache + Cassandra get/insert get/put memcached Cass client 100k random inserts/sec! Cassandra memcache Cassandra memcache Castle Castle ... H/W H/W

  27. v1 v1 v1 v1 v12 v13 v15 v13 v12 v13 v15 v13 v12 v13 v15 v13 v12 v13 v15 v13 v24 v16 v24 v16 v24 v16 v24 v16

  28. ~device capacity Beware the “write cliff”...

  29. • Castle: Predictable Performance for Big Data • Monitoring: distributed, multi- master tools, give you aggregated and summarised view of your cluster • Snapshots & Clones: addressing real problems with new workloads • RDA: lightening fast rebuilds for massive disks

  30. Questions? Tom Wilkie @tom_wilkie tom@acunu.com http://bitbucket.org/acunu http://github.com/acunu http://www.acunu.com/download http://www.acunu.com/insights

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend