overview of cassandra
play

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK - PowerPoint PPT Presentation

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK MYTH OF NOT BEING LISTENED TO? AGENDA Bio Monitoring Basic C* Data Model Version issues and Replication vs. tombstoning Quorum Maintenance Tasks


  1. OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK MYTH OF NOT BEING LISTENED TO?

  2. AGENDA • Bio • Monitoring • Basic C* Data Model • Version issues and • Replication vs. tombstoning Quorum • Maintenance Tasks • Failure Recovery • The Dark Side • AWS Implications

  3. QUICK BIO • Programming since 1981 • Four patents • 2010 JavaOne Rock Star and Duke’s Choice winner • Frequent contributor to Pragmatic Programmer magazine, SearchAws.com and LinkinPulse News • briantarbox.org, log4jfugue.org, BrianTarbox@gmail.com

  4. C* DATA MODEL; COMPARISON • “ When all you have is a hammer, everything looks like a thumb. ” - Morgan • Relational (tabular) model: Postgresql • Relationship (graph) model: Neo4J • Document model: Mongo • Time-series model : C*

  5. THE REAL DIFFERENCE BETWEEN SQL AND “NO-SQL” • In SQL we’re trained to design based on storing the data, ideally in 3rd Normal form. Queries are bolted on later. • In no-SQL we design based on the queries we’ll perform. “Table” structure falls out of that. • Queries should get top billing b/c if you just store the data who cares?

  6. C* DATA MODEL, EXAMPLE • Wide rows; wide columns, heterogeneous columns • For example, a row per stock, with each column being all we know about that stock for that day. • Designed to be easy to “select” a row and then read thousands of columns sequentially • Not designed to randomly select specific columns

  7. CQL, SLICE PREDICATES • In postgres you might say “select * from stock where ticker=“IBM” and price > 100” • You simply can not do that with C* • SQL uses indexes to speed up access to rows; indexes are very problematic in C* • Often the C* answer is denormalization

  8. SLICE PREDICATES • Columns have names (e.g. “date”, but columns can also contain many (hundreds) of values. • Slice predicates let you specify which columns to select

  9. CLIQUE, INC.: C* ANTIPATTERN • My last company folded, but not before providing a C* anti pattern • Collaboration software; many ad-hoc queries (who’s in what context, where was “x” said, etc) • We ended up with 14 copies of the main data, each in its own column-family. • Bad Dog.

  10. REPLICATION VS. READ/WRITE LEVEL • Replication refers to how many distinct copies of the data there are • Read/Write Level refers to how many of the replicas must respond/agree before proceeding

  11. THE WRITE PATH • Client picks C* node at random, it becomes the Coordinator, etc. (diagram), send to replica # of nodes, wait til ’n’ respond before returning

  12. THE READ PATH • diagram (coordinator, send to all nodes with data, wait for ’n’ to respond) • Read Repair

  13. FAILURE RECOVERY - WHERE C* REALLY SHINES • What happens when a node fails? • How many nodes can fail w/o data loss depends on # nodes and #replicas • Auto-recovery vs. backup and restore • With the usual caveats… C* recovery “ just works ”

  14. RUNNING C* ON AWS • Scale out not up • More spindles is better • Log dir vs. data dir • Selecting the right instance type • You must run with NTP (not an AWS standard)

  15. CONFIGURATION • The main C* config file is 700 lines long • You really need to deeply understand most of it. • cluster_name, listen_address, commitlog_directory, endpoint_snitch, seed_provider, compaction_throughput_mb_per_sec, concurrent_reads, snapshot_before_compaction, phi_convict_threshold, commitlog_sync, partitioner, key_cache_size_in_mb, row_cache_save_period, tombstone_warn_threshold, read_request_timeout_in_ms, cross_node_timeout, internode_compression, inter_dc_tcp_nodelay, dynamic_snitch_badness_threshold, dynamic_snitch_update_interval, 
 hinted_handoff_enabled, max_hints_delivery_threads,…..

  16. MONITORING YOUR C* CLUSTER

  17. VERSION ISSUES AND TOMBSTONES • Life is better if you never delete records • If you delete you can end up with tombstones • To deal with tombstones you need to run Repair… and that is a whole nasty can of worms.

  18. MAINTENANCE TASKS • Full and minor compressions • snapshot your disks if using AWS/EBS

  19. THE DARK SIDE, PART 1 • Datastax maintains three parallel release branches, with vastly different feature sets • New releases are always unstable; never accept an n.0, n.1 or n.2 release

  20. THE DARK SIDE, PART 2 • C* uses schema-less design • Requires knowledge of slice predicates rather than SQL • DataStax decided to adopt schema and CQL to gain marketshare at the expense of their soul. • You can now pretend C* is relational (except no indexes and mostly no where clauses)

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend