OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK - PowerPoint PPT Presentation

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK MYTH OF NOT BEING LISTENED TO?

AGENDA • Bio • Monitoring • Basic C* Data Model • Version issues and • Replication vs. tombstoning Quorum • Maintenance Tasks • Failure Recovery • The Dark Side • AWS Implications

QUICK BIO • Programming since 1981 • Four patents • 2010 JavaOne Rock Star and Duke’s Choice winner • Frequent contributor to Pragmatic Programmer magazine, SearchAws.com and LinkinPulse News • briantarbox.org, log4jfugue.org, BrianTarbox@gmail.com

C* DATA MODEL; COMPARISON • “ When all you have is a hammer, everything looks like a thumb. ” - Morgan • Relational (tabular) model: Postgresql • Relationship (graph) model: Neo4J • Document model: Mongo • Time-series model : C*

THE REAL DIFFERENCE BETWEEN SQL AND “NO-SQL” • In SQL we’re trained to design based on storing the data, ideally in 3rd Normal form. Queries are bolted on later. • In no-SQL we design based on the queries we’ll perform. “Table” structure falls out of that. • Queries should get top billing b/c if you just store the data who cares?

C* DATA MODEL, EXAMPLE • Wide rows; wide columns, heterogeneous columns • For example, a row per stock, with each column being all we know about that stock for that day. • Designed to be easy to “select” a row and then read thousands of columns sequentially • Not designed to randomly select specific columns

CQL, SLICE PREDICATES • In postgres you might say “select * from stock where ticker=“IBM” and price > 100” • You simply can not do that with C* • SQL uses indexes to speed up access to rows; indexes are very problematic in C* • Often the C* answer is denormalization

SLICE PREDICATES • Columns have names (e.g. “date”, but columns can also contain many (hundreds) of values. • Slice predicates let you specify which columns to select

CLIQUE, INC.: C* ANTIPATTERN • My last company folded, but not before providing a C* anti pattern • Collaboration software; many ad-hoc queries (who’s in what context, where was “x” said, etc) • We ended up with 14 copies of the main data, each in its own column-family. • Bad Dog.

REPLICATION VS. READ/WRITE LEVEL • Replication refers to how many distinct copies of the data there are • Read/Write Level refers to how many of the replicas must respond/agree before proceeding

THE WRITE PATH • Client picks C* node at random, it becomes the Coordinator, etc. (diagram), send to replica # of nodes, wait til ’n’ respond before returning

THE READ PATH • diagram (coordinator, send to all nodes with data, wait for ’n’ to respond) • Read Repair

FAILURE RECOVERY - WHERE C* REALLY SHINES • What happens when a node fails? • How many nodes can fail w/o data loss depends on # nodes and #replicas • Auto-recovery vs. backup and restore • With the usual caveats… C* recovery “ just works ”

RUNNING C* ON AWS • Scale out not up • More spindles is better • Log dir vs. data dir • Selecting the right instance type • You must run with NTP (not an AWS standard)

CONFIGURATION • The main C* config file is 700 lines long • You really need to deeply understand most of it. • cluster_name, listen_address, commitlog_directory, endpoint_snitch, seed_provider, compaction_throughput_mb_per_sec, concurrent_reads, snapshot_before_compaction, phi_convict_threshold, commitlog_sync, partitioner, key_cache_size_in_mb, row_cache_save_period, tombstone_warn_threshold, read_request_timeout_in_ms, cross_node_timeout, internode_compression, inter_dc_tcp_nodelay, dynamic_snitch_badness_threshold, dynamic_snitch_update_interval,   hinted_handoff_enabled, max_hints_delivery_threads,…..

MONITORING YOUR C* CLUSTER

VERSION ISSUES AND TOMBSTONES • Life is better if you never delete records • If you delete you can end up with tombstones • To deal with tombstones you need to run Repair… and that is a whole nasty can of worms.

MAINTENANCE TASKS • Full and minor compressions • snapshot your disks if using AWS/EBS

THE DARK SIDE, PART 1 • Datastax maintains three parallel release branches, with vastly different feature sets • New releases are always unstable; never accept an n.0, n.1 or n.2 release

THE DARK SIDE, PART 2 • C* uses schema-less design • Requires knowledge of slice predicates rather than SQL • DataStax decided to adopt schema and CQL to gain marketshare at the expense of their soul. • You can now pretend C* is relational (except no indexes and mostly no where clauses)

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK - PowerPoint PPT Presentation

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK MYTH OF NOT BEING LISTENED TO? AGENDA Bio Monitoring Basic C* Data Model Version issues and Replication vs. tombstoning Quorum Maintenance Tasks

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5

On Cassandra's evolution Berlin Buzzwords (June 4th 2013) Sylvain Lebresne Apache Cassandra

Cassandra and Apollo By Octavia, Baylee, and Tilah Cassandra was not an oracle.she could not see

Apache Cassandra for Big Data Applications Christof Roduner Java User Group Switzerland COO and

Balens 2017 CPD Event Legal Update Social Media Cassandra Dighton BSG Solicitors Social

Duy Hai DOAN @doanduyhai Who Am I ? Duy Hai DOAN Cassandra technical advocate talks, meetups,

and other platforms Sankalp Sah, Manish Singh MityLytics Inc Why ARM for Cassandra ? RISC

Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda 1. Motivation 2. Approaches

Lessons Learned with Cassandra & Spark_ Matthias Niehoff Apache: Big Data 2017

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Presented by Fiona Stewart, Cassandra ONeill & Monica Brinkerhoff Leadership for Change

Presented by Fiona Stewart, Cassandra ONeill & Monica Brinkerhoff Leadership for Change

Cassandra: Distributed Access Control Policies with Tunable Expressiveness Moritz Y. Becker and

Cassandra Offline Analytics Dongqian Liu, Yi Liu 2017/05/02 Agenda Introduction Use Case

Cassandra By Example: Data Modelling with CQL3 Berlin Buzzwords June 4, 2013 Eric Evans

Accelerating Digital Transformation to Drive Business Impact About Me Director Data

Privacy at Uber June 2019 Privacy is not a blocker of innovation. Its essential to it.

Sharing Information to Manage Risk USCIS /SEVP Joint Initiative Briefing December 18, 2015 What

Issues of Data Mining Kyle Borah OutLine Background Data Anonymization Encryption

Studies on Developers, Refactoring and Code Smells Aiko Yamashita CWI, Netherlands Oslo and

AntiPhish Project Presentation Brian Witten December 2006 December 18 th 2006 Agenda Agenda 1.

2.4 Cyber-Safety Protect Your Computer The Internet is a global network of networks with

Computer Classes Powerpoint Presentation PowerPoint 2010 is a presentation program in the

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK - PowerPoint PPT Presentation

OVERVIEW OF CASSANDRA WHY WOULD YOU NAME A DATABASE AFTER A GREEK MYTH OF NOT BEING LISTENED TO? AGENDA Bio Monitoring Basic C* Data Model Version issues and Replication vs. tombstoning Quorum Maintenance Tasks

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5

On Cassandra's evolution Berlin Buzzwords (June 4th 2013) Sylvain Lebresne Apache Cassandra

Cassandra and Apollo By Octavia, Baylee, and Tilah Cassandra was not an oracle.she could not see

Apache Cassandra for Big Data Applications Christof Roduner Java User Group Switzerland COO and

Balens 2017 CPD Event Legal Update Social Media Cassandra Dighton BSG Solicitors Social

Duy Hai DOAN @doanduyhai Who Am I ? Duy Hai DOAN Cassandra technical advocate talks, meetups,

and other platforms Sankalp Sah, Manish Singh MityLytics Inc Why ARM for Cassandra ? RISC

Cassandra on RocksDB Dikang Gu Software Engineer @ Facebook Agenda 1. Motivation 2. Approaches

Lessons Learned with Cassandra &amp; Spark_ Matthias Niehoff Apache: Big Data 2017

Day 4 Lab1: Docker container for Kafka - Spark streaming - Cassandra This Dockerfile sets up

Presented by Fiona Stewart, Cassandra ONeill &amp; Monica Brinkerhoff Leadership for Change

Presented by Fiona Stewart, Cassandra ONeill &amp; Monica Brinkerhoff Leadership for Change

Cassandra: Distributed Access Control Policies with Tunable Expressiveness Moritz Y. Becker and

Cassandra Offline Analytics Dongqian Liu, Yi Liu 2017/05/02 Agenda Introduction Use Case

Cassandra By Example: Data Modelling with CQL3 Berlin Buzzwords June 4, 2013 Eric Evans

Accelerating Digital Transformation to Drive Business Impact About Me Director Data

Privacy at Uber June 2019 Privacy is not a blocker of innovation. Its essential to it.

Sharing Information to Manage Risk USCIS /SEVP Joint Initiative Briefing December 18, 2015 What

Issues of Data Mining Kyle Borah OutLine Background Data Anonymization Encryption

Studies on Developers, Refactoring and Code Smells Aiko Yamashita CWI, Netherlands Oslo and

AntiPhish Project Presentation Brian Witten December 2006 December 18 th 2006 Agenda Agenda 1.

2.4 Cyber-Safety Protect Your Computer The Internet is a global network of networks with

Computer Classes Powerpoint Presentation PowerPoint 2010 is a presentation program in the

Lessons Learned with Cassandra & Spark_ Matthias Niehoff Apache: Big Data 2017

Presented by Fiona Stewart, Cassandra ONeill & Monica Brinkerhoff Leadership for Change

Presented by Fiona Stewart, Cassandra ONeill & Monica Brinkerhoff Leadership for Change