Apache Cassandra STL Java Users Group Cliff Gilmore DataStax - PowerPoint PPT Presentation

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer � Aug 14, 2014 1

Agenda • Cassandra Overview • Cassandra Architecture • Cassandra Query Language • Interacting with Cassandra using Java • About DataStax 2

CASSANDRA OVERVIEW 3

Who is using DataStax? Collections / Recommendation / Playlists Personalization Fraud detection Internet of Things / Sensor data Messaging 4

What is Apache Cassandra? Apache Cassandra™ is a massively scalable NoSQL database. • Continuous availability • High performing writes and reads • Linear scalability • Multi-data center support

The NoSQL Performance Leader “ In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.” � Source: Solving Big Data Challenges for Enterprise Application Performance Management benchmark paper presented at the Very Large Database Conference, 2013. Netflix Cloud Benchmark… End Point Independent NoSQL Benchmark Highest in throughput… Lowest in latency… Source: Netflix Tech Blog 6

Cassandra is Fault Tolerant Token Sale Order_id Qty Client 70 1001 10 100 44 1002 5 50 15 1003 30 200 10 Node failure or it goes 80 20 down temporarily Client 70 30 We could still retrieve the data from the other 2 60 40 nodes 50 Replication Factor = 3

Multi Data Center Support Client No interruption to the business 15 10 10 85 25 80 80 20 20 Client Data Center 75 35 70 30 70 Outage Occurs 30 65 60 60 45 40 40 55 50 50 East Data Center West Data Center

Writes in Cassandra Flush to Disk Memory Commit Log SSTables Clie nt Data is organized into Partitions � 1. Data is written to a Commit Log for a node (durability) � 2. Data is written to MemTable (in memory) � 3. MemTables are flushed to disk in an SSTable based on size. � � SSTables are immutable 9

Tunable Data Consistency Writes Reads • Any • One • One • Quorum • Quorum • Local_Quorum • Local_Quorum • Each_Quorum • Each_Quorum • All • All 10

Built for Modern Online Applications Architected for today’s needs • Linear scalability at lowest cost • 100% uptime • Operationally simple • 11

Cassandra Query Language 12

CQL - DevCenter A SQL-like query language for communicating with Cassandra � � DataStax DevCenter – a free, visual query tool for creating and running CQL statements against Cassandra and DataStax Enterprise. 13

CQL - Create Keyspace CREATE KEYSPACE demo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'EastCoast': 3, 'WestCoast': 2); � � � DC: WestCoast DC: EastCoast Node 1 Node 1 1 st copy 1 st copy Node 2 Node 2 Node 5 Node 5 2 nd copy 2 nd copy Node 3 Node 4 Node 4 Node 3 3 rd copy 14

CQL - Basics CREATE TABLE users ( username text, password text, create_date timestamp, PRIMARY KEY (username, create_date desc); � � INSERT INTO users (username, password, create_date) VALUES ('caroline', 'password1234', '2014-06-01 07:01:00'); � � SELECT * FROM users WHERE username = ‘caroline’ AND create_date = ‘2014-06-01 07:01:00’; � � Predicates On the partition key: = and IN On the cluster columns: <, <=, =, >=, >, IN 15

Collection Data Types CQL supports having columns that contain collections of data. � The collection types include: Set, List and Map. � � CREATE TABLE users ( � username text, � set_example set<text>, � list_example list<text>, � map_example map<int,text>, � PRIMARY KEY (username) � ); � Favor sets over list – better performance 16

  Plus much more … Light Weight Transactions   INSERT INTO customer_account (customerID, customer_email) VALUES (‘LauraS’, ‘lauras@gmail.com’) IF NOT EXISTS;   UPDATE customer_account SET customer_email=’laurass@gmail.com’   IF customer_email=’lauras@gmail.com’;   Counters   UPDATE UserActions SET total = total + 2   WHERE user = 123 AND action = ’xyz';   Time to live (TTL) INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘abe’, ‘lincoln’) USING TTL 3600;   Batch Statements BEGIN BATCH INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE userID = 'user2’ APPLY BATCH; 17

JAVA CODE EXAMPLES 18

DataStax Java Driver • Written for CQL 3.0 � • Uses the binary protocol introduced in   Cassandra 1.2 � • Uses Netty to provide an asynchronous architecture � • Can do asynchronous or synchronous queries � • Has connection pooling � • Has node discovery and load balancing � � � http://www.datastax.com/download 19

Add .JAR Files to Project Easiest way is to do this with Maven, which is a software project � management tool � 20

Add .JAR Files to Project In the pom.xml file, select the Dependencies tab � � Click the Add… button in the left column � � Enter the DataStax Java driver info � � � � � � � � 21

Connect & Write Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .build(); � � Session session = cluster.connect("demo"); � � session.execute( "INSERT INTO users (username, password) ” + "VALUES(‘caroline’, ‘password1234’)" ); � � � � � � Note: Cluster and Session objects should be long-lived and re-used 22

Read from Table ResultSet rs = session.execute("SELECT * FROM users"); � List<Row> rows = rs.all(); � for (Row row : rows) { String userName = row.getString("username"); String password = row.getString("password"); } � 23

  Asynchronous Read ResultSetFuture future = session.executeAsync(   "SELECT * FROM users"); � for (Row row : future.get()) { String userName = row.getString("username"); String password = row.getString("password"); } � � � � Note: The future returned implements Guava's ListenableFuture interface. This means you can use all Guava's Futures 1 methods! 1 http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/Futures.html 24

Read with Callbacks final ResultSetFuture future = session.executeAsync("SELECT * FROM users"); � future.addListener(new Runnable() { � � public void run() { for (Row row : future.get()) { String userName = row.getString("username"); String password = row.getString("password"); } }   }, executor); 25

Parallelize Calls int queryCount = 99; List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>(); for (int i=0; i<queryCount; i++) { futures.add( session.executeAsync("SELECT * FROM users " +"WHERE username = '"+i+"'")); } for(ResultSetFuture future : futures) { for (Row row : future.getUninterruptibly()) { //do something } } 26

Prepared Statements PreparedStatement statement = session.prepare( "INSERT INTO users (username, password) " + "VALUES (?, ?)"); � � BoundStatement bs = statement.bind(); � bs.setString("username", "caroline"); bs.setString("password", "password1234"); � session.execute(bs); 27

Query Builder Query query = QueryBuilder .select() .all() .from("demo", "users") .where(eq("username", "caroline")); � � ResultSet rs = session.execute(query); 28

Load Balancing Determine which node will next be contacted once a connection to a cluster has been established � � Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40","10.158.02.44") .withLoadBalancingPolicy( new DCAwareRoundRobinPolicy("DC1")) .build(); � � Name of the local DC Policies are: � • RoundRobinPolicy � • DCAwareRoundRobinPolicy (default) � • TokenAwarePolicy 29

RoundRobinPolicy • Not data-center aware � • Each subsequent request after initial connection to the cluster goes to the next node in the cluster � � � � � � � • If the node that is serving as the coordinator fails during a request, the next node is used 30

DCAwareRoundRobinPolicy • Is data center aware � • Does a round robin within the local data center � • Only goes to another   data center if there is   not a node available   to be coordinator in   the local data center 31

TokenAwarePolicy • Is aware of where the replicas for a given token live � • Instead of round robin, the client chooses the node that contains the primary replica to be the chosen coordinator � • Avoids unnecessary time taken to go to any node to have it serve as coordinator to then contact the nodes with the replicas � 32

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax - PowerPoint PPT Presentation

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer Aug 14, 2014 1 Agenda Cassandra Overview Cassandra Architecture Cassandra Query Language Interacting with Cassandra using Java

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5

On Cassandra's evolution Berlin Buzzwords (June 4th 2013) Sylvain Lebresne Apache Cassandra

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Cassandra for Big Data Applications Christof Roduner Java User Group Switzerland COO and

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

INTRODUCING: VIGILES LINUX SECURITY MONITORING TOOL INTRODUCTION AND OVERVIEW DECEMBER 2019

R-Speech.125 )Persuading the Arbitration Panel: Keys to Make the Winning Presentation at the

STATE MANDATE Georgia Gov. Brian Kemp issued an executive order instructing state employees to

Conducting and Analyzing Patent Searches Strategies for Validity, Patentability, Infringement, FTO

Distributed Systems Maciej opatka Facebook Inbox Search Authors Avinash Lakshman (one of

Outline Outline Motivation Data Stream Processing & I magine Stream Architecture

Poking the S in SD cards Nicolas Oberli Who am I ? Research team @KudelskiSec Focusing on

Bingdong Li , Jeff Springer , Mehmet Gunes , George Bebis University of Nevada Reno FloCon 2013

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax - PowerPoint PPT Presentation

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer Aug 14, 2014 1 Agenda Cassandra Overview Cassandra Architecture Cassandra Query Language Interacting with Cassandra using Java

SASI, Cassandra on the full text search ride DuyHai DOAN Apache Cassandra Evangelist 1 5

On Cassandra's evolution Berlin Buzzwords (June 4th 2013) Sylvain Lebresne Apache Cassandra

Sergey Beryozkin, T alend Sergey Beryozkin, T alend Apache CXF Apache CXF Practical JOSE

Apache Cassandra for Big Data Applications Christof Roduner Java User Group Switzerland COO and

Apache Felix Web Console Carsten Ziegeler | cziegeler@apache.org ApacheCon NA 2014 About

The Apache Way The Apache Way Nick Burch Nick Burch CTO, Quanticate CTO, Quanticate The

Apache Calcite for Enabling SQL Access to NoSQL Data Systems such as Apache Geode Christian

Data Processing at the Speed of 100 Gbps using Apache Crail Patrick Stuedi IBM Research Apache

Multi-tenant Machine Learning Apache Aurora &amp; Apache Mesos Stephan Erb

Stream Processing with Apache Apex Thomas Weise Apache Apex PMC Chair thw@apache.org @thweise

What's new with Apache Tika? What's new with Apache Tika? What's New with Apache Tika? What's

Apache Gearpump next-gen streaming engine Karol Brejna, Intel (karolbrejna@apache.org) Huafeng

Avoiding Vendor Lock-In Avoiding Vendor Lock-In Using Apache Libcloud Using Apache Libcloud

CSN09101 Networked Services Week 8: Essential Apache Week 8: Essential Apache Module Leader: Dr

Integrating Apache Camel with Apache Syncope Dr. Colm higeartaigh, Talend. Speaker

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC &amp; Apache Software Foundation

INTRODUCING: VIGILES LINUX SECURITY MONITORING TOOL INTRODUCTION AND OVERVIEW DECEMBER 2019

R-Speech.125 )Persuading the Arbitration Panel: Keys to Make the Winning Presentation at the

STATE MANDATE Georgia Gov. Brian Kemp issued an executive order instructing state employees to

Conducting and Analyzing Patent Searches Strategies for Validity, Patentability, Infringement, FTO

Distributed Systems Maciej opatka Facebook Inbox Search Authors Avinash Lakshman (one of

Outline Outline Motivation Data Stream Processing &amp; I magine Stream Architecture

Poking the S in SD cards Nicolas Oberli Who am I ? Research team @KudelskiSec Focusing on

Bingdong Li , Jeff Springer , Mehmet Gunes , George Bebis University of Nevada Reno FloCon 2013

Multi-tenant Machine Learning Apache Aurora & Apache Mesos Stephan Erb

Bug hunting with Apache Lucene Uwe Schindler Apache Lucene PMC & Apache Software Foundation

Outline Outline Motivation Data Stream Processing & I magine Stream Architecture