Apache Cassandra STL Java Users Group Cliff Gilmore DataStax - - PowerPoint PPT Presentation

apache cassandra
SMART_READER_LITE
LIVE PREVIEW

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax - - PowerPoint PPT Presentation

Apache Cassandra STL Java Users Group Cliff Gilmore DataStax Solutions Architect / Engineer Aug 14, 2014 1 Agenda Cassandra Overview Cassandra Architecture Cassandra Query Language Interacting with Cassandra using Java


slide-1
SLIDE 1

Apache Cassandra

STL Java Users Group

Cliff Gilmore DataStax Solutions Architect / Engineer

  • Aug 14, 2014

1

slide-2
SLIDE 2

Agenda

2

  • Cassandra Overview
  • Cassandra Architecture
  • Cassandra Query Language
  • Interacting with Cassandra using Java
  • About DataStax
slide-3
SLIDE 3

CASSANDRA OVERVIEW

3

slide-4
SLIDE 4

Who is using DataStax?

4

Collections / Playlists Recommendation / Personalization Fraud detection Messaging Internet of Things / Sensor data

slide-5
SLIDE 5

What is Apache Cassandra?

Apache Cassandra™ is a massively scalable NoSQL database.

  • Continuous availability
  • High performing writes and reads
  • Linear scalability
  • Multi-data center support
slide-6
SLIDE 6

6

The NoSQL Performance Leader

Source: Netflix Tech Blog

Netflix Cloud Benchmark…

“In terms of scalability, there is a clear winner throughout our experiments. Cassandra achieves the highest throughput for the maximum number of nodes in all experiments with a linear increasing throughput.”

  • Source: Solving Big Data Challenges for Enterprise Application Performance Management benchmark paper presented at the Very Large Database Conference,

2013.

End Point Independent NoSQL Benchmark

Highest in throughput… Lowest in latency…

slide-7
SLIDE 7

10 50 30 70 80 40 20 60

Client

Client

Replication Factor = 3

We could still retrieve the data from the other 2 nodes

Token

Order_id Qty

Sale

70 1001 10 100 44 1002 5 50 15 1003 30 200

Node failure or it goes down temporarily

Cassandra is Fault Tolerant

slide-8
SLIDE 8

Client

10 50 30 70 80 40 20 60

Client

15 55 35 75 85 45 25 65

West Data Center East Data Center

10 50 30 70 80 40 20 60

Data Center Outage Occurs No interruption to the business

Multi Data Center Support

slide-9
SLIDE 9

9

Writes in Cassandra

Data is organized into Partitions

  • 1. Data is written to a Commit Log for a node (durability)
  • 2. Data is written to MemTable (in memory)
  • 3. MemTables are flushed to disk in an SSTable based on

size.

  • SSTables are immutable

Clie nt

Memory SSTables Commit Log Flush to Disk

slide-10
SLIDE 10

10

  • Any
  • One
  • Quorum
  • Local_Quorum
  • Each_Quorum
  • All

Writes

  • One
  • Quorum
  • Local_Quorum
  • Each_Quorum
  • All

Reads

Tunable Data Consistency

slide-11
SLIDE 11

11

Built for Modern Online Applications

  • Architected for today’s needs
  • Linear scalability at lowest cost
  • 100% uptime
  • Operationally simple
slide-12
SLIDE 12

Cassandra Query Language

12

slide-13
SLIDE 13

CQL - DevCenter

13

A SQL-like query language for communicating with Cassandra

  • DataStax DevCenter – a free, visual query tool for creating and running CQL

statements against Cassandra and DataStax Enterprise.

slide-14
SLIDE 14

CQL - Create Keyspace

14

CREATE KEYSPACE demo WITH REPLICATION = {'class' : 'NetworkTopologyStrategy', 'EastCoast': 3, 'WestCoast': 2);

  • Node 1

1st copy Node 4 Node 5 Node 2 2nd copy Node 3 Node 1 1st copy Node 4 Node 5 Node 2 2nd copy Node 3 3rd copy

DC: EastCoast DC: WestCoast

slide-15
SLIDE 15

CQL - Basics

15

CREATE TABLE users ( username text, password text, create_date timestamp, PRIMARY KEY (username, create_date desc);

  • INSERT INTO users (username, password, create_date)

VALUES ('caroline', 'password1234', '2014-06-01 07:01:00');

  • SELECT * FROM users WHERE username = ‘caroline’ AND

create_date = ‘2014-06-01 07:01:00’;

  • Predicates

On the partition key: = and IN On the cluster columns: <, <=, =, >=, >, IN

slide-16
SLIDE 16

Collection Data Types

16

CQL supports having columns that contain collections of data.

  • The collection types include:

Set, List and Map.

  • Favor sets over list – better performance

CREATE TABLE users ( username text, set_example set<text>, list_example list<text>, map_example map<int,text>, PRIMARY KEY (username) );

slide-17
SLIDE 17

Plus much more…

17

Light Weight Transactions
 INSERT INTO customer_account (customerID, customer_email) VALUES (‘LauraS’, ‘lauras@gmail.com’) IF NOT EXISTS;
 
 UPDATE customer_account SET customer_email=’laurass@gmail.com’
 IF customer_email=’lauras@gmail.com’;
 Counters
 UPDATE UserActions SET total = total + 2 
 WHERE user = 123 AND action = ’xyz';
 Time to live (TTL) INSERT INTO users (id, first, last) VALUES (‘abc123’, ‘abe’, ‘lincoln’) USING TTL 3600;
 Batch Statements BEGIN BATCH INSERT INTO users (userID, password, name) VALUES ('user2', 'ch@ngem3b', 'second user') UPDATE users SET password = 'ps22dhds' WHERE userID = 'user2' INSERT INTO users (userID, password) VALUES ('user3', 'ch@ngem3c') DELETE name FROM users WHERE userID = 'user2’ APPLY BATCH;

slide-18
SLIDE 18

JAVA CODE EXAMPLES

18

slide-19
SLIDE 19

DataStax Java Driver

19

  • Written for CQL 3.0
  • Uses the binary protocol introduced in 


Cassandra 1.2

  • Uses Netty to provide an asynchronous architecture
  • Can do asynchronous or synchronous queries
  • Has connection pooling
  • Has node discovery and load balancing
  • http://www.datastax.com/download
slide-20
SLIDE 20

Add .JAR Files to Project

20

Easiest way is to do this with Maven, which is a software project management tool

slide-21
SLIDE 21

Add .JAR Files to Project

21

In the pom.xml file, select the Dependencies tab

  • Click the Add… button in the left column
  • Enter the DataStax Java driver info
slide-22
SLIDE 22

Connect & Write

22

Cluster cluster = Cluster.builder() .addContactPoints("10.158.02.40", "10.158.02.44") .build();

  • Session session = cluster.connect("demo");
  • session.execute(

"INSERT INTO users (username, password) ” + "VALUES(‘caroline’, ‘password1234’)" );

  • Note: Cluster and Session objects should be long-lived and re-used
slide-23
SLIDE 23

Read from Table

23

ResultSet rs = session.execute("SELECT * FROM users");

  • List<Row> rows = rs.all();
  • for (Row row : rows) {

String userName = row.getString("username"); String password = row.getString("password"); }

slide-24
SLIDE 24

Asynchronous Read

24

ResultSetFuture future = session.executeAsync(
 "SELECT * FROM users");

  • for (Row row : future.get()) {

String userName = row.getString("username"); String password = row.getString("password"); }

  • Note: The future returned implements Guava's ListenableFuture interface. This

means you can use all Guava's Futures1 methods! 


1http://docs.guava-libraries.googlecode.com/git/javadoc/com/google/common/util/concurrent/Futures.html

slide-25
SLIDE 25

Read with Callbacks

25

final ResultSetFuture future = session.executeAsync("SELECT * FROM users");

  • future.addListener(new Runnable() {
  • public void run() {

for (Row row : future.get()) { String userName = row.getString("username"); String password = row.getString("password"); } }
 }, executor);

slide-26
SLIDE 26

Parallelize Calls

26

int queryCount = 99; List<ResultSetFuture> futures = new ArrayList<ResultSetFuture>(); for (int i=0; i<queryCount; i++) { futures.add( session.executeAsync("SELECT * FROM users " +"WHERE username = '"+i+"'")); } for(ResultSetFuture future : futures) { for (Row row : future.getUninterruptibly()) { //do something } }

slide-27
SLIDE 27

Prepared Statements

27

PreparedStatement statement = session.prepare( "INSERT INTO users (username, password) " + "VALUES (?, ?)");

  • BoundStatement bs = statement.bind();
  • bs.setString("username", "caroline");

bs.setString("password", "password1234");

  • session.execute(bs);
slide-28
SLIDE 28

Query Builder

28

Query query = QueryBuilder .select() .all() .from("demo", "users") .where(eq("username", "caroline"));

  • ResultSet rs = session.execute(query);
slide-29
SLIDE 29

Load Balancing

29

Determine which node will next be contacted once a connection to a cluster has been established

  • Cluster cluster = Cluster.builder()

.addContactPoints("10.158.02.40","10.158.02.44") .withLoadBalancingPolicy( new DCAwareRoundRobinPolicy("DC1")) .build();

  • Policies are:
  • RoundRobinPolicy
  • DCAwareRoundRobinPolicy (default)
  • TokenAwarePolicy

Name of the local DC

slide-30
SLIDE 30

RoundRobinPolicy

30

  • Not data-center aware
  • Each subsequent request after initial connection to the

cluster goes to the next node in the cluster

  • If the node that is serving as the coordinator fails during a

request, the next node is used

slide-31
SLIDE 31

DCAwareRoundRobinPolicy

31

  • Is data center aware
  • Does a round robin within the local data center
  • Only goes to another 


data center if there is 
 not a node available 
 to be coordinator in 
 the local data center

slide-32
SLIDE 32

TokenAwarePolicy

32

  • Is aware of where the replicas for a given token live
  • Instead of round robin, the client chooses the node that

contains the primary replica to be the chosen coordinator

  • Avoids unnecessary time taken to go to any node to have it

serve as coordinator to then contact the nodes with the replicas

slide-33
SLIDE 33

Additional Information & Support

33

  • Community Site
  • (http://planetcassandra.org)
  • Documentation


(http://www.datastax.com/docs)

  • Downloads


(http://www.datastax.com/download)

  • Getting Started 


(http://www.datastax.com/documentation/gettingstarted/index.html)

  • DataStax


(http://www.datastax.com)

slide-34
SLIDE 34

ABOUT DATASTAX

34

slide-35
SLIDE 35

About DataStax

35

Founded in April 2010

30

Percent

500+

Customers

Santa Clara, Austin, New York, London

300+

Employees

slide-36
SLIDE 36

Confidential

DataStax delivers
 Apache Cassandra to the Enterprise

36

Certified / Enterprise-ready Cassandra Visual Management & Monitoring Tools 24x7 Support & Training

slide-37
SLIDE 37

37

slide-38
SLIDE 38

DSE 4.5

38

slide-39
SLIDE 39

Thank You!

cgilmore@datastax.com