Hands-on Cassandra OSCON July 20, 2010 Eric Evans - - PowerPoint PPT Presentation

hands on cassandra
SMART_READER_LITE
LIVE PREVIEW

Hands-on Cassandra OSCON July 20, 2010 Eric Evans - - PowerPoint PPT Presentation

Hands-on Cassandra OSCON July 20, 2010 Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com Background 2 2 Influential Papers BigTable Strong consistency Sparse map data model GFS, Chubby, et al Dynamo


slide-1
SLIDE 1

Hands-on Cassandra

OSCON July 20, 2010

Eric Evans eevans@rackspace.com @jericevans http://blog.sym-link.com

slide-2
SLIDE 2

Background

2 2

slide-3
SLIDE 3

Influential Papers

  • BigTable
  • Strong consistency
  • Sparse map data model
  • GFS, Chubby, et al
  • Dynamo
  • O(1) distributed hash table (DHT)
  • BASE (aka eventual consistency)
  • Client tunable consistency/availability

3

slide-4
SLIDE 4

NoSQL

  • HBase
  • MongoDB
  • Riak
  • Voldemort
  • Neo4J
  • Cassandra
  • Hypertable
  • HyperGraphDB
  • Memcached
  • Tokyo Cabinet
  • Redis
  • CouchDB

4

slide-5
SLIDE 5

NoSQL Big data

  • HBase
  • MongoDB
  • Riak
  • Voldemort
  • Neo4J
  • Cassandra
  • Hypertable
  • HyperGraphDB
  • Memcached
  • Tokyo Cabinet
  • Redis
  • CouchDB

5

slide-6
SLIDE 6

Bigtable / Dynamo

Bigtable

  • HBase
  • Hypertable

Dynamo

  • Riak
  • Voldemort

Cassandra ??

6

slide-7
SLIDE 7

Dynamo-Bigtable Lovechild

7

slide-8
SLIDE 8

CAP Theorem “Pick Two”

  • CP
  • Bigtable
  • Hypertable
  • HBase
  • AP
  • Dynamo
  • Voldemort
  • Cassandra

8

slide-9
SLIDE 9

CAP Theorem “Pick Two”

  • Consistency
  • Availability
  • Partition Tolerance

9

slide-10
SLIDE 10

History

  • Facebook (2007-2008)
  • Avinash, former Dynamo engineer
  • Motivated by “Inbox Search”
  • Google Code (2008-2009)
  • Dark times
  • Apache (2009-Present)
  • Digg, Twitter, Rackspace, Others
  • Rapidly growing community
  • Fast-paced development

10

slide-11
SLIDE 11

Hands-on

Setup

11

slide-12
SLIDE 12

$ TUT_ROOT=$HOME $ cd $TUT_ROOT $ tar xfz apache-cassandra-xxxx-bin.tar.gz $ tar xfz twissandra-xxxx.tar.gz $ tar xfz pycassa-xxxx.tar.gz

“Installation”

12

slide-13
SLIDE 13

$ cp twissandra/cassandra.yaml \ apache-cassandra-xxxx/conf $ mkdir $TUT_ROOT/log $ mkdir -p $TUT_ROOT/lib/data $ mkdir -p $TUT_ROOT/lib/commitlog

Setup

13

slide-14
SLIDE 14

… # Where data is stored on disk data_file_directories:

  • TUT_ROOT/lib/data

… # Commit log commitlog_directory: TUT_ROOT/lib/commitlog …

Setup (continued)

conf/cassandra.yaml

14

slide-15
SLIDE 15

… log4j.rootLogger=DEBUG,stdout,R … log4j.appender.R.File=TUT_ROOT/log/system.log …

Setup (continued)

conf/log4j-server.properties

15

slide-16
SLIDE 16

$ cd $TUT_ROOT/apache-cassandra-xxxx $ bin/cassandra -f # In a new terminal $ cd $TUT_ROOT/apache-cassandra-xxxx $ bin/loadSchemaFromYAML localhost 8080

Starting up / Initializing

16

slide-17
SLIDE 17

$ cd $TUT_ROOT/pycassa $ sudo python setup.py -cassandra install \ [--prefix=/usr/local] … $ cd $TUT_ROOT/twissandra $ python manage.py runserver 0.0.0.0:8000

Pycassa / Twissandra

17 17

slide-18
SLIDE 18

Data Model

18

slide-19
SLIDE 19

Users

CREATE TABLE user ( id INTEGER PRIMARY KEY, username VARCHAR(64), password VARCHAR(64) );

19

slide-20
SLIDE 20

Friends and Followers

CREATE TABLE followers ( user INTEGER REFERENCES user(id), follower INTEGER REFERENCES user(id) ); CREATE TABLE following ( user INTEGER REFERENCES user(id), followee INTEGER REFERENCES user(id) );

20

slide-21
SLIDE 21

Tweets

CREATE TABLE tweets ( id INTEGER PRIMARY KEY, user INTEGER REFERENCES user(id), body VARCHAR(140), timestamp TIMESTAMP );

21

slide-22
SLIDE 22

Overview

  • Keyspace
  • Uppermost namespace
  • Typically one per application
  • ColumnFamily
  • Associates records of a similar kind
  • Record-level Atomicity
  • Indexed
  • Column
  • Basic unit of storage

22

slide-23
SLIDE 23

Sparse Table

23

slide-24
SLIDE 24

Column

  • name
  • byte[]
  • Queried against (predicates)
  • Determines sort order
  • value
  • byte[]
  • Opaque to Cassandra
  • timestamp
  • long
  • Conflict resolution (Last Write Wins)

24

slide-25
SLIDE 25

Column Comparators

  • Bytes
  • UTF8
  • TimeUUID
  • Long
  • LexicalUUID
  • Composite (third-party)

http://github.com/edanuff/CassandraCompositeType 25

slide-26
SLIDE 26

Column Families

  • User
  • Username
  • Friends
  • Followers
  • Tweet
  • Timeline
  • Userline

26

slide-27
SLIDE 27

User / Username

  • User
  • Stores users
  • Keyed on a unique ID (UUID).
  • Columns for username and password
  • Username
  • Indexes User
  • Keyed on username
  • One column, the unique UUID for user

27

slide-28
SLIDE 28

Friends and Followers

  • Friends
  • Maps a user to the users they follow
  • Keyed on user ID
  • Columns for each user being followed
  • Followers
  • Maps a user to those following them
  • Keyed on username
  • Columns for each user following

28

slide-29
SLIDE 29

Tweets

  • Keyed on a unique identifier
  • Columns:
  • Unique identifier
  • User ID
  • Body of the tweet
  • timestamp

29

slide-30
SLIDE 30

Timeline / Userline

  • Timeline
  • Keyed on user ID
  • Columns that map timestamps to Tweet ID
  • The materialized view of Tweets for a user.
  • Userline
  • Keyed on user ID
  • Columns that map timestamps to Tweet ID
  • The collection of Tweets attributed to a user

30

slide-31
SLIDE 31

Pycassa

31

slide-32
SLIDE 32

Pycassa – Python Client API

  • connect() → Thrift proxy
  • cf = ColumnFamily(proxy, ksp, cfname)
  • cf.insert() → long
  • cf.get() → dict
  • cf.get_range() → dict

http://github.com/vomjom/pycassa 32

slide-33
SLIDE 33

Adding a User

cass.save_user()

username = 'jericevans' password = '**********' useruuid = str(uuid()) columns = { 'id': useruuid, 'username': username, 'password': password } USER.insert(useruuid, columns) USERNAME.insert(username, {'id': useruuid})

33

slide-34
SLIDE 34

Following a Friend

cass.add_friends()

FRIENDS.insert(userid, {friendid: time()}) FOLLOWERS.insert(friendid, {userid: time()})

34

slide-35
SLIDE 35

Tweeting

cass.save_tweet()

columns = { 'id': tweetid, 'user_id': useruuid, 'body': body, '_ts': timestamp } TWEET.insert(tweetid, columns) columns = {pack('>d', timestamp): tweetid} USERLINE.insert(useruuid, columns) TIMELINE.insert(useruuid, columns) for otheruuid in FOLLOWERS.get(useruuid, 5000): TIMELINE.insert(otheruuid, columns)

35

slide-36
SLIDE 36

Getting a Timeline

cass.get_timeline()

start = request.GET.get('start') limit = NUM_PER_PAGE timeline = TIMLINE.get( userid, column_start=start, column_count=limit, column_reversed=True ) tweets = TWEET.multiget(timeline.values())

36

slide-37
SLIDE 37

Hands-on

pycassaShell

37

slide-38
SLIDE 38

Retweet

38

slide-39
SLIDE 39

$ cd $TUT_ROOT/twissandra $ patch -p1 < ../django.patch $ patch -p1 < ../retweet.patch

Adding Retweet

39

slide-40
SLIDE 40

Retweet

cass.save_retweet()

ts = _long(int(time() * 1e6)) for follower in get_follower_ids(userid): TIMELINE.insert(follower_id, {ts: tweet_id})

40

slide-41
SLIDE 41

Clustering

Concepts

41

slide-42
SLIDE 42

P2P Routing

42

slide-43
SLIDE 43

P2P Routing

43

slide-44
SLIDE 44

Partitioning

(see partitioner)

  • Random
  • 128bit namespace, (MD5)
  • Good distribution
  • Order Preserving
  • Tokens determine namespace
  • Natural order (lexicographical)
  • Range / cover queries
  • Yours ??

44

slide-45
SLIDE 45

Replica Placement

(see endpoint_snitch)

  • SimpleSnitch
  • Default
  • N-1 successive nodes
  • RackInferringSnitch
  • Infers DC/rack from IP
  • PropertyFileSnitch
  • Configured w/ a properties file

45

slide-46
SLIDE 46

Bootstrap

(see auto_bootstrap)

46

slide-47
SLIDE 47

Bootstrap

47

slide-48
SLIDE 48

Bootstrap

48

slide-49
SLIDE 49

Remember CAP?

  • Consistency
  • Availability
  • Partition Tolerance

49

slide-50
SLIDE 50

Choosing Consistency

Read Level Description ZERO N/A ANY N/A ONE 1 replica QUORUM (N / 2) +1 ALL All replicas Write Level Description ZERO Hail Mary ANY 1 replica (HH) ONE 1 replica QUORUM (N / 2) +1 ALL All replicas

R + W > N

50

slide-51
SLIDE 51

Quorum ((N/2) + 1)

51

slide-52
SLIDE 52

Quorum ((N/2) + 1)

52

slide-53
SLIDE 53

Operations

53

slide-54
SLIDE 54

Cluster sizing

  • Data size and throughput
  • Fault tolerance (replication)
  • Data-center / hosting costs

54

slide-55
SLIDE 55

Nodes

  • Go commodity!
  • Cores (more is better)
  • Memory (more is better)
  • Disks
  • Commitlog
  • Storage
  • double-up for working space

55

slide-56
SLIDE 56

Writes

56

slide-57
SLIDE 57

Reads

57

slide-58
SLIDE 58

Tuning (heap size)

bin/cassandra.in.sh

# Arguments to pass to the JVM JVM_OPTS=” \ …

  • Xmx1G \

58

slide-59
SLIDE 59

Tuning (memtable)

conf/cassandra.yaml

# Amount of data written memtable_throughput_in_mb: 64 # Number of objects written memtable_operations_in_millions: 0.3 # Time elapsed memtable_flush_after_mins: 60

59

slide-60
SLIDE 60

Tuning (column families)

conf/cassandra.yaml

keyspaces:

  • name: Twissandra

… column_families:

  • name: User

keys_cached: 100 preload_row_cache: true rows_cached: 1000 …

60

slide-61
SLIDE 61

Tuning (mmap)

conf/cassandra.yaml

# Choices are auto, standard, mmap, and # mmap_index_only. disk_access_mode: auto

61

slide-62
SLIDE 62

Nodetool

bin/nodetool –host <arg> command

  • ring
  • info
  • cfstats
  • tpstats

62

slide-63
SLIDE 63

Nodetool (cont.)

bin/nodetool –host <arg> command

  • compact
  • snapshot [name]
  • flush
  • drain
  • repair
  • decommission
  • move
  • loadbalance

63

slide-64
SLIDE 64

Clustertool

bin/clustertool –host <arg> command

  • get_endpoints <keyspace> <key>
  • global_snapshot [name]
  • clear_global_snapshot
  • truncate <keyspace> <cfname>

64

slide-65
SLIDE 65

Wrapping Up

65

slide-66
SLIDE 66

When Things Go Wrong

Where to look

  • Logs
  • ERRORs, stack traces
  • Enable DEBUG
  • Isolate if possible
  • Crash files (java_pid*.hprof, hs_err_pid*.log)
  • nodetool / jconsole / etc
  • Thread pool stats
  • Column family stats

66

slide-67
SLIDE 67

When Things Go Wrong

What to do

  • user@cassandra.apache.org
  • user-subscribe@cassandra.apache.org
  • http://www.mail-archive.com/user@cassandra.apache.org/
  • http://wiki.apache.org/cassandra
  • https://issues.apache.org/jira/browse/CASSANDRA
  • #cassandra on irc.freenode.net

67

slide-68
SLIDE 68

Further Reading

Bigtable: A Distributed Storage System for Structured Data

Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber

http://labs.google.com/papers/bigtable-osdi06.pdf

Dynamo: Amazon’s Highly Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall and Werner Vogels

http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

68

slide-69
SLIDE 69

Thanks

  • Apache Cassandra:
  • http://cassandra.apache.org
  • Twissandra: Eric Florenzano (and others)
  • http://github.com/ericflo/twissandra
  • Pycassa: Jonathan Hseu (and others)
  • http://github.com/vomjom/pycassa

69

slide-70
SLIDE 70

Fin