The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive - - PowerPoint PPT Presentation

the nosql ecosystem
SMART_READER_LITE
LIVE PREVIEW

The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive - - PowerPoint PPT Presentation

Jonathan Ellis @spyced jbellis@riptano.com The NoSQL Ecosystem 7-21-10 Wednesday, July 21, 2010 Executive summary NoSQL is about using the right tool for the job Wednesday, July 21, 2010 My bias Started working on Cassandra in 2009


slide-1
SLIDE 1

7-21-10

The NoSQL Ecosystem

Jonathan Ellis @spyced jbellis@riptano.com

Wednesday, July 21, 2010

slide-2
SLIDE 2

Executive summary

✤ NoSQL is about using the right tool for the job

Wednesday, July 21, 2010

slide-3
SLIDE 3

My bias

✤ Started working on Cassandra in 2009 after looking at the

alternatives

✤ Co-founded Riptano in April 2010

Wednesday, July 21, 2010

slide-4
SLIDE 4

NoSQL at OSCON

✤ Introduction to MongoDB ✤ Scaling Sourceforge with MongoDB ✤ Hadoop, Pig, and Twitter* ✤ (Plus the Neo4J and Cassandra tutorials Monday and

Tuesday)

Wednesday, July 21, 2010

slide-5
SLIDE 5

Why NoSQL? 1

✤ Relational databases don’t scale

Wednesday, July 21, 2010

slide-6
SLIDE 6

Wednesday, July 21, 2010

slide-7
SLIDE 7

Wednesday, July 21, 2010

slide-8
SLIDE 8

Wednesday, July 21, 2010

slide-9
SLIDE 9

Wednesday, July 21, 2010

slide-10
SLIDE 10

Wednesday, July 21, 2010

slide-11
SLIDE 11

Wednesday, July 21, 2010

slide-12
SLIDE 12

Wednesday, July 21, 2010

slide-13
SLIDE 13

Wednesday, July 21, 2010

slide-14
SLIDE 14

Wednesday, July 21, 2010

slide-15
SLIDE 15

Wednesday, July 21, 2010

slide-16
SLIDE 16

Wednesday, July 21, 2010

slide-17
SLIDE 17

Wednesday, July 21, 2010

slide-18
SLIDE 18

Wednesday, July 21, 2010

slide-19
SLIDE 19

Wednesday, July 21, 2010

slide-20
SLIDE 20

Wednesday, July 21, 2010

slide-21
SLIDE 21

(“The eBay Architecture,” Randy Shoup and Dan Pritchett)

Wednesday, July 21, 2010

slide-22
SLIDE 22

Wednesday, July 21, 2010

slide-23
SLIDE 23

Why NoSQL? 2

✤ The relational model maps poorly to some problems

✤ Sub-category: almost all NoSQL databases are schema-free or

schema-optional to some degree

Wednesday, July 21, 2010

slide-24
SLIDE 24

Wednesday, July 21, 2010

slide-25
SLIDE 25

Why NoSQL? 3

✤ Relational databases are slow

Wednesday, July 21, 2010

slide-26
SLIDE 26

Wednesday, July 21, 2010

slide-27
SLIDE 27

Myth 1

✤ “NoSQL is for people who don’t understand {SQL,

denormalization, query tuning, ...}”

✤ Similarly: “Only users of [database X] are turning to NoSQL

databases, because X sucks.”

Wednesday, July 21, 2010

slide-28
SLIDE 28

eBay: NoSQL pioneer

✤ “BASE is diametrically opposed to ACID. Where ACID is

pessimistic and forces consistency at the end of every

  • peration, BASE is optimistic and accepts that the

database consistency will be in a state of flux. Although this sounds impossible to cope with, in reality it is quite manageable and leads to levels of scalability that cannot be obtained with ACID.”

”BASE: An Acid Alternative,” Dan Pritchett, eBay

Wednesday, July 21, 2010

slide-29
SLIDE 29

Scale forces tradeoffs

Wednesday, July 21, 2010

slide-30
SLIDE 30

Myth 2

✤ “NoSQL is nothing new because we had key/value

databases like bdb years ago.”

Wednesday, July 21, 2010

slide-31
SLIDE 31

Myth 3

✤ “Only huge sites like Facebook and Twitter need to care

about scalability.”

Wednesday, July 21, 2010

slide-32
SLIDE 32

The downside to NoSQL-as-identifier

Wednesday, July 21, 2010

slide-33
SLIDE 33

Evaluating NoSQL databases

✤ Data model / query language ✤ Scalability / availability ✤ Persistence

Wednesday, July 21, 2010

slide-34
SLIDE 34

Data model

✤ Document

✤ CouchDB, MongoDB, Riak

✤ ColumnFamily

✤ Cassandra, HBase

✤ Graph

✤ Neo4j, AllegroGraph,

Objectivity InfiniteGraph

✤ Collections

✤ Redis

✤ Key/value

✤ bdb, bitcask, Memcached,

Tokyo Cabinet

Wednesday, July 21, 2010

slide-35
SLIDE 35

Document queries

✤ CouchDB

✤ js map/reduce creates [materialized] views that may be queried

✤ MongoDB

✤ b-tree indexes allow querying documents by field

✤ Riak

✤ link-walking or [runtime] js map/reduce

Wednesday, July 21, 2010

slide-36
SLIDE 36

ColumnFamily queries

SELECT * FROM tweets WHERE user_id IN (SELECT follower FROM followers WHERE user_id = ?)

followers ? tweets timeline ?

Wednesday, July 21, 2010

slide-37
SLIDE 37

Persistence

✤ Classic B-tree

✤ bdb, TC, MongoDB

✤ Append-only B-tree

✤ CouchDB

✤ On-disk linked lists

✤ Neo4J

✤ Pluggable

✤ Riak, Voldemort

✤ SSTable

✤ Cassandra, HBase

✤ Memory-only

✤ Memcached, VoltDB

✤ Memory w/checkpoint

✤ Membase, Redis

Wednesday, July 21, 2010

slide-38
SLIDE 38

Durable

✤ bdb ✤ Cassandra ✤ CouchDB ✤ Neo4J ✤ Riak*, Voldemort*

Wednesday, July 21, 2010

slide-39
SLIDE 39

Wednesday, July 21, 2010

slide-40
SLIDE 40

pathExists(a, b, 4) 1 000 2 000 ms 1 000 2 ms 1 000 000 2 ms

Wednesday, July 21, 2010

slide-41
SLIDE 41

Commitlog Memtable Writer Reader The Log-Structured Merge-Tree, Bigtable: A Distributed Storage System for Structured Data

Wednesday, July 21, 2010

slide-42
SLIDE 42

Scalability

✤ Master-driven vs distributed replicas

Wednesday, July 21, 2010

slide-43
SLIDE 43

Lock manager

Wednesday, July 21, 2010

slide-44
SLIDE 44

Wednesday, July 21, 2010

slide-45
SLIDE 45

Wednesday, July 21, 2010

slide-46
SLIDE 46

CAP

✤ Consistency ✤ Availability ✤ Partition tolerance

Wednesday, July 21, 2010

slide-47
SLIDE 47

A L T W F P Y

Key K

U

Multi-DC with distributed replicas

Wednesday, July 21, 2010

slide-48
SLIDE 48

CA

✤ Scalaris ✤ VoltDB

Wednesday, July 21, 2010

slide-49
SLIDE 49

Conclusion

✤ “If you’re deploying memcache on top of your database,

you’re inventing your own ad-hoc, difficult to maintain NoSQL data store”

Wednesday, July 21, 2010