1 Wednesday, October 6, 2010 Image credit: - - PowerPoint PPT Presentation

1
SMART_READER_LITE
LIVE PREVIEW

1 Wednesday, October 6, 2010 Image credit: - - PowerPoint PPT Presentation

1 Wednesday, October 6, 2010 Image credit: http://browsertoolkit.com/fault-tolerance.png 2 Wednesday, October 6, 2010 Image credit: http://browsertoolkit.com/fault-tolerance.png 3 Wednesday, October 6, 2010 Image credit:


slide-1
SLIDE 1 1 Wednesday, October 6, 2010
slide-2
SLIDE 2 2 Image credit: http://browsertoolkit.com/fault-tolerance.png Wednesday, October 6, 2010
slide-3
SLIDE 3 3 Image credit: http://browsertoolkit.com/fault-tolerance.png Wednesday, October 6, 2010
slide-4
SLIDE 4 4 Image credit: http://browsertoolkit.com/fault-tolerance.png Wednesday, October 6, 2010
slide-5
SLIDE 5

NOSQL

  • an overview -

goto; con 2010

Emil Eifrem

CEO, Neo Technology @emileifrem emil@neotechnology.com

Wednesday, October 6, 2010
slide-6
SLIDE 6

So what’s the plan?

๏Why NOSQL? ๏The NOSQL landscape ๏NOSQL challenges ๏Conclusion

6 Wednesday, October 6, 2010
slide-7
SLIDE 7

First off: the name

7

๏WE ALL HATES IT, M’KAY?

Wednesday, October 6, 2010
slide-8
SLIDE 8

NOSQL is NOT...

8 Wednesday, October 6, 2010
slide-9
SLIDE 9

NOSQL is NOT...

๏ NO to SQL

8 Wednesday, October 6, 2010
slide-10
SLIDE 10

NOSQL is NOT...

๏ NO to SQL ๏ NEVER SQL

8 Wednesday, October 6, 2010
slide-11
SLIDE 11

Not Only SQL

9

NOSQL is simply

Wednesday, October 6, 2010
slide-12
SLIDE 12 Wednesday, October 6, 2010
slide-13
SLIDE 13 11

Four trends

NOSQL - Why now?

Wednesday, October 6, 2010
slide-14
SLIDE 14

Trend 1: data set size

Source: IDC 2007

2007

40

Wednesday, October 6, 2010
slide-15
SLIDE 15

2007

40

2010

988

Source: IDC 2007

Trend 1: data set size

Wednesday, October 6, 2010
slide-16
SLIDE 16

Trend 2: Connectedness

14

1990 Information connectivity 2000 2010 2020

web 1.0 web 2.0 “web 3.0” Wednesday, October 6, 2010
slide-17
SLIDE 17

Trend 2: Connectedness

15 Text documents

1990 Information connectivity 2000 2010 2020

web 1.0 web 2.0 “web 3.0” Wednesday, October 6, 2010
slide-18
SLIDE 18

Trend 2: Connectedness

16 Text documents

1990 Information connectivity

Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0” Wednesday, October 6, 2010
slide-19
SLIDE 19

Trend 2: Connectedness

17 Text documents

1990 Information connectivity

Folksonomies Tagging User-generated content Wikis RSS Blogs Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0” Wednesday, October 6, 2010
slide-20
SLIDE 20

Trend 2: Connectedness

18 Text documents

1990 Information connectivity

Folksonomies Tagging User-generated content Wikis RSS Blogs Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0” Ontologies RDF Giant Global Graph (GGG) Wednesday, October 6, 2010
slide-21
SLIDE 21

Trend 2: Connectedness

19 Text documents

1990 Information connectivity

Folksonomies Tagging User-generated content Wikis RSS Blogs Hypertext

2000 2010 2020

web 1.0 web 2.0 “web 3.0” Ontologies RDF Giant Global Graph (GGG) Wednesday, October 6, 2010
slide-22
SLIDE 22

Trend 3: Semi-structure

20

๏Individualization of content

  • In the salary lists of the 1970s, all elements had exactly one job
  • In the salary lists of the 2000s, we need 5 job columns! Or 8?

Or 15?

๏All encompassing “entire world views”

  • Store more data about each entity

๏Trend accelerated by the decentralization of content generation

that is the hallmark of the age of participation (“web 2.0”)

Wednesday, October 6, 2010
slide-23
SLIDE 23

Aside: RDBMS performance

21

Data complexity Performance

Relational database Requirement of application Wednesday, October 6, 2010
slide-24
SLIDE 24

Aside: RDBMS performance

22

Data complexity Performance

Relational database Requirement of application Wednesday, October 6, 2010
slide-25
SLIDE 25

Aside: RDBMS performance

23

Data complexity Performance

Salary List Relational database Requirement of application Wednesday, October 6, 2010
slide-26
SLIDE 26

Aside: RDBMS performance

24

Data complexity Performance

Majority of Webapps Salary List Relational database Requirement of application Wednesday, October 6, 2010
slide-27
SLIDE 27

Aside: RDBMS performance

25

Data complexity Performance

Majority of Webapps Social network Semantic Trading Salary List

}

custom

Relational database Requirement of application Wednesday, October 6, 2010
slide-28
SLIDE 28

Trend 4: Architecture

26

DB Application 1980s: Application (<-- note lack of s)

Wednesday, October 6, 2010
slide-29
SLIDE 29

Trend 4: Architecture

27

DB Application 1990s: Database as integration hub Application Application

Wednesday, October 6, 2010
slide-30
SLIDE 30

DB DB DB

Trend 4: Architecture

28

Service 2000s: (moving towards) Decoupled services

with their own backend

Service Service

Wednesday, October 6, 2010
slide-31
SLIDE 31

Why NOSQL Now?

๏Trend 1: Size ๏Trend 2: Connectedness ๏Trend 3: Semi-structure ๏Trend 4: Architecture

29 Wednesday, October 6, 2010
slide-32
SLIDE 32

Four NOSQL categories

30 Wednesday, October 6, 2010
slide-33
SLIDE 33

Category 1: Key-Value stores

31

๏Lineage:

  • “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)

๏Data model:

  • Global key-value mapping
  • Think: Globally available HashMap/Dict/etc

๏Examples:

  • Project

Voldemort

  • Tokyo {Cabinet, Tyrant, etc}
Wednesday, October 6, 2010
slide-34
SLIDE 34

Category 1: Key-Value stores

32

๏Strengths

  • Simple data model
  • Great at scaling out horizontally

๏Weaknesses:

  • Simplistic data model
  • Poor for complex data
Wednesday, October 6, 2010
slide-35
SLIDE 35

Category II: ColumnFamily (BigTable) stores

33

๏Lineage:

  • “Bigtable: A Distributed Storage System for Structured

Data” (2006)

๏Data model:

  • A big table, with column families

๏Examples:

  • HBase
  • HyperTable
  • Cassandra
Wednesday, October 6, 2010
slide-36
SLIDE 36

Category III: Document databases

34

๏Lineage:

  • Lotus Notes

๏Data model:

  • Collections of documents
  • A document is a key-value collection

๏Examples:

  • CouchDB
  • MongoDB
Wednesday, October 6, 2010
slide-37
SLIDE 37

Document db: An example

35

๏How would we model a blogging software? ๏One stab:

  • Represent each Blog as a Collection of Post documents
  • Represent Comments as nested documents in the Post

documents

Wednesday, October 6, 2010
slide-38
SLIDE 38

Document db: Creating a blog post

36

import com.mongodb.Mongo; import com.mongodb.DB; import com.mongodb.DBCollection; import com.mongodb.BasicDBObject; import com.mongodb.DBObject; // ... Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB // ... DB blogs = mongo.getDB( "blogs" ); // Access the blogs database DBCollection myBlog = blogs.getCollection( "Thobe’s blog" ); DBObject blogPost = new BasicDBObject(); blogPost.put( "title", "JAOO^H^H^H^HGoto; con 2010" ); blogPost.put( "pub_date", new Date() ); blogPost.put( "body", "Publishing a post about JA...Goto con in my MongoDB blog!" ); blogPost.put( "tags", Arrays.asList( "conference", "names" ) ); blogPost.put( "comments", new ArrayList() ); myBlog.insert( blogPost );

Wednesday, October 6, 2010
slide-39
SLIDE 39

Retrieving posts

// ... import com.mongodb.DBCursor; // ... public Object getAllPosts( String blogName ) { DBCollection blog = db.getCollection( blogName ); return renderPosts( blog.find() ); } private Object renderPosts( DBCursor cursor ) { // order by publication date (descending) cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) ); // ... }

37 Wednesday, October 6, 2010
slide-40
SLIDE 40

Category IV: Graph databases

38

๏Lineage:

  • Euler and graph theory

๏Data model:

  • Nodes with properties
  • Typed relationships with properties

๏Examples:

  • Sones GraphDB
  • InfiniteGraph
  • Neo4j
Wednesday, October 6, 2010
slide-41
SLIDE 41

Property Graph model

39 Wednesday, October 6, 2010
slide-42
SLIDE 42

Property Graph model

40

LIVES WITH LOVES OWNS DRIVES LOVES

Wednesday, October 6, 2010
slide-43
SLIDE 43

Property Graph model

41

LIVES WITH LOVES OWNS DRIVES LOVES name: “James” age: 32 twitter: “@spam” name: “Mary” age: 35 brand: “Volvo” model: “V70” property type: “car”

Wednesday, October 6, 2010
slide-44
SLIDE 44

Graphs are whiteboard friendly

42 Image credits: Tobias Ivarsson An application domain model
  • utlined on a whiteboard or piece
  • f paper would be translated to
an ER
  • diagram, then normalized
to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Wednesday, October 6, 2010
slide-45
SLIDE 45

Graphs are whiteboard friendly

43

thobe Wardrobe Strength Joe project blog Hello Joe Neo4j performance analysis Modularizing Jython

Image credits: Tobias Ivarsson An application domain model
  • utlined on a whiteboard or piece
  • f paper would be translated to
an ER
  • diagram, then normalized
to fit a Relational Database. With a Graph Database the model from the whiteboard is implemented directly. Wednesday, October 6, 2010
slide-46
SLIDE 46

Graph db: Creating a social graph

44

GraphDatabaseService graphDb = new EmbeddedGraphDatabase( GRAPH_STORAGE_LOCATION ); Transaction tx = graphDb.beginTx(); try { Node mrAnderson = graphDb.createNode(); mrAnderson.setProperty( "name", "Thomas Anderson" ); mrAnderson.setProperty( "age", 29 ); Node morpheus = graphDb.createNode(); morpheus.setProperty( "name", "Morpheus" ); morpheus.setProperty( "rank", "Captain" ); Relationship friendship = mrAnderson.createRelationshipTo( morpheus, SocialGraphTypes.FRIENDSHIP ); tx.success(); } finally { tx.finish(); }

Wednesday, October 6, 2010
slide-47
SLIDE 47

Graph db: How do I know this person?

Node me = ... Node you = ... PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ), /* maximum depth: */ 4 ); Path shortestPath = shortestPathFinder.findSinglePath(me, you); for ( Node friend : shortestPath.nodes() ) { System.out.println( friend.getProperty( "name" ) ); }

45 Wednesday, October 6, 2010
slide-48
SLIDE 48

Graph db: Recommend new friends

Node person = ... TraversalDescription friendsOfFriends = Traversal.description() .expand( Traversals.expanderForTypes( SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) ) .prune( Traversal.pruneAfterDepth( 2 ) ) .breadthFirst() // Visit my friends before their friends. //Visit a node at most once (don’t recommend direct friends) .uniqueness( Uniqueness.NODE_GLOBAL ) .filter( new Predicate<Path>() { // Only return friends of friends public boolean accept( Path traversalPos ) { return traversalPos.length() == 2; } } ); for ( Node recommendation : friendsOfFriends.traverse( person ).nodes() ) { System.out.println( recommendedFriend.getProperty("name") ); }

46 Wednesday, October 6, 2010
slide-49
SLIDE 49

Four emerging NOSQL categories

๏Key-Value stores ๏ColumnFamiy stores ๏Document databases ๏Graph databases

47 Wednesday, October 6, 2010
slide-50
SLIDE 50

Scaling to size vs. Scaling to complexity

48

Size Complexity

Key/Value stores Wednesday, October 6, 2010
slide-51
SLIDE 51

Scaling to size vs. Scaling to complexity

48

Size Complexity

Key/Value stores ColumnFamily stores Wednesday, October 6, 2010
slide-52
SLIDE 52

Scaling to size vs. Scaling to complexity

48

Size Complexity

Key/Value stores ColumnFamily stores Document databases Wednesday, October 6, 2010
slide-53
SLIDE 53

Scaling to size vs. Scaling to complexity

48

Size Complexity

Key/Value stores ColumnFamily stores Document databases Graph databases Wednesday, October 6, 2010
slide-54
SLIDE 54

Scaling to size vs. Scaling to complexity

48

Size Complexity

Key/Value stores ColumnFamily stores Document databases Graph databases

My subjective view: > 90% of use cases

Billions of nodes and relationships Wednesday, October 6, 2010
slide-55
SLIDE 55

NOSQL challenges?

49 Wednesday, October 6, 2010
slide-56
SLIDE 56

NOSQL challenges?

๏Mindshare

  • But that’s also product usability (“how do you query it?”)
49 Wednesday, October 6, 2010
slide-57
SLIDE 57

NOSQL challenges?

๏Mindshare

  • But that’s also product usability (“how do you query it?”)

๏Tool support

  • Both devtime tools and runtime ops tools
  • Standards may help?
  • ... or maybe just time
49 Wednesday, October 6, 2010
slide-58
SLIDE 58

NOSQL challenges?

๏Mindshare

  • But that’s also product usability (“how do you query it?”)

๏Tool support

  • Both devtime tools and runtime ops tools
  • Standards may help?
  • ... or maybe just time

๏Middleware support

49 Wednesday, October 6, 2010
slide-59
SLIDE 59

Middleware support?

๏Let me tell you the story about Mike

50 Wednesday, October 6, 2010
slide-60
SLIDE 60

Step 1: Buildsing a web site

51

MySQL PHP n stufg One box

Wednesday, October 6, 2010
slide-61
SLIDE 61

Step II: Whoa, ppl are actually using it?

52 Wednesday, October 6, 2010
slide-62
SLIDE 62

Step II: Whoa, ppl are actually using it?

52

MySQL PHP n stufg Two boxes

Wednesday, October 6, 2010
slide-63
SLIDE 63

Step III: That’s a LOT of pages served...

53

MySQL PHP n stufg n boxes PHP n stufg PHP n stufg 1 box

Wednesday, October 6, 2010
slide-64
SLIDE 64

Step IV: Our DB is completely overwhelmed...

54

MySQL (m) PHP n stufg n boxes PHP n stufg PHP n stufg MySQL (s) n boxes

Wednesday, October 6, 2010
slide-65
SLIDE 65

Step V: Our DBs are STILL overwhelmed

?

55 Wednesday, October 6, 2010
slide-66
SLIDE 66

Step V: Our DBs are STILL overwhelmed

๏Turns out the problem is due to joins ๏A while back we introduced a new feature

  • Recommend restaurants based on the user’s friends (and friends
  • f friends)
  • It’s killing us with joins

๏What about sharding? ๏What about SSDs?

56 Wednesday, October 6, 2010
slide-67
SLIDE 67

Polyglot persistence (Not Only SQL)

๏Data sets are increasingly less uniform ๏Parts of Mike’s data fits well in an RDBMS ๏But parts of it is graph-shaped

  • If fits much better in a graph database like Neo4j!

๏But what does the code look like?

57 Wednesday, October 6, 2010
slide-68
SLIDE 68

An intervention!

There shall be code.

58 Wednesday, October 6, 2010
slide-69
SLIDE 69

Conclusion

๏There’s an explosion of ‘nosql’ databases out there

  • Some are immature and experimental
  • Some are coming out of years of battle-hardened production

๏NOSQL is about finding the right tool for the job

  • Frequently that’s an RDBMS
  • But increasingly commonly an RDBMS is the perfect fit

๏We will have heterogenous data backends in the future

  • Now the rest of the stack needs to step up and help developers

cope with that

59 Wednesday, October 6, 2010
slide-70
SLIDE 70

Not Only SQL

60

Key takeaway

Wednesday, October 6, 2010
slide-71
SLIDE 71

http://neotechnology.com

Wednesday, October 6, 2010