Scaling for Humongous amounts of data with MongoDB Alvin Richards - - PowerPoint PPT Presentation

scaling for humongous amounts of data with mongodb
SMART_READER_LITE
LIVE PREVIEW

Scaling for Humongous amounts of data with MongoDB Alvin Richards - - PowerPoint PPT Presentation

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com Getting from here to there... http://bit.ly/OT71M4 2 ...probably using one of these http://bit.ly/QDUIUF


slide-1
SLIDE 1

Scaling for Humongous amounts of data with MongoDB

Alvin Richards

Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

slide-2
SLIDE 2

2

http://bit.ly/OT71M4

Getting from here to there...

slide-3
SLIDE 3

3

...probably using one of these

http://bit.ly/QDUIUF

slide-4
SLIDE 4

4

Why NoSQL at all?

slide-5
SLIDE 5

5

5,000 10,000 15,000 20,000 25,000 30,000 1998 2000 2008 2012

Indexed Pages

Pages Index (Million)

http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD

Growth? Scaling? Cost? Flexibility?

slide-6
SLIDE 6

6

  • Build a database for scaleout

– Run on clusters of 100s of commodity machines

  • … that enables agile development
  • … and is usable for a broad variety of applications

Need a Database that...

slide-7
SLIDE 7

7

  • Partitioning of Data

– Hashes (Dynamo) vs Ranges (Big Table) – Physical vs Logical segments

  • Consistency

– Eventually

  • Multi Master updates, resolve conflicts later

– Immediately

  • Single Master updates, always consistent

Is Scaleout Mission Impossible?

slide-8
SLIDE 8

8

NoSQL and MongoDB

slide-9
SLIDE 9

9

depth of functionality scalability & performance

  • memcached
  • key/value
  • RDBMS

Tradeoff: Scale vs Functionality

slide-10
SLIDE 10

10

  • Cost efgective operationalize abundant data

(clickstreams, logs, tweets, ...)

  • Relaxed transactional semantics enable easy

scale out

  • Auto Sharding for scale down and scale up
  • Applications store complex data that is easier to

model as documents

  • Schemaless DB enables faster development

cycles

What MongoDB solves

Agility Flexibility Cost

slide-11
SLIDE 11

11

  • Build a database for scaleout

– Run on clusters of 100s of commodity machines

  • … that enables agile development
  • … and is usable for a broad variety of applications

How does MongoDB shape up?

slide-12
SLIDE 12

12

Purpose:

  • Aggregate system resources horizontally
  • Scaling writes
  • Scaling consistent reads

Goals:

  • Data location transparent to your code
  • Data distribution is automatic
  • No code changes required

Data Distribution across nodes - Sharding

slide-13
SLIDE 13

13

Sharding - Range distribution

shard01 shard02 shard03

sh.shardCollection("test.tweets",3{_id:31}3,3false)3

slide-14
SLIDE 14

14

Sharding - Range distribution

shard01 shard02 shard03

a-i j-r s-z

slide-15
SLIDE 15

15

Sharding - Splits

shard01 shard02 shard03

a-i ja-jz s-z k-r

slide-16
SLIDE 16

16

Sharding - Splits

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

slide-17
SLIDE 17

17

Sharding - Auto Balancing

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r js-jw jz-r

slide-18
SLIDE 18

18

Sharding - Goal Equilibrium

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

slide-19
SLIDE 19

19

Sharding - Find by Key

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

find({_id:3"alvin"})3

slide-20
SLIDE 20

20

Sharding - Find by Key

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

find({_id:3"alvin"})3

slide-21
SLIDE 21

21

Sharding - Find by Attribute

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

find({email:3"alvin@10gen.com"})3

slide-22
SLIDE 22

22

Sharding - Find by Attribute

shard01 shard02 shard03

a-i ja-ji s-z ji-js js-jw jz-r

find({email:3"alvin@10gen.com"})3

slide-23
SLIDE 23

23

Sharding - Caching

shard01

a-i j-r s-z

300 GB Data 300 GB 96 GB Mem 3:1 Data/Mem

slide-24
SLIDE 24

24

Aggregate Horizontal Resources

shard01 shard02 shard03

a-i j-r s-z

96 GB Mem 1:1 Data/Mem 100 GB 100 GB 100 GB 300 GB Data 96 GB Mem 1:1 Data/Mem 96 GB Mem 1:1 Data/Mem

j-r s-z

slide-25
SLIDE 25

25

  • Partitions data across many nodes

– Scales Read & Writes

  • What happens if a node fails?

– Data in that partition is lost

  • Must have copies of partition across

– Nodes – Data Centers – Geographic regions

Sharding

slide-26
SLIDE 26

26

Replica Sets

Primary Secondary Secondary

Read Write

App

Asynchronous Replication

slide-27
SLIDE 27

27

Replica Sets

Primary Secondary Secondary App

Read Write

slide-28
SLIDE 28

28

Replica Sets

Primary Primary Secondary

Automatic Election of new Primary

App

Read Write

slide-29
SLIDE 29

29

Replica Sets

Recovering Primary Secondary

Read Write New primary serves data

App

slide-30
SLIDE 30

30

Replica Sets

Secondary Primary Secondary

Read Write

App

slide-31
SLIDE 31

31

Scale Eventually Consistent Reads

Secondary Primary Secondary

Read Write Read Read

App

slide-32
SLIDE 32

32

  • Read Preferences

– PRIMARY, PRIMARY PREFERRED! – SECONDARY, SECONDARY PREFERRED! – NEAREST

Java example

ReadPreference pref = ReadPreference.primaryPreferred();! DBCursor cur = new DBCursor(collection, query, ! null, pref);!

Eventual Consistency Using Replicas for Read Scaling

slide-33
SLIDE 33

33

Immediate Consistency

Primary Thread #1 Insert Update Read Read

v1

✔" ✔"

v2

slide-34
SLIDE 34

34

Eventual Consistency

Primary Secondary Thread #1 Insert Read

v1

Thread #2

✔"

v1

✖"

v1 does not exist

✔"

reads v1

slide-35
SLIDE 35

35

Eventual Consistency

Primary Secondary Thread #1 Insert Update Read Read

v1

Thread #2

✔" ✔"

v1

✖" ✖"

v2 v2

reads v1 v1 does not exist

✔"

reads v2

✔"

reads v1

slide-36
SLIDE 36

36

Tunable Data Durability

Memory Journal Secondary Other Data Center

RDBMS fire & forget w=1 w=1 j=true w="majority" w=n w="myTag" Less More async sync

slide-37
SLIDE 37

37

  • Capped Collections

– Limit data by size, acts as a circular buffer / FIFO – Use cases: Audit, history, logs

  • Time To Live (TTL) collections

– Expire data based on timestamp – Use cases: Archiving, purging, sessions

  • Text Search

– Search by word, phrase, stemming, stop words – Use cases: Consistent text search

Other MongoDB features

slide-38
SLIDE 38

38

✓ Build a database for scaleout

– Run on clusters of 100s of commodity machines

  • … that enables agile development
  • … and is usable for a broad variety of applications

How does MongoDB shape up?

slide-39
SLIDE 39

39

  • Why JSON?

– Simple, well understood encapsulation of data – Maps simply to objects in your OO language – Linking & Embedding to describe relationships

Data Model

slide-40
SLIDE 40

40

Why Mess with the Data Model?

slide-41
SLIDE 41

41

posts authors comments

Mapping Objects to RDBMS

select * ! from posts p, ! authors a, ! comments c! where p.author_id = a.id! and p.id = c.post_id! and p.id = 123! ! start transaction! ! insert into comments (...)! ! update posts ! set comment_count = comment_count + 1! where post_id = 123! ! commit! !

slide-42
SLIDE 42

42

posts authors comments

server a server b server c

Mapping Objects to Distributed RDBMS

select * ! from posts p, ! authors a, ! comments c! where p.author_id = a.id! and p.id = c.post_id! and p.id = 123! ! start transaction! ! insert into comments (...)! ! update posts ! set comment_count = comment_count + 1! where post_id = 123! ! commit! !

slide-43
SLIDE 43

43

Same Schema in MongoDB

embedding linking

slide-44
SLIDE 44

44

posts

db.posts.find({_id:3123})3 ! db.posts.update(3 33{_id:3123},3 33{3"$push":3{comments:3new_comment},3 3333"$inc":33{comments_count:31}3}3 )3

author comments comments comments

server a server b server c

Mapping Object with MongoDB

slide-45
SLIDE 45

45

  • Design documents that simply map to your

application

post3=3{author:3"Hergé", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3 >3db.posts.save(post)

Schemas in MongoDB

slide-46
SLIDE 46

46

// Find the object! > db.blogs.find( { text: "Destination Moon" } )! ! // Find posts with tags! > db.blogs.find( { tags: { $exists: true } } )! ! // Regular expressions: posts where author starts with h! > db.blogs.find( { author: /^h/i } )! ! // Counting: number of posts written by Hergé! > db.blogs.find( { author: "Hergé" } ).count() !

Examples

slide-47
SLIDE 47

47

  • Conditional Query Operators

– Scalar: $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne – Vector: $in, $nin, $all, $size

  • Atomic Update Operators

– Scalar: $inc, $set, $unset – Vector: $push, $pop, $pull, $pushAll, $pullAll, $addToSet

Data Manipulation

slide-48
SLIDE 48

48

>3db.blogs.update(3 33333333333{3text:3"Destination3Moon"3},3 33333333333{3"$push":3{3comments:3new_comment3},3 3333333333333"$inc":33{3comments_count:313}3}3)3 3 33{3_id:3ObjectId("4c4ba5c0672c685e5e8aabf3"),33 3333text:3"Destination3Moon",3 3333comments:3[3 333{3 3 3author:3"Kyle",3 3 3date:3ISODate("2011Z09Z19T09:56:06.298Z"),3 3 3text:3"great3book"3 333}3 3333],3 3333comment_count:313 33}3

Extending the schema

slide-49
SLIDE 49

49

✓ Build a database for scaleout

– Run on clusters of 100s of commodity machines

✓ … that enables agile development

  • … and is usable for a broad variety of applications

How does MongoDB shape up?

slide-50
SLIDE 50

50

Big Data = MongoDB = Solved

Data Hub User Data Management Big Data Content Mgmt & Delivery Mobile & Social

slide-51
SLIDE 51

51

✓ Build a database for scaleout

– Run on clusters of 100s of commodity machines

✓ … that enables agile development ✓ … and is usable for a broad variety of applications

How does MongoDB shape up?

slide-52
SLIDE 52

52

10gen is the organization behind MongoDB

190+ employees 500+ customers Over $81 million in funding Offices in New York, Palo Alto, Washington DC, London, Dublin, Barcelona and Sydney

slide-53
SLIDE 53

53

10gen Products and Services

Consulting

Expert Resources for All Phases of MongoDB Implementations

Training

Online and In-Person for Developers and Administrators

MongoDB Monitoring Service (MMS)

Free, Cloud-Based Service for Monitoring and Alerts

Subscriptions

Professional Support, Subscriber Edition and Commercial License

slide-54
SLIDE 54

54

Indeed.com Trends

Top Job Trends 1. HTML 5 2. MongoDB 3. iOS 4. Android 5. Mobile Apps 6. Puppet 7. Hadoop 8. jQuery 9. PaaS

  • 10. Social Media

MongoDB is the Leading NoSQL Database

LinkedIn Job Skills

MongoDB Competitor 1 Competitor 2 Competitor 3 Competitor 4 Competitor 5 All Others

Google Search

MongoDB Competitor 1 Competitor 2 Competitor 3 Competitor 4

Jaspersoft Big Data Index

Direct Real-Time Downloads MongoDB Competitor 1 Competitor 2 Competitor 3

slide-55
SLIDE 55

55

The Evolution of MongoDB

2.2 Aug 12 2.4 March13 2.0 Sept 11 1.8 March 11

Journaling Sharding and Replica set enhancements Spherical geo search Index enhancements to improve size and performance Authentication with sharded clusters Replica Set Enhancements Concurrency improvements Aggregation Framework Multi-Data Center Deployments Improved Performance and Concurrency Kerberos/SASL Hash Shard Key V8 Intersecting polygons Aggregation enhancements Text Search

slide-56
SLIDE 56

@mongodb(

Drop(by(on(the(5th(floor(and(meet(an(Engineer( and( Get(a(discount(code(for(MongoDB(London(April(9th(

http://bit.ly/mongoC((

Facebook((((((((((|(((((((((Twitter(((((((((|(((((((((LinkedIn(

http://linkd.in/joinmongo(

download at mongodb.org