scaling for humongous amounts of data with mongodb
play

Scaling for Humongous amounts of data with MongoDB Alvin Richards - PowerPoint PPT Presentation

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com Getting from here to there... http://bit.ly/OT71M4 2 ...probably using one of these http://bit.ly/QDUIUF


  1. Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

  2. Getting from here to there... http://bit.ly/OT71M4 2

  3. ...probably using one of these http://bit.ly/QDUIUF 3

  4. Why NoSQL at all? 4

  5. Growth? Scaling? Cost? Flexibility? Indexed Pages 30,000 25,000 20,000 15,000 Pages Index 10,000 (Million) 5,000 0 1998 2000 2008 2012 http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD 5

  6. Need a Database that... • Build a database for scaleout – Run on clusters of 100s of commodity machines • … that enables agile development • … and is usable for a broad variety of applications 6

  7. Is Scaleout Mission Impossible? • Partitioning of Data – Hashes (Dynamo) vs Ranges (Big Table) – Physical vs Logical segments • Consistency – Eventually • Multi Master updates, resolve conflicts later – Immediately • Single Master updates, always consistent 7

  8. NoSQL and MongoDB 8

  9. Tradeoff: Scale vs Functionality • memcached scalability & performance • key/value • RDBMS depth of functionality 9

  10. What MongoDB solves • Applications store complex data that is easier to Agility model as documents • Schemaless DB enables faster development cycles • Relaxed transactional semantics enable easy Flexibility scale out • Auto Sharding for scale down and scale up • Cost e fg ective operationalize abundant data Cost (clickstreams, logs, tweets, ...) 10

  11. How does MongoDB shape up? • Build a database for scaleout – Run on clusters of 100s of commodity machines • … that enables agile development • … and is usable for a broad variety of applications 11

  12. Data Distribution across nodes - Sharding Purpose: • Aggregate system resources horizontally • Scaling writes • Scaling consistent reads Goals: • Data location transparent to your code • Data distribution is automatic • No code changes required 12

  13. Sharding - Range distribution sh.shardCollection("test.tweets",3{_id:31}3,3false)3 shard01 shard02 shard03 13

  14. Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-z 14

  15. Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-r 15

  16. Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 16

  17. Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-r 17

  18. Sharding - Goal Equilibrium shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 18

  19. Sharding - Find by Key find({_id:3"alvin"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 19

  20. Sharding - Find by Key find({_id:3"alvin"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 20

  21. Sharding - Find by Attribute find({email:3"alvin@10gen.com"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 21

  22. Sharding - Find by Attribute find({email:3"alvin@10gen.com"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 22

  23. Sharding - Caching 96 GB Mem 3:1 Data/Mem shard01 a-i 300 GB Data j-r s-z 300 GB 23

  24. Aggregate Horizontal Resources 96 GB Mem 96 GB Mem 96 GB Mem 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data j-r s-z 100 GB 100 GB 100 GB 24

  25. Sharding • Partitions data across many nodes – Scales Read & Writes • What happens if a node fails? – Data in that partition is lost • Must have copies of partition across – Nodes – Data Centers – Geographic regions 25

  26. Replica Sets App Write Primary Asynchronous Replication Read Secondary Secondary 26

  27. Replica Sets App Write Primary Read Secondary Secondary 27

  28. Replica Sets App Primary Automatic Election of Write new Primary Primary Read Secondary 28

  29. Replica Sets App Recovering New primary serves Write data Primary Read Secondary 29

  30. Replica Sets App Secondary Write Primary Read Secondary 30

  31. Scale Eventually Consistent Reads App Secondary Read Write Primary Read Secondary Read 31

  32. Eventual Consistency Using Replicas for Read Scaling • Read Preferences – PRIMARY, PRIMARY PREFERRED ! – SECONDARY, SECONDARY PREFERRED ! – NEAREST Java example ReadPreference pref = ReadPreference.primaryPreferred(); ! DBCursor cur = new DBCursor(collection, query, ! null, pref); ! 32

  33. Immediate Consistency Thread #1 Primary Insert v1 ✔" Read Update v2 ✔" Read 33

  34. Eventual Consistency Thread #1 Primary Thread #2 Secondary v1 does not Insert v1 exist ✔" ✖" Read v1 reads v1 ✔" 34

  35. Eventual Consistency Thread #1 Primary Thread #2 Secondary v1 does not Insert v1 exist ✔" ✖" Read v1 reads v1 Update v2 ✔" ✔" ✖" Read reads v1 v2 ✔" reads v2 35

  36. Tunable Data Durability Memory Journal Secondary Other Data Center RDBMS async fire & forget w=1 w=1 sync j=true w="majority" w=n w="myTag" Less More 36

  37. Other MongoDB features • Capped Collections – Limit data by size, acts as a circular buffer / FIFO – Use cases: Audit, history, logs • Time To Live (TTL) collections – Expire data based on timestamp – Use cases: Archiving, purging, sessions • Text Search – Search by word, phrase, stemming, stop words – Use cases: Consistent text search 37

  38. How does MongoDB shape up? ✓ Build a database for scaleout – Run on clusters of 100s of commodity machines • … that enables agile development • … and is usable for a broad variety of applications 38

  39. Data Model • Why JSON? – Simple, well understood encapsulation of data – Maps simply to objects in your OO language – Linking & Embedding to describe relationships 39

  40. Why Mess with the Data Model? 40

  41. ! ! ! ! ! Mapping Objects to RDBMS select * ! start transaction ! from posts p, ! insert into comments (...) ! authors a, ! comments c ! update posts ! where p.author_id = a.id ! set comment_count = comment_count + 1 ! and p.id = c.post_id ! where post_id = 123 ! and p.id = 123 ! commit ! posts comments authors 41

  42. ! ! ! ! ! Mapping Objects to Distributed RDBMS select * ! start transaction ! from posts p, ! insert into comments (...) ! authors a, ! comments c ! update posts ! where p.author_id = a.id ! set comment_count = comment_count + 1 ! and p.id = c.post_id ! where post_id = 123 ! and p.id = 123 ! commit ! posts comments authors server a server b server c 42

  43. Same Schema in MongoDB embedding linking 43

  44. ! Mapping Object with MongoDB db.posts.find({_id:3123})3 db.posts.update(3 33{_id:3123},3 33{3"$push":3{comments:3new_comment},3 posts 3333"$inc":33{comments_count:31}3}3 )3 author comments comments comments server a server b server c 44

  45. Schemas in MongoDB • Design documents that simply map to your application post3=3{author:3"Hergé", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3 >3db.posts.save(post) 45

  46. ! ! ! Examples // Find the object ! > db.blogs.find( { text: "Destination Moon" } ) ! // Find posts with tags ! > db.blogs.find( { tags: { $exists: true } } ) ! // Regular expressions: posts where author starts with h ! > db.blogs.find( { author: /^h/i } ) ! // Counting: number of posts written by Hergé ! > db.blogs.find( { author: "Hergé" } ).count() ! 46

  47. Data Manipulation • Conditional Query Operators – Scalar : $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne – Vector : $in, $nin, $all, $size • Atomic Update Operators – Scalar : $inc, $set, $unset – Vector : $push, $pop, $pull, $pushAll, $pullAll, $addToSet 47

  48. Extending the schema >3db.blogs.update(3 33333333333{3text:3"Destination3Moon"3},3 33333333333{3"$push":3{3comments:3new_comment3},3 3333333333333"$inc":33{3comments_count:313}3}3)3 3 33{3_id:3ObjectId("4c4ba5c0672c685e5e8aabf3"),33 3333text:3"Destination3Moon",3 3333comments:3[3 333{3 3 3author:3"Kyle",3 3 3date:3ISODate("2011Z09Z19T09:56:06.298Z"),3 3 3text:3"great3book"3 333}3 3333],3 3333comment_count:313 33}3 48

  49. How does MongoDB shape up? ✓ Build a database for scaleout – Run on clusters of 100s of commodity machines ✓ … that enables agile development • … and is usable for a broad variety of applications 49

  50. Big Data = MongoDB = Solved Big Data Content Mgmt & Delivery Mobile & Social User Data Management Data Hub 50

  51. How does MongoDB shape up? ✓ Build a database for scaleout – Run on clusters of 100s of commodity machines ✓ … that enables agile development ✓ … and is usable for a broad variety of applications 51

  52. 10gen is the organization behind MongoDB 190+ employees 500+ customers Offices in New York, Palo Alto, Washington Over $81 million in funding DC, London, Dublin, Barcelona and Sydney 52

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend