Scaling for Humongous amounts of data with MongoDB Alvin Richards - PowerPoint PPT Presentation

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com

Getting from here to there... http://bit.ly/OT71M4 2

...probably using one of these http://bit.ly/QDUIUF 3

Why NoSQL at all? 4

Growth? Scaling? Cost? Flexibility? Indexed Pages 30,000 25,000 20,000 15,000 Pages Index 10,000 (Million) 5,000 0 1998 2000 2008 2012 http://bit.ly/VDkDN2 http://bit.ly/108jTHN http://bit.ly/Wt3fl7 http://bit.ly/Qmg8YD 5

Need a Database that... • Build a database for scaleout – Run on clusters of 100s of commodity machines • … that enables agile development • … and is usable for a broad variety of applications 6

Is Scaleout Mission Impossible? • Partitioning of Data – Hashes (Dynamo) vs Ranges (Big Table) – Physical vs Logical segments • Consistency – Eventually • Multi Master updates, resolve conflicts later – Immediately • Single Master updates, always consistent 7

NoSQL and MongoDB 8

Tradeoff: Scale vs Functionality • memcached scalability & performance • key/value • RDBMS depth of functionality 9

What MongoDB solves • Applications store complex data that is easier to Agility model as documents • Schemaless DB enables faster development cycles • Relaxed transactional semantics enable easy Flexibility scale out • Auto Sharding for scale down and scale up • Cost e fg ective operationalize abundant data Cost (clickstreams, logs, tweets, ...) 10

How does MongoDB shape up? • Build a database for scaleout – Run on clusters of 100s of commodity machines • … that enables agile development • … and is usable for a broad variety of applications 11

Data Distribution across nodes - Sharding Purpose: • Aggregate system resources horizontally • Scaling writes • Scaling consistent reads Goals: • Data location transparent to your code • Data distribution is automatic • No code changes required 12

Sharding - Range distribution sh.shardCollection("test.tweets",3{_id:31}3,3false)3 shard01 shard02 shard03 13

Sharding - Range distribution shard01 shard02 shard03 a-i j-r s-z 14

Sharding - Splits shard01 shard02 shard03 a-i ja-jz s-z k-r 15

Sharding - Splits shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 16

Sharding - Auto Balancing shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw js-jw jz-r jz-r 17

Sharding - Goal Equilibrium shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 18

Sharding - Find by Key find({_id:3"alvin"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 19

Sharding - Find by Key find({_id:3"alvin"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 20

Sharding - Find by Attribute find({email:3"alvin@10gen.com"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 21

Sharding - Find by Attribute find({email:3"alvin@10gen.com"})3 shard01 shard02 shard03 a-i ja-ji s-z ji-js js-jw jz-r 22

Sharding - Caching 96 GB Mem 3:1 Data/Mem shard01 a-i 300 GB Data j-r s-z 300 GB 23

Aggregate Horizontal Resources 96 GB Mem 96 GB Mem 96 GB Mem 1:1 Data/Mem 1:1 Data/Mem 1:1 Data/Mem shard01 shard02 shard03 a-i j-r s-z 300 GB Data j-r s-z 100 GB 100 GB 100 GB 24

Sharding • Partitions data across many nodes – Scales Read & Writes • What happens if a node fails? – Data in that partition is lost • Must have copies of partition across – Nodes – Data Centers – Geographic regions 25

Replica Sets App Write Primary Asynchronous Replication Read Secondary Secondary 26

Replica Sets App Write Primary Read Secondary Secondary 27

Replica Sets App Primary Automatic Election of Write new Primary Primary Read Secondary 28

Replica Sets App Recovering New primary serves Write data Primary Read Secondary 29

Replica Sets App Secondary Write Primary Read Secondary 30

Scale Eventually Consistent Reads App Secondary Read Write Primary Read Secondary Read 31

Eventual Consistency Using Replicas for Read Scaling • Read Preferences – PRIMARY, PRIMARY PREFERRED ! – SECONDARY, SECONDARY PREFERRED ! – NEAREST Java example ReadPreference pref = ReadPreference.primaryPreferred(); ! DBCursor cur = new DBCursor(collection, query, ! null, pref); ! 32

Immediate Consistency Thread #1 Primary Insert v1 ✔" Read Update v2 ✔" Read 33

Eventual Consistency Thread #1 Primary Thread #2 Secondary v1 does not Insert v1 exist ✔" ✖" Read v1 reads v1 ✔" 34

Eventual Consistency Thread #1 Primary Thread #2 Secondary v1 does not Insert v1 exist ✔" ✖" Read v1 reads v1 Update v2 ✔" ✔" ✖" Read reads v1 v2 ✔" reads v2 35

Tunable Data Durability Memory Journal Secondary Other Data Center RDBMS async fire & forget w=1 w=1 sync j=true w="majority" w=n w="myTag" Less More 36

Other MongoDB features • Capped Collections – Limit data by size, acts as a circular buffer / FIFO – Use cases: Audit, history, logs • Time To Live (TTL) collections – Expire data based on timestamp – Use cases: Archiving, purging, sessions • Text Search – Search by word, phrase, stemming, stop words – Use cases: Consistent text search 37

How does MongoDB shape up? ✓ Build a database for scaleout – Run on clusters of 100s of commodity machines • … that enables agile development • … and is usable for a broad variety of applications 38

Data Model • Why JSON? – Simple, well understood encapsulation of data – Maps simply to objects in your OO language – Linking & Embedding to describe relationships 39

Why Mess with the Data Model? 40

! ! ! ! ! Mapping Objects to RDBMS select * ! start transaction ! from posts p, ! insert into comments (...) ! authors a, ! comments c ! update posts ! where p.author_id = a.id ! set comment_count = comment_count + 1 ! and p.id = c.post_id ! where post_id = 123 ! and p.id = 123 ! commit ! posts comments authors 41

! ! ! ! ! Mapping Objects to Distributed RDBMS select * ! start transaction ! from posts p, ! insert into comments (...) ! authors a, ! comments c ! update posts ! where p.author_id = a.id ! set comment_count = comment_count + 1 ! and p.id = c.post_id ! where post_id = 123 ! and p.id = 123 ! commit ! posts comments authors server a server b server c 42

Same Schema in MongoDB embedding linking 43

! Mapping Object with MongoDB db.posts.find({_id:3123})3 db.posts.update(3 33{_id:3123},3 33{3"$push":3{comments:3new_comment},3 posts 3333"$inc":33{comments_count:31}3}3 )3 author comments comments comments server a server b server c 44

Schemas in MongoDB • Design documents that simply map to your application post3=3{author:3"Hergé", 33333333date:3new3Date(), 33333333text:3"Destination3Moon", 33333333tags:3["comic",3"adventure"]} 3 >3db.posts.save(post) 45

! ! ! Examples // Find the object ! > db.blogs.find( { text: "Destination Moon" } ) ! // Find posts with tags ! > db.blogs.find( { tags: { $exists: true } } ) ! // Regular expressions: posts where author starts with h ! > db.blogs.find( { author: /^h/i } ) ! // Counting: number of posts written by Hergé ! > db.blogs.find( { author: "Hergé" } ).count() ! 46

Data Manipulation • Conditional Query Operators – Scalar : $ne, $mod, $exists, $type, $lt, $lte, $gt, $gte, $ne – Vector : $in, $nin, $all, $size • Atomic Update Operators – Scalar : $inc, $set, $unset – Vector : $push, $pop, $pull, $pushAll, $pullAll, $addToSet 47

Extending the schema >3db.blogs.update(3 33333333333{3text:3"Destination3Moon"3},3 33333333333{3"$push":3{3comments:3new_comment3},3 3333333333333"$inc":33{3comments_count:313}3}3)3 3 33{3_id:3ObjectId("4c4ba5c0672c685e5e8aabf3"),33 3333text:3"Destination3Moon",3 3333comments:3[3 333{3 3 3author:3"Kyle",3 3 3date:3ISODate("2011Z09Z19T09:56:06.298Z"),3 3 3text:3"great3book"3 333}3 3333],3 3333comment_count:313 33}3 48

How does MongoDB shape up? ✓ Build a database for scaleout – Run on clusters of 100s of commodity machines ✓ … that enables agile development • … and is usable for a broad variety of applications 49

Big Data = MongoDB = Solved Big Data Content Mgmt & Delivery Mobile & Social User Data Management Data Hub 50

How does MongoDB shape up? ✓ Build a database for scaleout – Run on clusters of 100s of commodity machines ✓ … that enables agile development ✓ … and is usable for a broad variety of applications 51

10gen is the organization behind MongoDB 190+ employees 500+ customers Offices in New York, Palo Alto, Washington Over $81 million in funding DC, London, Dublin, Barcelona and Sydney 52

Scaling for Humongous amounts of data with MongoDB Alvin Richards - PowerPoint PPT Presentation

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com Getting from here to there... http://bit.ly/OT71M4 2 ...probably using one of these http://bit.ly/QDUIUF

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

1 2 3 4 5 6 7 $ sed -n 22,27p glibc/malloc/malloc.c This is a version (aka ptmalloc2) of

Subgradient method Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

Verifiable Delay Functions and More from Isogenies and Pairings Luca De Feo based on joint work

Minimal multiple blocking sets Anurag Bishnoi (with S. Mattheus and J. Schillewaert) Free

APEX Extragalactic Surveys Attila Kovcs The Case for APEX in the ALMA Era Zero Spacing APEX

Z 2 Structure of the Quantum Spin Hall Effect Leon Balents, UCSB Joel Moore, UCB Summary

Management for Multi-stream SSDs Jingpei Yang, PhD, Rajinikanth Pandurangan, Changho Choi, PhD ,

!"#$%&'()+,-.)(%/-* .(01/'2&3043(5(#-6750.--.3(

Scaling for Humongous amounts of data with MongoDB Alvin Richards - PowerPoint PPT Presentation

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director alvin@10gen.com @jonnyeight alvinonmongodb.com Getting from here to there... http://bit.ly/OT71M4 2 ...probably using one of these http://bit.ly/QDUIUF

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

Scaling for Humongous amounts of data with MongoDB Alvin Richards Technical Director, EMEA

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

Outline Scaling Scalinga Plenitude of Power Laws Scaling-at-large Scaling-at-large

UP UP AND OUT: SCALING SOFTWARE WITH AKKA Jonas Bonr CTO Typesafe @jboner Scaling software

1 2 3 4 5 6 7 $ sed -n 22,27p glibc/malloc/malloc.c This is a version (aka ptmalloc2) of

Subgradient method Geoff Gordon &amp; Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

Verifiable Delay Functions and More from Isogenies and Pairings Luca De Feo based on joint work

Minimal multiple blocking sets Anurag Bishnoi (with S. Mattheus and J. Schillewaert) Free

APEX Extragalactic Surveys Attila Kovcs The Case for APEX in the ALMA Era Zero Spacing APEX

Z 2 Structure of the Quantum Spin Hall Effect Leon Balents, UCSB Joel Moore, UCB Summary

Management for Multi-stream SSDs Jingpei Yang, PhD, Rajinikanth Pandurangan, Changho Choi, PhD ,

!&quot;#$%&amp;'()*+,*-.)(%/-* .(*01/'2&amp;3043(5(#*-67*50.--.3(*

Subgradient method Geoff Gordon & Ryan Tibshirani Optimization 10-725 / 36-725 1 Remember

!"#$%&'()+,-.)(%/-* .(01/'2&3043(5(#-6750.--.3(