Managing Data and Operation Distribution In MongoDB Antonios - PowerPoint PPT Presentation

Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1

Introduction Antonios Giannopoulos Jason Terpko www.objectrocket.com 2

Overview • Sharded Cluster • Shard Keys Selection • Shard Key Operations • Chunk Management • Data Distribution • Orphaned documents • Q&A www.objectrocket.com 3

Sharded Cluster • Cluster Metadata • Data Layer • Query Routing • Cluster Communication www.objectrocket.com 4

Cluster Metadata

Data Layer … s2 s1 sN

Replication Data redundancy relies on an idempotent log of operations.

Query Routing … s2 s1 sN

Sharded Cluster … s2 s1 sN

Cluster Communication How do independent components become a cluster and communicate? ● Replica Set ○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole ● Mongos Configuration ○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor ● Post Add Shard ○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions

Primary Shard Database <foo> … s2 s1 sN

Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections

Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important • UUID’s for a namespace must match • Use 4.0+ Tools for a sharded cluster restore

Shard Key - Selection • Profiling • Identify shard key candidates • Pick a shard key • Challenges www.objectrocket.com 14

Sharding Shards are Physical Partitions Chunks are Logical Partitions Database <foo> Collection <foo> … s2 s1 sN chunk chunk chunk chunk chunk chunk 15

What is a Chunk? The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster. ● Maximum size is defined in config.settings ○ Default 64MB ● Before 3.4.11: Hardcoded maximum document count of 250,000 ● Version 3.4.11 and higher: 1.3 configured chunk size by the average document size ● Chunk map is stored in config.chunks ○ Continuous range from MinKey to MaxKey ● Chunk map is cached at both the mongos and mongod ○ Query Routing ○ Sharding Filter ● Chunks distributed by the Balancer ○ Using moveChunk ○ Up to maxSize

Shard Key Selection Profiling Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size www.objectrocket.com 17

Shard Key Selection Profiling Candidates Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates www.objectrocket.com 18

Shard Key Selection Build-in Profiling Candidates Constraints Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates www.objectrocket.com 19

Shard Key Selection Schema Build-in Profiling Candidates Constraints Constraints Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations www.objectrocket.com 20

Shard Key Selection Schema Build-in Profiling Candidates Future Constraints Constraints Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months www.objectrocket.com 21

Shard key - Operations • Apply a shard key • Revert a shard key www.objectrocket.com 22

Apply a shard key Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer www.objectrocket.com 23

Sharding sh.ShardCollection({foo.foo},<key>) Burn Period sh.startBalancer() Database <foo> Collection <foo> … s2 s1 sN chunk chunk chunk chunk chunk chunk

Revert a shard key Two categories: Affects functionality (exceptions, inconsistent data,…) o Affects performance (operational hotspots…) o Dump/Restore Requires downtime – write and in some cases read o Time consuming operation o You may restore on a sharded or unsharded collection o Better pre-create indexes o Same or new cluster can be used o Streaming dump/restore is an option o On special cases, like time series data can be fast o www.objectrocket.com 25

Revert a shard key Dual writes Mongo to Mongo connector or Change streams o No downtime o Requires extra capacity o May Increase latency o Same or new cluster can be used o Adds complexity o Alter the config database Requires downtime – but minimal o Easy during burn period o Time consuming, if chunks are distributed o Has overhead during chunk moves o www.objectrocket.com 26

Revert a shard key Process: 1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset Rollback: • After step 6, stop all mongos and shards • Stop the running members of the config server ReplSet and wipe their data directory • Start all config server replset members • Start all mongos and shards www.objectrocket.com 27

Revert a shard key Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus : Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases : Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1}) Possible (and easier) if b’s max and min (per a) are predefined o For example {year:month} to be extended to {year:month:day} o Reduce the elements of a shard key (({a:1, b:1} to {a:1}) Possible (and easier) if all distinct “a” values are in the same shard o There aren’t chunks with the same “a.min” (adds complexity) o www.objectrocket.com 28

Revert a shard key Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change *There might be a more optimal code path but the above one worked like a charm www.objectrocket.com 29

Chunk Splitting and • Pre-splitting Merging • Auto Splits • Manual Intervention www.objectrocket.com 30

Distribution Goal Database Size: 200G 25% 25% Primary Shard: s1 25% Database <foo> … s2 s1* s4 50G 50G 50G 31

Pre-Split – Hashed Keys Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 1,600 = 51,200 / 32 Count = Number of documents / 125000 800 = 100,000,000 / 125,000 Limit = Number of shards * 8192 32,768 = 4 *8192 numInitialChunks = Min(Max(Size, Count), Limit) 1600 = Min(Max(1600, 800), 32768) Command db.runCommand( { shardCollection: ”foo.users", key: { ”email": "hashed" }, numInitialChunks : 1600 } ); 32

Pre-Split – Deterministic Prerequisites Use Case: Collection containing user profiles with email as the unique key. 1. Shard key analysis complete 2. Understanding of access patterns 3. Knowledge of the data 4. Unique key constraint 33

Pre-Split – Deterministic Prerequisites Split Initial Chunk Splits 34

Pre-Split – Deterministic Prerequisites Split Balance 35

Pre-Split – Deterministic Balance Split Prerequisites Split 36

Automatic Splitting Controlling Auto-Split • sh.enableAutoSplit() • sh.disableAutoSplit() Alternatively Mongos • The component responsible for track statistics • Bytes Written Statistics • Multiple Mongos Servers for HA 37

Sub-Optimal Distribution Database Size: 200G 40% 20% Primary Shard: s1 20% Chunks: Balanced Database <foo> … s2 s1* s4 38

Maintenance – Splitting Four Helpful Resources: • collStats • config.chunks • Profiler • Oplog • dataSize 39

Maintenance – Splitting Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile 40

Managing Data and Operation Distribution In MongoDB Antonios - PowerPoint PPT Presentation

Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBAs @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1 Introduction Antonios Giannopoulos Jason Terpko

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

Fast, Reliable, Secure, Affordable MongoDB on AWS EC2 Paul Marcelin Why are we here today?

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted Agenda

Dos and Donts of a Hybrid Environment MySQL and MongoDB Introduction Im Rick Vasquez a

MICE Demonstration of Ionisation Cooling Colin Whyte University of Strathclyde On behalf of

Learning near-optimal hyperparameters with minimal overhead Gellrt Weisz Andrs Gyrgy

Electric System Financial Results Storm Costs Recovery- Municipal vs. IOU Building Community

Arthur Kressner CON EDISONS ELECTRIC SYSTEM NEW YORK CITY AND WESTCHESTER COUNTY ! 3.2

Null Hypothesis Significance Testing Gallery of Tests 18.05 Spring 2014 January 1, 2017 1

Tax Basics for the Business Lawyer May 25, 2017 Presented by the Taxation Committee Roger

Key Regression Fu, Kamara, Kohno Divya Muthukumaran Content Distribution Content publishers

PERC: Double + EKT IETF 99, July 2017, Prague - Cullen & Sergio 1 V3 Agenda One broad

Managing Data and Operation Distribution In MongoDB Antonios - PowerPoint PPT Presentation

Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBAs @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1 Introduction Antonios Giannopoulos Jason Terpko

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

Fast, Reliable, Secure, Affordable MongoDB on AWS EC2 Paul Marcelin Why are we here today?

CS 61: Database Systems MongoDB Schema Design Adapted mongodb.com unless otherwise noted Agenda

Dos and Donts of a Hybrid Environment MySQL and MongoDB Introduction Im Rick Vasquez a

MICE Demonstration of Ionisation Cooling Colin Whyte University of Strathclyde On behalf of

Learning near-optimal hyperparameters with minimal overhead Gellrt Weisz Andrs Gyrgy

Electric System Financial Results Storm Costs Recovery- Municipal vs. IOU Building Community

Arthur Kressner CON EDISONS ELECTRIC SYSTEM NEW YORK CITY AND WESTCHESTER COUNTY ! 3.2

Null Hypothesis Significance Testing Gallery of Tests 18.05 Spring 2014 January 1, 2017 1

Tax Basics for the Business Lawyer May 25, 2017 Presented by the Taxation Committee Roger

Key Regression Fu, Kamara, Kohno Divya Muthukumaran Content Distribution Content publishers

PERC: Double + EKT IETF 99, July 2017, Prague - Cullen &amp; Sergio 1 V3 Agenda One broad

PERC: Double + EKT IETF 99, July 2017, Prague - Cullen & Sergio 1 V3 Agenda One broad