managing data and operation distribution in mongodb
play

Managing Data and Operation Distribution In MongoDB Antonios - PowerPoint PPT Presentation

Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBAs @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1 Introduction Antonios Giannopoulos Jason Terpko


  1. Managing Data and Operation Distribution In MongoDB Antonios Giannopoulos and Jason Terpko DBA’s @ Rackspace/ObjectRocket linkedin.com/in/antonis/ | linkedin.com/in/jterpko/ 1

  2. Introduction Antonios Giannopoulos Jason Terpko www.objectrocket.com 2

  3. Overview • Sharded Cluster • Shard Keys Selection • Shard Key Operations • Chunk Management • Data Distribution • Orphaned documents • Q&A www.objectrocket.com 3

  4. Sharded Cluster • Cluster Metadata • Data Layer • Query Routing • Cluster Communication www.objectrocket.com 4

  5. Cluster Metadata

  6. Data Layer … s2 s1 sN

  7. Replication Data redundancy relies on an idempotent log of operations.

  8. Query Routing … s2 s1 sN

  9. Sharded Cluster … s2 s1 sN

  10. Cluster Communication How do independent components become a cluster and communicate? ● Replica Set ○ Replica Set Monitor ○ Replica Set Configuration ○ Network Interface ASIO Replication / Network Interface ASIO Shard Registry ○ Misc: replSetName, keyFile, clusterRole ● Mongos Configuration ○ configDB Parameter ○ Network Interface ASIO Shard Registry ○ Replica Set Monitor ○ Task Executor ● Post Add Shard ○ Collection config.shards ○ Replica Set Monitor ○ Task Executor Pool ○ config.system.sessions

  11. Primary Shard Database <foo> … s2 s1 sN

  12. Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections

  13. Collection UUID With featureCompatibilityVersion 3.6 all collections are assigned an immutable UUID. Cluster Metadata config.collections Data Layer (mongod) config.collections Important • UUID’s for a namespace must match • Use 4.0+ Tools for a sharded cluster restore

  14. Shard Key - Selection • Profiling • Identify shard key candidates • Pick a shard key • Challenges www.objectrocket.com 14

  15. Sharding Shards are Physical Partitions Chunks are Logical Partitions Database <foo> Collection <foo> … s2 s1 sN chunk chunk chunk chunk chunk chunk 15

  16. What is a Chunk? The mission of the shard key is to create chunks The logical partitions your collection is divided into and how data is distributed across the cluster. ● Maximum size is defined in config.settings ○ Default 64MB ● Before 3.4.11: Hardcoded maximum document count of 250,000 ● Version 3.4.11 and higher: 1.3 configured chunk size by the average document size ● Chunk map is stored in config.chunks ○ Continuous range from MinKey to MaxKey ● Chunk map is cached at both the mongos and mongod ○ Query Routing ○ Sharding Filter ● Chunks distributed by the Balancer ○ Using moveChunk ○ Up to maxSize

  17. Shard Key Selection Profiling Helps identify your workload Requires Level 2 – db.setProfilingLevel(2) May need to increase profiler size www.objectrocket.com 17

  18. Shard Key Selection Profiling Candidates Export statements types with frequency Export statement patterns with frequency Produces a list of shard key candidates www.objectrocket.com 18

  19. Shard Key Selection Build-in Profiling Candidates Constraints Key and Value is immutable Must not contain NULLs Update and findAndModify operations must contain shard key Unique constraints must be maintained by a prefix of shard key A shard key cannot contain special index types (i.e. text) Potentially reduces the list of candidates www.objectrocket.com 19

  20. Shard Key Selection Schema Build-in Profiling Candidates Constraints Constraints Cardinality Monotonically increased Data Hotspots Operational Hotspots Targeted vs Scatter-gather operations www.objectrocket.com 20

  21. Shard Key Selection Schema Build-in Profiling Candidates Future Constraints Constraints Poor cardinality Growth and data hotspots Data pruning & TTL indexes Schema changes Try to simulate the dataset in 3,6 and 12 months www.objectrocket.com 21

  22. Shard key - Operations • Apply a shard key • Revert a shard key www.objectrocket.com 22

  23. Apply a shard key Create the associated index Make sure the balancer is stopped: sh.stopBalancer() sh.getBalancerState() Apply the shard key: sh.shardCollection(“foo.col”,{field1:1,...,fieldN:1}) Allow a burn period Start the balancer www.objectrocket.com 23

  24. Sharding sh.ShardCollection({foo.foo},<key>) Burn Period sh.startBalancer() Database <foo> Collection <foo> … s2 s1 sN chunk chunk chunk chunk chunk chunk

  25. Revert a shard key Two categories: Affects functionality (exceptions, inconsistent data,…) o Affects performance (operational hotspots…) o Dump/Restore Requires downtime – write and in some cases read o Time consuming operation o You may restore on a sharded or unsharded collection o Better pre-create indexes o Same or new cluster can be used o Streaming dump/restore is an option o On special cases, like time series data can be fast o www.objectrocket.com 25

  26. Revert a shard key Dual writes Mongo to Mongo connector or Change streams o No downtime o Requires extra capacity o May Increase latency o Same or new cluster can be used o Adds complexity o Alter the config database Requires downtime – but minimal o Easy during burn period o Time consuming, if chunks are distributed o Has overhead during chunk moves o www.objectrocket.com 26

  27. Revert a shard key Process: 1) Disable the balancer – sh.stopBalancer() 2) Move all chunks to the primary shard (skip during burn period) 3) Stop one secondary from the config server ReplSet (for rollback) 4) Stop all mongos and all shards 5) On the config server replset primary execute: db.getSiblingDB(‘config’).chunks.remove({ns:<collection name>}) db.getSiblingDB(‘config’).collections.remove({_id:<collection name>}) 6) Start all mongos and shards 7) Start the secondary from the config server replset Rollback: • After step 6, stop all mongos and shards • Stop the running members of the config server ReplSet and wipe their data directory • Start all config server replset members • Start all mongos and shards www.objectrocket.com 27

  28. Revert a shard key Online option requested on SERVER-4000 - May be supported in 4.2 Further reading - Morphus : Supporting Online Reconfigurations in Sharded NoSQL Systems http://dprg.cs.uiuc.edu/docs/ICAC2015/Conference.pdf Special use cases : Extend a shard key, by adding field(s) ({a:1} to {a:1,b:1}) Possible (and easier) if b’s max and min (per a) are predefined o For example {year:month} to be extended to {year:month:day} o Reduce the elements of a shard key (({a:1, b:1} to {a:1}) Possible (and easier) if all distinct “a” values are in the same shard o There aren’t chunks with the same “a.min” (adds complexity) o www.objectrocket.com 28

  29. Revert a shard key Always preform a dry-run Balancer/Autosplit must be disabled You must take downtime during the change *There might be a more optimal code path but the above one worked like a charm www.objectrocket.com 29

  30. Chunk Splitting and • Pre-splitting Merging • Auto Splits • Manual Intervention www.objectrocket.com 30

  31. Distribution Goal Database Size: 200G 25% 25% Primary Shard: s1 25% Database <foo> … s2 s1* s4 50G 50G 50G 31

  32. Pre-Split – Hashed Keys Shard keys using MongoDB’s hashed index allow the use of numInitialChunks. Hashing Mechanism jdoe@gmail.com 694ea0904ceaf766c6738166ed89bafb NumberLong(“7588178963792066406”) Value 64-bits of MD5 64-bit Integer Estimation Size = Collection size (in MB) / 32 1,600 = 51,200 / 32 Count = Number of documents / 125000 800 = 100,000,000 / 125,000 Limit = Number of shards * 8192 32,768 = 4 *8192 numInitialChunks = Min(Max(Size, Count), Limit) 1600 = Min(Max(1600, 800), 32768) Command db.runCommand( { shardCollection: ”foo.users", key: { ”email": "hashed" }, numInitialChunks : 1600 } ); 32

  33. Pre-Split – Deterministic Prerequisites Use Case: Collection containing user profiles with email as the unique key. 1. Shard key analysis complete 2. Understanding of access patterns 3. Knowledge of the data 4. Unique key constraint 33

  34. Pre-Split – Deterministic Prerequisites Split Initial Chunk Splits 34

  35. Pre-Split – Deterministic Prerequisites Split Balance 35

  36. Pre-Split – Deterministic Balance Split Prerequisites Split 36

  37. Automatic Splitting Controlling Auto-Split • sh.enableAutoSplit() • sh.disableAutoSplit() Alternatively Mongos • The component responsible for track statistics • Bytes Written Statistics • Multiple Mongos Servers for HA 37

  38. Sub-Optimal Distribution Database Size: 200G 40% 20% Primary Shard: s1 20% Chunks: Balanced Database <foo> … s2 s1* s4 38

  39. Maintenance – Splitting Four Helpful Resources: • collStats • config.chunks • Profiler • Oplog • dataSize 39

  40. Maintenance – Splitting Five Helpful Resources: • collStats • config.chunks • dataSize • oplog.rs • system.profile 40

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend