running mongodb in production
play

Running MongoDB in Production Tim Vaillancourt Sr Technical - PowerPoint PPT Presentation

Running MongoDB in Production Tim Vaillancourt Sr Technical Operations Architect, Percona `whoami` { name: tim, lastname: vaillancourt, employer: percona, techs: [ mongodb, mysql, cassandra,


  1. Security: External Authentication ● LDAP Authentication ○ Supported in PSMDB and MongoDB Enterprise ○ The following components are necessary for external authentication to work ■ LDAP Server: Remotely stores all user credentials (i.e. user name and associated password). ■ SASL Daemon: Used as a MongoDB server-local proxy for the remote LDAP service. ■ SASL Library: Used by the MongoDB client and server to create authentication mechanism-specific data. ○ Creating a User: db.getSiblingDB("$external").createUser( {user : christian, roles: [{role: "read", db: "test"} ]} ); ○ Authenticating as a User: db.getSiblingDB("$external").auth({ mechanism:"PLAIN", user:"christian", pwd:"secret", digestPassword:false}) ○ Other auth methods possible with MongoDB Enterprise 22

  2. Security: SSL Connections and Auth ● SSL / TLS Connections ○ Supported since MongoDB 2.6x ■ May need to complile-in yourself on older binaries ■ Supported 100% in Percona Server for MongoDB ○ Minimum of 128-bit key length for security ○ Relaxed and strict (requireSSL) modes ○ System (default) or Custom Certificate Authorities are accepted ● SSL Client Authentication (x509) ○ MongoDB supports x.509 certificate authentication for use with a secure TLS/SSL connection as of 2.6.x. ○ The x.509 client authentication allows clients to authenticate to servers with certificates rather than with a username and password. ○ Enabled with: security.clusterAuthMode: x509 23

  3. Security: Encryption at Rest ● MongoDB Enterprise ○ Encryption supported in Enterprise binaries ($$$) ● Percona Server for MongoDB ○ Use CryptFS/LUKS block device for encryption of data volume ○ Documentation published (or coming soon) ○ Completely open-source / Free ● Application-Level ○ Selectively encrypt only required fields in application ○ Benefits ■ The data is only readable by the application (reduced touch points) ■ The resource cost of encryption is lower when it’s applied selectively ■ Offloading of encryption overhead from database 24

  4. Security: Network Firewall ● MongoDB only requires a single TCP port to be reachable (to all nodes) ○ Default port 27017 ○ This does not include monitoring tools, etc ■ Percona PMM requires inbound connectivity to 1-2 TCP ports ● Restrict TCP port access to nodes that require it! ● Sharded Cluster ○ Application servers only need access to ‘mongos’ ○ Block direct TCP access from application -> shard/mongod instances ■ Unless ‘mongos’ is bound to localhost! ● Advanced ○ Move inter-node replication to own network fabric, VLAN, etc ○ Accept client connections on a Public interface 25

  5. More on Security (some overlap) Room: Field Suite #2 Time: Tuesday, 17:25 to 17:50 26

  6. Monitoring

  7. Monitoring: Methodology ● Monitor often ○ 60 - 300 seconds is not enough! ○ Problems can begin/end in seconds ● Correlate Database and Operating System together! ● Monitor a lot ○ Store more than you graph ○ Example: PMM gathers 700-900 metrics per polling ● Process ○ Use to troubleshoot Production events / incidents ○ Iterate and Improve monitoring ■ Add graphing for whatever made you SSH to a host ■ Blind QA with someone unfamiliar with the problem 28

  8. Monitoring: Important Metrics ● Database ○ Operation counters ○ Cache Traffic and Capacity ○ Checkpoint / Compaction Performance ○ Concurrency Tickets (WiredTiger and RocksDB) ○ Document and Index scanning ○ Various engine-specific details ● Operating System ○ CPU ○ Disk ○ Bandwidth / Util ○ Average Wait Time ○ Memory and Network 29

  9. Monitoring: Percona PMM ● Open-source monitoring from Percona! ● Based on open-source technology Prometheus ○ Grafana ○ Go Language ○ ● Simple deployment ● Examples in this demo are from PMM! ● Correlation of OS and DB Metrics ● 800+ metrics per ping 30

  10. Architecture and High-Availability

  11. High Availability ● Replication ○ Asynchronous ■ Write Concerns can provide psuedo-synchronous replication ■ Changelog based, using the “Oplog” ○ Maximum 50 members ○ Maximum 7 voting members ■ Use “vote:0” for members $gt 7 ○ Oplog ■ The “oplog.rs” capped-collection in “local” storing changes to data ■ Read by secondary members for replication ■ Written to by local node after “apply” of operation 32

  12. Architecture ● Datacenter Recommendations ○ Minimum of 3 x physical servers required for High-Availability ○ Ensure only 1 x member per Replica Set is on a single physical server!!! ● EC2 / Cloud Recommendations ○ Place Replica Set members in odd number of Availability Zones, same region ○ Use a hidden-secondary node for Backup and Disaster Recover in another region ○ Entire Availability Zones have been lost before! 33

  13. Hardware

  14. Hardware: Mainframe vs Commodity ● Databases: The Past ○ Buy some really amazing, expensive hardware ○ Buy some crazy expensive license ■ Don’t run a lot of servers due to above ○ Scale up: ■ Buy even more amazing hardware for monolithic host ■ Hardware came on a truck ○ HA: When it rains, it pours ● Databases: A New Era ○ Everything fails, nothing is precious ○ Elastic infrastructures (“The cloud”, Mesos, etc) ○ Scale up: add more cheap, commodity servers ○ HA: lots of cheap, commodity servers - still up! 35

  15. Hardware: Block Devices ● Isolation Run Mongod dbPaths on separate volume ○ Optionally, run Mongod journal on separate volume ○ ● RAID Level RAID 10 == performance/durability sweet spot ○ RAID 0 == fast and dangerous ○ ● SSDs Benefit MMAPv1 a lot ○ Benefit WT and RocksDB a bit less ○ Keep about 30% free for internal GC on the SSD ○ 36

  16. Hardware: Block Devices ● EBS / NFS / iSCSI Risks / Drawbacks ○ Exponentially more things to break ■ Block device requests wrapped in TCP is extremely slow ■ You probably already paid for some fast local disks ■ More difficult (sometimes nearly-impossible) to troubleshoot ■ MongoDB doesn’t really benefit from remote storage features/flexibility ■ ● Built-in High-Availability of data via replication ● MongoDB replication can bootstrap new members ● Strong write concerns can be specified for critical data 37

  17. Hardware: CPUs ● Cores vs Core Speed ○ Lots of cores > faster cores (4 CPU minimum recommended) ○ Thread-per-connection Model ● CPU Frequency Scaling ○ ‘cpufreq’: a daemon for dynamic scaling of the CPU frequency ○ Terrible idea for databases or any predictability! ○ Disable or set governor to 100% frequency always, i.e mode: ‘performance’ ○ Disable any BIOS-level performance/efficiency tuneable ○ ENERGY_PERF_BIAS ■ A CentOS/RedHat tuning for energy vs performance balance ■ RHEL 6 = ‘performance’ ■ RHEL 7 = ‘normal’ (!) ● My advice: use ‘tuned’ to set to ‘performance’ 38

  18. Hardware: Network Infrastructure ● Datacenter Tiers ○ Network Edge ○ Public Server VLAN ■ Servers with Public NAT and/or port forwards from Network Edge ■ Examples: Proxies, Static Content, etc ■ Calls backends in Backend VLAN ○ Backend Server VLAN ■ Servers with port forwarding from Public Server VLAN (w/Source IP ACLs) ■ Optional load balancer for stateless backends ■ Examples: Webserver, Application Server/Worker, etc ■ Calls data stores in Data VLAN ○ Data VLAN ■ Servers, filers, etc with port forwarding from Backend Server VLAN (w/Source IP ACLs) ■ Examples: Databases, Queues, Filers, Caches, HDFS, etc 39

  19. Hardware: Network Infrastructure ● Network Fabric ○ Try to use 10GBe for low latency ○ Use Jumbo Frames for efficiency ○ Try to keep all MongoDB nodes on the same segment ■ Goal: few or no network hops between nodes ■ Check with ‘traceroute’ ● Outbound / Public Access ○ Databases don’t need to talk to the internet* ■ Store a copy of your Yum, DockerHub, etc repos locally ■ Deny any access to Public internet or have no route to it ■ Hackers will try to upload a dump of your data out of the network!! ● Cloud? ○ Try to replicate the above with features of your provider 40

  20. Hardware: Why So Quick? ● MongoDB allows you to scale reads and writes with more nodes ○ Single-instance performance is important, but deal-breaking ● You are the most expensive resource! ○ Not hardware anymore 41

  21. Tuning MongoDB

  22. Tuning MongoDB: MMAPv1 ● A kernel-level function to map file blocks to memory ● MMAPv1 syncs data to disk once per 60 seconds (default) Override with —syncDelay <seconds> flag ○ If a server with no journal crashes it can lose 1 min of ○ data!!! ● In memory buffering of Journal Synced every 30ms ‘journal’ is on a different disk ○ Or every 100ms ○ Or 1/3rd of above if change uses Journaled write ○ concern (explained later) 43

  23. Tuning MongoDB: MMAPv1 ● Fragmentation Can cause serious slowdowns on scans, range ○ queries, etc db.<collection>.stats() ○ Shows various storage info for a collection ■ Fragmentation can be computed by dividing ■ ‘storageSize’ by ‘size’ Any value > 1 indicates fragmentation ■ Compact when you near a value of 2 by rebuilding ○ secondaries or using the ‘compact’ command WiredTiger and RocksDB have little-no fragmentation ○ due to checkpoints / compaction 44

  24. Tuning MongoDB: WiredTiger ● WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes ○ ● In-memory buffering of Journal Journal buffer size 128kb ○ Synced every 50 ms (as of 3.2) ○ Or every change with Journaled write concern (explained ○ later) In between write operations while the journal records ○ remain in the buffer, updates can be lost following a hard shutdown! 45

  25. Tuning MongoDB: RocksDB ● Level-based strategy using immutable data level files ○ Built-in Compression ○ Block and Filesystem caches ● RocksDB uses “compaction” to apply changes to data files Tiered level compaction ○ Follows same logic as MMAPv1 for journal ○ buffering ● MongoRocks ○ A layer between RocksDB and MongoDB’s storage engine API ○ Developed in partnership with Facebook 46

  26. Tuning MongoDB: Storage Engine Caches ● WiredTiger ○ In heap ■ 50% available system memory ■ Uncompressed WT pages ○ Filesystem Cache ■ 50% available system memory ■ Compressed pages ● RocksDB ○ Internal testing planned from Percona in the future ○ 30% in-heap cache recommended by Facebook / Parse 47

  27. Tuning MongoDB: Durability ● storage.journal.enabled = <true/false> Default since 2.0 on 64-bit builds ○ Always enable unless data is transient ○ Always enable on cluster config servers ○ ● storage.journal.commitIntervalMs = <ms> Max time between journal syncs ○ ● storage.syncPeriodSecs = <secs> Max time between data file flushes ○ 48

  28. Tuning MongoDB: Don’t Enable! ● “cpu” ○ External monitoring is recommended ● “rest” ○ Will be deprecated in 3.6+ ● “smallfiles” ○ In most situations this is not necessary unless ■ You use MMAPv1, and ■ It is a Development / Test environment ■ You have 100s-1000s of databases with very little data inside (unlikely) ● Profiling mode ‘2’ ○ Unless troubleshooting an issue / intentional 49

  29. Tuning Linux

  30. Tuning Linux: The Linux Kernel ● Linux 2.6.x? ● Avoid Linux earlier than 3.10.x - 3.12.x ● Large improvements in parallel efficiency in 3.10+ (for Free!) ● More: https://blog.2ndquadrant.com/postgresql-vs-kernel-versions/ 51

  31. Tuning Linux: NUMA ● A memory architecture that takes into account the locality of memory, caches and CPUs for lower latency ○ But no databases want to use it :( ● MongoDB codebase is not NUMA “aware”, causing unbalanced memory allocations on NUMA systems ● Disable NUMA In the Server BIOS ○ Using ‘numactl’ in init scripts BEFORE ‘mongod’ ○ command (recommended for future compatibility) : numactl --interleave=all /usr/bin/mongod <other flags> 52

  32. Tuning Linux: Transparent HugePages ● Introduced in RHEL/CentOS 6, Linux 2.6.38+ ● Merges memory pages in background (Khugepaged process) ● Decreases overall performance when used with MongoDB! ● “AnonHugePages” in /proc/meminfo shows usage ● Disable TransparentHugePages! ● Add “transparent_hugepage=never” to kernel command-line (GRUB) Reboot the system ○ ■ Disabling online does not clear previous TH pages ■ Rebooting tests your system will come back up! 53

  33. Tuning Linux: Time Source ● Replication and Clustering needs consistent clocks ○ mongodb_consistent_backup relies on time sync, for example! ● Use a consistent time source/server ○ “It’s ok if everyone is equally wrong” ● Non-Virtualized ○ Run NTP daemon on all MongoDB and Monitoring hosts ○ Enable service so it starts on reboot ● Virtualised ○ Check if your VM platform has an “agent” syncing time ○ VMWare and Xen are known to have their own time sync ○ If no time sync provided install NTP daemon 54

  34. Tuning Linux: I/O Scheduler ● Algorithm kernel uses to commit reads and writes to disk ● CFQ “Completely Fair Queue” ○ Default scheduler in 2.6-era Linux distributions ○ Perhaps too clever/inefficient for database workloads ○ Probably good for a laptop ○ ● Deadline Best general default IMHO ○ Predictable I/O request latencies ○ ● Noop Use with virtualised servers ○ Use with real-hardware BBU RAID controllers ○ 55

  35. Tuning Linux: Filesystems ● Filesystem Types Use XFS or EXT4, not EXT3 ○ EXT3 has very poor pre-allocation performance ■ Use XFS only on WiredTiger ■ EXT4 “data=ordered” mode recommended ■ Btrfs not tested, yet! ○ ● Filesystem Options Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’: ○ Remount the filesystem after an options change, or reboot ○ 56

  36. Tuning Linux: Block Device Readahead ● Tuning that causes data ahead of a block on disk to be read and then cached ● Assumption: There is a sequential read pattern ○ Something will benefit from the extra cached blocks ○ ● Risk Too high waste cache space ○ Increases eviction work ○ MongoDB tends to have very random disk patterns ■ ● A good start for MongoDB volumes is a ’32’ (16kb) read-ahead ○ Let MongoDB worry about optimising the pattern 57

  37. Tuning Linux: Block Device Readahead ● Change ReadAhead Add file to ‘/etc/udev/rules.d’ ○ /etc/udev/rules.d/60-mongodb-disk.rules: ■ # set deadline scheduler and 32/16kb read-ahead for /dev/sda ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16" Reboot (or use CLI tools to apply) ○ 58

  38. Tuning Linux: Virtual Memory Dirty Pages ● Dirty Pages Pages stored in-cache, but needs to be written to ○ storage ● Dirty Ratio Max percent of total memory that can be dirty ○ VM stalls and flushes when this limit is reached ○ Start with ’10’, default (30) too high ○ ● Dirty Background Ratio Separate threshold forbackground dirty page ○ flushing Flushes without pauses ○ Start with ‘3’, default (15) too high ○ 59

  39. Tuning Linux: Swappiness ● A Linux kernel sysctl setting for preferring RAM or disk for swap Linux default: 60 ○ To avoid disk-based swap: 1 (not zero!) ○ To allow some disk-based swap: 10 ○ ‘0’ can cause more swapping than ‘1’ on recent ○ kernels ■ More on this here: https://www.percona.com/blog/2014/04/28/oom-rela tion-vm-swappiness0-new-kernel/ 60

  40. Tuning Linux: Ulimit ● Allows per-Linux-user resource constraints Number of User-level Processes ○ Number of Open Files ○ CPU Seconds ○ Scheduling Priority ○ And others… ○ ● MongoDB Should probably have a dedicated VM, container or server ○ Creates a new process ○ ■ For every new connection to the Database ■ Plus various background tasks / threads ○ Creates an open file for each active data file on disk ■ 64,000 open files and 64,000 max processes is a good start 61

  41. Tuning Linux: Ulimit ● Setting ulimits ○ /etc/security/limits.d file ○ Systemd Service ○ Init script ● Ulimits are set by Percona and MongoDB packages! ○ Example on left: PSMDB RPM (Systemd) 62

  42. Tuning Linux: Network Stack ● Defaults are not good for > 100mbps Ethernet ● Suggested starting point: ● Set Network Tunings: ○ Add the above sysctl tunings to /etc/sysctl.conf ○ Run “/sbin/sysctl -p” as root to set the tunings ○ Run “/sbin/sysctl -a” to verify the changes 63

  43. Tuning Linux: More on this... https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/ 64

  44. Tuning Linux: “Tuned” ● Tuned A “framework” for applying tunings to Linux ○ RedHat/CentOS 7 only for now ■ Debian added tuned, not sure if compatible yet ■ Cannot tune NUMA, file system type or fs mount opts ■ Syctls, THP, I/O sched, etc ■ My apology to the community for writing “Tuning Linux for MongoDB”: ● ○ https://github.com/Percona-Lab/tuned-percona-mongodb 65

  45. Troubleshooting “The problem with troubleshooting is trouble shoots back” ~ Unknown

  46. Troubleshooting: Usual Suspects ● Locking Collection-level locks ○ Document-level locks ○ Software mutex/semaphore ○ ● Limits Max connections ○ Operation rate limits ○ Resource limits ○ ● Resources Lack of IOPS, RAM, CPU, network, etc ○ 67

  47. Troubleshooting: MongoDB Resources ● Memory ● CPU System CPU ○ FS cache ○ Networking ○ Disk I/O ○ Threading ○ ● User CPU (MongoDB) Compression (WiredTiger and RocksDB) ○ Session Managemen ○ BSON (de)serialisation ○ Filtering / scanning / sorting ○ 68

  48. Troubleshooting: MongoDB Resources ● User CPU (MongoDB) Optimiser ○ Disk ○ Data file read/writes ○ Journaling ○ Error logging ○ Network ○ Query request/response ○ Replication ○ ● Disk I/O ○ Journaling ○ Oplog Reads / Writes ○ Background Flushing / Compactions / etc 69

  49. Troubleshooting: MongoDB Resources ● Disk I/O ○ Page Faults (data not in cache) ○ Swapping ● Network ○ Client API ○ Replication ○ Sharding ■ Chunk Moves ■ Mongos -> Shards 70

  50. Troubleshooting: db.currentOp() ● A function that dumps status info about running operations and various lock/execution details ● Only queries currently in progress are shown. ● Provided Query ID number can be used to kill long running queries. ● Includes ○ Original Query ○ Parsed Query ○ Query Runtime ○ Locking details ● Filter Documents ○ { "$ownOps": true } == Only show operations for the current user ○ https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples 71

  51. Troubleshooting: db.stats() ● Returns ○ Document-data size (dataSize) ○ Index-data size (indexSize) ○ Real-storage size (storageSize) ○ Average Object Size ○ Number of Indexes ○ Number of Objects 72

  52. Troubleshooting: db.currentOp() 73

  53. Troubleshooting: Log File ● Interesting details are logged to the mongod/mongos log files Slow queries ○ Storage engine details (sometimes) ○ Index operations ○ Sharding ○ Chunk moves ■ Elections / Replication ○ Authentication ○ Network ○ Connections ■ ● Errors ● Client / Inter-node connections 74

  54. Troubleshooting: Log File - Slow Query 2017-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } } keysExamined: 1 docsExamined: 1 nMatched: 1 nModified: 1 keysInserted: 1 keysDeleted: 1 numYields: 0 reslen: 604 locks: { Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol: op_command 106ms 75

  55. Troubleshooting: Operation Profiler ● Writes slow database operations to a new MongoDB collection for analysis Capped Collection “system.profile” in each database, default 1mb ○ The collection is capped, ie: profile data doesn’t last forever ○ ● Support for operationProfiling data in Percona Monitoring and Management in current future goals ● Enable operationProfiling in “slowOp” mode Start with a very high threshold and decrease it in steps ○ Usually 50-100ms is a good threshold ○ Enable in mongod.conf ○ operationProfiling: slowOpThresholdMs: 100 mode: slowOp 76

  56. Troubleshooting: Operation Profiler ● Useful Profile Metrics op/ns/query: type, namespace and query of a profile ○ keysExamined: # of index keys examined ○ docsExamined: # of docs examined to achieve result ○ writeConflicts: # of Write Concern Exceptions ○ encountered during update numYields: # of times operation yielded for others ○ locks: detailed lock statistics ○ 77

  57. Troubleshooting: .explain() ● Shows query explain plan for query cursors ● This will include ○ Winning Plan ■ Query stages ● Query stages may include sharding info in clusters ■ Index chosen by optimiser ○ Rejected Plans 78

  58. Troubleshooting: .explain() and Profiler 79

  59. Troubleshooting: Cluster Metadata ● The “config” database on Cluster Config servers ○ Use .find() queries to view Cluster Metadata ● Contains actionlog (3.0+) ○ changelog ○ databases ○ collections ○ shards ○ chunks ○ settings ○ mongos ○ locks ○ lockpings ○ 80

  60. Troubleshooting: Percona PMM QAN ● The Query Analytics tool enables DBAs and developers to analyze queries over periods of time and find performance problems. ● Helps you optimise database performance by making sure that queries are executed as expected and within the shortest time possible. ● Central, web-based location for visualising data. ● Agent collected from MongoDB Profiler (required) from agent. ● Great for reducing access to systems while proving valueable data to development teams! ● Query Normalization ○ ie:“{ item: 123456 }” -> “{ item: ##### }”. ● Command-line Equivalent: pt-mongodb-query-digest tool 81

  61. Troubleshooting: Percona PMM QAN 82

  62. Troubleshooting: mlogfilter ● A useful tool for processing mongod.log files ● A log-aware replacement for ‘grep’, ‘awk’ and friends ● Generally focus on ○ mlogfilter --scan <file> ■ Shows all collection scan queries ○ mlogfilter --slow <ms> <file> ■ Shows all queries that are slower than X milliseconds ○ mlogfilter --op <op-type> <file> ■ Shows all queries of the operation type X (eg: find, aggregate, etc) ● More on this tool here https://github.com/rueckstiess/mtools/wiki/mlogfilter 83

  63. Troubleshooting: Common Problems ● Sharding ○ removeShard Doesn’t Complete ■ Check the ‘dbsToMove’ array of the removeShard response mongos> db.adminCommand({removeShard:"test2"}) { "msg" : "draining started successfully", "state" : "started", "shard" : "test2", "note" : "you need to drop or movePrimary these databases", "dbsToMove" : [ "wikipedia" ], "ok" : 1 } ■ Why? mongos> use config switched to db config mongos> db.databases.find() { "_id" : "wikipedia", "primary" : "test2" , "partitioned" : true } 84

  64. Troubleshooting: Common Problems ● Sharding ○ removeShard Doesn’t Complete ■ Try ● Use movePrimary to move database(s) Primary-role to others ● Run the removeShard command once the shard being removed is NOT primary for any database ○ This starts ths draining of the shard ● Run the same removeShard command to check on progress. ○ If the draining and removing is complete this will respond with success ○ Jumbo Chunks ■ Will prevent balancing from occurring ■ config.chunks collection document will contain jumbo:true as a key/value pair ■ Sharding ‘split’ commands can be used to reduce the chunk size (sh.splitAt, etc) ■ https://www.percona.com/blog/2016/04/11/dealing-with-jumbo-chunks-in-mongodb/ 85

  65. Schema Design & Workflow “The problem with troubleshooting is trouble shoots back” ~ Unknown

  66. Schema Design: Data Types ● Strings ○ Only use strings if required ○ Do not store numbers as strings! ○ Look for {field:“123456”} instead of {field:123456} ■ “12345678” moved to a integer uses 25% less space ■ Range queries on proper integers is more efficient ○ Example JavaScript to convert a field in an entire collection ■ db.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.containers.update( { _id: x._id }, { $set: {itemId: itemId } } ) }); 87

  67. Schema Design: Data Types ● Strings ○ Do not store dates as strings! ■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space! ○ Do not store booleans as strings! ■ “true” -> true = 47% less space wasted ● DBRefs ○ DBRefs provide pointers to another document ○ DBRefs can be cross-collection ● NumberLong (3.4+) ○ Higher precision for floating-point numbers 88

  68. Schema Design: Indexes ● MongoDB supports BTree, text and geo indexes Default behaviour ○ ● Collection lock until indexing completes ● {background:true} Runs indexing in the background avoiding pauses ○ Hard to monitor and troubleshoot progress ○ Unpredictable performance impact ○ ● Avoid drivers that auto-create indexes ○ Use real performance data to make indexing decisions, find out before Production! ● Too many indexes hurts write performance for an entire collection ● Indexes have a forward or backward direction ○ Try to cover .sort() with index and match direction! 89

  69. Schema Design: Indexes ● Compound Indexes ○ Several fields supported ○ Fields can be in forward or backward direction ■ Consider any .sort() query options and match sort direction! ○ Composite Keys are read Left -> Right ■ Index can be partially-read ■ Left-most fields do not need to be duplicated! ■ All Indexes below are duplicates: ● {username: 1, status: 1, date: 1, count: -1} ● {username: 1, status: 1, data: 1} ● {username: 1, status: 1 } ● {username: 1 } ● Use db.collection.getIndexes() to view current Indexes 90

  70. Schema Design: Query Efficiency ● Query Efficiency Ratios ○ Index: keysExamined / nreturned ○ Document: docsExamined / nreturned ● End goal: Examine only as many Index Keys/Docs as you return! ○ Tip: when using covered indexes zero documents are fetched (docsExamined: 0) ! ○ Example: a query scanning 10 documents to return 1 has efficiency 0.1 ○ Scanning zero docs is possible if using a covered index! 91

  71. Schema Workflow ● MongoDB optimised for single-document operations ● Single Document / Centralised Greate cache/disk-footprint efficiency ○ Centralised schemas may create a hotspot for write locking ○ ● Multi Document / Decentralised ○ MongoDB rarely stores data sequentially on disk ○ Multi-document operations are less efficient ○ Less potential for hotspots/write locking ○ Increased overhead due to fan-out of updates ○ Example: Social Media status update, graph relationships, etc ○ More on this later.. 92

  72. Schema Workflow ● Read Heavy Workflow Read-heavy apps benefit from pre-computed results ○ Consider moving expensive reads computation to insert/update/delete ○ Example 1: An app does ‘count’ queries often ○ Move .count() read query to a summary document with counters ■ Increment/decrement single count value at write-time ■ Example 2: An app that does groupings of data ○ Move .aggregate() read query that is in-line to the user to a backend summary worker ■ Read from a summary collection, like a view ■ ● Write Heavy Workflow Reduce indexing as much as possible ○ Consider batching or a decentralised model with lazy updating (eg: social media ○ graph) 93

  73. Schema Workflow ● Batching Inserts/Updates Requires less network commands ○ Allows the server to do some internal batching ○ Operations will be slower overall Suited for queue worker scenarios batching many changes Traditional user-facing database traffic should aim to operate on a single (or few) document(s) ● Thread-per-connection model ○ 1 x DB operation = 1 x CPU core only ○ Executing Parallel Reads ■ Large batch queries benefit from several parallel sessions ■ Break query range or conditions into several client->server threads ■ Not recommended for Primary nodes or Secondaries with heavy reads 94

  74. Schema Workflow ● No list of fields specified in .find() ○ MongoDB returns entire documents unless fields are specified ○ Only return the fields required for an application operation! ○ Covered-index operations require only the index fields to be specified ● Using $where operators ○ This executes JavaScript with a global lock ● Many $and or $or conditions ○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently ○ Try to avoid this sort of model with ■ Data locality ■ Background Summaries / Views 95

  75. Fan Out / Fan In ● Fan-Out Systems ○ Decentralised ○ Data is eventually written in many locations ○ Complex write path (several updates) ■ Good use-case for Queue/Worker model ■ Batching possible ○ Simple read path (data locality) ● Fan-In ○ Centralised ○ Simple Write path ■ Possible Write locking ○ Complex Read Path ■ Potential for latency due to network 96

  76. Data Integrity “The problem with troubleshooting is trouble shoots back” ~ Unknown

  77. Data Integrity: `whoami` (continued) ● Very Paranoid ● Previous RDBMs ○ Online Marketing / Publishing ■ Paid for clicks coming in ■ Downtime = revenue + traffic (paid for) loss ○ Warehousing / Pricing SaaS ■ Store real items in warehouses/stores/etc ■ Downtime = many businesses (customers)/warehouses/etc at stand-still ■ Integrity problems = ● Orders shipped but not paid for 2010 ● Orders paid for but not shipped, etc ○ Moved on to Gaming, Percona ● So why MongoDB? 98

  78. Data Integrity: Storage and Journaling ● The Journal provides durability in the event of failure of the server ● Changes are written ahead to the journal for each write operation ● On crash recovery, the server Finds the last point of consistency to disk ○ Searches the journal file(s) for the record matching the checkpoint ○ Applies all changes in the journal since the last point of consistency ○ ● Journal data is stored in the ‘journal’ subdirectory of the server data path (dbPath) ● Dedicated disks for data (random I/O) and journal (sequential I/O) improve performance 99

  79. Data Integrity: Write Concern ● MongoDB Replication is Asynchronous ● Write Concerns Allow control of data integrity of a write to a Replica Set ○ ○ Write Concern Modes ■ “w: <num>” - Writes much acknowledge to defined number of nodes ■ “majority” - Writes much acknowledge on a majority of nodes ■ “<replica set tag>” - Writes acknowledge to a member with the specified replica set tags ○ Durability ■ By default write concerns are NOT durable ■ “j: true” - Optionally, wait for node(s) to acknowledge journaling of operation ■ In 3.4+ “writeConcernMajorityJournalDefault” allows enforcement of “j: true” via replica set configuration! ● Must specify “j: false” or alter “writeConcernMajorityDefault” to disable 100

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend