Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - - PowerPoint PPT Presentation

Tuning MongoDB: WiredTiger ● WT syncs data to disk in a process called “Checkpointing”: Every 60 seconds or >= 2GB data changes ○ ● In-memory buffering of Journal Journal buffer size 128kb ○ Synced every 50 ms (as of 3.2) ○ Or every change with Journaled write concern ○ In between write operations while the journal records remain in the buffer, updates can be ○ lost following a hard shutdown! 53

Tuning MongoDB: Storage Engine Caches ● WiredTiger ○ In heap ~ 50% available system memory Uncompressed WT pages ○ Filesystem Cache ~ 50% available system memory Compressed pages 54

Tuning MongoDB: Durability ● WiredTiger - Default since 3.2 ● storage.journal.enabled = <true/false> Always enable unless data is transient - default true ○ Always enable on cluster config servers ○ ● storage.journal.commitIntervalMs = <ms> Max time between journal syncs - default 100ms ○ ● storage.syncPeriodSecs = <secs> Max time between data file flushes - default 60 seconds ○ 55

Security “Think of the network like a public place” ~ Unknown

Security: Authorization ● Always enable auth on Production Installs! ● Do not use weak passwords ● Minimum access policy 57

Default Roles ● read ● hostManager ● readWrite ● backup ● dbAdmin ● restore ● dbOwner ● readAnyDatabase ● userAdmin ● readWriteAnyDatabase ● clusterAdmin ● userAdminAnyDatabase ● clusterMonitor ● dbAdminAnyDatabase ● clusterManager ● root ● __system 58

Security: Filesystem Access ● Use a service user+group ○ ‘mongod’ or ‘mongodb’ on most systems ○ Ensure data path, log file and key file(s) are owned by this user+group ● Data Path ○ Mode: 0750 ● Log File ○ Mode: 0640 ○ Contains real queries and their fields! ● Key File(s) ○ Files Include: keyFile and SSL certificates or keys ○ Mode: 0600 59

Security: Network Access ● Firewall ○ Mainly port 27017 ● Creating a dedicated network segment for Databases is recommended! ● DO NOT allow MongoDB to talk to the internet at all costs!!! 60

Security: System Access ● Recommended to restrict system access to Database Administrators ● A “shell” on a system can be enough to take the system over! 61

Security: External Authentication ● LDAP Authentication ○ Supported in PSMDB and MongoDB Enterprise 62

Security: SSL Connections and Auth ● SSL / TLS Connections ● Intra-cluster authentication with x509 63

Security: SSL Connections and Auth 64

Security: Encryption at Rest ● MongoDB Enterprise ● Percona Server for MongoDB 3.6.8-20 does have encryption at rest using keyfile in BETA ● Application-Level 65

High-Availability

High Availability - Replica Set ● Replication ○ Asynchronous ■ Write Concerns can provide pseudo-synchronous replication ■ Changelog based, using the “Oplog” ○ Maximum 50 members ○ Maximum 7 voting members ■ Use “vote:0” for members $gt 7 67

High Availability - Oplog ● Oplog ○ The “oplog.rs” capped - collection in “local” storing changes to data ○ Read by secondary members for replication ○ Written to by local node after “apply” of operation ○ Events in the oplog are idempotent ■ operations produce the same results whether applied once or multiple times to the target dataset ○ Each event in the oplog represent a single document inserted, updated, deleted ○ Oplog has a default size depending on the OS and the storage engine ■ from 3.6 the size can be change at runtime using replSetResizeOplog admin command 68

What is a Replica Set ● Group of mongod processes that maintain the same dataset ● Provide redundancy and HA ● Suggested for all production environments ● Can provide increased read capacity ● clients can send read operations to different servers ● Automatic failover ● Internals are similar (more or less) to MySQL ● async ● events are replicated reading a primary node’s collection: oplog ● Primary = Master ● Secondary = Slave 69

Replica Set: how it works 70

Replica Set: how it works 71

Replica Set: automatic failover 72

Automatic failover ● When a primary does not communicate with the other members ● electionTimeoutMillis period (10 seconds by default) ● The cluster attempts to complete the election of a new primary and resume normal operations ● The RS cannot process write operations until the election completes successfully ● The RS can continue to serve read queries if such queries are configured to run on secondaries while the primary is offline ● An eligible secondary calls for an election to nominate itself as the new primary 73

Architecture ● Datacenter Recommendations ○ Minimum of 3 x physical servers required for High-Availability ○ Ensure only 1 x member per Replica Set is on a single physical server!!! ● EC2 / Cloud Recommendations ○ Place Replica Set members in odd number of Availability Zones, same region ○ Use a hidden-secondary node for Backup and Disaster Recovery in another region ○ Entire Availability Zones have been lost before! 74

Arbiter node ● A node with no data ● Usually used to have an odd number of nodes ● Cannot be elected during failover ● Can vote during the election 75

Priority ● Priority: weight of each single node ● Defines which nodes can be elected as Primary ● Here is a typical architecture deployed on 3 data centers 76

Hidden/Delayed secondary nodes ● HIDDEN SECONDARY ○ Maintains a copy of the primary’s data ○ Invisible to client applications ○ Run Backups, Statistics or special tasks ○ Must be priority = 0 : cannot be elected as Primary but votes during election ● DELAYED SECONDARY ○ Reflects earlier state of the dataset ○ Recover from unsuccessful application upgrades and operator errors, Backups ○ Must be priority = 0 : cannot be elected as Primary but votes during election 77

More details in other sessions MongoDB HA, what can go wrong? Igor Donchovski Wed 7th 12:20PM 1:10PM @Bull 78

Quick Break and QA 15 minutes

Troubleshooting “The problem with troubleshooting is trouble shoots back” ~ Unknown

Troubleshooting: db.currentOp() ● A function that dumps status info about running operations and various lock/execution details ● Only queries currently in progress are shown. ● Provided Query ID (opid) number can be used to kill long running queries using db.killOp() ● Includes ○ Original Query ○ Parsed Query ○ Query Runtime ○ Locking details ● Filter Documents ○ { "$ownOps": true } == Only show operations for the current user ○ https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples 81

Troubleshooting: db.currentOp() 82

Troubleshooting: db.stats() ● Returns ○ Document-data size (dataSize) ○ Index-data size (indexSize) ○ Real-storage size (storageSize) ○ Average Object Size ○ Number of Indexes ○ Number of Objects ○ https://docs.mongodb.com/manual/reference/method/db.stats/ 83

Troubleshooting: db.stats() 84

Troubleshooting: Log File ● Interesting details are logged to the mongod/mongos log files Slow queries ○ Storage engine details (sometimes) ○ Index operations ○ Sharding ○ Chunk moves ■ Elections / Replication ○ Authentication ○ Network ○ Connections ■ ● Errors ● Client / Inter-node connections ● The log could be really verbose ○ verbosity can be controlled using db.setLogLevel() ○ https://docs.mongodb.com/manual/reference/method/db.setLogLevel/ 85

Troubleshooting: Log File - Slow Query 2018-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } } keysExamined: 1 docsExamined: 1 nMatched: 1 nModified: 1 keysInserted: 1 keysDeleted: 1 numYields: 0 reslen: 604 locks: { Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol: op_command 106ms 86

Troubleshooting: Operation Profiler ● Writes slow database operations to a new MongoDB collection for analysis Capped Collection “system.profile” in each database, default 1MB ○ The collection is capped, ie: profile data doesn’t last forever ○ ● Support for operationProfiling data in PMM ● Enable operationProfiling in “slowOp” mode Start with a very high threshold and decrease it in steps ○ Usually 50-100ms is a good threshold ○ Enable in mongod.conf ○ operationProfiling: slowOpThresholdMs: 100 mode: slowOp 87

Troubleshooting: Operation Profiler ● Useful Profile Metrics op/ns/query: type, namespace and query of a profile ○ keysExamined: # of index keys examined ○ docsExamined: # of docs examined to achieve result ○ writeConflicts: # of Write Concern Exceptions ○ encountered during update numYields: # of times operation yielded for others ○ locks: detailed lock statistics ○ 88

Troubleshooting: .explain() ● Shows query explain plan for query cursors ● This will include ○ Winning Plan ■ Query stages ● Query stages may include sharding info in clusters ■ Index chosen by optimiser ○ Rejected Plans 89

Troubleshooting: mlogfilter ● A useful tool for processing mongod.log files ● A log- aware replacement for ‘grep’, ‘awk’ and friends ● Generally focus on ○ mlogfilter --scan <file> ■ Shows all collection scan queries ○ mlogfilter --slow <ms> <file> ■ Shows all queries that are slower than X milliseconds ○ mlogfilter --op <op-type> <file> ■ Shows all queries of the operation type X (eg: find, aggregate, etc) ● More on this tool here https://github.com/rueckstiess/mtools 91

Troubleshooting: mongostat ● Status of the current workload of a running mongod instance ● Useful after delivering a new application or for investigating ongoing unusual behavior ● By default it provides metrics every 1 second ○ number of inserted/updated/deleted/read documents ○ percentage of WiredTiger cache in use/dirty ○ number of flushes to disk ○ inbound/outbound traffic https://docs.mongodb.com/manual/reference/program/mongostat/ 92

Troubleshooting: mongotop ● It tracks the time spent for reading/writing data ● Statistics on a per-collection level ● Useful to have a good idea of which are the collections that are taking more time to execute reads and writes ● By default it provides metrics every 1 second https://docs.mongodb.com/manual/reference/program/mongotop/ 93

Schema Design “The problem with troubleshooting is trouble shoots back” ~ Unknown

Schema Design: Data Types ● Strings ○ Only use strings if required ○ Do not store numbers as strings! ○ Look for {field:“123456”} instead of {field:123456} ■ “12345678” moved to a integer uses 25% less space ■ Range queries on proper integers is more efficient ○ Example JavaScript to convert a field in an entire collection ■ db.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.containers.update( { _id: x._id }, { $set: {itemId: itemId } } ) }); ○ Do not store dates as strings! ■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space! ○ Do not store booleans as strings! ■ “true” -> true = 47% less space wasted 95

Schema Design: Indexes ● MongoDB supports BTree, text and geo indexes ● Collection lock until indexing completes ○ index creation is a really heavy task ● Avoid drivers that auto-create indexes ○ Use real performance data to make indexing decisions, find out before Production! ● Too many indexes hurts write performance for an entire collection ○ Index entries must be maintained for any insert/update/delete ● Indexes have a forward or backward direction ○ Try to cover .sort() with index and match direction! 96

non-blocking Index creation ● db.collection.createIndex supports {background:true} option ○ index creation doesn’t lock the collection ○ the collection can be used by other queries ○ index creation takes longer than foreground creation ○ unpredictable performance ! ● Using the Replica Set an index can be created using the following procedure ○ detach a SECONDARY from the RS and create the index in foreground, the reconnect to the RS ○ repeat for all the SECONDARY nodes ○ at last detach the PRIMARY ■ wait fo the election and detach the node when SECONDARY ■ create the foreground index ■ reconnect the node to the RS 97

Schema Design: Indexes ● Compound Indexes ○ Several fields supported ○ Fields can be in forward or backward direction ■ Consider any .sort() query options and match sort direction! ○ Composite Keys are read Left -> Right ■ Index can be partially-read ■ Left-most fields do not need to be duplicated! ■ All Indexes below are duplicates: ● {username: 1, status: 1, date: 1, count: -1} ● {username: 1, status: 1, data: 1} ● {username: 1, status: 1 } ● {username: 1 } ■ Duplicate indexes must be dropped ● Use db.collection.getIndexes() to view current Indexes 98

Schema Workflow ● Read Heavy Workflow Read-heavy apps benefit from pre-computed results ○ Consider moving expensive reads computation to insert/update/delete ○ Example 1: An app does ‘count’ queries often ○ Move .count() read query to a summary document with counters ■ Increment/decrement single count value at write-time ■ Example 2: An app that does groupings of data ○ Move .aggregate() read query that is in-line to the user to a backend summary worker ■ Read from a summary collection, like a view ■ ● Write Heavy Workflow Reduce indexing as much as possible ○ Consider batching or a decentralised model with lazy updating (eg: social media graph) ○ 101

Schema Workflow ● No list of fields specified in .find() ○ MongoDB returns entire documents unless fields are specified ○ Only return the fields required for an application operation! ○ Covered-index operations require only the index fields to be specified ● Many $and or $or conditions ○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently ○ Try to avoid this sort of model with ■ Data locality ■ Background Summaries / Views 102

More details in other sessions MongoDB Sharding 101 Adamo Tonete Tuesday 6th Nov 4:30PM 5:20PM @Bull 124

Multi-document ACID Transactions

Multi-document ACID transactions ● New in 4.0 ● Writes on multiple documents in different collections can be included in a single transaction ● ACID properties are supported ○ atomicity ○ consistency ○ isolation ○ durability ● Through snapshot isolation, transactions provide a consistent view of data, and enforce all-or-nothing execution to maintain data integrity ● Available for Replica Set and WiredTiger storage engine only ○ in order to use transactions on a standalone server you need to start Replica Set ○ transaction support for sharded cluster is scheduled for 4.2 128

Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - - PowerPoint PPT Presentation

Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - 12:00 PM Bull About us 4 Agenda Hardware and OS configuration MongoDB in Production Backups and Monitoring Q&A 5 Terminology Data Document: single *SON

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

Fast, Reliable, Secure, Affordable MongoDB on AWS EC2 Paul Marcelin Why are we here today?

Deploying And Supporting Perl 6 Jonathan Worthington UKUUG Spring 2007 Conference Deploying And

Introduction to MongoDB Kristina Chodorow kristina@mongodb.org Application PHP Apache

WEATHERIZATION & COVID 19: NOW WHAT? AMANDA MARCOTT-THOTTUNKAL SENIOR ENERGY PROGRAMS MANAGER

STAR and PHENIX Amol Jaikar Production STAR Production BirdView (Monitoring

Investing in U.S Innovation Ecosystem Sridhar Kota Executive Director, MForesight: Alliance for

Sociology of critique 1.From critical sociology to a sociology of critique JSM477 The core

Hospital Productivity U S C - B R O O K I N G S S C H A E F F E R I N I T I A T I V E F O R H

How to Fix Your Productivity to Amp Up Your Results @fundraiserchad Who is this guy? And why

How to increase productivity in Microsoft Dynamics CRM Ways to Increase Productivity in Dynamics

An Analytical Placer for Mixed-Size 3D Placement Jason Cong 1,2 and Guojie Luo 1 1 University of

Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - - PowerPoint PPT Presentation

Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - 12:00 PM Bull About us 4 Agenda Hardware and OS configuration MongoDB in Production Backups and Monitoring Q&A 5 Terminology Data Document: single *SON

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for

MongoDB Thomas Schwarz, SJ MongoDB History 2007 Developed by 10gen as a Platform as a Service

MongoDB Building data model with MongoDB and Mongoose MVC Pattern Connect Express app to

MongoDB Sharding 101 Agenda What is MongoDB? Single Instances Replica-set

Everything You Know About MongoDB is Wrong (Probably) Mark Smith | MongoDB | @Judy2K Myth 0

External Authentication with Percona Server for MongoDB and MongoDB Enterprise Jason Terpko DBA

1. Instillations o https://www.mongodb.com/download-center/community 2. Download and Install

Your First MongoDB Environment: What You Should Know Before Choosing MongoDB as Your Database Me

Information Retrieval in MongoDB Data storage, Indexing and Querying Kaustubh Dhokte (NB97699)

MongoDB Backups, All Grown up! David Murphy David Murphy MongoDB Practice Manager for Percona

What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM -

MongoDB and Java 8 Agenda Java8 Main Features MongoDB + Java8 Few Examples RX Driver 3 Java

Geospatial and MongoDB MongoDB Geospatial Features Agenda Query Examples Optimizations 2

Fast, Reliable, Secure, Affordable MongoDB on AWS EC2 Paul Marcelin Why are we here today?

Deploying And Supporting Perl 6 Jonathan Worthington UKUUG Spring 2007 Conference Deploying And

Introduction to MongoDB Kristina Chodorow kristina@mongodb.org Application PHP Apache

WEATHERIZATION &amp; COVID 19: NOW WHAT? AMANDA MARCOTT-THOTTUNKAL SENIOR ENERGY PROGRAMS MANAGER

STAR and PHENIX Amol Jaikar Production STAR Production BirdView (Monitoring

Investing in U.S Innovation Ecosystem Sridhar Kota Executive Director, MForesight: Alliance for

Sociology of critique 1.From critical sociology to a sociology of critique JSM477 The core

Hospital Productivity U S C - B R O O K I N G S S C H A E F F E R I N I T I A T I V E F O R H

How to Fix Your Productivity to Amp Up Your Results @fundraiserchad Who is this guy? And why

How to increase productivity in Microsoft Dynamics CRM Ways to Increase Productivity in Dynamics

An Analytical Placer for Mixed-Size 3D Placement Jason Cong 1,2 and Guojie Luo 1 1 University of

WEATHERIZATION & COVID 19: NOW WHAT? AMANDA MARCOTT-THOTTUNKAL SENIOR ENERGY PROGRAMS MANAGER