Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - - - PowerPoint PPT Presentation

deploying mongodb in production
SMART_READER_LITE
LIVE PREVIEW

Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - - - PowerPoint PPT Presentation

Deploying MongoDB in Production Monday, November 5, 2018 9:00 AM - 12:00 PM Bull About us 4 Agenda Hardware and OS configuration MongoDB in Production Backups and Monitoring Q&A 5 Terminology Data Document: single *SON


slide-1
SLIDE 1

Deploying MongoDB in Production

Monday, November 5, 2018 9:00 AM - 12:00 PM Bull

slide-2
SLIDE 2

4

About us

slide-3
SLIDE 3

Agenda

  • Hardware and OS configuration
  • MongoDB in Production
  • Backups and Monitoring
  • Q&A

5

slide-4
SLIDE 4

6

  • Data

Document: single *SON object, often nested ○ Field: single field in a document ○ Collection: grouping of documents ○ Database: grouping of collections ○ Capped Collection: A fixed-size FIFO collection

  • Replication

○ Oplog: A special capped collection for replication ○ Primary: A replica set node that can receive writes ○ Secondary: A replica of the Primary that is read-only ○ Voting: A process to elect a Primary node ○ Hidden-Secondary: A replica that cannot become Primary

Terminology

slide-5
SLIDE 5

Common database architecture

slide-6
SLIDE 6

On site (local DC)

  • Buy the biggest machine possible with a lot of fancy disks and memory.
  • Retire the equipment 5 years later because they are outdated.
  • Open source was available but also proprietary softwares were very present in

companies.

  • Buy fiber links, more than one for safety.
  • Huge and heavy UPS
  • Routers configuration

8

slide-7
SLIDE 7

On the public cloud

  • Rent or reserve a machine according to the needs.
  • No upfront investment
  • No huge hydro bills
  • No need to setup cables and buy link

At the end someone else did that for you.

9

slide-8
SLIDE 8

On the private cloud

  • Considerable upfront
  • Still need to configure the hardware and buy links and disks.
  • Abstraction layer to configure virtual machines usually using a platform.

10

slide-9
SLIDE 9

Beyond the cloud

  • Use the database as a service.
  • Scale up and down with a few commands
  • Doesn't matter where the service is running anymore
  • Docker/Mesosphere/Kubernetes and a lot more...

11

slide-10
SLIDE 10

Hardware configuration

slide-11
SLIDE 11

Hardware configuration

  • Not all of those options are available in cloud, private and services.

13

slide-12
SLIDE 12

Disks

slide-13
SLIDE 13

Disks

  • Crucial resource to any database or system.
  • Databases needs disk to persist the data.
  • Options available:

Magnetic disks SSD/nvMe

15

slide-14
SLIDE 14
  • What is RAID?
  • If using RAID prefer 1+0
  • Avoid RAID5 and RAID6 (write performance penalty)

Disk Configuration

16

slide-15
SLIDE 15

Disks - Configuration

Performance: High Redundancy: None Overhead/parity : None

17

slide-16
SLIDE 16

Disks - Configuration

Performance: Low Redundancy: Yes Overhead/parity : Yes

18

slide-17
SLIDE 17

Disks - Configuration

Performance: High Redundancy: None Overhead/parity : Yes

19

slide-18
SLIDE 18

Disks - Configuration

Performance: High Redundancy: Yes Overhead/parity : Yes

source: http://www.icc-usa.com/raid-calculator.html

20

slide-19
SLIDE 19

Disks - Configuration

For cloud services local storage offers best throughput/price but remember, once the machine is restarted, all the data is erased. It is not common to see RAID 10 in cloud environments, replica-sets keeps the same data across different nodes and in case of failure all we need is to start a new box.

21

slide-20
SLIDE 20

Good Practices:

  • Use different disk for the data folder.
  • If possible move the journal to a different disk.
  • SSD will give a better performance than spinning disks.
  • If using EBS consider io2 family to guarantee IOPS

Disks - Configuration

22

slide-21
SLIDE 21

Warnings:

  • EBS without PIOPS may reach out the IOPS limits in the middle of the

business day slowing down the application.

  • Do not share the same disks for replica-sets in a storage.
  • NFS/remote disks will slow down the database and tend to have more issues

than local/fiber connected disks

Disks - Configuration

23

slide-22
SLIDE 22

Disk Scheduler

  • Disk scheduler may affect the database performance.
  • Most common disk schedulers are:

NOOP DEADLINE CFQ

24

slide-23
SLIDE 23

Disk Scheduler - NOOP

First Come First Serve

25

slide-24
SLIDE 24

Disk Scheduler - Deadline

Reads have priority. Writes are queued as that can happen asynchronous. This scheduler try to speed up reads as the application may need the data to return results to the client.

26

slide-25
SLIDE 25

Disk Scheduler - Complete Fair Queue

Time slide per process

27

slide-26
SLIDE 26

Disk Scheduler

28

slide-27
SLIDE 27

Disk Filesystems

  • Filesystem Types

Use XFS or EXT4

Use XFS only on WiredTiger

EXT4 “data=ordered” mode recommended

Btrfs not tested, yet!

  • Filesystem Options

Set ‘noatime’ on MongoDB data volumes in ‘/etc/fstab’:

○ ○

Remount the filesystem after an options change, or reboot

29

slide-28
SLIDE 28

Disk Readahead

  • Spinning disks may be slow to read data
  • Setting readahead may improve the read performance at the cost of may

loading data into memory that will never be used

  • We recommend '32' blocks read-ahead (16k)

30

slide-29
SLIDE 29

Disk Readahead

  • Change ReadAhead adding file to ‘/etc/udev/rules.d’

/etc/udev/rules.d/60-mongodb-disk.rules:

# set deadline scheduler and 32/16kb read-ahead for /dev/sda ACTION=="add|change", KERNEL=="sda", ATTR{queue/scheduler}="deadline", ATTR{bdi/read_ahead_kb}="16"

  • Or

sudo blockdev --getra and sudo blockdev --setra

31

slide-30
SLIDE 30

Processor

slide-31
SLIDE 31

Hardware: CPUs

  • Cores vs Core Speed

○ Faster cores doesn't necessarily mean a faster database. ○ Almost all the databases takes advantage of multi cores for a good performance.

33

slide-32
SLIDE 32
  • CPU Frequency Scaling - Power Save
  • Check if the database server is not configured with power saving profile.
  • This configuration may change the processor performance and also the

database performance.

Hardware: CPUs - Checks

34

slide-33
SLIDE 33

Hardware: CPUs - vCPU

  • If using virtual machines check the vCPU speed.
  • Some public cloud set the maximum speed to something like 1.2GHZ
  • Avoid overcommitment of resources, memory and CPU

35

slide-34
SLIDE 34

Memory

slide-35
SLIDE 35

Tuning Linux: NUMA

  • A memory architecture that takes into account the locality of memory, caches

and CPUs for lower latency

  • MongoDB codebase is not NUMA “aware”, causing unbalanced memory

allocations on NUMA systems

  • Disable NUMA

In the Server BIOS

Using ‘numactl’ in init scripts BEFORE ‘mongod’ command (recommended for future compatibility):

numactl --interleave=all /usr/bin/mongod <other flags>

37

slide-36
SLIDE 36

Tuning Linux: NUMA

38

slide-37
SLIDE 37

Tuning Linux: Transparent HugePages

  • Introduced in RHEL/CentOS 6, Linux 2.6.38+
  • Merges memory pages in background (Khugepaged process)
  • Decreases overall performance when used with MongoDB!
  • “AnonHugePages” in /proc/meminfo shows usage
  • Disable TransparentHugePages!
  • Add “transparent_hugepage=never” to kernel command-line (GRUB)

39

slide-38
SLIDE 38

Kernel

slide-39
SLIDE 39

Tuning Linux: The Linux Kernel

  • Linux 2.6.x?
  • Avoid Linux earlier than 3.10.x - 3.12.x
  • Large improvements in parallel efficiency in 3.10+ (for Free!)

More: https://blog.2ndquadrant.com/postgresql-vs-kernel-versions/

41

slide-40
SLIDE 40
  • Allows per-Linux-user resource constraints

○ Number of User-level Processes ○ Number of Open Files ○ CPU Seconds ○ Scheduling Priority

  • MongoDB

○ Should probably have a dedicated VM, container or server ○ Creates a new process ○ Creates an open file for each active data file on disk

Tuning Linux: Ulimit

42

slide-41
SLIDE 41

Tuning Linux: Swappiness

  • A Linux kernel sysctl setting for preferring RAM or disk for swap

Linux default: 60

To avoid disk-based swap: 1 (not zero!)

To allow some disk-based swap: 10

‘0’ can cause more swapping than ‘1’ on recent kernels

More on this here: https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/

43

slide-42
SLIDE 42

Tuning Linux: Ulimit

  • Setting ulimits

○ /etc/security/limits.d file ○ Systemd Service ○ Init script

  • Ulimits are set by Percona and MongoDB packages!

○ Example on left: PSMDB RPM (Systemd)

44

slide-43
SLIDE 43

Tuning Linux: Time Source

  • Replication and Clustering needs consistent clocks (before 3.6)

○ mongodb_consistent_backup relies on time sync, for example!

  • Use a consistent time source/server

○ “It’s ok if everyone is equally wrong”

  • Non-Virtualized

○ Run NTP daemon on all MongoDB and Monitoring hosts ○ Enable service so it starts on reboot

  • Virtualised

○ Check if your VM platform has an “agent” syncing time ○ VMWare and Xen are known to have their own time sync ○ If no time sync provided install NTP daemon

45

slide-44
SLIDE 44

Tuning Linux: Time Source

46

slide-45
SLIDE 45

Network

slide-46
SLIDE 46

Tuning Linux: Network Stack

  • Defaults are not good for > 100mbps Ethernet
  • Suggested starting point:
  • Set Network Tunings:

○ Add the above sysctl tunings to /etc/sysctl.conf ○ Run “/sbin/sysctl -p” as root to set the tunings ○ Run “/sbin/sysctl -a” to verify the changes

  • Listen Backlog = 128 (Mongo Parameter)

48

slide-47
SLIDE 47

Hardware: Network Infrastructure

  • Datacenter Tiers

○ Network Edge ○ Public Server VLAN ○ Backend Server VLAN ○ Data VLAN

49

slide-48
SLIDE 48

Hardware: Network Infrastructure

  • Network Fabric

○ Try to use 10GBe for low latency ○ Use Jumbo Frames for efficiency ○ Try to keep all MongoDB nodes on the same segment

  • Outbound / Public Access

○ Databases don’t need to talk to the internet*

  • Cloud?

○ Try to replicate the above with features of your provider

50

slide-49
SLIDE 49

Tuning Linux: More on this...

https://www.percona.com/blog/2016/08/12/tuning-linux-for-mongodb/

51

slide-50
SLIDE 50

Storage Engine and Installation

slide-51
SLIDE 51

Tuning MongoDB: WiredTiger

  • WT syncs data to disk in a process called “Checkpointing”:

Every 60 seconds or >= 2GB data changes

  • In-memory buffering of Journal

Journal buffer size 128kb

Synced every 50 ms (as of 3.2)

Or every change with Journaled write concern

In between write operations while the journal records remain in the buffer, updates can be lost following a hard shutdown!

53

slide-52
SLIDE 52

Tuning MongoDB: Storage Engine Caches

  • WiredTiger

○ In heap

~ 50% available system memory Uncompressed WT pages

○ Filesystem Cache

~ 50% available system memory Compressed pages

54

slide-53
SLIDE 53

Tuning MongoDB: Durability

55

  • WiredTiger - Default since 3.2
  • storage.journal.enabled = <true/false>

Always enable unless data is transient - default true

Always enable on cluster config servers

  • storage.journal.commitIntervalMs = <ms>

Max time between journal syncs - default 100ms

  • storage.syncPeriodSecs = <secs>

Max time between data file flushes - default 60 seconds

slide-54
SLIDE 54

Security

“Think of the network like a public place” ~ Unknown

slide-55
SLIDE 55

Security: Authorization

  • Always enable auth on Production Installs!
  • Do not use weak passwords
  • Minimum access policy

57

slide-56
SLIDE 56

Default Roles

  • read
  • readWrite
  • dbAdmin
  • dbOwner
  • userAdmin
  • clusterAdmin
  • clusterMonitor
  • clusterManager
  • hostManager
  • backup
  • restore
  • readAnyDatabase
  • readWriteAnyDatabase
  • userAdminAnyDatabase
  • dbAdminAnyDatabase
  • root
  • __system

58

slide-57
SLIDE 57

Security: Filesystem Access

  • Use a service user+group

○ ‘mongod’ or ‘mongodb’ on most systems ○ Ensure data path, log file and key file(s) are owned by this user+group

  • Data Path

○ Mode: 0750

  • Log File

○ Mode: 0640 ○ Contains real queries and their fields!

  • Key File(s)

○ Files Include: keyFile and SSL certificates or keys ○ Mode: 0600

59

slide-58
SLIDE 58

Security: Network Access

  • Firewall

○ Mainly port 27017

  • Creating a dedicated network segment for Databases is recommended!
  • DO NOT allow MongoDB to talk to the internet at all costs!!!

60

slide-59
SLIDE 59

Security: System Access

  • Recommended to restrict system access to Database Administrators
  • A “shell” on a system can be enough to take the system over!

61

slide-60
SLIDE 60

Security: External Authentication

  • LDAP Authentication

○ Supported in PSMDB and MongoDB Enterprise

62

slide-61
SLIDE 61

Security: SSL Connections and Auth

  • SSL / TLS Connections
  • Intra-cluster authentication with x509

63

slide-62
SLIDE 62

Security: SSL Connections and Auth

64

slide-63
SLIDE 63

Security: Encryption at Rest

  • MongoDB Enterprise
  • Percona Server for MongoDB

3.6.8-20 does have encryption at rest using keyfile in BETA

  • Application-Level

65

slide-64
SLIDE 64

High-Availability

slide-65
SLIDE 65

High Availability - Replica Set

  • Replication

○ Asynchronous

■ Write Concerns can provide pseudo-synchronous replication ■ Changelog based, using the “Oplog”

○ Maximum 50 members ○ Maximum 7 voting members

■ Use “vote:0” for members $gt 7

67

slide-66
SLIDE 66

High Availability - Oplog

  • Oplog

○ The “oplog.rs” capped-collection in “local” storing changes to data ○ Read by secondary members for replication ○ Written to by local node after “apply” of operation ○ Events in the oplog are idempotent

  • perations produce the same results whether applied once or multiple times to the target

dataset

○ Each event in the oplog represent a single document inserted, updated, deleted ○ Oplog has a default size depending on the OS and the storage engine

■ from 3.6 the size can be change at runtime using replSetResizeOplog admin command

68

slide-67
SLIDE 67

What is a Replica Set

  • Group of mongod processes that maintain the same dataset
  • Provide redundancy and HA
  • Suggested for all production environments
  • Can provide increased read capacity
  • clients can send read operations to different servers
  • Automatic failover
  • Internals are similar (more or less) to MySQL
  • async
  • events are replicated reading a primary node’s collection: oplog
  • Primary = Master
  • Secondary = Slave

69

slide-68
SLIDE 68

Replica Set: how it works

70

slide-69
SLIDE 69

Replica Set: how it works

71

slide-70
SLIDE 70

Replica Set: automatic failover

72

slide-71
SLIDE 71

Automatic failover

  • When a primary does not communicate with the other members
  • electionTimeoutMillis period (10 seconds by default)
  • The cluster attempts to complete the election of a new primary and

resume normal operations

  • The RS cannot process write operations until the election completes

successfully

  • The RS can continue to serve read queries if such queries are

configured to run on secondaries while the primary is offline

  • An eligible secondary calls for an election to nominate itself as the new

primary

73

slide-72
SLIDE 72

Architecture

  • Datacenter Recommendations

○ Minimum of 3 x physical servers required for High-Availability ○ Ensure only 1 x member per Replica Set is on a single physical server!!!

  • EC2 / Cloud Recommendations

○ Place Replica Set members in odd number of Availability Zones, same region ○ Use a hidden-secondary node for Backup and Disaster Recovery in another region ○ Entire Availability Zones have been lost before!

74

slide-73
SLIDE 73

Arbiter node

  • A node with no data
  • Usually used to have an odd number of

nodes

  • Cannot be elected during failover
  • Can vote during the election

75

slide-74
SLIDE 74

Priority

  • Priority: weight of each single node
  • Defines which nodes can be elected as Primary
  • Here is a typical architecture deployed on 3 data centers

76

slide-75
SLIDE 75

Hidden/Delayed secondary nodes

  • HIDDEN SECONDARY

○ Maintains a copy of the primary’s data ○ Invisible to client applications ○ Run Backups, Statistics or special tasks ○ Must be priority = 0 : cannot be elected as Primary but votes during election

  • DELAYED SECONDARY

○ Reflects earlier state of the dataset ○ Recover from unsuccessful application upgrades and operator errors, Backups ○ Must be priority = 0 : cannot be elected as Primary but votes during election

77

slide-76
SLIDE 76

More details in other sessions

MongoDB HA, what can go wrong? Igor Donchovski Wed 7th 12:20PM 1:10PM @Bull

78

slide-77
SLIDE 77

Quick Break and QA

15 minutes

slide-78
SLIDE 78

Troubleshooting

“The problem with troubleshooting is trouble shoots back” ~ Unknown

slide-79
SLIDE 79

Troubleshooting: db.currentOp()

  • A function that dumps status info about running operations and various

lock/execution details

  • Only queries currently in progress are shown.
  • Provided Query ID (opid) number can be used to kill long running queries using

db.killOp()

  • Includes

○ Original Query ○ Parsed Query ○ Query Runtime ○ Locking details

  • Filter Documents

○ { "$ownOps": true } == Only show operations for the current user ○ https://docs.mongodb.com/manual/reference/method/db.currentOp/#examples

81

slide-80
SLIDE 80

Troubleshooting: db.currentOp()

82

slide-81
SLIDE 81

Troubleshooting: db.stats()

  • Returns

○ Document-data size (dataSize) ○ Index-data size (indexSize) ○ Real-storage size (storageSize) ○ Average Object Size ○ Number of Indexes ○ Number of Objects ○ https://docs.mongodb.com/manual/reference/method/db.stats/

83

slide-82
SLIDE 82

Troubleshooting: db.stats()

84

slide-83
SLIDE 83

Troubleshooting: Log File

  • Interesting details are logged to the mongod/mongos log files

Slow queries

Storage engine details (sometimes)

Index operations

Sharding

Chunk moves

Elections / Replication

Authentication

Network

Connections

  • Errors
  • Client / Inter-node connections
  • The log could be really verbose

○ verbosity can be controlled using db.setLogLevel() ○ https://docs.mongodb.com/manual/reference/method/db.setLogLevel/

85

slide-84
SLIDE 84

Troubleshooting: Log File - Slow Query

2018-09-19T20:58:03.896+0200 I COMMAND [conn175] command config.locks appName: "MongoDB Shell" command: findAndModify { findAndModify: "locks", query: { ts: ObjectId('59c168239586572394ae37ba') }, update: { $set: { state: 0 } }, writeConcern: { w: "majority", wtimeout: 15000 }, maxTimeMS: 30000 } planSummary: IXSCAN { ts: 1 } update: { $set: { state: 0 } } keysExamined:1 docsExamined:1 nMatched:1 nModified:1 keysInserted:1 keysDeleted:1 numYields:0 reslen:604 locks:{ Global: { acquireCount: { r: 2, w: 2 } }, Database: { acquireCount: { w: 2 } }, Collection: { acquireCount: { w: 1 } }, Metadata: { acquireCount: { w: 1 } }, oplog: { acquireCount: { w: 1 } } } protocol:op_command 106ms

86

slide-85
SLIDE 85

Troubleshooting: Operation Profiler

  • Writes slow database operations to a new MongoDB collection for analysis

Capped Collection “system.profile” in each database, default 1MB

The collection is capped, ie: profile data doesn’t last forever

  • Support for operationProfiling data in PMM
  • Enable operationProfiling in “slowOp” mode

Start with a very high threshold and decrease it in steps

Usually 50-100ms is a good threshold

Enable in mongod.conf

  • perationProfiling:

slowOpThresholdMs: 100 mode: slowOp

87

slide-86
SLIDE 86

Troubleshooting: Operation Profiler

  • Useful Profile Metrics

  • p/ns/query: type, namespace and query of a profile

keysExamined: # of index keys examined

docsExamined: # of docs examined to achieve result

writeConflicts: # of Write Concern Exceptions encountered during update

numYields: # of times operation yielded for others

locks: detailed lock statistics

88

slide-87
SLIDE 87

Troubleshooting: .explain()

  • Shows query explain plan for query cursors
  • This will include

○ Winning Plan

■ Query stages

  • Query stages may include sharding info in clusters

■ Index chosen by optimiser

○ Rejected Plans

89

slide-88
SLIDE 88

Troubleshooting: mlogfilter

  • A useful tool for processing mongod.log files
  • A log-aware replacement for ‘grep’, ‘awk’ and friends
  • Generally focus on

○ mlogfilter --scan <file> ■

Shows all collection scan queries

○ mlogfilter --slow <ms> <file> ■

Shows all queries that are slower than X milliseconds

○ mlogfilter --op <op-type> <file> ■

Shows all queries of the operation type X (eg: find, aggregate, etc)

  • More on this tool here

https://github.com/rueckstiess/mtools

91

slide-89
SLIDE 89

Troubleshooting: mongostat

  • Status of the current workload of a running mongod instance
  • Useful after delivering a new application or for investigating ongoing unusual

behavior

  • By default it provides metrics every 1 second

○ number of inserted/updated/deleted/read documents ○ percentage of WiredTiger cache in use/dirty ○ number of flushes to disk ○ inbound/outbound traffic

https://docs.mongodb.com/manual/reference/program/mongostat/

92

slide-90
SLIDE 90

Troubleshooting: mongotop

  • It tracks the time spent for reading/writing data
  • Statistics on a per-collection level
  • Useful to have a good idea of which are the collections that are taking more

time to execute reads and writes

  • By default it provides metrics every 1 second

https://docs.mongodb.com/manual/reference/program/mongotop/

93

slide-91
SLIDE 91

Schema Design

“The problem with troubleshooting is trouble shoots back” ~ Unknown

slide-92
SLIDE 92

Schema Design: Data Types

  • Strings

○ Only use strings if required ○ Do not store numbers as strings! ○ Look for {field:“123456”} instead of {field:123456}

■ “12345678” moved to a integer uses 25% less space ■ Range queries on proper integers is more efficient

○ Example JavaScript to convert a field in an entire collection

■ db.items.find().forEach(function(x) { newItemId = parseInt(x.itemId); db.containers.update( { _id: x._id }, { $set: {itemId: itemId } } ) });

○ Do not store dates as strings!

■ The field "2017-08-17 10:00:04 CEST" stores in 52.5% less space!

○ Do not store booleans as strings!

■ “true” -> true = 47% less space wasted

95

slide-93
SLIDE 93

Schema Design: Indexes

  • MongoDB supports BTree, text and geo indexes
  • Collection lock until indexing completes

○ index creation is a really heavy task

  • Avoid drivers that auto-create indexes

○ Use real performance data to make indexing decisions, find out before Production!

  • Too many indexes hurts write performance for an entire collection

○ Index entries must be maintained for any insert/update/delete

  • Indexes have a forward or backward direction

○ Try to cover .sort() with index and match direction!

96

slide-94
SLIDE 94

non-blocking Index creation

  • db.collection.createIndex supports {background:true} option

○ index creation doesn’t lock the collection ○ the collection can be used by other queries ○ index creation takes longer than foreground creation ○ unpredictable performance !

  • Using the Replica Set an index can be created using the following procedure

○ detach a SECONDARY from the RS and create the index in foreground, the reconnect to the RS ○ repeat for all the SECONDARY nodes ○ at last detach the PRIMARY

■ wait fo the election and detach the node when SECONDARY ■ create the foreground index ■ reconnect the node to the RS

97

slide-95
SLIDE 95

Schema Design: Indexes

  • Compound Indexes

○ Several fields supported ○ Fields can be in forward or backward direction

■ Consider any .sort() query options and match sort direction!

○ Composite Keys are read Left -> Right

■ Index can be partially-read ■ Left-most fields do not need to be duplicated! ■ All Indexes below are duplicates:

  • {username: 1, status: 1, date: 1, count: -1}
  • {username: 1, status: 1, data: 1}
  • {username: 1, status: 1 }
  • {username: 1 }

■ Duplicate indexes must be dropped

  • Use db.collection.getIndexes() to view current Indexes

98

slide-96
SLIDE 96

Schema Workflow

  • Read Heavy Workflow

Read-heavy apps benefit from pre-computed results

Consider moving expensive reads computation to insert/update/delete

Example 1: An app does ‘count’ queries often

Move .count() read query to a summary document with counters

Increment/decrement single count value at write-time

Example 2: An app that does groupings of data

Move .aggregate() read query that is in-line to the user to a backend summary worker

Read from a summary collection, like a view

  • Write Heavy Workflow

Reduce indexing as much as possible

Consider batching or a decentralised model with lazy updating (eg: social media graph)

101

slide-97
SLIDE 97

Schema Workflow

  • No list of fields specified in .find()

○ MongoDB returns entire documents unless fields are specified ○ Only return the fields required for an application operation! ○ Covered-index operations require only the index fields to be specified

  • Many $and or $or conditions

○ MongoDB (or any RDBMS) doesn’t handle large lists of $and or $or efficiently ○ Try to avoid this sort of model with

■ Data locality ■ Background Summaries / Views

102

slide-98
SLIDE 98

More details in other sessions

MongoDB Sharding 101 Adamo Tonete Tuesday 6th Nov 4:30PM 5:20PM @Bull

124

slide-99
SLIDE 99

Multi-document ACID Transactions

slide-100
SLIDE 100

Multi-document ACID transactions

  • New in 4.0
  • Writes on multiple documents in different collections can be included in a single

transaction

  • ACID properties are supported

○ atomicity ○ consistency ○ isolation ○ durability

  • Through snapshot isolation, transactions provide a consistent view of data, and

enforce all-or-nothing execution to maintain data integrity

  • Available for Replica Set and WiredTiger storage engine only

○ in order to use transactions on a standalone server you need to start Replica Set ○ transaction support for sharded cluster is scheduled for 4.2

128

slide-101
SLIDE 101

Limitations

  • A collection MUST exists in order to use transactions
  • A collection cannot be created or dropped inside a transaction
  • An index cannot be created or dropped inside a transaction
  • Non-CRUD operations cannot be used inside a transaction; for example stuffs

like createUser, getParameter, etc.

  • Cannot read/write in config, admin and local databases
  • Cannot write to system.* collections
  • Prior to use a transaction a session must be created

129

slide-102
SLIDE 102

Is my app good for transactions?

  • Yes it is if

○ you have a lot of 1:N and N:N relationships between different collections and you are aware of data consistency ○ you manage commercial/financial and/or really sensitive data ○ you need to be aware of data consistency because your app needs to be

  • In general remember the following matters

○ transactions incur a greater performance cost over single document writes ○ transactions should not be a replacement for effective schema design

■ embed documents as much as possible ■ denormalized data model continue to be optimal ■ single document writes are always atomic

130

slide-103
SLIDE 103

More details in other sessions

Use multi-document ACID transactions in MongoDB 4.0 Corrado Pandiani Wed 7th 2:20PM 3:10PM @Bull What’s new in MongoDB 4.0 Vinicius Gripps Tue 6th 11:20AM 12:10PM @Bull

131

slide-104
SLIDE 104

Benchmark and replay tools

slide-105
SLIDE 105

13 3

mongoreplay

  • availability 3.4+
  • captures traffic sent to a mongod instance
  • replay later the captured traffic against a different mongod
  • provides feedback on replayed traffic
  • useful to test a new MongoDB deployment using the real workload

○ testing a different storage engine ○ testing a different hardware ○ testing a different OS configuration

slide-106
SLIDE 106

13 4

mongoreplay usage

  • capture traffic with record command and create the playabck file

○ mongoreplay record -i eth0 -e "port 27017" -p ~/recordings/playback

  • replay the recorded playback file using play command

○ mongoreplay play -p ~/recordings/playback --report ~/reports/replay_stats.json --host mongodb://192.168.0.4:27018

  • inspect a live mongod instance using monitor command

○ mongoreplay monitor -i eth0 -e 'port 27017' --report ~/reports/monitor-live.json --collect json

https://docs.mongodb.com/manual/reference/program/mongoreplay/

slide-107
SLIDE 107

13 5

flashback

  • third party tool
  • records queries from the profiler

○ setProfilingLevel set to 2

  • replays captured ops

○ the replayer can sends these ops to databases as fast as possible to test limits ○ reply ops in accordance to their original timestamps, which allows us to imitate regular traffic

https://github.com/facebookarchive/flashback

slide-108
SLIDE 108

Monitoring

slide-109
SLIDE 109

Monitoring: Methodology

  • Monitor often

○ 60 - 300 seconds is not enough! ○ Problems can begin/end in seconds

  • Correlate Database and Operating System together!
  • Monitor a lot

○ Store more than you graph ○ Example: PMM gathers 700-900 metrics per polling

  • Process

○ Use to troubleshoot Production events / incidents ○ Iterate and Improve monitoring

■ Add graphing for whatever made you SSH to a host ■ Blind QA with someone unfamiliar with the problem

150

slide-110
SLIDE 110

Monitoring: Important Metrics

  • Database

○ Operation counters ○ Cache Traffic and Capacity ○ Checkpoint ○ Concurrency Tickets (WiredTiger) ○ Document and Index scanning

  • Operating System

○ CPU ○ Disk ○ Bandwidth / Util ○ Average Wait Time ○ Memory and Network

151

slide-111
SLIDE 111

Monitoring: Percona PMM

  • Open-source monitoring from Percona!
  • Based on open-source technology

Prometheus

Grafana

Go Language

  • Simple deployment
  • Examples in this demo are

from PMM!

  • Correlation of OS and DB

Metrics

  • 800+ metrics per ping

152

slide-112
SLIDE 112

Monitoring: Percona PMM

https://pmmdemo.percona.com

153

slide-113
SLIDE 113

More details in other sessions

Monitoring MongoDB with Percona Monitoring and Management (PMM)

Michael Coburn Tue 5:25PM 5:50PM Bull

154

slide-114
SLIDE 114

Questions

slide-115
SLIDE 115

Questions