What's New in Percona Server for MongoDB? 2019 Q3: Enterprise - - PowerPoint PPT Presentation
What's New in Percona Server for MongoDB? 2019 Q3: Enterprise - - PowerPoint PPT Presentation
What's New in Percona Server for MongoDB? 2019 Q3: Enterprise Enhancements and v4.2 4:00 PM - 4:50 PM - Room B About Adamo and Akira Two of the most experienced MongoDB field experts in the world. Adamo w/ MongoDB: 2013 ~ MongoDB: 2009 ~
Two of the most experienced MongoDB field experts in the world. MongoDB: 2009 ~ vs. AdamoMongoDB
+ AkiraMongoDB > MongoDB !
About Adamo and Akira
2
Adamo w/ MongoDB: 2013 ~ Akira w/ MongoDB: 2014 ~
Talk Overview
- New in Percona Server for MongoDB (PSMDB) 4.0+
○ Encryption at rest with Hashicorp Vault ○ All about PSMDB
- MongoDB Community / PSMDB 4.2
○ New Primary throttle (a.k.a. "flow control") ○ New index build process ○ Cursors and user info added to $currentOp ○ Wildcard indexes ○ Modularization-friendly config files
3
4
Encryption at Rest with Hashicorp Vault
What is Encryption and How Does it Work?
Encryption is the process of hiding data in such a way that only those who have the key (decrypt key) will be able to read the data. Any data - files, emails, network, individual fields - can be encrypted. For MongoDB data-at-rest encryption the collection documents and index key entries are encrypted within the WiredTiger Btree file format saved in the *.wt files.
5
PSMDB has two optional encryption features:
- Encryption at Rest
- Encryption in Transit (SSL/TLS)
We'll discuss both, but focus a bit more on the encryption at rest as this is a free feature only in PSMDB
6
Types of Encryption
WiredTiger Keyfile-Encryption
This is the basic encryption where there is a key file that acts as an encryption and decryption key of the database. The encryption key is in the filesystem and can be read by any root user or mongod. The database is encrypted, however the secret is not that secret
7
WiredTiger Keyfile-Encryption
8
Instance
A vault is an external process capable of keeping secrets and answering API-like calls to show a secret to a client. PSMDB is fully integrated with Hashicorp Vault.
WiredTiger Vault Encryption
9
WiredTiger Vault Encryption
10
a8abc37456e - token secret Token can be changed Secret is not on the same disk as data Instance
What is Hashicorp Vault?
Vault is a tool for securely accessing secrets. A secret is anything that you want to tightly control access to, such as API keys, passwords, or certificates. Vault provides a unified interface to any secret, while providing tight access control and recording a detailed audit log.
11
How Does Hashicorp Vault Work?
Vault only speaks using TLS, so we need to configure the CaFile, otherwise the client won't be able to understand the reply. This also is a security feature, as any request without SSL will fail. Clients’ request the secret to the vault using a previously-created token. This is logged in the audit log and the server replies to the client with the secret. The secret is used to decrypt the db.key database, then open the master key to encrypt/decrypt the database.
12
Parameters to Enable Vault in PSMDB
- -enableEncryption
- -encryptionCipherMode AES256-CBC
- -vaultServerName <vault ip>
- -vaultPort 8200
- -vaultToken <machine token>
- -vaultSecret secret/data/psmdb-node1
13
Parameters to Enable Vault in PSMDB
security: enableEncryption: true vault: serverName: 127.0.0.1 port: 8200 tokenFile: /home/user/path/token secret: secret/data/hello
14
Backups?
- Logical backups (mongodump) works the same as before, all the data is
decrypted.
- Binary copies (percona hot backup) will need to access the vault in order
to get the secret to open the key.db, otherwise the database will fail to start. Always encrypt your logical backup, the easiest way is using GPG.
15
- Because the encryption is at rest all the data is transiting over the wire
without encryption. That makes it really easy to intercept.
- Encryption at rest doesn't remove the necessity of using TLS/SSL (we
will talk about that shortly).
What About the Data in Transit?
16
17
Everything Else
Everything Else
Percona Server for MongoDB comes with additional features such as:
- LDAP Authentication
- Auditing
- Log Redaction
- In-Memory Storage Engine
- Hot Backup
All free and open source!
LDAP authentication
LDAP stands for Lightweight Directory Access Protocol and it is a common protocol used in companies to centralise their users in just one software. All the other connected software can validate an user and password though the LDAP client. Microsoft version of LDAP is Active Directory. PSMDB features LDAP authentication, not authorization.
Auditing
For enhanced security and compliance, awareness of the operations the database is performing can be critical. With an audit, it’s possible to track operations such as user and index creation - at the database level.
Log Redaction
Logs can have sensitive data. Depending on regulations, certain information may not be allowed to be saved in a log file. Log redaction hides sensitive information, changing the values to a different character.
In Memory Storage Engine
Low latency storage engine that doesn't interact with the disk subsystem. Completely ephemeral, once the database stops all the data is gone. Sub-millisecond latency, only for specific use cases.
PSMDB only features
Hot Backup: This is a backup command that will generate an exact copy of the database (binary copy) in a different folder, in a very lightweight fashion.
> use admin switched to db admin > db.runCommand({createBackup: 1, backupDir: "/my/backup/data/path"}) { "ok" : 1 }
PSMDB is compatible with MongoDB Enterprise and Community. Just replace MongoDB Community binaries with PSMDB and you'll be all set. PSMDB can replace MongoDB Enterprise in place except when config file has enabled: * security.kmip.* options (PSMDB uses security.vault.* instead) * security.ldap.* options (PSMDB supports only saslauthd LDAP authentication as of v4.0.12-6)
Migration to PSMDB
PMM
PMM is an open-source platform for managing and monitoring MySQL and MongoDB performance and metrics. It is based on Docker, virtual appliances and AWS AMI and it is self hosted.
https://www.percona.com/blog/2018/07/05/configuring-pmm-monitoring-mongodb-cluster/
PMM 2 is GA!
https://pmmdemo.percona.com/graph/
27
New in 4.2
28
Primary Throttle
"Flow control"
Primary Throttle
What happens when a Replica set has:
- Uneven hardware?
- 'Noisy neighbours' in VM servers?
Primary server's capacity > Secondary's capacity No problem so long as Load < Secondary's capacity But replication lag will grow when load goes above secondary's capacity
29
Long Replication Lag OK? Not OK?
Yes, if:
- You want fastest-possible writes at any time. (Use w:1)
But:
- Recognize high-load, server capacity-saturating times are the most
likely times to have failovers.
- Accept that writes will be rolled-back in those failovers
A lot of replication lag == a lot of rolled-back documents.
- If you use secondary reads – very stale data.
30
Long Replication Lag OK? Not OK?
"And if I say; 'No thanks?’"
- A. Use w:majority Write Concern
- Latency for client increased
- More connections open simultaneously, awaiting client ⇄ P ⇄ S
confirmations.
- Capacity effectively throttled to weakest server in the w:majority subset of the
replicaset.
31
MVCC, Transactions, Replication
MVCC Update
32
Make new version of document
+
Pin old version document until no client needs it. Clean-up performed asynchronously by storage engine. WiredTiger is an MVCC architecture. It supports transactions.
MVCC, Transactions, Replication
MongoDB 4.0 added multi-document, user-level transactions.
- Multiple clients can attempt to read or modify the same doc.
- The old document version is pinned until the slowest/latest
client request referencing it finishes. MongoDB 4.2 enable cluster-wide transactions. Long transactions == Long pin time == Larger active cache. When app client requests have conflicts: double-work (or worse)
33
MVCC, Transactions, Replication
A secondary reading from the primary == Another client. Pins old versions whilst its replication optime is older. At checkpoint time newest, committed docs saved to Primary's disk. Unreplicated old document versions go into a separate "Cache
- verflow" datastructure (a.k.a. Lookaside table).
Cost: Lift-and-move (+ disk flush) on a fraction of the per-minute write volume.
34
Impact of Secondary Lag on the Primary
Costs:
Amount of cache active increases proportional to pin time.
- Transaction pinning:
○ Best-case: Linear increase. ○ High doc conflicts: Exponential increase.
- Replication lag factor at checkpoint:
○ Old doc versions go into "Cache overflow." ○ Linear but high cost for proportion of writes each min. ○ Proportion ~= replication-lag / flush interval (default 60s).
35
Impact of Secondary Lag on the Primary
If a checkpoint takes too long, latency suffers.
- No software 'locks'.
- But hardware resources are heavily utilized.
E.g. five second checkpoint could easily cause 1.0+ sec latency for clients.
36
Primary Throttle
Software throttle to prevent replication lag. Capping "cache overflow" work == Capping worst-case conflicts for I/O resources during checkpoints
37
Primary Throttle
SERVER-37865
"Add mechanism to throttle writes on the primary" Master branch Commit: SERVER-39673 Tweaks: SERVER-40367, SERVER-39616, SERVER-39867 Code reading:
- mongo/db/storage/flow_control.cpp
- mongo/db/concurrency/flow_control_ticketholder.cpp
- mongo/db/storage/flow_control_parameters.idl
38
Primary Throttle
Default settings (new as of 4.2)
- enableFlowControl:
true (SERVER-41340)
- flowControlTargetLagSeconds: 10
- flowControlThresholdLagPercentage: 0.5 (50%, i.e. 5 secs)
- flowControlMaxSamples: 1000000
- flowControlSamplePeriod: 1000 (ops)
- flowControlMinTicketsPerSecond: 100
(N.b. look for "Server Parameters" not "Configuration options" in docs).
39
Primary Throttle
Observing: See serverStatus flowControl section (on primaries only)
"flowControl" : { "enabled" : true/false, "targetRateLimit" : xxx, "timeAcquiringMicros" : xxx, = Total time client writes forced to wait for fC "locksPerOp" : xxx, "sustainerRate" : xxx, "isLagged" : true/false,
true = In use this moment
"isLaggedCount" : xxx,
= Not used since last restart
"isLaggedTimeMicros" : xxx,
Low = More or less unused
},
40
41
'Middle-Ground' Index Builds
Background Index Builds
In MongoDB <= 4.0 "background" option for createIndex
db.collection.createIndex({"x": 1}, {"background": true})
Took longer than default mode ('foreground'), but didn't block other reads and writes. As of 4.2 "background" option deprecated, ignored.
42
4.2+ Index Build Summary
- Mechanism closer to <= 4.0 background index build
○ But without non-optimal index structure as side effect
- Performance approx. as good as 'foreground' index build
if you guarantee writes are paused to that collection.
- Only brief blocking of other clients. Once at start and once at end.
So what is removed?
- Forced blocking of other reader and writers
(to allow index build to complete as fast as possible)
43
44
CurrentOp Enhancements
Where is 'listConnections' Command?
Simple feature request ticket:- 'Please give us a list-connections command'
45
Where is 'listConnections' Command?
No ball! Denied! Closed 11 mins after
- pened.
46
Seems like a fair request ... but listConnections isn't the right idea for a distributed database like MongoDB.
Egress User conns != Ingress DB conns
47
S S S P P P S S S Applications mongos nodes Shards x RS nodes
Sum egress TCP socket count here Ingress TCP sockets at any single mongos or mongod node
>=
Egress User conns != Ingress DB conns
48
A mongod or mongos node does not receive information on all potential client
- conns. That would not be scalable.
Only observes which TCP connections are currently open to it. A single TCP connection might be used for hundreds of different ops per second.
- Client's perspective: Its connections to shard mongod always appear open.
- Shard node's perspective: At any given moment only some fraction of clients
are running ops on it.
So What is the Best / Right Command?
49
List running operations, not connections db.currentOp() db.aggregate([{$currentOp}]) But idle cursors awaiting the next "getMore" command were unlistable in <= 4.0. Fixed in 4.2:
Idle Cursor?
50
Client code
var cursor = db.collection.find({ <many doc query>}); while (cursor.hasNext()) { print(cursor.next()._id) }
Server ops:
find
Returns first 101 documents + server-side cursorId (doing nothing while client iterates those 101 docs)
getMore Returns next 16MB of docs
(doing nothing while client iterates that next ~16MB of docs)
getMore Returns next 16MB of docs
... ...
Idle Cursor?
51
Client: Has query object that is still open and being read. Server: Finished running the initial find command and sent first batch,
- r one of the subsequent getMore command's batches.
Server-side cursor inactive/unused until next "getMore" command. db.aggregate([{$currentOp: {idleCursors: true}}])
User-auth'ed conns →__system conns
52
S S S P P P S S S Applications mongos nodes Shards x RS nodes u s e r
- x
user-y user-z __system user
Trust __system user
53
'Impersonated' user/role info attached to op details for auditing subsystem v4.2: (SERVER-5261) 'Impersonated user' now displaying as "runBy" in currentOp.
User-auth'ed conns →__system conns
54
DBAs occasionally need to answer: "What client apps are putting the load on this mongod node?" Don't analyze by connections: TCP sockets not 1-to-1 with client sessions, plus user info not attached 1-to-1 there either. List by using currentOp:
- Operations (both currently running or yielding).
- Idle cursors (queries inactive server-side between first find batch and
successive getMore commands).
- Includes user and role info, as well as client IP address.
Summary For currentOp Enhancement
55
Wildcard Indexes
Arbitrary Attributes? Prior Workaround
{ "_id": ..., "sn": ..., "modelName": ..., "attributes": [ { "k": "color", " v": "blue" }, { " k": "capacity", " v": "3200mAh" } ] }
//Make compound index on (key name, value):
db.collection.createIndex({" attributes.k": 1, "attributes.v": 1}) db.collection.find({" attributes.k": "color", "attributes.v": "blue"}) db.collection.aggregate([{$match: {"attributes.k": "color"}}, {$group: {"_id": "$attributes.v"}}, {$proj: {"color": "$_id"}}])
56
Arbitrary Attributes? Prior Workaround
Despite using a Document DB you need to add mini-ORM in app:
collection.find(...) for k in myObj.attributes myObj.setProperty(array_item.k, array_item.v) delete myObj.attributes
- for k in myObj.properties():
if (k not in fixedSchemaFields): myObj.attributes.k = myObj[k] delete myObj[k] collection.insert(myObj)
57
4.2 Wildcard Indexes
Query engine applies facade over the same mechanism. In reality only one index structure. I.e. another special type like:
- Text indexes
- Multikey indexes (i.e. nested array fields)
- Geo indexes
58
4.2 Wildcard Indexes
createIndex specification:
- Match all fields: { "$**" : 1 }
- Match only 1 nested object’s fields: { "attributes.$**" : 1 }
- Match/Filter subset: { "$**" : 1 , " wildcardProjection": {....}}
59
4.2 Wildcard Indexes
Not standard index structure, so limitations apply:
- Cannot be used in a compound index
- Cannot be combined with hashed, TTL, 2d options
- Query or Sort by only one of the wildcard fields.
- Can't query
k: $exists
- Can't query
k: {$ne: null}
- Can't test equality to non-scalar values.
If you need the above: continue using the mini-ORM workaround.
60
61
Modularization-Friendly Config Files
__rest, __exec External Config
Don't want to update config files? Container image lockdown etc? Add --configExpand [exec,rest] as a command-line arg, then:
62
What an exciting time to be alive!
... net: dbPath: type: "string" __exec: "echo ${CURRENT_TESTDB_ROOT}" trim: "whitespace" ... net: dbPath: /mnt/cvol_test20191002
(especially for security team ^_^; )
Any Questions?
Rate My Session
64
We’re Hiring!
65