Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 - - PowerPoint PPT Presentation

percona backup for mongodb
SMART_READER_LITE
LIVE PREVIEW

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 - - PowerPoint PPT Presentation

Percona Backup for MongoDB Akira Kurogane Percona 3 - 2 - 1 MongoDB Percona Server for MongoDB Community Edition MongoDB Enterprise Edition Replica Set Cluster Percona Backup for MongoDB 2 Elements of MongoDB Backups 3 MongoDB oplog


slide-1
SLIDE 1

Percona Backup for MongoDB

Akira Kurogane Percona

slide-2
SLIDE 2

2

3 - 2 - 1

MongoDB Community Edition Percona Server for MongoDB MongoDB Enterprise Edition

Replica Set Cluster Percona Backup for MongoDB

slide-3
SLIDE 3

3

Elements of MongoDB Backups

slide-4
SLIDE 4

MongoDB oplog

  • MongoDB has logical (not physical) replication.
  • Visible to db users in "local" db's oplog.rs collection.
  • User writes will be transformed to idempotent operations:

○ A write modifying n docs will become n docs in the oplog, each with "_id" value of affected doc. ○ Relative modifications become absolute E.g. {"x": {$inc: 1}} → {"$set": {"x": <newX> }} ○ Nested arrays usually $set as whole every modification.

  • Transactions pack several ops together for a single apply time.
  • Secondaries apply oplog ops with broad-use "applyOps" command.

4

slide-5
SLIDE 5

MongoDB oplog - Extra Use in Backups

A database dump has a phase of copying all collection documents. Let's say this takes m minutes.

  • The last dumped doc is as-of time (T).
  • The first dumped doc is as-of (T - m) mins.

Inconsistent! But easy fix to make all docs match time (T).

  • Get oplog slice for those m mins.
  • Replay the (idempotent) oplog on the dump.

5

slide-6
SLIDE 6

Consistency (Replica Set)

All methods below provide consistent snapshots for replica sets:

  • Filesystem snapshot method

Storage engine's natural consistency

  • Stopped secondary

Storage engine's natural consistency

  • Dump method + oplog slice during copy

= reconstructable consistency as-of finish time. All the DIY scripts or tools use one of the above. (But don't forget --oplogFile if using mongodump in own script!)

6

slide-7
SLIDE 7

Consistency (Cluster)

As for a replica set, but synchronized for all replicasets in cluster: Config server replicaset as of tx Shard 1 replicaset as of tx Shard 2 replicaset as of tx ... ...

7

slide-8
SLIDE 8

Consistency (Cluster)

Concept 'gotcha': Simultaneous-for-everyone consistency impossible. Network latencies to shards == relativity effect. 2 clients. Far shards with 2ms RTT latency, Near shards with 0.2ms RTT.

  • Initiate reads to Far shards at
  • 1.5ms
  • Read happens on Far shards at
  • 0.5ms
  • Initiate writes on Near shards at
  • 0.1ms
  • Writes happen at

0 ms

  • Writes confirmed by response

+0.1ms

  • Reads returned in response at

+0.5ms

Both observe the Near write before Far read. Asymmetric.

8

slide-9
SLIDE 9

Consistency (Cluster)

Minimal client latency relativity effect per different point-in-time definitions:

  • Same wall-clock time by oplog

Clock skew + RTT.

  • Same time according to one client

RTT latency.

  • Single client's 'checkpoint' write

Perfect to that client; RTT to others. All approximately same accuracy, on the scale of milliseconds.

  • Very accurate by human response times.
  • Crude by storage engine op execution time.

9

slide-10
SLIDE 10

Consistency (Cluster)

Minimal client latency relativity effect by point-in-time definitions:

  • Parallel filesystem snapshots

Snapshot op time + RTT.

  • Hidden secondary snapshots

Shutdown time + RTT. "lvcreate -s ..." ~= several hundred milliseconds (my experience). Node shutdown: typically several seconds (my experience).

10

slide-11
SLIDE 11

Point-in-time Restores

Backup snapshot at time st1 Copy of oplog from <= st1 to tx Daily snaps + 24/7 oplog history Note:

  • Large write churn = too much to stream to backup store. Give up PITR.
  • Since v3.6 need to skip some system cache collections:

config.system.sessions, config.transactions, etc.

11

Restore to any point in time between st1 to tx PITR from stoldest to now.

slide-12
SLIDE 12

Transactions - Restore Method

MongoDB 4.0 replica set transactions.

  • Appear as one composite oplog doc when the transaction completes.

Just replay as soon as encountered when restoring. MongoDB 4.2 distributed transactions

  • In most situations the same as above (w/out 16MB limit).

Just replay as soon as encountered when restoring.

  • Only multi-shard transactions use new oplog format.
  • Distributed transaction oplog has separate docs for each op.
  • Buffer these and don't replay until "completeTransaction" doc found.

12

slide-13
SLIDE 13

13

Existing MongoDB Backup Tools

slide-14
SLIDE 14

MongoDB Backup Methods (DIY)

mongodump / mongorestore: Simple ☑ Sharding ☒ Easy restore ☑ PITR ☒ S3 store ☒ HW cost $

  • r

Simple ☒ Sharding ☑ Easy restore ☒ PITR ☒ S3 store ☒ HW cost $ Filesystem snapshots: Simple ☒ Sharding ☑ Easy restore ☒ PITR ☒ S3 store ☑ HW cost $ Hidden secondary: Simple ☑ Sharding ☑ Easy restore ☒ PITR ☒ S3 store ☑ HW cost $

14

slide-15
SLIDE 15

MongoDB Backup Methods (PSMDB HB)

Percona Server for MongoDB has command for hot backup:

> use admin > db.runCommand({createBackup: 1, <local dir or S3 store>})

PSMDB Hot Backup (Non-sharded replica set): Simple ☑ Sharding ☒ Easy restore ☒ PITR ☒ S3 store ☑ HW cost $ PSMDB Hot Backup (Cluster): Simple ☒ Sharding ☑ Easy restore ☒ PITR ☒ S3 store ☑ HW cost $

(similar to filesystem snapshot, but extra unix admin for LVM etc. avoided)

15

New in v4.0.12-6

slide-16
SLIDE 16

MongoDB Backup Methods (Tools)

MongoDB OpsManager (Paid license; closed source) Simple ☒ Sharding ☑ Easy restore ☑ PITR ☑ S3 store ☑ HW cost $$ mongodb-consistent-backup (Percona-Labs repo) Simple ☑ Sharding ☑ Easy restore ☑ PITR ☒ S3 store ☑ HW cost $ percona-backup-mongodb v0.5 Simple ☒ Sharding ☑ Easy restore ☑ PITR ☒ S3 store ☑ HW cost $

16

slide-17
SLIDE 17

MCB; PBM v0.5

mongodb-consistent-backup

  • single script
  • single-server bottleneck

Not suitable for many-shard clusters percona-backup-mongodb v0.5

  • pbm-agent

1-to-1 to mongod (copy bottleneck gone)

  • pbm-coordinator Coordinator daemon to agents
  • pbm CLI

"Simple ☒" because coordinator-to-agents is an extra topology

17

slide-18
SLIDE 18

percona-backup-mongodb v1.0

percona-backup-mongodb v1.0

  • pbm-agent

1-to-1 to mongod

  • pbm-coordinator Coordinator daemon to agents
  • pbm CLI

18

Simple ☑ Sharding ☑ Easy restore ☑ PITR ☒ S3 etc. ☑ HW cost $

Now: Manual PITR on restored snapshot is OK Full Auto PITR is next major feature on dev roadmap

slide-19
SLIDE 19

19

Percona Backup for MongoDB v0.5 --> v1.0

slide-20
SLIDE 20

pbm-coordinator (R.I.P.)

percona-backup-mongodb v0.5

  • pbm-agent

1-to-1 to mongod

  • pbm-coordinator Coordinator daemon to agents
  • pbm

20

Why kill the coordinator ...?

slide-21
SLIDE 21

"Let's Have a Coordinator Daemon"

Cluster shard and configsvr backup oplog slices must reach same time -> Coordination is needed between the agents.

21

"So let's have a coordinator daemon. We just need:"

  • One or two more setup steps.
  • Extra authentication subsystem for agent <-> coordinators.
  • A few more ports open (== firewall reconfig).
  • New pbm commands to list/add/remove agents.
  • Users must notice coordinator-agent topology first; troubleshooting hard.
slide-22
SLIDE 22

"New Idea: Let's Not!"

But how do we coordinate? REQUIRED: Some sort of distributed server

  • Already present on the MongoDB servers.
  • Where we can store and update config data.
  • Agents can listen for messages as a stream.
  • Has an authentication and authorization system.
  • Agents can communicate without firewall issues.
  • Automatic failover would be a nice-to-have.
  • ...

22

slide-23
SLIDE 23

Coordination Channel = MongoDB

pbm sends message by updating a pbm command collection. pbm-agents update their status likewise.

  • Already present on the MongoDB servers (duh!)
  • Store and update config data in admin.pbm* collections.
  • Agents listen for commands using MongoDB change stream.
  • Use the MongoDB authentication and role-based access control.
  • Agents connect only to mongod hosts so no firewall reconfig needed.
  • Automatic failover provided by MongoDB's replication.

23

slide-24
SLIDE 24

PBM's Collections (as of v1.0)

  • admin database

○ pbmCmd The trigger (and state) of a backup or restore ○ pbmConfig Remote store location and access credentials

pbmBackups Status

pbmOp Coordination locks

24

slide-25
SLIDE 25

Lose DB cluster, Lose Backup System?

  • Q. If the cluster (or non-sharded replicaset) is gone, how can the pbm

command line tool communicate with the agents? A: It can't. In the event of a complete loss / rebuild of servers:

  • Start a fresh, empty cluster with same RS names.
  • Create the pbm mongodb user with backup/restore role.
  • Re-insert the remote-store config (S3 URL, bucket, etc).
  • "pbm list" --> backups listed by timestamp.
  • Restart the pbm-agent processes.
  • "pbm restore <yyyymmdd_hhmmss>".

25

slide-26
SLIDE 26

26

Demonstration

slide-27
SLIDE 27

Demonstration

27

pbm --help pbm [--mongodb-uri ...] set store --config <S3_config.yaml> pbm-agent --mongodb-uri mongodb://user:pwd@localhost:port/ pbm [--mongodb-uri ...] backup (aws s3 ls s3://bucket/...) pbm [--mongodb-uri ...] list pbm [--mongodb-uri ...] restore <yyyymmdd_hhmmss>

slide-28
SLIDE 28

28

Looking Ahead

slide-29
SLIDE 29

Coming Features

29

  • Point-in-time restore.
  • pbm status, pbm log.
  • Distributed transaction oplog handling.
slide-30
SLIDE 30

Point-in-time Restore

Agents already copy variable length of oplog for cluster snapshots.

30

"Snapshot" time == min(oplog slice finish times) == 0 ~ few secs after slowest data-copy end time

  • Agents replay oplog slices only to that snapshot time.
  • (Parallel application in each shard and configsvr RS).

Data copy Oplog Snapshot time configsvr shard2 shard3

slide-31
SLIDE 31

Point-in-time Restore

31

Let's use the same oplog capture and replay functionality. To come as next main feature in PBM:

  • Option to add oplog capture 24/7 to enable PITR.
  • After restore of backup snapshot at ts replay oplog from ts to tx
  • (Parallel application in each shard and configsvr RS).

Data copy Oplog

s x

24/7 Oplog copy

slide-32
SLIDE 32

Point-in-time Restore

32

Manual PITR is already possible on top of a PMB v1.0-restored backup if

  • The cluster isn't already erased, and;
  • The oplog(s) start before that backup's time.

Method:

  • 1. Dump the oplog(s) elsewhere before doing "pbm restore"
  • 2. Use mongorestore --oplogReplay --oplogFile ....

https://www.percona.com/blog/2019/07/05/mongodb-disaster-snapshot-restore-and-point-in-time-replay/

slide-33
SLIDE 33

User Interface

33

pbm status Show the progress of running backups pbm log Centralized agent log display

slide-34
SLIDE 34

Transaction Consistency Now

34

Transactions consistency supported by PBM so far (v0.5, v1.0)

  • 4.0 Replica set transactions.
  • 4.2 Single shard-affecting transactions.

Mechanism for these transactions:

  • MongoDB creates single oplog doc at commit time.
  • Transaction's write ops wrapped in a nested "applyOps" array.
  • Just apply as the next op, like classic oplog mechanism.

Not unique to PBM. mongorestore can restore these too.

slide-35
SLIDE 35

35

{ "ts" : Timestamp(1567058020, 1), ... "op" : "c", "ns" : "admin.$cmd", ... "txnNumber" : NumberLong(2), ... "o" : { "applyOps" : [ { "op" : "i", "ns" : "test.baz", "ui" : UUID("54b05710-ee45-4cca-9bd1-63b749ed6557"), "o" : { "_id" : ObjectId("5d676859138f17a8d8a27bb8") } }, { "op" : "i", "ns" : "test.bar", "ui" : UUID("5c65df08-da5e-4ef8-8bb0-27bfa3b50c80"), "o" : { "_id" : ObjectId("5d67685f138f17a8d8a27bb9") } } ] } }

slide-36
SLIDE 36

4.2 Distributed Transactions

36

Transactions not supported so far (<= v1.0)

  • 4.2 Multiple shard-affecting transactions.

Mechanism:

  • Transaction ops written separately ({.., "txnNumber": ..., {.., "prepare": true}}).
  • Don't apply immediately. Buffer in chain for that txn.
  • Apply all when 'completeTransaction' reached.
  • Discard buffered ops if 'abortTransaction', or if replay simply finishes.
slide-37
SLIDE 37

37

{ "ts" : Timestamp(1567134752, 2), ... "op" : "i", "ns" : "config.transaction_coordinators ", ..., "o" : { "_id" : { "lsid" : { "id" : UUID("995ad9a8-9d95-43c5-acbe-1a987df4fc95"), "uid" : BinData(0,"kanlvzjTP1bYGUTMfQK71txdM8LpbSXTMtQ+b8M4WTA=") }, "txnNumber" : NumberLong(0) }, "participants" : [ "s2rs", "testrs" ] } } { "ts" : Timestamp(1567134752, 3), ... "op" : "c", "ns" : "admin.$cmd", ... "txnNumber" : NumberLong(0), ... "o" : { "applyOps" : [ { "op" : "i", "ns" : "test.baz", "ui" : UUID("e68e7aba-46e2-4ecd-818a-5c8e5a1b8ef4"), "o" : { "_id" : ObjectId("5d689411858632a838de0861") } } ], "prepare" : true } } { //On OTHER SHARD "ts" : Timestamp(1567134752, 3), ... "op" : "c", "ns" : "admin.$cmd", ... "txnNumber" : NumberLong(0), ... "o" : { "applyOps" : [ { "op" : "i", "ns" : "test.bar", "ui" : UUID("fa769194-1b8c-4704-a50b-56bef326e341"), "o" : { "_id" : ObjectId("5d68941b858632a838de0862") } } ], "prepare" : true } } { "ts" : Timestamp(1567134752, 4), ... "op" : "u", "ns" : "config.transaction_coordinators ", ... "o2" : {...}, "o" : { "_id" : { "lsid" : { "id" : UUID("995ad9a8-9d95-43c5-acbe-1a987df4fc95"), "uid" : BinData(0,"kanlvzjTP1bYGUTMfQK71txdM8LpbSXTMtQ+b8M4WTA=") }, "txnNumber" : NumberLong(0) }, "participants" : [ "s2rs", "testrs" ], "decision" : { "decision" : "commit", "commitTimestamp" : Timestamp(1567134752, 3) } } } { //On BOTH SHARDS "ts" : Timestamp(1567134752, 5), ... "op" : "c", "ns" : "admin.$cmd", ... "txnNumber" : NumberLong(0), ... "o" : { "commitTransaction" : 1, "commitTimestamp" : Timestamp(1567134752, 3) } } { "ts" : Timestamp(1567134752, 6), ... "op" : "d", "ns" : "config.transaction_coordinators ", ... "o" : { "_id" : { "lsid" : { "id" : UUID("995ad9a8-9d95-43c5-acbe-1a987df4fc95"), "uid" : BinData(0,"kanlvzjTP1bYGUTMfQK71txdM8LpbSXTMtQ+b8M4WTA=") }, "txnNumber" : NumberLong(0) } } }

slide-38
SLIDE 38

4.2 Distributed Transactions

38

Backup tools supporting 4.2 Distributed Transactions as of now. Needed only if your backup snapshot time bisects multi-shard transactions.

  • MongoDB Ops Manager v4.2 ☑
  • mongodump + mongorestore

  • Filesystem snapshot method

  • Percona Backup for MongoDB v1.0 ☒

Roadmap: Percona Backup for MongoDB to be PITR ☑ in v1.2.