Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production - - PowerPoint PPT Presentation

continuous mysql restores
SMART_READER_LITE
LIVE PREVIEW

Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production - - PowerPoint PPT Presentation

Continuous MySQL Restores Divij Rajkumar (divij@fb.com) Production Engineer, MySQL Infrastructure, Facebook Continuous Restores Why? Verify backup integrity Havent tested your backups? You dont have them Understand resource


slide-1
SLIDE 1
slide-2
SLIDE 2

Divij Rajkumar (divij@fb.com)

Production Engineer, MySQL Infrastructure, Facebook

Continuous MySQL Restores

slide-3
SLIDE 3
  • Verify backup integrity
  • Haven’t tested your backups? You don’t have them
  • Understand resource requirements for restores
  • Time, server capacity, network capacity
  • Test restore orchestration
  • Don’t panic during a disaster event

Continuous Restores – Why?

slide-4
SLIDE 4

Backups

slide-5
SLIDE 5
  • Everything, every day

Backups

slide-6
SLIDE 6

server

shar d shar d shar d shar d shar d shar d

mysq l

shar d shar d shar d shar d shar d shar d

mysq l

Shard 86 User 1 User 2 . . User 1000

slide-7
SLIDE 7
  • Everything, every day
  • Tens of thousands of servers, many, many shards
  • Stored in HDFS
  • 3 types:
  • Full dumps
  • Differential backups
  • Binary Logs

Backups

slide-8
SLIDE 8
  • mysqldump
  • --single-transaction
  • --set-gtid-purged=COMMENTED, stored in metadata
  • github.com/facebook/mysql-5.6
  • Per shard, not per instance
  • Shards aren’t pinned to an instance
  • Easier to locate

Backups – Full dumps

slide-9
SLIDE 9
  • Logical, not physical
  • Easy single-table restore
  • No indexes backed up, reduced size
  • Easy to debug
  • Better compression
  • Every 5 days

Backups – Full dumps

slide-10
SLIDE 10
  • Diff between 2 full dumps
  • 2 files generated - Rows deleted, Rows inserted
  • Based off of existing dump in HDFS
  • Take full dump -> compute diff -> upload diff

Backups – Differential backups

slide-11
SLIDE 11

Backups – Differential backups

slide-12
SLIDE 12
  • Diff between 2 full dumps
  • 2 files generated - Rows deleted, Rows inserted
  • Based off of existing dump in HDFS
  • Take full dump -> compute diff -> upload diff
  • Possible because of logical full dumps

Backups – Differential backups

slide-13
SLIDE 13
  • Raw binlogs from mysql
  • Record Previous-GTIDs in metadata
  • Using mysqlbinlog:
  • --start-position=4
  • --stop-position=121
  • Continuously uploaded
  • Binlog Server to interface

Backups – Binary Logs

slide-14
SLIDE 14
  • Everything, every day
  • Tens of thousands of servers, many, many shards
  • Stored in HDFS
  • 3 types:
  • Full dumps
  • Differential backups
  • Binary Logs

Backups

slide-15
SLIDE 15

Restore Infrastructure

slide-16
SLIDE 16

Warchief

OR C

ORC DB Peon

MySQ L

Syn c Schedul e

CRT

Reque st

Restore Infrastructure

Peon

MySQ L

Peon

MySQ L

slide-17
SLIDE 17

ORC - Peons

slide-18
SLIDE 18

Restore Infrastructure - Peons

server

restored db restored db restored db

mysq l

restored db restored db restored db restored db

mysq l

peo n peo n

  • Python process
  • Exposes Thrift

interface

  • Manages MySQL

Instance

  • Multiple peons per host
  • Work on restore jobs
  • 1 database == 1 job
slide-19
SLIDE 19

Restore Jobs – State Machine

  • 1. SELECT – Select backup to restore
  • 2. DOWNLOAD – Download backup to disk
  • 3. LOAD – Load tables in parallel
  • 4. VERIFY – Verify loaded full/diff dump
  • 5. REPLAY – Replay binlogs

+ FAILED states

slide-20
SLIDE 20

Restore Jobs – mysqldump index

  • Custom index added to mysqldump output
  • Locate tables using per-table byte offsets
  • Preserved across diff backups
slide-21
SLIDE 21

Restore Jobs – SELECT

  • Pre-defined search range per job
  • 1 day == restore most recent backup only
  • Use metadata to ignore “bad” backups
  • Only full/diff backups selected here
slide-22
SLIDE 22

Restore Jobs – DOWNLOAD

  • For full dumps:
  • Download mysqldump from HDFS to disk
  • For diff backups:
  • Stream diffs + full dump to perform 3-way merge
  • Store recreated mysqldump to disk
  • Single mysqldump output at the end
slide-23
SLIDE 23

Restore Jobs – LOAD

  • Split backup into per-table streams using index
  • Load tables in parallel
slide-24
SLIDE 24

Restore Jobs – VERIFY

  • Perform sanity checks on loaded data
  • Possible verification methods:
  • Compare checksums
  • Compare number of tables loaded
  • Best verification: replay binlogs
slide-25
SLIDE 25

Restore Jobs – REPLAY

  • Replay X seconds/minutes/hours worth of binlogs
  • Enables point in time restores
  • Verify loaded data can be written to
slide-26
SLIDE 26

Restore Jobs – State Machine

SELECT DOWNLOAD LOAD TABLE LOAD TABLE LOAD TABLE VERIFY REPLAY LOAD TABLE LOAD TABLE

slide-27
SLIDE 27

Restore Jobs – Pipeline

  • Allow processing many jobs concurrently
  • Peon has 1 handler per job state
  • Limit per-state concurrency
slide-28
SLIDE 28

Restore Jobs – Pipeline

DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y SELEC T

SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5

slide-29
SLIDE 29

db1340 db2362 db142 db7 – t1 db7 – t2 db982 – t3 db982 – t9 db7 – t5 db7 db982

DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y

db1963

SELEC T

db3794 db165 SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5 db651 db1000

Restore Jobs – Pipeline

slide-30
SLIDE 30

Restore Jobs – Pipeline

  • Allow processing many jobs concurrently
  • Peon has 1 handler per job state
  • LOAD -> LOAD + LOAD_TABLE
  • Limit per-state concurrency
  • Slot capacity can be tuned
slide-31
SLIDE 31

Restore Jobs – Selecting Binlogs

  • Full/diff backups can be taken from slaves
  • Binlogs always taken from master
  • Need to uniquely identify transactions…
  • GTIDs!
slide-32
SLIDE 32

Restore Jobs – Selecting Binlogs

  • Use GTIDs purged from mysqldump:
  • --set-gtid-purged=COMMENTED
  • Use GTIDs purged from mysqlbinlog:
  • --start-position=4
  • --stop-position=121
  • These are GTIDs purged, not GTIDs contained
slide-33
SLIDE 33

Restore Jobs – Selecting Binlogs

  • GTIDs purged from LOAD state (dump_gtids):
  • GTIDs purged in binlogs (binlog_gtids):

Contains 451-529 Contains 530-774 Contains 775-?

First superset First binlog to replay

slide-34
SLIDE 34

Restore Jobs – Replaying Binlogs

  • Filter binlog events with mysqlbinlog:
  • --database
  • --skip-gtids
  • --skip-empty-trans
  • github.com/facebook/mysql-5.6
  • Last transaction to replay:
  • --stop-datetime
slide-35
SLIDE 35

db100 – t1 db100 – t2 db100 –t3

Restore Job – Lifecycle

DOWNLOA D LOA D LOAD TABLE VERIF Y REPLA Y SELEC T

SLOT 1 SLOT 2 SLOT 3 SLOT 4 SLOT 5 db100

slide-36
SLIDE 36

CRT

slide-37
SLIDE 37

Warchief

OR C

ORC DB Peon

MySQ L

Syn c Schedul e

CRT

Reque st

Restore Infrastructure

Peon

MySQ L

Peon

MySQ L

slide-38
SLIDE 38

CRT – Continuous Restore Tier

  • Create restore jobs for shards
  • Periodically poll backup metadata to find backups
  • Thrift call to Warchief to issue restore job
  • Monitor restore progress
  • Peons expose job and health stats via Thrift
  • Detailed stats written to Scuba
slide-39
SLIDE 39

CRT – Continuous Restore Tier

  • Failure handling
  • Retry on transient errors
  • Mark “unstable”
  • Notify oncall
  • Orchestration component
  • Resolve dependency graph
  • Request restores in order of dependency
slide-40
SLIDE 40

On-demand Restores

slide-41
SLIDE 41

“oops, I accidentally ran UPDATE without WHERE”

slide-42
SLIDE 42

“oops, I thought I was on my dev database”

slide-43
SLIDE 43

“oops”

slide-44
SLIDE 44

Warchief

OR C

ORC DB Peon

MySQ L

Syn c Schedul e

CRT

Reque st

Restore Infrastructure

Peon

MySQ L

Peon

MySQ L

UI/CLI

Reque st

slide-45
SLIDE 45

Discover Edge Cases

slide-46
SLIDE 46

War Stories

slide-47
SLIDE 47

War Stories – Collation Change

  • COLLATE=latin1_bin
  • Required for MyRocks
  • Schema change deployed
  • Most diff backup restores fail overnight
  • Thanks CRT Monitoring!
slide-48
SLIDE 48

200x increase in DOWNLOAD_FAILED !

slide-49
SLIDE 49

War Stories – Collation Change

  • DifferentialBackup used default collation
  • Sorting order broken in diff files
  • Consistent sorting is required for 3-way merge
  • 3-way merges failed for affected backups
  • Wouldn’t have been caught without continuous

restores

  • DifferentialBackup fixed to understand table collation
slide-50
SLIDE 50

War Stories – RBR Binlogs

  • Row-based replication deployed
  • Lots of REPLAY failures overnight

“The database used for the current transaction

has changed since BEGIN. This is not supported!”

  • 2 common patterns:
  • Shards belonged to same database tier
  • Failure only happened after master promotion
slide-51
SLIDE 51

War Stories – RBR Binlogs

  • SBR binlogs:
  • Each BEGIN contains session database
  • RBR binlogs:
  • On master, each BEGIN contains session database
  • On slave, BEGIN has no session database
slide-52
SLIDE 52

War Stories – RBR Binlogs

  • With mysqlbinlog --skip-gtids --skip-empty-trans
  • Expected: session databases don’t change across events
  • Binlogs not rotated after master promotion
  • For some part of binlog, instance is a slave
  • For some part of binlog, session database is empty
  • For some part of binlog, session databases don’t match

“The database used for the current transaction has changed since BEGIN. This is not supported!”

slide-53
SLIDE 53

War Stories – RBR Binlogs

  • Possible solutions:
  • Rotate binlogs during promotion
  • Fix MySQL server behaviour
  • Patch mysqlbinlog
  • ???
slide-54
SLIDE 54

Divij Rajkumar (divij@fb.com)

Thank s!

slide-55
SLIDE 55