Backup and Recovery Strategy About Stacy 10+ years of experience on - - PowerPoint PPT Presentation
Backup and Recovery Strategy About Stacy 10+ years of experience on - - PowerPoint PPT Presentation
Backup and Recovery Strategy About Stacy 10+ years of experience on various flavors of relational databases. Focus on performance tuning, code reviews, database deployment and infrastructure management for MySQL In her spare time, she
About Stacy
▪ 10+ years of experience on various flavors of relational databases. ▪ Focus on performance tuning, code reviews, database deployment and infrastructure management for MySQL ▪ In her spare time, she enjoys reading books and doing some volunteer work.
About Kushal
▪ 4+ years of experience on relational databases. ▪ Focus on infrastructure management for MySQL and Oracle RAC, code reviews, troubleshooting and capacity planning ▪ In spare time, like to play indoor and outdoor games
MySQL at Yahoo
▪ MySQL powers many mission-critical products at Yahoo ▪ Different configurations based on product requirements ▪ Products: Yahoo Sports, Flickr, Finance, Tumblr, Daily Fantasy, Ad System, YDS ▪ MySQL Flavor: Percona Server 5.5 to 5.6 including XtraDB Cluster ▪ Operating systems : RHEL 6.x
What are the next 50 minutes about?
§ Backups
- XtraBackup and its options
§ Recovery
- File based
- GTID enabled
- MTS without GTID
§ Backup Monitoring § Multi Source Replication Setup
Backups
§ Why we need backup?
MySQL Database Backup Method
- Hot Backup
§ mysqldump § MySQL Enterprise Backup § Xtrabackup § File System
- Cold Backup
§ File System
Why go for XtraBackup
XtraBackup is an open-source hot backup utility
§ Optimized
- Uses posix_fadvise() not to cache data in memory
- Perform more aggressive read-ahead optimizations on the source files
§ Reliable
- Checks for page corruption on each page with InnoDB’s buf_page_is_corrupted() function.
§ No interruption for writes
Xtrabackup Improvements
Percona MySQL 5.6.16-64+ use backup locks instead of “FLUSH TABLES WITH READ LOCK “ It is a light-weight lock.
- Block updates to non-transactional tables and
DDL statements for all tables
- New backup lock will not flush tables
Storage engines are not forced to close tables and tables are not expelled from the table cache.
- Only waits for conflicting statements to complete (i.e. DDL and updates to non-
transactional tables). It never waits for SELECTs, or UPDATEs to InnoDB tables to complete
https://www.percona.com/doc/percona-server/5.6/management/backup_locks.html
XtraBackup
PRESENTED BY Brian Koch ⎪ March 2014
- Records the log sequence number(LSN) when it starts
- Copies all InnoDB files which is internally in inconsistent state
- Read transaction log files to create its own change log file
- Backup locks
- Use LOCK TABLES FOR BACKUP to copy non-InnoDB files
- Use Lock BINLOG FOR BACKUP to block operation and gets binary log coordinates
- Apply xtrabackup logs to make data files consistent state
Yahoo MySQL Database Architecture
▪Dual masters single writer setup ▪Each master carries one or more slave
VIP for Write Prod master BCP master Prod slave BCP slave VIP for Read-only Backup server
Current Backup Strategy
PRESENTED BY Brian Koch ⎪ March 2014
- Full backup on Sunday and incremental backup from Monday to Saturday
- The backups are stored in external storage volume
- The backup job runs on dedicated slave or prod slave
- The binary logs are saved periodically
Prod master Dedicated slave Xtrabackup job Backup filer Copy binary logs
§ xtrabackup.addl_args: --slave-info § xtrabackup.backup_dir: /mnt/backup/ § xtrabackup.cron_bkup_ret_hour: 0 § xtrabackup.cron_bkup_ret_min: 0 § xtrabackup.cron_bkup_ret_wday: 1 § xtrabackup.retain_extra_days: 14 § xtrabackup.cron_day: * § xtrabackup.cron_hour: 7 § xtrabackup.cron_min: 0 § xtrabackup.cron_wday: 0 § xtrabackup.incremental: --incremental § xtrabackup.cron_incremental_day: * § xtrabackup.cron_incremental_hour: 7 § xtrabackup.cron_incremental_min: 0 § xtrabackup.cron_incremental_wday: 1-6 § xtrabackup.crontab: on § xtrabackup.email: page-oncall-email § xtrabackup.backup_vol_threshold: 80 § xtrabackup.backup_user: root § xtrabackup.backup_password: <path to pw file>
Xtrabackup Wrapper
Xtrabackup Options
Options Comments slave-info
Use it when taking backup is from the slave. Record its master's binary log file name and its position in the file xtrabackup_slave_info
parallel
It specifies the number of threads the xtrabackup child process should use to back up files concurrently, speed up backup process
compress
It instructs xtrabackup to compress backup copies of InnoDB data files.
use-memory
Speed up the apply log process
safe-slave- backup
stop slave SQL thread before running FLUSH TABLES WITH READ LOCK and wait to start backup until Slave_open_temp_tables in SHOW STATUS is zero.
incremental
Instructs xtrabackup to take incremental backup instead of fullbackup
Xtrabackup with Network Streaming
PRESENTED BY Brian Koch ⎪ March 2014
Target mkdir /home/tmp/backup
screen -S backup nc -l 9999 | tar -ixf - -C /home/tmp/backup
Source screen -S backup
sudo innobackupex --user=root –password=** --slave-info --stream=tar ./ | nc <target> 9999
Target The backups will be in target:/home/tmp/backup
sudo innobackupex --apply-log --use-memory=16G --parallel=8 /home/tmp/backup sudo innobackupex --move-back /home/tmp/backup sudo chown -R mysql:mysql /home/y/…/mysql start mysql_server
Backup Option: --compress
PRESENTED BY Brian Koch ⎪ March 2014
§ Saved backup space 2/3 § Take more time to complete backup § Take less time if multiple compress-threads are used
Database Server Full backup size Compresse d size Full bkup time Compressed bkup time Compress- thread db1 1.3 TB 360 GB 220 mins 255 mins 1 db2 1.3 TB 360 GB 220 mins 180 mins 8
Database Restore Comparison
PRESENTED BY Brian Koch ⎪ March 2014
DB Backup size Transfer backup to target server Decompress Apply log Total time Space saved Time saved
Full backup 1.3TB 210 mins 15 mins 225 mins Full compressed backup 360GB 75 mins 65 mins 15 mins 155 mins ~70% ~30%
The Issues Could Happen
PRESENTED BY Brian Koch ⎪ March 2014
- The database that has a lot of partitions could run into
Error number 24 means 'Too many open files’ Added “ulimit -n 65535“ in the backup script
- Transaction logs could be small
xtrabackup: error: log block numbers mismatch: xtrabackup: error: expected log block no. 98032273, but got no. 102226569 from the log file. xtrabackup: error: it looks like InnoDB log has wrapped around before xtrabackup could process all records due to either log copying being too slow, or log files being too small. xtrabackup: Error: xtrabackup_copy_logfile() failed.
Increase transaction log size or take backup from off-peak time.
CAVEATS
PRESENTED BY Brian Koch ⎪ March 2014
- The backup folder should be dedicated.
If it contains other folders or other files, the incremental backup may not be right.
- The backup purging time and backup time shall be different.
- Information_schema is not backed up. So the table create time stamp could
be lost.
CAVEATS --- continued
▪ event master : status is ENABLED slave: status is SLAVESIDE_DISABLED ▪ Master and slave are NOT same Taking backup from master to rebuild slave, event is enabled causes issue
Create event cross all database servers
▪ Login to each server
set sql_log_bin=0; create event event_name …. BEGIN IF @@global.read_only=0 THEN call del_test_data(); end if; END
Ensures master and slave identical
Backup and Recovery Strategy -- continue
Take Database Backup
PRESENTED BY Brian Koch ⎪ March 2014
0 12 * * 1 root find /backup/dbbk -maxdepth 1 -type d -ctime +14 | xargs rm -rf 0 0 * * 0 root /home/y/bin/innobackupex_y 0 0 * * 1-6 root /home/y/bin/innobackupex_y --incremental
Point-Time-Recovery
PRESENTED BY Brian Koch ⎪ March 2014
Assume there are compressed backups:
full backup: /mnt/backup/saca/2016-04-07_05-05-01 Incremental backup: /mnt/backup/saca/2016-04-08_05-00-00
- 1. Transfer backups to the restore server
- 2. Decompress the backups first
§ sudo innobackupex /home/syuan/backup/2016-04-07_05-05-01 --use-memory=32G --decompress --parallel=16 § sudo innobackupex /home/syuan/backup/2016-04-08_05-00-00 --use-memory=8G
- -decompress --parallel=8
- 3. Apply logs
§ sudo innobackupex --apply-log --redo-only /home/syuan/backup/2016-04-07_05-05-01 --use-memory=32G -- parallel=16 § sudo innobackupex --apply-log --redo-only /home/syuan/backup/2016-04-07_05-05-01 --incremental- dir=/home/syuan/backup/2016-04-08_05-00-00 --use-memory=8G
- -parallel=8
§ sudo innobackupex --apply-log /home/syuan/backup/2016-04-07_05-05-01 --use-memory=32G --parallel=16
Point-Time-Recovery - continued
PRESENTED BY Brian Koch ⎪ March 2014
- 4. Move the backups into right directory
sudo innobackupex --move-back /home/syuan/backup/2016-04-07_05-05-01
- 5. Start database
- sudo chown -R mysql:mysql /home/y/var/mysql
- yinst start mysql_server
- 6. Update root password
- 7. mysqlbinlog
- stop-datetime
- stop-position
Slave Rebuild
PRESENTED BY Brian Koch ⎪ March 2014
Get the replication coordinates
- If the backup is taken from the master, check the file xtrabackup_binlog_info in the backup folder
- If the backup is from slave, please check the file xtrabackup_slave_info
Set up the replication
CHANGE MASTER TO MASTER_HOST='master-host-name.yahoo.com', MASTER_USER='replicant', MASTER_PASSWORD='pass-word', MASTER_LOG_FILE='mysqld-bin.004464', MASTER_LOG_POS=49402644; START SLAVE; SHOW SLAVE STATUS\G
Slave Rebuild - continued
PRESENTED BY Brian Koch ⎪ March 2014
The binary logs that is needed have been purged.
- 1. Transfer the needed binary logs to the restore server
- 2. Use mysqlbinlog utility to apply them to the database.
mysqlbinlog --start-position=49402644 mysqld-bin.004464 mysqld-bin.004465 .......| mysql -u root –p
Caution: If data contains BLOB, please use binary-mode Using one db connection to process all binary logs to avoid temporary table issue.
Alternative Way to Apply Binary Logs
PRESENTED BY Brian Koch ⎪ March 2014
- Set up a fake master or binlog server
- Copy all necessary binary logs to the fake master
- Make sure the system recognize all these binary logs
Edit file names in mysqld-bin.index Restart MySQL
- Set up replications between this master and restore server
- Change master to
………
Advantages
PRESENTED BY Brian Koch ⎪ March 2014
Much safer and robust.
- Recover speed is much faster
20% faster comparing with using mysqlbinlog utility
- Best way for partial replication database to be recovered
Add ignore, replicate-do-db in my.cnf, start replication
Clone Replication Slave
PRESENTED BY Brian Koch ⎪ March 2014
- Requirements for my.cnf
relay_log_info_repository = TABLE master_info_repository = TABLE relay_log_recovery = 1
- Xtrabackup option slave-info
Take xtrabackup from the sever, restore it in target server Start slave; The replication is automatically enabled.
Start Slave Issues
PRESENTED BY Brian Koch ⎪ March 2014
- Start slave fails
Error log: Slave failed to initialize relay log info structure from the repository
Explanation: Relay log info is messed up.
reset slave all; change master to … start slave;
Start Slave Issues - GTID
Enable Replication in Slave Issue
mysql> SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533'; ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
How to fix
▪
RESET MASTER; ▪ SET GLOBAL gtid_purged='ffee1ff8-363f-11e5-af47-9cb654954cac:1-29123533’; ▪ CHANGE MASTER TO MASTER_HOST="mastername", master_user='rep_user', master_password='rep_password', MASTER_AUTO_POSITION = 1; ▪ START SLAVE;
Special backup case – MTS without GTID enabled
PRESENTED BY Brian Koch ⎪ March 2014
- Can not add --slave-info in MTS without GTID.
- How to get the consistent backup
1. Normal way: STOP SLAVE; START SLAVE UNTIL SQL_AFTER_MTS_GAP; 2. The alternative way: Backup the database as normal
Recovery:
- 1. The table slave_worker_info records each worker detail, the required relay log from backup
server or corresponding mysql binary logs can be found from its master.
- 2. START SLAVE UNTIL SQL_AFTER_MTS_GAPS; to fill all gaps,
- 3. Then direct new slave to the existing master as usual.
PITR– MTS without GTID enabled
PRESENTED BY Brian Koch ⎪ March 2014
Follow previous instructions to restore the database backup.
mysql> select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info; +----+------------------------------------------------+---------------+-------------------+----------------+ | ID | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos | +----+------------------------------------------------+---------------+-------------------+----------------+ | 1 | /home/y/var/mysql/data/mysqld-relay-bin.000187 | 946879583 | mysqld-bin.000059 | 946879419 | | 2 | /home/y/var/mysql/data/mysqld-relay-bin.000187 | 947931963 | mysqld-bin.000059 | 947931799 | | 3 | /home/y/var/mysql/data/mysqld-relay-bin.000187 | 947515423 | mysqld-bin.000059 | 947515259 | +----+------------------------------------------------+---------------+-------------------+----------------+ 3 rows in set (0.00 sec)
Please copy mysqld-relay-bin.000187 from the backup slave to the restore server.
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
If the relay log file exists
1. Edit mysqld-relay-bin.index with this new relay log file. 2. Edit file permission chown mysql:mysql mysqld-relay-bin.000187
Example: [root@bf1-trfdb-001 data]# cat mysqld-relay-bin.index /home/y/var/mysql/data/mysqld-relay-bin.000001 /home/y/var/mysql/data/mysqld-relay-bin.000187
- 3. Restart mysqld to make the system recognized the new relay log
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
Use mysql binary log which is from the master
1. Rename mysqld-bin.000059 to mysqld-relay-bin.000002 Edit mysqld-relay-bin.index with this new relay log file.
- 2. Edit file permission chown mysql:mysql mysqld-relay-bin.000002
Example: [root@bf1-trfdb-001 data]# cat mysqld-relay-bin.index /home/y/var/mysql/data/mysqld-relay-bin.000001 /home/y/var/mysql/data/mysqld-relay-bin.000002
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
- 3. update mysql.slave_worker_info
set relay_log_name= ‘…./mysqld-relay-bin.000002',relay_log_pos=master_log_pos;
mysql> select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info; +----+------------------------------------------------+---------------+-------------------+----------------+ | ID | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos | +----+------------------------------------------------+---------------+-------------------+----------------+ | 1 | /home/y/var/mysql/data/mysqld-relay-bin.000002 | 946879419 | mysqld-bin.000059 | 946879419 | | 2 | /home/y/var/mysql/data/mysqld-relay-bin.000002 | 947931799 | mysqld-bin.000059 | 947931799 | | 3 | /home/y/var/mysql/data/mysqld-relay-bin.000002 | 947515259 | mysqld-bin.000059 | 947515259 | +----+------------------------------------------------+---------------+-------------------+----------------+
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
- 4. Fill all holes.
START SLAVE UNTIL SQL_AFTER_MTS_GAPS;
mysql> select ID,Relay_log_name,Relay_log_pos,Master_log_name,Master_log_pos from mysql.slave_worker_info; +----+----------------+---------------+-----------------+----------------+ | ID | Relay_log_name | Relay_log_pos | Master_log_name | Master_log_pos | +----+----------------+---------------+-----------------+----------------+ | 1 | | 0 | | 0 | | 2 | | 0 | | 0 | | 3 | | 0 | | 0 | +----+----------------+---------------+-----------------+----------------+
3 rows in set (0.00 sec)
- 5. Change master to ….; start slave;
PITR– MTS without GTID enabled --Continued
PRESENTED BY Brian Koch ⎪ March 2014
TABLE mysql.slave_relay_log_info is NOT updated relay_log_recovery=1 starts auto recovery
Relay_Log_File: mysqld-relay-bin.000001 Relay_Log_Pos: 4
Replication is screwed!!
https://bugs.mysql.com/bug.php?id=81385
MySQL Backup Monitoring
PRESENTED BY Brian Koch ⎪ March 2014
Email alert if backup fails Backup dashboard
Multi-source Replication
PRESENTED BY Brian Koch ⎪ March 2014
- Enable replication slave to receive transactions from multiple sources simultaneously
- Merge table shards, data aggregation for reporting purpose
- Back up one database server instead of of different multitenant clusters.
- Automate database restore from prod to non-prod.
master1 master2 master3 slave Binlog channel 1 Binlog channel 2 Binlog channel 3 Binlogs
Multi-source Replication
PRESENTED BY Brian Koch ⎪ March 2014
- Pre-condition in target slave:
master-info-repository=TABLE relay-log-info-repository=TABLE slave is MySQL57
- Replication topology can be (GTID) based or binary log position-based
- Change master to master_host=‘master1’,… FOR CHANNEL 'master-1';
- Start slave FOR CHANNEL 'master-1';
Stop slave FOR CHANNEL 'master-1'; Show slave status FOR CHANNEL 'master-1’\G; SELECT * FROM performance_schema.replication_connection_status\G
Multi-source Replication -setup
PRESENTED BY Brian Koch ⎪ March 2014
- Source: MySQL 5.6 Target: MySQL 5.7
- Source: take xtrabackup using --slave-info in the master, transfer the backup to the slave
generate a file contains create database/table sql statement without data
- Target: innobackupex --apply-log --export /home/backup
Load database structure sql script in the slave Generate “ALTER TABLE TABLENAME DISCARD|IMPORT TABLESPACE;” sql scripts Target: Run drop tablespace script Copy backups(*.ibd,*.exp, *.cfg) into right database folders Run import tablespace script Get the coordinates from xtrabackup_slave_info to setup replication
Multi-source Replication -setup
PRESENTED BY Brian Koch ⎪ March 2014
Target slave: Reset master; SET GLOBAL gtid_purged=…… change master to master_host=’master1', master_port=3306, master_user='rep', master_password='pass' , MASTER_AUTO_POSITION = 1 for channel 'ch1_master1’; start slave for channel 'ch1_master1';
Multi-source Replication - Issues
PRESENTED BY Brian Koch ⎪ March 2014
Pre-check: Any temporal time columns Any antelope tables Rebuild all those tables Issue mysql> alter table application IMPORT TABLESPACE; ERROR 1808 (HY000): Schema mismatch (Table flags don't match, server table has 0x6 and the meta-data file has 0x1) Drop table and recreate table with ROW_FORMAT=compact Copy backup and import this tablespace
Multi-source Replication - restrictions
PRESENTED BY Brian Koch ⎪ March 2014
1. All tables should have the newer format for timestamp columns 2. innodb_file_per_table should be used 3. All files should be in Barracuda format 4. Database name has to be different crossing multiple masters