demystifying mysql replication crash safety
play

Demystifying MySQL Replication Crash Safety Presented at Percona - PowerPoint PPT Presentation

Demystifying MySQL Replication Crash Safety Presented at Percona Live Europe 2018 in Frankfurt by Jean-Franois Gagn Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com Introducing MessageBird 225+


  1. Demystifying MySQL Replication Crash Safety Presented at Percona Live Europe 2018 in Frankfurt by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com

  2. Introducing MessageBird 225+ Direct-to-Carrier Agreements With operators from around the world MessageBird is a cloud communications platform founded in Amsterdam 2011. 15,000+ Customers Examples of our messaging and voice SaaS: In over 60+ countries SMS in and out, call in (IVR) and out (alert), SIP, WhatsApp, Facebook, Telegram, Twitter, WeChat, … 180+ Employees Omni-Channel Conversation Engineering office in Amsterdam Sales and support offices worldwide Details at www.messagebird.com We are expanding : {Software, Front-End, Infrastructure, Data, Security, Telecom, QA} Engineers {Team, Tech, Product} Leads, Product Owners, Customer Support {Commercial, Connectivity, Partnership} Managers 2 www.messagebird.com/careers

  3. Summary (Demystifying MySQL Replication Crash Safety – PLEU2018) • Helicopter view of – and then Zoom in – Replication and Crash Safety • MySQL 5.6 solution (and its problems) • Complexifying things with GTIDs and Multi-Threaded Slave ( MTS ) • Impacts of reducing / compromising durability ( sync_binlog != 1 and trx_commit != 1 ) • Overview of related subjects: Semi-Sync, MariaDB & Pseudo-GTIDs • Closing, links, bugs and questions 3

  4. Overview of MySQL Replication (Demystifying MySQL Replication Crash Safety – PLEU2018) One master with one or more slaves: • The master records transactions in a journal (binary logs); each slave: • Downloads the journal and saves it locally in the relay logs (IO thread) • Executes the relay logs on its local database (SQL thread) • Could also produce binary logs to be a master ( log-slave-updates – lsu )

  5. Replication Crash Safety (Demystifying MySQL Replication Crash Safety – PLEU2018) What do I mean by Replication Crash Safety ? • When a slave crashes, it is able to resume replication after recovery (OK if rewinds its state after recovery, as long as it is eventually consistent) • When a master crashes, slaves are able to resume replicating from it • All above without sacrificing data consistency • In other words: A C I D is not compromised by a slave or a master crash (Discussion limited to transactional SE: InnoDB, TokuDB, MyRocks; obviously not MyISAM) Intermediate masters ( IM ) qualify both as master and slave Slaves are potential master (and IM) in some failover strategy (Proving replication crash un-safety is easy, proving safety is hard) 5

  6. State of the Dolphin and of the Sea Lion (Demystifying MySQL Replication Crash Safety – PLEU2018) State of the Dolphin in Replication Crash Safety: • MySQL 5.5 is not crash safe • MySQL 5.6 can be made crash safe (it is not by default) • MySQL 5.7 is mostly the same as 5.6 (with complexity added by Logical Lock parallel replication) • MySQL 8.0 is crash safe by default (but it can be made unsafe by “ tuning ” the configuration) Quick state of the Sea Lion: • MariaDB 5.5 is not replication crash safe • MariaDB 10.x can be made crash safe 6

  7. Zoom in the details [1 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) More details about replication: • The IO Thread stores its state in master info ( also configuration stored there ) • The SQL Thread in relay log info slave1 [localhost] {msandbox} ((none)) > show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event [...] Master_Log_File: mysql-bin.000001 <-------+-- master info ( persisted state ) Read_Master_Log_Pos: 25489 <-------+ Relay_Log_File: mysql-relay.000002 <--+ Relay_Log_Pos: 10788 <--+ Relay_Master_Log_File: mysql-bin.000001 <--+-- relay log info ( persisted state ) [...] | Exec_Master_Log_Pos: 10575 <--+ [...] 7 1 row in set (0.00 sec)

  8. Zoom in the details [2 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) More parameters: sync_master_info, sync_relay_log and sync_relay_log_info In MySQL 5.5, master info and relay log info are files: • No atomicity of “making progress” and “state tracking” for IO & SQL Threads • Consistency of actual vs registered state is compromised after a crash Ø This is why replication is not crash-safe in MySQL 5.5 8

  9. Zoom in the details [3 of 3] (Demystifying MySQL Replication Crash Safety – PLEU2018) Even more parameters: • sync_binlog ( and innodb_flush_log_at_trx_commit – trx_commit ): • Binlogs are synchronised to disk after every N writes/transactions (default 0 in My|SQL 5.5 and 5.6; and in 5.7 and 8.0 it is 1 which is full A C I D ) • trx_commit = 1: logs written and flushed each trx ( full A C I D and default ) = 0: written and flushed once per second (not crash safe) = 2: written after each trx and flushed once per second (mysqld crash safe, not OS crash safe) 9

  10. MySQL 5.6 solution [1 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) Reminder: problems making MySQL 5.5 Replication Crash Un-Safe: • The position of the SQL Thread cannot be trusted • The position of the IO Thread cannot be trusted • The content of the Relay Logs cannot be trusted 10

  11. MySQL 5.6 solution [2 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) The MySQL 5.6 solution • Atomicity for SQL Thread: relay - log - info - repository = TABLE ( default = FILE ) • Useless for crash safety: a parameter to store master info in a table: • master-info-repository = TABLE (default = FILE ) • Providing a way to “fix” the relay logs: relay - log - recovery = 1 ( default = 0 )

  12. MySQL 5.6 solution [3 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) More details about Relay Log Recovery : • relay-log-recovery is only used on mysqld startup ( dynamic would be useless ) • If relay-log-recovery = 0 , nothing special done ( and a new relay log is created ) • If relay-log-recovery = 1 : • The position of the IO Thread is set to the position of the SQL Thread • The position of the SQL Thread is set to the newly created relay log • If relay-log-purge = 1: the old relay logs will be deleted on SQL Thread startup ( relay-log-recovery does not delete anything: easy to test with skip-slave-start ) Ø Said otherwise, the previous relay logs are skipped ! (those relay logs are considered improper for SQL Thread consumption) • This will happen even if MySQL (or the IO Thread) did not crash OK for 1 st implementation but a waste of perfectly good relay logs 12

  13. MySQL 5.6 solution [4 of 4] (Demystifying MySQL Replication Crash Safety – PLEU2018) In MySQL 5.7: • No change of defaults ( for replication crash safety ) • Relay log recovery still simplistic K In MySQL 8.0: • Still simplistic relay log recovery L • New defaults: • relay-log-info-repository = TABLE J • relay-log-recovery = 1 J • master-info-repository = TABLE ( not sure this is very useful ) Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave Bug#74321: Execute relay-log-recovery only when needed 13

  14. Adding complexity with GTIDs [1 of 2] (Demystifying MySQL Replication Crash Safety – PLEU2018) Not only MySQL 5.6 introduces replication crash safety, it also introduced Global Transaction IDs ( GTIDs ) • This tags every transaction with an ID when writing to the binlogs • The GTID state of the master and slaves are tracked in the binlogs Ø IO and SQL Thread states are now partially in the binlogs (and relay logs) • Optionally, slaves can use GTID to replicate ( instead of file+position ) • This allows easier repointing of slaves to a new master ( including fail over ) • This heavily relies on precise tracking of GTID states on master and slaves Ø As this tracking is in the binlogs, this is impeded when sync_binlog != 1 Bug#70659: Make crash safe slave with gtid + less durable settings 14

  15. Adding complexity with GTIDs [2 of 2] (Demystifying MySQL Replication Crash Safety – PLEU2018) To make replication crash safe with GTIDs in MySQL 5.6: • relay-log-info-repository = TABLE (default = FILE ) • relay-log-recovery = 1 (default = 0) – (Bug#92093) • sync_binlog = 1 (default = 0) • In 5.7, the default is sync_binlog = 1 J ( two other unchanged K ) • In 8.0, all the defaults are good for crash safe replication with GTID J J • MySQL 5.7 adds a table for storing the GTID state of slaves: • Allows GTIDS slaves without log-slave-updates ( lsu ) • With lsu, this table ( mysql.gtid_executed ) is not updated after each trx Ø Missed opportunity for OS crash safety with sync_binlog != 1 L L L Bug#92109: Make GTID replication crash safe with less durable setting 15

  16. Master Replication Crash Safety [1 of 5] (Demystifying MySQL Replication Crash Safety – PLEU2018) Relaxing durability of the binlogs implies losing GTID state ( after an OS crash ) • What about the consequence on the master ? With and without GTID ? • If sync_binlog != 1 on the master, an OS crash will lose binlogs • With sync_binlog != 1 , usually trx_commit != 1 ( normally 2, but can be 0 ) • trx_commit = 2 preserves data on mysqld crashes, 0 does not ( à 2 is better ) Ø InnoDB will also lose transactions on an OS crash Ø After an OS crash, InnoDB will be out-of-sync with the binlogs Ø And we cannot trust the binlogs on such master (trx gap or ghost trx) The failure mode will be different depending on the configuration 16

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend