[PPT] - Demystifying MySQL Replication Crash Safety Presented at Percona PowerPoint Presentation

SLIDE 1

Demystifying MySQL Replication Crash Safety

Presented at Percona Live Europe 2018 in Frankfurt by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com

SLIDE 2

2

Introducing MessageBird

MessageBird is a cloud communications platform founded in Amsterdam 2011. Examples of our messaging and voice SaaS:

SMS in and out, call in (IVR) and out (alert), SIP, WhatsApp, Facebook, Telegram, Twitter, WeChat, … Omni-Channel Conversation

Details at www.messagebird.com

225+ Direct-to-Carrier Agreements

With operators from around the world

15,000+ Customers

In over 60+ countries

180+ Employees

Engineering office in Amsterdam Sales and support offices worldwide

We are expanding :

{Software, Front-End, Infrastructure, Data, Security, Telecom, QA} Engineers {Team, Tech, Product} Leads, Product Owners, Customer Support {Commercial, Connectivity, Partnership} Managers www.messagebird.com/careers

SLIDE 3

3

Summary

(Demystifying MySQL Replication Crash Safety – PLEU2018)

Helicopter view of – and then Zoom in – Replication and Crash Safety
MySQL 5.6 solution (and its problems)
Complexifying things with GTIDs and Multi-Threaded Slave (MTS)
Impacts of reducing / compromising durability

(sync_binlog != 1 and trx_commit != 1)

Overview of related subjects: Semi-Sync, MariaDB & Pseudo-GTIDs
Closing, links, bugs and questions

SLIDE 4

Overview of MySQL Replication

(Demystifying MySQL Replication Crash Safety – PLEU2018)

One master with one or more slaves:

The master records transactions in a journal (binary logs); each slave:
Downloads the journal and saves it locally in the relay logs (IO thread)
Executes the relay logs on its local database (SQL thread)
Could also produce binary logs to be a master (log-slave-updates – lsu)

SLIDE 5

Replication Crash Safety

(Demystifying MySQL Replication Crash Safety – PLEU2018)

What do I mean by Replication Crash Safety ?

When a slave crashes, it is able to resume replication after recovery

(OK if rewinds its state after recovery, as long as it is eventually consistent)

When a master crashes, slaves are able to resume replicating from it
All above without sacrificing data consistency
In other words: ACID is not compromised by a slave or a master crash

(Discussion limited to transactional SE: InnoDB, TokuDB, MyRocks; obviously not MyISAM)

Intermediate masters (IM) qualify both as master and slave Slaves are potential master (and IM) in some failover strategy

(Proving replication crash un-safety is easy, proving safety is hard)

5

SLIDE 6

6

State of the Dolphin and of the Sea Lion

(Demystifying MySQL Replication Crash Safety – PLEU2018)

State of the Dolphin in Replication Crash Safety:

MySQL 5.5 is not crash safe
MySQL 5.6 can be made crash safe (it is not by default)
MySQL 5.7 is mostly the same as 5.6

(with complexity added by Logical Lock parallel replication)

MySQL 8.0 is crash safe by default

(but it can be made unsafe by “tuning” the configuration)

Quick state of the Sea Lion:

MariaDB 5.5 is not replication crash safe
MariaDB 10.x can be made crash safe

SLIDE 7

7

Zoom in the details [1 of 3]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

More details about replication:

The IO Thread stores its state in master info (also configuration stored there)
The SQL Thread in relay log info

slave1 [localhost] {msandbox} ((none)) > show slave status\G *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event [...] Master_Log_File: mysql-bin.000001 <-------+-- master info (persisted state) Read_Master_Log_Pos: 25489 <-------+ Relay_Log_File: mysql-relay.000002 <--+ Relay_Log_Pos: 10788 <--+ Relay_Master_Log_File: mysql-bin.000001 <--+-- relay log info (persisted state) [...] | Exec_Master_Log_Pos: 10575 <--+ [...] 1 row in set (0.00 sec)

SLIDE 8

More parameters: sync_master_info, sync_relay_log and sync_relay_log_info In MySQL 5.5, master info and relay log info are files:

No atomicity of “making progress” and “state tracking” for IO & SQL Threads
Consistency of actual vs registered state is compromised after a crash

Ø This is why replication is not crash-safe in MySQL 5.5

8

Zoom in the details [2 of 3]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

SLIDE 9

9

Zoom in the details [3 of 3]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

Even more parameters:

sync_binlog (and innodb_flush_log_at_trx_commit – trx_commit):
Binlogs are synchronised to disk after every N writes/transactions

(default 0 in My|SQL 5.5 and 5.6; and in 5.7 and 8.0 it is 1 which is full ACID)

trx_commit

= 1: logs written and flushed each trx (full ACID and default) = 0: written and flushed once per second (not crash safe) = 2: written after each trx and flushed once per second (mysqld crash safe, not OS crash safe)

SLIDE 10

MySQL 5.6 solution [1 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Reminder: problems making MySQL 5.5 Replication Crash Un-Safe:

The position of the SQL Thread cannot be trusted
The position of the IO Thread cannot be trusted
The content of the Relay Logs cannot be trusted

10

SLIDE 11

MySQL 5.6 solution [2 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) The MySQL 5.6 solution

Atomicity for SQL Thread: relay-log-info-repository = TABLE (default = FILE)
Useless for crash safety: a parameter to store master info in a table:
master-info-repository = TABLE (default = FILE)
Providing a way to “fix” the relay logs: relay-log-recovery = 1 (default = 0)

SLIDE 12

12

MySQL 5.6 solution [3 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) More details about Relay Log Recovery:

relay-log-recovery is only used on mysqld startup (dynamic would be useless)
If relay-log-recovery = 0, nothing special done (and a new relay log is created)
If relay-log-recovery = 1:
The position of the IO Thread is set to the position of the SQL Thread
The position of the SQL Thread is set to the newly created relay log
If relay-log-purge = 1: the old relay logs will be deleted on SQL Thread startup

(relay-log-recovery does not delete anything: easy to test with skip-slave-start) Ø Said otherwise, the previous relay logs are skipped ! (those relay logs are considered improper for SQL Thread consumption)

This will happen even if MySQL (or the IO Thread) did not crash

OK for 1st implementation but a waste of perfectly good relay logs

SLIDE 13

13

MySQL 5.6 solution [4 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) In MySQL 5.7:

No change of defaults (for replication crash safety)
Relay log recovery still simplistic

K

In MySQL 8.0:

Still simplistic relay log recovery

L

New defaults:
relay-log-info-repository = TABLE

J

relay-log-recovery = 1

J

master-info-repository = TABLE (not sure this is very useful)

Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave Bug#74321: Execute relay-log-recovery only when needed

SLIDE 14

Adding complexity with GTIDs [1 of 2]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Not only MySQL 5.6 introduces replication crash safety, it also introduced Global Transaction IDs (GTIDs)

This tags every transaction with an ID when writing to the binlogs
The GTID state of the master and slaves are tracked in the binlogs

Ø IO and SQL Thread states are now partially in the binlogs (and relay logs)

Optionally, slaves can use GTID to replicate (instead of file+position)
This allows easier repointing of slaves to a new master (including fail over)
This heavily relies on precise tracking of GTID states on master and slaves

Ø As this tracking is in the binlogs, this is impeded when sync_binlog != 1 Bug#70659: Make crash safe slave with gtid + less durable settings

14

SLIDE 15

Adding complexity with GTIDs [2 of 2]

(Demystifying MySQL Replication Crash Safety – PLEU2018) To make replication crash safe with GTIDs in MySQL 5.6:

relay-log-info-repository = TABLE (default = FILE)
relay-log-recovery = 1 (default = 0) – (Bug#92093)
sync_binlog

= 1 (default = 0)

In 5.7, the default is sync_binlog = 1 J (two other unchanged K)
In 8.0, all the defaults are good for crash safe replication with GTID J J
MySQL 5.7 adds a table for storing the GTID state of slaves:
Allows GTIDS slaves without log-slave-updates (lsu)
With lsu, this table (mysql.gtid_executed) is not updated after each trx

Ø Missed opportunity for OS crash safety with sync_binlog != 1 L L L Bug#92109: Make GTID replication crash safe with less durable setting

15

SLIDE 16

16

Master Replication Crash Safety [1 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Relaxing durability of the binlogs implies losing GTID state (after an OS crash)

What about the consequence on the master ? With and without GTID ?
If sync_binlog != 1 on the master, an OS crash will lose binlogs
With sync_binlog != 1, usually trx_commit != 1 (normally 2, but can be 0)
trx_commit = 2 preserves data on mysqld crashes, 0 does not (à 2 is better)

Ø InnoDB will also lose transactions on an OS crash Ø After an OS crash, InnoDB will be out-of-sync with the binlogs Ø And we cannot trust the binlogs on such master (trx gap or ghost trx)

The failure mode will be different depending on the configuration

SLIDE 17

17

Master Replication Crash Safety [2 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

With file+position

IO Thread in vanished binlogs
So slaves executed phantom trx

(ghost in binlogs, maybe not in InnoDB)

When the master is restarted:
It records trx in new binlog file
Most slaves are broken, and they might be out-of-sync with each-others
Some lagging slave might skip vanished binlogs

SLIDE 18

18

Master Replication Crash Safety [2 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

With file+position

IO Thread in vanished binlogs
So slaves executed phantom trx

(ghost in binlogs, maybe not in InnoDB)

When the master is restarted:
It records trx in new binlog file
Most slaves are broken, and they might be out-of-sync with each-others
Some lagging slave might skip vanished binlogs

Ø Broken slaves have more data than the master (à data drift) Ø And different data drift on “lucky” lagging slaves that might not break

SLIDE 19

19

Master Replication Crash Safety [3 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

With GTID enabled

Slave also executed ghost

trx vanished from binlogs

But those are in their GTID state
A recovered master reuses

GTIDs of the vanished trx

Slaves magically reconnect to the master (MASTER_AUTO_POSITION = 1)
1. If master has not reused all ghost GTIDs, then the slave breaks
2. If it has, then the slave skips the new transactions à more data drift

(in illustration, the slave will skip new 50 to 58 as it has the old one)

SLIDE 20

20

Master Replication Crash Safety [4 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

With GTID enabled but MASTER_AUTO_POSITION = 0

Left as an exercise to the reader…

On the consequences of sync_binlog != 1 (part #1)

https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html

(more posts to be published in the series)

SLIDE 21

Master Replication Crash Safety [5 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Summary of running with sync_binlog != 1:

The binlogs – of the master or slave – cannot be trusted after an OS crash
On a master, having mysqld normally restarts after such a crash leads to data drift

Ø After an OS crash, make sure no slaves reconnect to the recovered master (OFFLINE_MODE = ON in config file – failing-over to a slave is the way forward)

On slaves, having mysqld restarts after such a crash leads to truncated binlogs

Ø After an OS crash, consider purging all binlogs on the recovered slave

Intermediate Masters (IM) are both master and slaves

Ø After an OS crash make sure no slaves reconnect to the recovered IM Ø And consider purging all binary logs on it

Remember: GTID state corrupted on slaves after OS crash (Bug#92109)

21

SLIDE 22

22

Adding complexity with MTS [1 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Multi-Threaded Slave (MTS) in MySQL 5.6 is doing out-of-order committing

Same for MySQL 5.7 with DATABASE and LOGICAL_CLOCK types
LOGICAL_CLOCK also has the slave_preserve_commit_order option

(OFF by default in 5.7 and 8.0 K, with ON requiring log-slave-updates L) (Bug#75396: Allow slave_preserve_commit_order without log-slave-updates)

Example: transactions A, B, C, D, E on the master

On a slave, SHOW SLAVE STATUS points to B, so A is committed
C and E are also committed, B is running and D is pending scheduling

(maybe B and D are in the same schema with DATABASE type)

With out-of-order commit, a file+position in relay log info is not enough

GTID allows tracking complex position (generating temporary holes on slaves)
And there is the mysql.slave_worker_info table

(https://dev.mysql.com/worklog/task/?id=5599: for more details)

SLIDE 23

23

Adding complexity with MTS [2 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Without GTID, resuming replication after a crash needs filling the gap in trx

Manual, error-prone, and not always possible before 5.6.31 and 5.7.13 (Bug#77496)
Now, automated by doing START SLAVE UNTIL SQL_AFTER_MTS_GAPS
But this needs relay logs, which might have vanished after an OS crash (Bug#81840)

SLIDE 24

24

Adding complexity with MTS [3 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

Bug#81840 makes MTS with File+Position OS crash unsafe (safe for mysqld crash)
Hard to accept workaround: sync_relay_log = 1 (performance killer)
Full state in mysql.slave_worker_info à recovery possible with a lot of effort
The good solution would be a better relay log recovery (Bug#93081)

SLIDE 25

25

Adding complexity with MTS [4 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is replication crash safe:

But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093)
And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …)
Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off)
And care with sync_binlog != 1 on the master (need to fail over if OS crash)

(sync_binlog != 1 should not be needed in 95% of cases) (Group Commit and MTS make this optimisation almost obsolete)

Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14:

GTID executed on the slave is 1-10:12:14 before a crash
Replication resumes by fetching 11:13:15… (after relay log recovery)

SLIDE 26

26

Adding complexity with MTS [4 of 4]

(Demystifying MySQL Replication Crash Safety – PLEU2018) With GTID, MTS in MySQL 5.6, 5.7 & 8.0 is should be crash safe :

But it needs MASTER_AUTO_POSITION = 1 (and relay log recovery Bug#92093)
And it comes with all the GTID “goodies” (rogue transactions, lsu for 5.6, …)
Also needs sync_binlog = 1 (if 5.7+, also works without binlogs or lsu off)
And care with sync_binlog != 1 on the master (need to fail over if OS crash)

Bug#92882: MTS not replication crash-safe with GTID and all the right parameters (Only applies to Operating System crashes)

Example: A, B, C, D, E on the master with GTID 10, 11, 12, 13, 14:

GTID executed on the slave is 1-10:12:14 before Operating System crash
Relay log recovery tries to “fill the gaps” but fails because relay logs are gone

(This might be a regression from the fix of Bug#77496) (Easy workaround: stop slave; reset slave; start slave;)

SLIDE 27

27

Related subjects – Semi-Sync

(Demystifying MySQL Replication Crash Safety – PLEU2018) In this talk, we did not cover master failover explicitly, when a master crashes in an unrecoverable way, failover needs to happen When failing-over to a slave, committed transactions can be lost (Some transactions on the crashed master might not have reached slaves) à violation of durability (ACID) in the replication topology (distributed system) Except if lossless semi-sync is used, more details in:

Question about Semi-Synchronous Replication: the Answer with All the Details

https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous- replication-answer-with-all-the-details/

SLIDE 28

28

Related subjects – MariaDB

(Demystifying MySQL Replication Crash Safety – PLEU2018)

MariaDB still stores its master info and relay log info in files

But it stores GTID state of slaves in the mysql.gtid_slave_pos table

Ø MariaDB is replication crash safe when using GTID slave positioning

Also, it has an interesting feature:

If using more than one storage engine, a single state table is not optimal
Having one such table per storage engine could be better

Improving replication with multiple storage engines (MariaDB 10.3)

https://kristiannielsen.livejournal.com/19223.html

SLIDE 29

29

Related subjects – Pseudo GTIDs

(Demystifying MySQL Replication Crash Safety – PLEU2018) Pseudo-GTIDs:

A way to get GTID-like features without GTIDs
They work with any version of MySQL/MariaDB (even 5.5)
But they assume in-order-commit à does not work with MTS

They can provide slave replication crash safety:

With log-slave-updates and sync_binlog = 1
Even on MySQL 5.5 or MariaDB 5.5

https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md

SLIDE 30

Conclusion

(Demystifying MySQL Replication Crash Safety – PLEU2018)

It is complicated and it depends…
It has many edge cases
It might still change as bugs are fixed
And hopefully improvements will be made
So sorry: there is no short version

SLIDE 31

Conclusion [2 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Some parameters never impact/improve Replication Crash Safety:

master-info-repository, sync_master_info, sync_relay_log_info

Some parameters are always needed for Replication Crash Safety:

relay-log-info-repository = TABLE
relay-log-recovery = 1

SLIDE 32

32

Conclusion [3 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018) MySQL 5.6 with GTID (with and without MTS) à crash safe slave if:

All above with sync_binlog = 1 (not default) and MASTER_AUTO_POSITION = 1

and maybe a “stop slave; reset slave; start slave;” (Bug#92882)

MySQL 5.6 without GTID and with MTS à not always crash safe slaves:

OK for MySQL crashes as relay logs are not lost
For OS crashes, losing the relay logs leads to replication breakage (Bug#81840)
Possible to recover with some voodoo and dark magic (Bug#93081)

For master and slaves, binlogs cannot be trusted if sync_binlog != 1

SLIDE 33

33

Conclusion [4 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018) MySQL 5.7 is mostly the same as 5.6:

sync_binlog = 1 is the default

J

Will be crash safe with GTID and sync_binlog != 1 when Bug#92109 fixed
LOGICAL_CLOCK with slave_preserve_commit_order like single-threaded
Without slave_preserve_commit_order, same as MTS in 5.6

MySQL 8.0 is mostly the same as 5.7 with safer defaults:

relay-log-info-repository = TABLE

J

relay-log-recovery = 1

J

But default for slave_preserve_commit_order is still 0

K

SLIDE 34

34

Conclusion [5 of 5]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

Care with MTS as it has many traps And in all cases:

Relay log recovery needs to re-download relay logs from the master
High load in case of lagging (or delayed) slaves

L

Will fail if the binary logs were purged from the master

L

Relay log recovery also fails for MTS and OS crashes (vanished relay logs)

L L L

We need a better Relay Log Recovery ! Bug#74321, Bug#74323, Bug#74324, Bug#81840 Bug#92882, Bug#93081

SLIDE 35

35

Links [1 of 3]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

Crash-Safe MySQL Replication - A Visual Guide

https://hackmongo.com/post/crash-safe-mysql-replication-a-visual-guide/ (diagrams in this talk are inspired by this post)

Jean-François’s blog posts about Replication Crash Safety:

Better Crash-safe replication for MySQL

https://medium.com/booking-com-infrastructure/better-crash-safe-replication-for-mysql-a336a69b317f

Replication crash safety with MTS in MySQL 5.6 and 5.7: reality or illusion?

https://jfg-mysql.blogspot.com/2016/01/replication-crash-safety-with-mts.html

A discussion about sync-master-info and other replication parameters

https://jfg-mysql.blogspot.com/2016/08/discussion-about-sync-master-info-and-replication-parameters.html

On the consequences of sync_binlog != 1 (part #1)

https://jfg-mysql.blogspot.com/2018/10/consequences-sync-binlog-neq-1-part-1.html

SLIDE 36

36

Links [2 of 3]

(Demystifying MySQL Replication Crash Safety – PLEU2018) Directly related bugs:

Bug#70669: Slave can't continue repl. after master's recovery (old – 5.6.14, and fixed – 5.6.17)
Bug#70659: Make crash safe slave work with gtid + less durable settings
Bug#74321: Execute relay-log-recovery only when needed
Bug#74323: Avoid overloading the master NIC on relay-log-recovery of a lagging slave
Bug#74324: Make keeping relay logs (relay_log_purge = 0) crash safe
Bug#77496: Replication position lost after crash on MTS configured slave (really fixed ?)
Bug#81840: Automatic Replication Recovery Does Not Handle Lost Relay Log Events
Bug#92093: Replication crash safety needs relay_log_recovery even with GTID
Bug#92109: Please make replication crash safe with GITD and less durable setting (bis)
Bug#92882: MTS not replication crash-safe with GTID and all the right parameters
Bug#93081: Please implement a better relay log recovery

Somehow related bugs:

Bug#75396: Allow slave_preserve_commit_order without log-slave-updates
Bug#92891: Please make relay_log_space_limit dynamic

SLIDE 37

Links [3 of 3]

(Demystifying MySQL Replication Crash Safety – PLEU2018)

Question about Semi-Synchronous Replication: the Answer with All the Details

https://www.percona.com/community-blog/2018/08/23/question-about-semi-synchronous-replication-answer-with-all-the-details/

Improving replication with multiple storage engines (in MariaDB 10.3)

https://kristiannielsen.livejournal.com/19223.html

Pseudo-GTID and Orchestrator:

https://github.com/github/orchestrator/blob/master/docs/pseudo-gtid.md https://speakerdeck.com/shlominoach/pseudo-gtid-and-easy-mysql-replication-topology-management

The Full MySQL and MariaDB Parallel Replication Tutorial

https://www.slideshare.net/JeanFranoisGagn/the-full-mysql-and-mariadb-parallel-replication-tutorial

Arg: relay_log_space_limit is (still) not dynamic !

https://jfg-mysql.blogspot.com/2018/10/arg-relay-log-space-limit-is-still-not-dynamic.html

Evaluating MySQL Parallel Replication Part 2: Slave Group Commit

https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-2-slave-group-commit-459026a141d2

Evaluating MySQL Parallel Replication Part 4: More Benchmarks in Production

https://medium.com/booking-com-infrastructure/evaluating-mysql-parallel-replication-part-4-more-benchmarks-in-production-49ee255043ab

SLIDE 38

Thanks !

Presented at Percona Live Europe 2018 in Frankfurt by Jean-François Gagné Senior Infrastructure Engineer / System and MySQL Expert jeanfrancois AT messagebird DOT com