errant gtids breaking replication
play

Errant GTIDs breaking replication Dieter Adriaenssens Ghent - PowerPoint PPT Presentation

Errant GTIDs breaking replication Dieter Adriaenssens Ghent University Who am I? Dieter Adriaenssens Linux System Administrator MySQL DBA Works at Ghent University Open Source : former phpMyAdmin team member Lives in


  1. Errant GTIDs breaking replication Dieter Adriaenssens Ghent University

  2. Who am I? Dieter Adriaenssens • Linux System Administrator • MySQL DBA • Works at Ghent University • Open Source : former phpMyAdmin team member • Lives in Ghent, Belgium • Climber • E-mail : dieter.adriaenssens@ugent.be • Twitter : @dcadriaenssens • Pictures : (c) Thomas Kindermans https://en.wikipedia.org/wiki/File:Gent_Graslei_zonsondergang.JPG − (c) Rouslan Placella − 2

  3. Errant GTIDs breaking replication How to detect and avoid them

  4. Context Replicating MySQL cluster ‘[..] the master has purged binary • Primary with some replicas • Master failover logs containing GTIDs that the • Replication breaks from new primary slave requires.’ (previous replica) to remaining replicas • Where did those transactions on the new primary come from? 4

  5. Overview • Replication and GTID • Errant GTIDs on a replica • Detection • Where do they come from? • How to avoid them • How to fix them • Demo 5

  6. Introduction Replication, GTID, data consistency

  7. Replication MySQL server replication : Replic • Primary (master) a • Several replicas (slaves) • Identical state across several servers Primar y Replic a 7

  8. Replication : Why • High availability • Master failover • Disaster recovery • Scaling load • Regional distribution • ... 8

  9. Replication : How MySQL master-slave replication : • All nodes start from an identical state • Replicas apply transactions from the primary node’s binary logs • Transaction in correct order • Asynchronous : replica initiates • Maintain identical state Read/write on primary ● Replicas should be readonly ● 9

  10. Replication : How Keeping track of binlog position : • Several binlogfiles : mysql_bin.xxxxxx • Rotation (size based, restart server) • Every transaction has a unique (incremental) position • Replication start : Binary log file name ● position in the binlog ● • Problematic multiple primaries ● Master failover ● Restore from backup ● 10

  11. GTID Global Transaction identifier • Unique identifier of a transaction Server_uuid:transaction_id • GTID = Origin ID + transaction ID 3E11FA47-71CA-11E1-9E33-C80AA9429562:23 • Origin ID : server UUID • Transaction ID : incremental number SELECT @@GLOBAL.server_uuid; 3E11FA47-71CA-11E1-9E33-C80AA9429562 11

  12. GTID set Executed GTID set SHOW SLAVE STATUS; SHOW MASTER STATUS; • Set of applied GTIDs on a server SELECT @@GLOBAL.gtid_executed; • Defines a state of data consistency • Easy to compare between nodes Executed_Gtid_Set: • Replica applies GTIDs of a primary 27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, • Simplifies failovers (no need to remember binlog position) 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-80192973, c004c0eb-c84e-11e6-8efc-aa00009002fd: 1-6084195:6140951-6141015 12

  13. ROW based replication • Changes in rows are replicated • <-> transaction-based : actual queries • In combination with GTID => data consistency across the cluster nodes 13

  14. Replicating cluster with GTID orchestrator -c topology –alias=demo node1:3306 [0s,ok,5.7.25,rw,ROW,>>,GTID] MySQL Server cluster + node2:3306 [0s,ok,5.7.25,ro,ROW,>>,GTID] • Primary (R/W) + node3:3306 [0s,ok,5.7.25,ro,ROW,>>,GTID] • Replicas (R/O) Orchestrator • GTID enabled • Row based replication • ProxySQL to redirect traffic to correct cluster node Node 2 • Orchestrator managing the cluster nodes Prox Nod (automatic master failover) y e1 SQL Node 3 14

  15. Errant GTID Definition, consequences, detection, examples, how to avoid, fixes

  16. Errant GTID • Unreplicated transaction existing only on a replica • Data is not the same on all nodes • Cluster is no longer in a consistent state 16

  17. Errant GTID : consequences • Everything is fine? • Inconsistent state between nodes Split brain ● Different data when reading from ● that replica • Unexpected behaviour when a replica is ‘[..] the master has purged binary promoted to master logs containing GTIDs that the Replication might fail ● slave requires.’ If GTID is purged from binlog, on ● master failover → replication stops 17

  18. Errant GTID detection

  19. Errant GTID detection • Compare executed GTID sets between primary node and replica nodes • Replica has more GTIDs than primary => errant GTID 19

  20. GTID executed set # primary SELECT @@GLOBAL.gtid_executed; 27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-75521 20 # replica SELECT @@GLOBAL.gtid_executed; 27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 20

  21. GTID subset SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>'); SELECT GTID_SUBSET( '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113', '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120') AS is_subset; +-----------+ | is_subset | +-----------+ | 1 | +-----------+ 1 row in set (0.00 sec) Replica GTID set is a subset of primary GTID set : OK 21

  22. GTID subset SELECT GTID_SUBSET('<gtid_executed_replica>', '<gtid_executed_primary>'); SELECT GTID_SUBSET( '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 ', '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120') AS is_subset; +-----------+ | is_subset | +-----------+ | 0 | +-----------+ 1 row in set (0.00 sec) Replica GTID set is NOT a subset of primary GTID set => Errant GTID on replica 22

  23. Find errant GTIDs SELECT GTID_SUBTRACT('<gtid_executed_replica>', '<gtid_executed_primary>'); SELECT GTID_SUBTRACT( '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552113, 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 ', '27ab32d2-7f36-11e7-8bd7-aa00009002fd:1-7552120') AS errant_gtid; +------------------------------------------+ | errant_gtid | +------------------------------------------+ | 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 | +------------------------------------------+ 1 row in set (0.00 sec) Result is subset of errant GTIDs 23

  24. Errant GTID : automatic detection • Icinga monitoring check • Orchestrator • Other tools : Mysqlfailover ● Mysqlrpladmin ● 24

  25. Errant GTID : detection Monitoring check • Automate checking for errant GTID • Icinga compatible output format • Uses orchestrator for cluster info • https://github.com/UGent-DICT/check_mysql_gtid 25

  26. Errant GTID : monitoring check ./check_mysql_gtid <clustername> ./check_mysql_gtid demo MYSQL_CLUSTER_GTID OK - GTIDs on all nodes are replicated in the cluster! Cluster demo (primary : node1) : - node2 : OK - node3 : OK Everything is fine! 26

  27. Errant GTID : monitoring check ./check_mysql_gtid <clustername> ./check_mysql_gtid demo MYSQL_CLUSTER_GTID WARNING : replicas containing unreplicated GTIDs : node2 Cluster demo (primary : node1) : - node2 : GTIDs only exist on the replica : 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1-4 - node3 : OK Check reports unreplicated (errant) GTIDs on one of the nodes 27

  28. Errant GTID : detection Orchestrator • Reports errant GTIDs (>= v3.0.13) ● Provides ways to fix them (>= v3.0.14) 28

  29. Errant GTID Find transaction • Look for GTID in binary logs • Each binlog mentions the executed GTID set (initial state) • Select relevant binlog • Find transaction in that binlog 29

  30. Errant GTID : find transaction mysqlbinlog --base64-output=DECODE-ROWS --verbose node2-bin.00* | grep 50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1 -A6 SET @@SESSION.GTID_NEXT= '50d5e9eb-c5d3-11e6-b86b-aa00009002f7:1'/*!*/; # at 23890384 #180305 6:25:02 server id 590100 end_log_pos 23890464 CRC32 0x2fe8266b Query thread_id=3848059 exec_time=0 error_code=0 SET TIMESTAMP=1520227502/*!*/; SET @@session.sql_mode=1073741824/*!*/; FLUSH ERROR LOGS /*!*/; 30

  31. Errant GTID : Examples Transactions on a replica • Manual transactions (by accident on a replica) • Scripted maintenance tasks (eg. config management) User creation ● Database creation ● • Master failover gone bad : Split brain ● Writes redirected to a replica (eg. host is r/w by ● accident, or after a restart) • Log flushes 31

  32. Intermezzo : flush-logs • FLUSH LOGS; → not added to binlog (1) • FLUSH ERROR LOGS; → added to binlog (2) • FLUSH [LOCAL | NO_WRITE_TO_BINLOG] ERROR LOGS; → not added to binlog • mysqladmin flush-logs executes FLUSH LOGS; • mysqladmin flush-logs error (3) executes FLUSH ERROR LOGS; (1) expected behaviour, according to documentation : “FLUSH LOGS, FLUSH BINARY LOGS, FLUSH ● TABLES WITH READ LOCK (with or without a table list), and FLUSH TABLES tbl_name ... FOR EXPORT are not written to the binary log in any case because they would cause problems if replicated to a slave.“ https://dev.mysql.com/doc/refman/8.0/en/flush.html • (2) Related bug report : https://bugs.mysql.com/bug.php?id=88720 • (3) introduced in MySQL 5.7.4 32

  33. Avoid errant GTIDs

  34. Avoid errant GTIDs Avoid transactions on replica nodes: • Don’t execute (write) statements on a replica 34

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend