Geographically Dispersed Percona XtraDB Cluster Deployment Marco - - PowerPoint PPT Presentation

geographically dispersed percona xtradb cluster deployment
SMART_READER_LITE
LIVE PREVIEW

Geographically Dispersed Percona XtraDB Cluster Deployment Marco - - PowerPoint PPT Presentation

Geographically Dispersed Percona XtraDB Cluster Deployment Marco (the Grinch) Tusa September 2017 Dublin About me Marco The Grinch Open source enthusiast Percona consulting Team Leader 2 Agenda What is PXC When nodes


slide-1
SLIDE 1

Geographically Dispersed Percona XtraDB Cluster Deployment

Marco (the Grinch) Tusa September 2017 Dublin

slide-2
SLIDE 2

2

Marco “The Grinch”

  • Open source enthusiast
  • Percona consulting Team Leader

About me

slide-3
SLIDE 3

3

Agenda

  • What is PXC
  • When nodes interacts
  • Let us clarify, geo dispersed- What to keep in mind then
  • How to measure latency correctly
  • Use the right way (sync/async)
  • Use help like replication manager
slide-4
SLIDE 4

4

What is PXC/Galera?

(Virtually) Synchronous Replication:

  • True multi-master
  • No slave lag
  • No master-slave failover or VIP
  • Multi-threaded applayers
  • Automatic node provisioning
  • Elastic scale (in – out)
  • Geographic distributed (with segments)
  • Mix with Async replication

Galera Balancer Web traffic

slide-5
SLIDE 5

5

What PXC/Galera is NOT?

Not Write-scalable solution Not great for a high amount of parallel, small requests Not great for working with Foreign Keys Not good for sharding Data (each node has the entire dataset)

Galera Balancer Web traffic

slide-6
SLIDE 6

6

What is a Node

Standard MySQL Replication

Master Slave Slave

  • Galera MySQL Replication

Node Node Node

9cba28fa-a8be-11e4-8f41-9f963e1dbf4f

slide-7
SLIDE 7

7

Segments

A segment is a logical grouping of nodes. Replication between Segment is

  • ptimized (writeset - some level of

communication)

Traffic and messaging is reduced In case of SST, the donor is chosen by proximity

slide-8
SLIDE 8

8

More nodes more problems

Use a two phase commit, or distributed locking with capacity formula: m = n x o x t (where messages/sec = number of nodes due to process o number of

  • peration with t transaction

throughput)

slide-9
SLIDE 9

9

When nodes interacts

  • Keepalive and checks for cluster health
  • Writeset on writer commit
  • Certification results
  • Ack on local apply
  • Flow Control
  • IST/SST
slide-10
SLIDE 10

10

Let us clarify, geo dispersed 1

Geo dispersed

  • r multi-site, cluster is a cluster configuration used to help ensure high

system and application availability in the event of site disaster. In this configuration, servers are separated geographically and the physical storage (quorum diskor) DATA is synchronously replicated between sites.


(http://www.expertglossary.com/storage/definition/geo-dispersed-cluster)

slide-11
SLIDE 11

11

Let us clarify, geo dispersed 2

For some environments, latency is the sole focus of performance. As an example of latency, shows a network transfer, such as an HTTP GET request, with the time split into latency and data transfer components.

slide-12
SLIDE 12

12

Geo dispersed

Geo dispersed is determinate by the latency existing between nodes NOT by the geographic location itself.

slide-13
SLIDE 13

13

How to measure latency correctly 1

  • wsrep_evs_repl_latency 


(It measures latency from the time point when a message is sent out to the time point when a message is received.)

  • wsrep_replicated/wsrep_replicated
  • netperf
slide-14
SLIDE 14

14

How to measure latency correctly 2

slide-15
SLIDE 15

14

How to measure latency correctly 2

PING

slide-16
SLIDE 16

14

How to measure latency correctly 2

PING

slide-17
SLIDE 17

14

How to measure latency correctly 2

Why?

slide-18
SLIDE 18

15

Brief digression

Ref: https://goo.gl/kDTYnW

slide-19
SLIDE 19

15

Brief digression

Ref: https://goo.gl/kDTYnW

slide-20
SLIDE 20

15

Brief digression

Ref: https://goo.gl/kDTYnW

slide-21
SLIDE 21

16

Brief digression

  • PING use ICMP (Internet Control Message Protocol) NOT TCP over IP
  • Default data size is 56 bytes plus header (8 bytes)
slide-22
SLIDE 22

16

Brief digression

  • PING use ICMP (Internet Control Message Protocol) NOT TCP over IP
  • Default data size is 56 bytes plus header (8 bytes)
slide-23
SLIDE 23

16

Brief digression

  • PING use ICMP (Internet Control Message Protocol) NOT TCP over IP
  • Default data size is 56 bytes plus header (8 bytes)

ping -M do -s 1473 -c 3 192.168.0.34

slide-24
SLIDE 24

16

Brief digression

  • PING use ICMP (Internet Control Message Protocol) NOT TCP over IP
  • Default data size is 56 bytes plus header (8 bytes)

ping -M do -s 1473 -c 3 192.168.0.34

slide-25
SLIDE 25

16

Brief digression

  • PING use ICMP (Internet Control Message Protocol) NOT TCP over IP
  • Default data size is 56 bytes plus header (8 bytes)

Not good enough!

slide-26
SLIDE 26

17

Brief digression

slide-27
SLIDE 27

17

Brief digression

TCP means Transmission Control Protocol and as the name says, it is design to control the data transmission happening between source and destination. TCP implementations use the IP protocol encapsulation for the transmission of the data:

slide-28
SLIDE 28

17

Brief digression

TCP means Transmission Control Protocol and as the name says, it is design to control the data transmission happening between source and destination. TCP implementations use the IP protocol encapsulation for the transmission of the data:

slide-29
SLIDE 29

17

Brief digression

TCP means Transmission Control Protocol and as the name says, it is design to control the data transmission happening between source and destination. TCP implementations use the IP protocol encapsulation for the transmission of the data:

Looks the same thing than before right?

slide-30
SLIDE 30

17

Brief digression

TCP means Transmission Control Protocol and as the name says, it is design to control the data transmission happening between source and destination. TCP implementations use the IP protocol encapsulation for the transmission of the data:

Looks the same thing than before right?

WRONG!

slide-31
SLIDE 31

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize:

slide-32
SLIDE 32

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented

slide-33
SLIDE 33

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented Establish a connection

slide-34
SLIDE 34

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented Establish a connection Monitor the data transfer

slide-35
SLIDE 35

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented Establish a connection Monitor the data transfer Buffered transmission

slide-36
SLIDE 36

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented Establish a connection Monitor the data transfer Buffered transmission Unstructured stream

slide-37
SLIDE 37

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented Establish a connection Monitor the data transfer Buffered transmission Unstructured stream Full-duplex connection

slide-38
SLIDE 38

18

Brief digression

A TCP implementation has several characteristics that make sense to summarize: Is stream oriented Establish a connection Monitor the data transfer Buffered transmission Unstructured stream Full-duplex connection Stream as a sequence of octet split in segments

slide-39
SLIDE 39

19

Brief digression

TCP dispatcher use Dynamic Slide Window

slide-40
SLIDE 40

19

Brief digression

TCP dispatcher use Dynamic Slide Window

slide-41
SLIDE 41

19

Brief digression

TCP dispatcher use Dynamic Slide Window Dispatcher manages three pointers associated to each connection: The first pointer indicate the start of the sliding window The second pointer indicates the higher octet that can be dispatchtet. The third pointer indicates the window limit

slide-42
SLIDE 42

20

How to measure latency correctly 3

Back to us

slide-43
SLIDE 43

20

How to measure latency correctly 3

Back to us

  • Check for gateway MTU cut
  • ping -M do -s 1473 -c 3 192.168.0.34
slide-44
SLIDE 44

20

How to measure latency correctly 3

Back to us

  • Check for gateway MTU cut
  • ping -M do -s 1473 -c 3 192.168.0.34
  • Consider sent and received messages (IE Wsrep replicated bytes & Wsrep received

bytes)

slide-45
SLIDE 45

20

How to measure latency correctly 3

Back to us

  • Check for gateway MTU cut
  • ping -M do -s 1473 -c 3 192.168.0.34
  • Consider sent and received messages (IE Wsrep replicated bytes & Wsrep received

bytes)

  • Check Kernel settings for:
  • Buffering
  • Congestion control
  • Frame utilization
slide-46
SLIDE 46

20

How to measure latency correctly 3

Back to us

  • Check for gateway MTU cut
  • ping -M do -s 1473 -c 3 192.168.0.34
  • Consider sent and received messages (IE Wsrep replicated bytes & Wsrep received

bytes)

  • Check Kernel settings for:
  • Buffering
  • Congestion control
  • Frame utilization
  • Test with netperf (IE)
  • netperf -H 192.168.1.51 -t TCP_RR -v 2 -l 60 -- -b 2 -r 250K -R 1M -s 250K,10M -S 10K,256K
slide-47
SLIDE 47

20

How to measure latency correctly 3

Back to us

  • Check for gateway MTU cut
  • ping -M do -s 1473 -c 3 192.168.0.34
  • Consider sent and received messages (IE Wsrep replicated bytes & Wsrep received

bytes)

  • Check Kernel settings for:
  • Buffering
  • Congestion control
  • Frame utilization
  • Test with netperf (IE)
  • netperf -H 192.168.1.51 -t TCP_RR -v 2 -l 60 -- -b 2 -r 250K -R 1M -s 250K,10M -S 10K,256K
  • Check the wsrep_evs_repl_latency value in SHOW GLOBAL STATUS like ‘wsrep%’;
slide-48
SLIDE 48

21

What is the right limit? Depends by the usage Balance incoming write/s consistence reads

slide-49
SLIDE 49

22

Last chance for (virtually) Synchronous

Wan settings:

slide-50
SLIDE 50

22

Last chance for (virtually) Synchronous

Wan settings:

evs.inactive_check_period = PT30S; evs.inactive_timeout = PT1M; evs.suspect_timeout = PT40S; evs.stats_report_period = PT3M; evs.join_retrans_period=PT0.5S don’t use PING

slide-51
SLIDE 51

22

Last chance for (virtually) Synchronous

Wan settings:

evs.inactive_check_period = PT30S; evs.inactive_timeout = PT1M; evs.suspect_timeout = PT40S; evs.stats_report_period = PT3M; evs.join_retrans_period=PT0.5S don’t use PING

Master_Slave not a very good option though

wsrep_provider_options = "gcs.fc_limit = 256; gcs.fc_factor = 0.99; gcs.fc_master_slave = YES"

slide-52
SLIDE 52

23

Async replication kicks in

We can use almost the same models we used

slide-53
SLIDE 53

23

Async replication kicks in

We can use almost the same models we used Challenge is to shift from one Master-Node to a new one Or from a slave to another

slide-54
SLIDE 54

24

Async replication ways

Standard binlog position using XID and wsrep_last_committed

+----------------------+---------+ | Variable_name | Value | +----------------------+---------+ | wsrep_last_committed | 3282552 | +----------------------+---------+

Binlog

# at 544 #170920 19:26:56 server id 3306 end_log_pos 575 CRC32 0x3ae1edcd Xid = 3282552

Simple to install/setup Nightmare to maintain

slide-55
SLIDE 55

25

Async replication ways

Using GTID All nodes on a cluster will have the same GTID Move from a slave from one node to another can be automated. Existing: Yves Trudeau solution : https://github.com/y-trudeau/Mysql-tools/tree/master/PXC Single link DC1->DC2 Multiple Link DC1 -> DC2 ->DC3

slide-56
SLIDE 56

26

Conclusions

  • Plan carefully you network and DC-DC connectivity
  • Keep the number of nodes inside a PXC cluster to minimum
  • Test properly (not ping) the latency on the network
  • Use PXC/Galera replication between geo-distributed only if it is safe
  • Do not esitate to shift to Async replication
  • Use existing solutions to help you manage async replication between

PXCs

slide-57
SLIDE 57

27

slide-58
SLIDE 58

28

Q&A

slide-59
SLIDE 59

29

Contacts

To contact Me Marco.tusa@percona.com marcotusa@tusacentral.net To follow me http://www.tusacentral.net/ http://www.percona.com/blog/ https://www.facebook.com/marco.tusa.94 @marcotusa http://it.linkedin.com/in/marcotusa/

“Consulting = No mission refused!”