Split your database Nicolai Plum Booking.com Database Engineering - - PowerPoint PPT Presentation

split your database
SMART_READER_LITE
LIVE PREVIEW

Split your database Nicolai Plum Booking.com Database Engineering - - PowerPoint PPT Presentation

Split your database Nicolai Plum Booking.com Database Engineering Before 2 After 3 Why? Size Query time, query latency conflicting workloads Business or architecture reasons Regulatory compliance Easier PCI DSS,


slide-1
SLIDE 1

Split your database

Nicolai Plum Booking.com Database Engineering

slide-2
SLIDE 2

Before

2

slide-3
SLIDE 3

After

3

slide-4
SLIDE 4

Why?

  • Size
  • Query time, query latency
  • … conflicting workloads
  • Business or architecture reasons
  • Regulatory compliance
  • Easier PCI DSS, GDPR compliance, PII...

4

slide-5
SLIDE 5

Choosing tables to split out

  • Minimise joins
  • Maximise gains
  • Split out biggest, fastest-growing, busiest, most-

regulated, …

  • Size and activity: ps-top
  • https://github.com/sjmudd/ps-top
  • Design requirements, current system structure
  • Based on code analysis, application knowledge

5

slide-6
SLIDE 6

ps-top

ps-top 0.7.5 - 18:44:30 db-101 / 5.7.19-log, up 31d 9h 39m 2 Table (table_io_waits_summary_by_table) 44 rows Ops %| Fetch Insert Update Delete|Table Name 13.49 k 50.6%| 99.8% 0.1% 0.1%|bdb.Inventory 6.97 k 26.1%| 2.6% 94.8% 2.5% 0.1%|bdb.Catalogue 1.36 k 5.1%| 50.9% 24.5% 24.5%|bdb.Transaction 963 3.5%| 66.7% 33.3% |bdb.Delivery 597 2.2%| 51.1% 25.3% 23.6% |bdb.Customer 396 1.4%| 4.5% 94.4% 1.0%|bdb.Reviews

6

slide-7
SLIDE 7

Application and code

  • Application queries must go to different master for some

tables after split

  • Code change
  • Proxy magic (untested!)
  • (almost certainly) Need to change your code
  • Try to separate read and write database handles in code
  • Try to use application feature switch, A/B experiment

framework, or other runtime new-old switch mechanism

  • Implement in your test environment first

7

slide-8
SLIDE 8

Joins & code changes

  • Writes:
  • Scan (statement) binlog
  • Preferably get rid of update clauses with joins altogether
  • Code analysis
  • Reads
  • Code analysis
  • Query sampling
  • Performance_Schema, proxies
  • Perfection is difficult, you may have fixes after split

8

slide-9
SLIDE 9

Foreign keys, triggers, stored procedures

  • Foreign Keys between the schemas will not work
  • Remove them and refactor code
  • Move relational integrity checks and enforcement to application

layers

  • Reflect upon your life and give up using FKs
  • Check stored procedures and triggers for use of both

schemas

  • Implement triggers and stored procedures on product

schema as necessary

  • No Percona Online Schema change during split

9

slide-10
SLIDE 10

Replication topology

10

slide-11
SLIDE 11

Translation Master

  • Schema translation
  • replicate-rewrite-db=’Sales->Product'
  • Replication filter
  • replicate-wild-do-table = Product.Catalogue
  • replicate-wild-do-table = Product.Inventory
  • replicate-wild-do-table = Product.Reviews

11

slide-12
SLIDE 12

Schema name translation

12

slide-13
SLIDE 13

Scheme name translation

13

slide-14
SLIDE 14

Preparing Translation master

  • Make a slave, with binlog enabled
  • Shut down MySQL
  • Configuration: Add
  • skip-slave-start
  • replicate-rewrite-db=’sales->product’
  • replicate-wild-do-table…

14

slide-15
SLIDE 15

Preparing Translation master

  • Create new schema (product)
  • Rename all tables that are being split to the new

schema

  • Drop old schema (sales) from translation master
  • Start replication slave
  • Remove skip-slave-start
  • If replication stops with missing table errors, find the

bad query, fix up (skip) or make a new translation master again

15

slide-16
SLIDE 16

Preparing new replication chain

  • Clone the translation master to new

(product) master

  • Remove the database translate and

replication filter statements

  • Clone more slave databases as needed

16

slide-17
SLIDE 17

Client connections

  • Sales clients
  • Write connection to

sales master

  • Read connection to

sales slaves

  • Product clients
  • Write connection to

sales master

  • Read connetion to

product slaves

17

slide-18
SLIDE 18

Clients

18

slide-19
SLIDE 19

So, how do clients see it?

  • Sales clients
  • Write sales schema in

sales master

  • Read sales schema in

sales slaves

  • Product clients
  • Write sales schema in

sales master

  • Read product schema

in product slaves

19

slide-20
SLIDE 20

Better client access

  • Create a view on the sales master of the

tables in a product schema

  • Simple view is updatable, so UPDATE/

INSERT/DELETE succeed

  • product client code can be changed to use

new schema name before split

  • Change product clients to use product on

sales master before split

20

slide-21
SLIDE 21

Schemas with views

21

slide-22
SLIDE 22

Creating views

  • Sys schema tool

use sys; call create_synonym_db(’sales', ’product');

  • Cannot handle views in sales
  • View of view is not possible
  • product views must be built on the same real sales tables

as sales views

  • Drop views of tables not moving
  • Or create database product then

create view…

22

slide-23
SLIDE 23

Views on slaves also

  • Updates are executed properly in SBR
  • Slave can be promoted to master
  • Using Orchestrator or similar

23

slide-24
SLIDE 24

So, how do clients see it now?

  • Sales clients
  • Write sales schema in

sales master

  • Read sales schema in

sales slaves

  • Product clients
  • Write product schema

in sales master

  • Read product schema

in product slaves

24

slide-25
SLIDE 25

Decreased redundancy

  • Failover of the translation master is hard
  • Keep the intermediate stage short-lived
  • Probably best to prepare to repoint slaves

in a hurry if translation master dies

  • Keep sufficient capacity in parent chain for

total load

25

slide-26
SLIDE 26

Grants

  • MySQL users of the sales tables will need

privileges on product tables on the sales master

  • Separate grants for sales and product

makes the split easier and safer

  • Grants (mysql schema) do not replicate

through translation master

  • product master will need matching grants

26

slide-27
SLIDE 27

Client database server configuration

  • Need a way to change server name

quickly

  • DNS CNAME or A record
  • Preferably without application restart

27

slide-28
SLIDE 28

Client access while split is ongoing

28

?

slide-29
SLIDE 29

Choice

  • Disconnect ALL

product clients from sales master BEFORE connecting to product master

  • More downtime
  • More data integrity
  • Allow some product

clients to connect to product master while

  • thers still use sales

master

  • Less downtime
  • Less data integrity

29

slide-30
SLIDE 30

Choice

  • Disconnect first
  • No chance of conflicting

transactions

  • Auto-increment counter

columns

  • Complex less-

understood application

  • Low risk appetite
  • Lower cost of downtime
  • Disconnect later
  • Some chance of

conflicting transactions

  • Fixed or external counter

and id sources

  • More-understood

application

  • Higher risk appetite
  • High cost of downtime

30

slide-31
SLIDE 31

Disconnect first

  • Disable product client access to sales master – use

separate user grants if possible, lock the user

  • Kill all connections from product clients
  • Wait for translation master replication pointer to catch

up past the point where product clients were blocked

  • Stop replication on the translation master
  • Change product client configuration to use product

master

  • (going back is now hard)

31

slide-32
SLIDE 32

Disconnect first

32

slide-33
SLIDE 33

Disconnect first

33

slide-34
SLIDE 34

Disconnect first

34

slide-35
SLIDE 35

Disconnect later

  • Change product client configuration to use product master
  • Wait for product clients to make new connections to product

master

  • Disable product client access to sales master – use separate

user grants if possible, lock the user

  • Kill all connections from product clients
  • Wait for translation master replication pointer to catch up
  • Stop replication on the translation master
  • (going back is now hard)

35

slide-36
SLIDE 36

Disconnect later

36

slide-37
SLIDE 37

Disconnect later

37

slide-38
SLIDE 38

Disconnect later

38

slide-39
SLIDE 39

After split

39

slide-40
SLIDE 40

Cleanup

  • Remove translation master
  • Drop views and grants for product on the

sales master

  • Rename, then drop tables from the sales

master

40

slide-41
SLIDE 41

Final state

41

slide-42
SLIDE 42

42

?

nicolai.plum@booking.com