Immutable Database Infrastructure with PXC Satoshi Mitani | @mita2 - - PowerPoint PPT Presentation

immutable database infrastructure with pxc
SMART_READER_LITE
LIVE PREVIEW

Immutable Database Infrastructure with PXC Satoshi Mitani | @mita2 - - PowerPoint PPT Presentation

Immutable Database Infrastructure with PXC Satoshi Mitani | @mita2 Yahoo Japan Corporation Agenda Yahoo! JAPAN Introduction Demo What is Immutable Infrastructure Architecture Why Percona XtraDB Cluster Disadvantages of


slide-1
SLIDE 1

Immutable Database Infrastructure with PXC

Satoshi Mitani | @mita2 Yahoo Japan Corporation

slide-2
SLIDE 2

2

Agenda

  • Yahoo! JAPAN Introduction
  • Demo
  • What is Immutable Infrastructure
  • Architecture
  • Why Percona XtraDB Cluster
  • Disadvantages of our method
  • Q&A
slide-3
SLIDE 3

Yahoo! JAPAN Introduction

slide-4
SLIDE 4

4

Yahoo! JAPAN Introduction

slide-5
SLIDE 5

Yahoo! JAPAN Introduction

Daily Unique Browser

90+ Million

Daily Unique Browser (Only Smartphone)

60+ Million

Monthly Page Views

70+ Billion

Number of services

100+

slide-6
SLIDE 6

Demo

slide-7
SLIDE 7

Demo

  • Our steps to release new software
  • 1. Take the node offline
  • 2. Rebuild the node with image including new software
  • 3. Bring the node back online
slide-8
SLIDE 8
slide-9
SLIDE 9

What is Immutable Infrastructure?

slide-10
SLIDE 10

Legacy infrastructure (Mutable)

  • Accumulated changes
  • Long life-span
  • Advantage
  • Existing Infrastructure
  • Persistent
  • Disadvantage
  • Need to track states
  • Need to upgrade perfectly
  • Difficult to test all combinations

v.1.0 v.1.1 v.1.2 v.2.0 v.2.1 v.2.2 v.1.0 v.2.0 v.2.1 v.2.2

SoftA SoftB SoftA SoftB Server A Server B

slide-11
SLIDE 11

Immutable Infrastructure

  • Does not change after creation
  • Disposable
  • Replace servers to release new features
  • Short life-span
  • Advantage.
  • Always fresh
  • Less combinations
  • Disadvantage
  • Volatile

v.1.0 v.2.0

SoftA SoftB

v.1.1 v.2.1

SoftA SoftB

v.1.2 v.2.2

SoftA SoftB

slide-12
SLIDE 12

12

Why do we need Immutable Infrastructure?

  • huge number of DBs
  • hard to track state
  • hard to test all combination
slide-13
SLIDE 13

Architecture

slide-14
SLIDE 14

Architecture overview

GitHub Enterprise Screwdriver.cd (CI System) VM Golden Image Chef recipes IaaS API

Config Backup Storage

Image factory Databases on IaaS

my.cnf etc… IaaS API

  • Image Repo
slide-15
SLIDE 15

Architecture – Image factory

GitHub Enterprise Screwdriver.cd (CI System) VM Golden Image Chef recipes

Image factory

IaaS API

  • Golden Image
  • Include all software
  • PXC
  • Prometheus
  • Fluentd
  • etc..
slide-16
SLIDE 16

Architecture – Image factory

GitHub Enterprise Screwdriver.cd (CI System) VM Golden Image Chef recipes

Image factory

IaaS API

  • 1. Update Chef recipe

yum_package ['Percona-XtraDB-Cluster-' + pxc_pkg_version, 'Percona-XtraDB-Cluster-shared-' + pxc_pkg_version] do version [pxc_version, pxc_version] action [:install, :lock]

  • ptions '--enablerepo="percona-release"'

end1 cookbook_file "/etc/systemd/system/mysql.service.d/override.conf" do source 'etc/systemd/system/mysqld.service.d/override.conf' mode 00444

  • wner 'root'

group 'root' end

slide-17
SLIDE 17

Architecture – Image factory

GitHub Enterprise Screwdriver.cd (CI System) VM Golden Image Chef recipes

Image factory

IaaS API

  • 2. Boot new VM
  • 3. Run chef-client
  • chef-client local mode
  • No workstation
  • No server

$ sudo chef-client –z –r “role[some-role]”

slide-18
SLIDE 18

Architecture – Image factory

GitHub Enterprise Screwdriver.cd (CI System) VM Golden Image Chef recipes

Image factory

IaaS API

  • Snapshot VM
  • 4. Create Snapshot
slide-19
SLIDE 19

Architecture – Image factory

GitHub Enterprise Screwdriver.cd (CI System) VM Golden Image Chef recipes

Image factory

IaaS API

  • Tests
  • Based on new Golden Image
  • Creating new Database Cluster
  • Monitoring Process
  • Load Balancing
  • etc…
  • Tests are covered by our own python scripts
  • Fabric
  • 5. Tests
slide-20
SLIDE 20

Architecture - Database

IaaS API

Config Backup Storage

Database

my.cnf etc…

  • Image Repo
  • Re-imaging clears all data
  • MySQL configuration
  • OS configuration
  • MySQL data
  • etc..
  • MySQL configuration files
  • Other OS configuration files
  • network-scripts/if-cfg, /etc/hosts etc..
  • Generated automatically by IaaS
slide-21
SLIDE 21

Architecture - Database

IaaS API

Config Backup Storage

Database

my.cnf etc…

  • Image Repo
  • Database consists of 3 nodes
  • Re-imaging the node one by one
  • To avoid downtime
  • Pass the backed up config file to

rebuild OpenStack IaaS API

  • 6. Rebuild

my.cnf etc…

slide-22
SLIDE 22

Why Percona XtraDB Cluster

slide-23
SLIDE 23

Our maintenance requirements

  • No downtime
  • Anytime, without scheduling
slide-24
SLIDE 24

24

Percona XtraDB Cluster (PXC)

  • MySQL compatible High-availability solution
  • Multi-writer
  • Galera replication
  • Automatic data recovery
  • State Snapshot Transfer (SST)
slide-25
SLIDE 25

Zero-downtime maintenance

  • Taking node offine before re-imaging
  • Wait for all client connections move to others
  • Possible write across the nodes
  • PXC supports multi-writer

App App App

slide-26
SLIDE 26

SST - Automatic data recovery

  • All data cleared by re-imaging
  • State Snapshot Transfer
  • full data copy from one node to the joining node

Joining node

slide-27
SLIDE 27

Disadvantages of our method

slide-28
SLIDE 28

SST Problem (1)

  • 1. SST compatibility issue between 5.7.22 and before 5.7.21
  • If you have TDE tables (ENCRYPTION=Y)
  • Need to upgrade all node before SST
  • 2. SST failed with TDE and Compressed Table
  • ENCRYPTION=Y, ROW_FORMAT=COMPRESSED
  • Will be fixed in next Percona XtraBackup release 2.4.15
  • https://jira.percona.com/browse/PXB-1867
slide-29
SLIDE 29

SST Problem (2)

  • 3. SST blocks DDL
  • Not a bug !
  • xtrabackup runs with –lock-ddl for safety
  • App with frequent DDL faces this problem
slide-30
SLIDE 30

Disadvantage of our method

  • PXC has some limitations
  • Deploy takes much time
  • Emergency release by manual operation
  • Limited volume
  • Large data cause long SST
  • We limited < 500GB
slide-31
SLIDE 31

Q&A

slide-32
SLIDE 32

Thank you