Herd of Containers Sad DIF Database Engineer Herd of Containers: - - PowerPoint PPT Presentation

herd of containers sa d dif
SMART_READER_LITE
LIVE PREVIEW

Herd of Containers Sad DIF Database Engineer Herd of Containers: - - PowerPoint PPT Presentation

Herd of Containers Sad DIF Database Engineer Herd of Containers: PostgreSQL in containers at BlaBlaCar pgDay Paris, Mar 15, 2018 BlaBlaCar Overview Todays agenda PostgreSQL usage at BlaBlaCar Switching to a new implementation


slide-1
SLIDE 1

Herd of Containers

slide-2
SLIDE 2

Saâd DIF

Database Engineer

slide-3
SLIDE 3

Herd of Containers: PostgreSQL in containers at BlaBlaCar

pgDay Paris, Mar 15, 2018

slide-4
SLIDE 4

Today’s agenda

BlaBlaCar Overview PostgreSQL usage at BlaBlaCar Switching to a new implementation

slide-5
SLIDE 5

BlaBlaCar Overview

slide-6
SLIDE 6

60 million members Founded in 2006 1 million tonnes less CO2

In the past year

30 million mobile app downloads

Iphone and Android

15 million travellers Currently in 22 countries

France, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.

Facts and Figures

slide-7
SLIDE 7

Core Data Ecosystem

1 2 3

MySQL

Main Database MariaDB 10.0+ Galera Cluster

Cassandra

Column Oriented Distributed

Redis

In Memory Key-Value Optional durability

slide-8
SLIDE 8

Core Data Ecosystem

4 5

ElasticSearch

JSON documents FullText search Distributed

PostgreSQL

ORDBMS Extensibility Stability

slide-9
SLIDE 9

Why Containers ?

Resource allocation Deployment Speed

On premise

Skills already there Cost

Containers

slide-10
SLIDE 10

Rkt

Why Rkt over Docker ?

CoreOS Container Linux

Linux Distrib Simple & Secure Only run containers

Fleet

Orchestration By default with CoreOS

Containers

slide-11
SLIDE 11

GGN

Generate systemd units

Dgr

Build and configure App Container Images

Pods

Aggregate images in

  • ne shared

environment

Containers

slide-12
SLIDE 12

bare-metal servers

1 type of hardware 3 disk profiles

fleet cluster

CoreOS

fleet etcd

“Distributed init system” Hardware

Container Registry

ggn dgr

Service Codebase

rkt PODs

build run store host create

pgsql

monitoring

nerve

pgsql-main1

php nginx nerve

monitoring

synapse

front1

synapse

nerve

zookeeper

Service Discovery

Containers

slide-13
SLIDE 13

Get rid of DNS internally Adapt to change

Why ?

1

Service Discovery

slide-14
SLIDE 14

Key-Value store Reliable, Fast, Scalable

Why ? Zookeeper

2 1

Service Discovery

slide-15
SLIDE 15

Go-Nerve Health Checks Ephemeral keys Present on each pod

Why ? Zookeeper Report

2 3 1

Service Discovery

slide-16
SLIDE 16

Go-Synapse Watch Zookeeper Update HAProxy configuration

Why ? Zookeeper Report Discover

2 3 4 1

Service Discovery

slide-17
SLIDE 17

backend pod client pod

Service Discovery

/database/node1

go-nerve does health checks and reports to zookeeper in service keys

node1 /database

Applications hit their local haproxy to access backends go-synapse watches zookeeper service keys and reloads haproxy if changes are detected

HAProxy go-nerve Zookeeper go-synapse

slide-18
SLIDE 18

PostgreSQL usage at BlaBlaCar

slide-19
SLIDE 19

Prerequisite PostGIS

Third-party applications Spatial

Confidence

Home Made tools

Usage

slide-20
SLIDE 20

Travel company Corridoring Point to Point

PostGIS

Rambouillet Paris Lyon Le Creusot

slide-21
SLIDE 21

3 685 1M

Rides passed by Amiens last month Number of meeting points

50k

Rows reads per minutes

slide-22
SLIDE 22

Change!

Streaming Replication Manual Interventions Not friendly Painful failover recovery

Operate

slide-23
SLIDE 23

Target

Scale writes Ease deployments Maximum availability Slaves Failovers Expandable resources

slide-24
SLIDE 24

Possibilities

Postgres-XC (x2) Postgres-XL PgLogical Bucardo Slony Londiste

slide-25
SLIDE 25

Switching to a new implementation

slide-26
SLIDE 26

BDR

Bi-Directional Replication OpenSource project by 2ndQuadrant Multi Master Asynchronous Replication 2 to 48 nodes Optimal for Geo Distributed databases

slide-27
SLIDE 27

BDR : The Confirmation

All nodes support reads and writes No failovers No other process / nodes needed Partition tolerant

slide-28
SLIDE 28

BDR : Caveats

Modified version of PostgreSQL 9.4

BDR 2.0 with PostgreSQL 9.6 for 2ndQuadrant support customers

DDL lock Replication lag Conflicts Some statement not supported yet Statement not replicated

slide-29
SLIDE 29

Check Init

Check if node have entries in the bdr_nodes table, if yes : skip init

Implementation

Run

[~/build-tools/aci/aci-postgresql-bdr] $ tree . ├── Jenkinsfile ├── aci-manifest.yml ├── attributes │ ├── base.yml │ └── postgresql.yml ├── files │ └── tmp │ └── postgresql │ ├── environment │ ├── pg_ctl.conf │ ├── pg_ident.conf │ └── start.conf ├── runlevels │ ├── build │ │ └── 00.install.sh │ └── build-late │ └── 00.clean.sh └── templates └── dgr └── runlevels └── prestart-late ├── 00.init-instance.sh.tmpl └── 01.init-database.sh.tmpl

slide-30
SLIDE 30

Implementation (init)

1 If no “donor” attributes : Init as new group 2 3 1 When the node have “donor” attributes :

Retrieve user definition on donor (pg_dumpall -g) Join BDR group Create minimum objects if not present

2 1

Part local node on donor Delete entries on donor

(bdr_nodes and bdr_connections) New fresh node Node already referenced but changed host or have lost his data

slide-31
SLIDE 31

Pager Duty

Incidents Manager

Grafana

Beautiful Visualizations

Prometheus

Smart Monitoring

Exporter

Expose metrics

Monitoring and Alerting

slide-32
SLIDE 32

Monitoring

Key principles: Usage Saturation

slide-33
SLIDE 33

BDR exporter specifics

$ cat aci-prometheus-postgresql-exporter/templates/queries.tmpl.yaml {{ if .use_bdr }} pg_replication_bdr_count: query: "select (select count(*) from bdr.bdr_nodes) as bdr_nodes, (select count(*) from bdr.bdr_connections) as bdr_connections;" metrics:

  • bdr_nodes:

usage: "GAUGE" description: "Number of rows in the bdr_nodes table"

  • bdr_connections:

usage: "GAUGE" description: "Number of rows in the bdr_connections table" {{ end }} pg_replication_count: query: "select (select count(*) from pg_stat_replication) as stat_repli, (select count(*) from pg_replication_slots where active=true) as rep_slots;" metrics:

  • stat_repli:

usage: "GAUGE" description: "Number of rows in the pg_stat_replication table"

  • rep_slots:

usage: "GAUGE" description: "Number of rows in the pg_replication_slots table with the active status" [...]

Template values for BDR specifics Extend metrics to all PostgreSQL needs

slide-34
SLIDE 34

Backup and Recovery

1

Retrieve dumps

pg_dump

2

Alter structure dump

3

Load structure and data dump

slide-35
SLIDE 35

Backup and Recovery

$ cat pod-mysql-backup/aci-backup/templates/opt/backup-main.tmpl.sh function startbackup { begin_unixtime=$(date +%s) cat <<EOF | curl --data-binary @- http://prometheus-gw:9091/metrics/job/backup_{{.env}}/target/$node/service/$service/type/{{.backup.type}} # HELP backup_begin_unixtime # TYPE backup_begin_unixtime counter backup_begin_unixtime $begin_unixtime EOF }

slide-36
SLIDE 36

$ cat prometheus-rules/alert.postgresql.rules # Alert: There is less replication active than bdr nodes ALERT BackupsTooOld IF time() - backup_end_unixtime{exported_service=~".*postgresql.*"} ) > ( 3600 * 24 ) LABELS { severity="warning", stack="backups", team="data_infrastructure" } ANNOTATIONS { summary="Backup {{ $labels.type }} on {{ $labels.exported_service }}.{{ $labels.target }} is too

  • ld.",

dashboard=" https://grafana.blabla.car/dashboard/db/db-backups ", }

Alerting

PromQL to find out unhealthy services Labeling for routing to Slack & Pager Duty Annotations with templating to have clear descriptions, URL to dashboards and ops runbooks

slide-37
SLIDE 37

Feedback

Clearly satisfied with availability Reactive community Know what your needs are Sanity checks BDR 3.0 coming soon!

slide-38
SLIDE 38

What’s next?

slide-39
SLIDE 39
slide-40
SLIDE 40