100% Containers Powered Carpooling Maxime Fouilleul Database - - PowerPoint PPT Presentation

100 containers powered carpooling maxime fouilleul
SMART_READER_LITE
LIVE PREVIEW

100% Containers Powered Carpooling Maxime Fouilleul Database - - PowerPoint PPT Presentation

100% Containers Powered Carpooling Maxime Fouilleul Database Reliability Engineer BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Todays agenda Stateful Services into containers - MariaDB as


slide-1
SLIDE 1

100% Containers Powered Carpooling

slide-2
SLIDE 2

Maxime Fouilleul

Database Reliability Engineer

slide-3
SLIDE 3

Today’s agenda

BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Stateful Services into containers - MariaDB as an example Next challenges - Kubernetes, the Cloud

slide-4
SLIDE 4

BlaBlaCar

Facts & Figures

slide-5
SLIDE 5

60 million members Founded in 2006 1 million tonnes less CO2

In the past year

30 million mobile app downloads

iPhone and Android

15 million travellers /quarter Currently in 22 countries

France, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.

Facts and Figures

slide-6
SLIDE 6

MariaDB Redis PostgreSQL Transactional

Our prod data ecosystem

Cassandra Distributed Volatile Spatial Kafka Stream ElasticSearch Search

slide-7
SLIDE 7

Infrastructure Ecosystem

100% containers powered carpooling

slide-8
SLIDE 8

Why containers?

slide-9
SLIDE 9

Homogeneous Hardware

From this

srv_001 svc_001 srv_002 svc_002 srv_003 svc_003 srv_004 svc_004 srv_005 svc_005 srv_006 svc_006 srv_007 svc_007 srv_008 svc_008 srv_009 svc_009 srv_010 svc_010 srv_011 svc_011 srv_012 svc_012 srv_013 srv_014 svc_013 svc_014

slide-10
SLIDE 10

Homogeneous Hardware

To that

srv_007 srv_008 svc_013 srv_005 srv_006 srv_003 srv_004 srv_001 srv_002 svc_001 svc_002 svc_003 svc_004 svc_005 svc_006 svc_007 svc_010 svc_008 svc_011 svc_009 svc_012 svc_014

slide-11
SLIDE 11

Homogeneous Hardware - “Pets vs Cattle”

Easier to replace broken hardware Cost Effective Easier to manage

slide-12
SLIDE 12

redis trip-meeting-point

Homogeneous Deployment

trip-meeting-point application

cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml

  • containers:
  • aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34
  • aci.blbl.cr/aci-go-synapse:15-40
  • aci.blbl.cr/aci-go-nerve:21-27
  • aci.blbl.cr/aci-logshipper:27

nodes:

  • hostname: trip-meeting-point1

gelf: level: INFO fleet:

  • MachineMetadata=rack=110
  • Conflicts=*trip-meeting-point*
  • hostname: trip-meeting-point2

fleet:

  • MachineMetadata=rack=210
  • Conflicts=*trip-meeting-point*
  • hostname: trip-meeting-point3

fleet:

  • MachineMetadata=rack=310
  • Conflicts=*trip-meeting-point*

cat ./prod-dc1/services/redis-meeting-point/service-manifest.yml

  • containers:
  • aci.blbl.cr/aci-redis:4.0.2-1
  • aci.blbl.cr/aci-redis-dictator:20
  • aci.blbl.cr/aci-go-nerve:21-27
  • aci.blbl.cr/aci-prometheus-redis-exporter:0.12.2-1

nodes:

  • hostname: redis-meeting-point1

fleet:

  • MachineMetadata=rack=110
  • Conflicts=*redis-meeting-point*
  • hostname: redis-meeting-point2

fleet:

  • MachineMetadata=rack=210
  • Conflicts=*redis-meeting-point*
  • hostname: redis-meeting-point3

fleet:

  • MachineMetadata=rack=310
  • Conflicts=*redis-meeting-point*

ggn prod-dc1 trip-meeting-point update -y ggn prod-dc1 redis-meeting-point update -y

slide-13
SLIDE 13

Volatile by design

trip-meeting-point dependencies

cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml

  • containers:
  • aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34
  • aci.blbl.cr/aci-go-synapse:15-41
  • aci.blbl.cr/aci-go-nerve:21-27
  • aci.blbl.cr/aci-logshipper:27

[...] cat ./aci-trip-meeting-point/aci-manifest.yml

  • name: aci.blbl.cr/aci-trip-meeting-point:{{.version}}

aci: dependencies:

  • aci.blbl.cr/aci-java:1.8.181-2

[...] cat ./aci-java/aci-manifest.yml

  • name: aci.blbl.cr/aci-java:1.8.181-2

aci: dependencies:

  • aci.blbl.cr/aci-debian:9.5-9
  • aci.blbl.cr/aci-common:7

trip-meeting-point aci-java aci-debian aci-common aci-trip-meeting-point aci-go-synapse aci-go-nerve aci-logshipper aci-hindsight

slide-14
SLIDE 14

Volatile - When should I redeploy?

A change in my own app/container: “immutable” Noisy neighbours: “mutualization” A change on a sidecar container or its dependencies When you are ready for instability your are HA

slide-15
SLIDE 15

How?

slide-16
SLIDE 16

Infrastructure Ecosystem

bare-metal servers

1 type of hardware 3 disk profiles

fleet cluster

CoreOS

ggn

“Distributed init system” Hardware

Container Registry

etcd dgr

Service Codebase

rkt PODs

build run store host create

mysqld

monitoring

nerve

mysql-main1

php nginx nerve

monitoring

synapse

front1

synapse

nerve

zookeeper

Service Discovery

slide-17
SLIDE 17

Infrastructure Ecosystem

bare-metal servers

1 type of hardware 3 disk profiles

fleet

CoreOS

ggn

“Distributed init system” Hardware

Container Registry

etcd dgr

Service Codebase

rkt PODs

build run store host create

mysqld

monitoring

nerve

mysql-main1

php nginx nerve

monitoring

synapse

front1

synapse

nerve

zookeeper

Service Discovery kubernetes

helm

slide-18
SLIDE 18

backend pod client pod

Service Discovery

/database/node1

go-nerve does health checks and reports to zookeeper in service keys

node1 /database

Applications hit their local haproxy to access backends go-synapse watches zookeeper service keys and reloads haproxy if changes are detected

HAProxy go-nerve Zookeeper go-synapse

slide-19
SLIDE 19

Stateful Services into containers

MariaDB as an example

slide-20
SLIDE 20

“Stateful” and “volatile by design”?

slide-21
SLIDE 21

The recipe/prereqs/pillars to succeed:

Be Quiet! “A node should be able to restart without impacting the app” Abolish Slavery “For a given service, every node have the same role” Build Smart “Services can be

  • perate by any SRE”
slide-22
SLIDE 22

MariaDB as an example

slide-23
SLIDE 23

Abolish Slavery

“For a given service, every node have the same role”

slide-24
SLIDE 24

Asynchronous vs. Synchronous

Master Slave Slave Slave wsrep wsrep wsrep wsrep

MariaDB Cluster

wsrep

MariaDB Cluster means

No Single Point of Failure

No Replication Lag Auto States Transfers As fast as the slowest

slide-25
SLIDE 25

The Target

wsrep wsrep wsrep wsrep

MariaDB Cluster

wsrep

MariaDB Cluster Containers

Writes go on one node

Writes

Reads are balanced

  • n the others

Reads

slide-26
SLIDE 26

How to hit the target?

Service Discovery

slide-27
SLIDE 27

# zookeepercli -c lsr /services/mysql/main mysql-main1_192.168.1.2_ba0f1f8b mysql-main2_192.168.1.3_734d63da mysql-main3_192.168.1.4_dde45787 # zookeepercli -c get /services/mysql/main/mysql-main1_192.168.1.2_ba0f1f8b3 { "available":true, "host":"192.168.1.2", "port":3306, "name":"mysql-main1", "weight":255, "labels":{ "host":"r10-srv4" } } # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml

  • verride:

nerve: services:

  • name: "mysql-main"

port: 3306 reporters:

  • {type: zookeeper, path: /services/mysql/main}

checks:

  • type: sql

driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"

Nerve - Track and report service status

slide-28
SLIDE 28

# cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml —-

  • verride:

tripsearch: database: read: host: localhaproxy database: tripsearch user: tripsearch_rd port: 3307 write: host: localhaproxy database: tripsearch user: tripsearch_wr port: 3308

Synapse - Service discovery router

# cat env/prod-dc1/services/tripsearch/attributes/synapse.yml

  • verride:

synapse: services:

  • name: mysql-main_read

path: /services/mysql/main port: 3307

  • name: mysql-main_write

path: /services/mysql/main port: 3308 serverOptions: backup serverSort: date

slide-29
SLIDE 29

Be Quiet!

“A node should be able to restart without impacting the app”

slide-30
SLIDE 30

# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml

  • verride:

nerve: services:

  • name: "mysql-main"

port: 3306 reporters:

  • {type: zookeeper, path: /services/mysql/main}

checks:

  • type: sql

driver: mysql request: "SELECT 1" datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"

Nerve - “Readiness Probe”

mysql -h 127.0.0.1 -ulocal_mon -plocal_mon -p3306 -e ‘SELECT 1;’

Starting Pod mysql-main1 Nerve check is KO Starting MySQL Nerve check is KO MySQL is syncing (IST/SST) Nerve check is KO MySQL is ready Nerve check is OK

slide-31
SLIDE 31

# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml

  • verride:

nerve: services:

  • name: "mysql-main"

port: 3306 reporters:

  • {type: zookeeper, path: /services/mysql/main}

checks:

  • type: sql

driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" disableCommand: "/report_remaining_processes.sh" disableMaxDurationInMilli: 180000

Nerve - “Grace Period”

Wait

The remaining sessions are finishing their job

Pod Stopped

The service can be shutdown without risk.

Stop Pod Call /disable on Nerve’s API

Set weight to 0 = no more new sessions will go into the services.

SELECT COUNT(1) FROM processlist WHERE user LIKE 'app_%';

slide-32
SLIDE 32

Build Smart

“Services can be operate by any SRE”

slide-33
SLIDE 33

Use Service Discovery to find peers

Example:

slide-34
SLIDE 34

Use Service Discovery to find peers

Eg: the wsrep_cluster_address attribute in Galera Cluster

Description: The addresses of cluster nodes to connect to when starting up. Good practice is to specify all possible cluster nodes, in the form gcomm://<node1 or ip:port>,<node2 or ip2:port>,<node3 or ip3:port>. Specifying an empty ip (gcomm://) will cause the node to start a new cluster.

node1

mysql-main

node2 node3 node1

Ask the Service Discovery to find mysql-main peers ? No peer found!

wsrep_cluster_address = gcomm://

node2

node1

wsrep_cluster_address = gcomm://node1

node3

node1, node2

wsrep_cluster_address = gcomm://node1,node2

slide-35
SLIDE 35

Next challenges

Kubernetes, the Cloud

slide-36
SLIDE 36

Kubernetes, the Cloud, why now?

slide-37
SLIDE 37

Kubernetes, the Cloud, why now?

Fleet is deprecated

Fleet is no longer developed and maintained by CoreOS.

Kubernetes

From a simple “Distributed init system” to the standard for container

  • rchestration.

Docker

rkt-based implementation

  • f Kubernetes has a poor

adoption.

slide-38
SLIDE 38

Service Oriented Architecture

Delegated Ownership.

Google Kubernetes Engine & Managed Services

Allows us to focus on services.

3 years old servers

We need to renew our hardware.

Kubernetes, the Cloud, why now?

slide-39
SLIDE 39

Kubernetes and stateful services?

slide-40
SLIDE 40

Kubernetes Statefulsets

Stable, unique network identifiers. Stable, persistent storage. Ordered, graceful deployment, scaling and rolling updates. StatefulSets control Pods that are based on an identical spec.

slide-41
SLIDE 41

Google Kubernetes Engine...

slide-42
SLIDE 42

Why are we excited about GKE?

Native suport of Liveness and Readiness probes Release granularity, from Pod to Deployment/Statefulset Native Service Discovery (kube-proxy and Services) GCEPersistentDisk provisioner to manage Persistent Volumes This + resources limitations make powerfull orchestration

slide-43
SLIDE 43

See you next year for 100% GKE Powered Carpooling !

slide-44
SLIDE 44
slide-45
SLIDE 45