100% Containers Powered Carpooling Maxime Fouilleul Database - - PowerPoint PPT Presentation
100% Containers Powered Carpooling Maxime Fouilleul Database - - PowerPoint PPT Presentation
100% Containers Powered Carpooling Maxime Fouilleul Database Reliability Engineer BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Todays agenda Stateful Services into containers - MariaDB as
Maxime Fouilleul
Database Reliability Engineer
Today’s agenda
BlaBlaCar - Facts & Figures Infrastructure Ecosystem - 100% containers powered carpooling Stateful Services into containers - MariaDB as an example Next challenges - Kubernetes, the Cloud
BlaBlaCar
Facts & Figures
60 million members Founded in 2006 1 million tonnes less CO2
In the past year
30 million mobile app downloads
iPhone and Android
15 million travellers /quarter Currently in 22 countries
France, Spain, UK, Italy, Poland, Hungary, Croatia, Serbia, Romania, Germany, Belgium, India, Mexico, The Netherlands, Luxembourg, Portugal, Ukraine, Czech Republic, Slovakia, Russia, Brazil and Turkey.
Facts and Figures
MariaDB Redis PostgreSQL Transactional
Our prod data ecosystem
Cassandra Distributed Volatile Spatial Kafka Stream ElasticSearch Search
Infrastructure Ecosystem
100% containers powered carpooling
Why containers?
Homogeneous Hardware
From this
srv_001 svc_001 srv_002 svc_002 srv_003 svc_003 srv_004 svc_004 srv_005 svc_005 srv_006 svc_006 srv_007 svc_007 srv_008 svc_008 srv_009 svc_009 srv_010 svc_010 srv_011 svc_011 srv_012 svc_012 srv_013 srv_014 svc_013 svc_014
Homogeneous Hardware
To that
srv_007 srv_008 svc_013 srv_005 srv_006 srv_003 srv_004 srv_001 srv_002 svc_001 svc_002 svc_003 svc_004 svc_005 svc_006 svc_007 svc_010 svc_008 svc_011 svc_009 svc_012 svc_014
Homogeneous Hardware - “Pets vs Cattle”
Easier to replace broken hardware Cost Effective Easier to manage
redis trip-meeting-point
Homogeneous Deployment
trip-meeting-point application
cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml
- containers:
- aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34
- aci.blbl.cr/aci-go-synapse:15-40
- aci.blbl.cr/aci-go-nerve:21-27
- aci.blbl.cr/aci-logshipper:27
nodes:
- hostname: trip-meeting-point1
gelf: level: INFO fleet:
- MachineMetadata=rack=110
- Conflicts=*trip-meeting-point*
- hostname: trip-meeting-point2
fleet:
- MachineMetadata=rack=210
- Conflicts=*trip-meeting-point*
- hostname: trip-meeting-point3
fleet:
- MachineMetadata=rack=310
- Conflicts=*trip-meeting-point*
cat ./prod-dc1/services/redis-meeting-point/service-manifest.yml
- containers:
- aci.blbl.cr/aci-redis:4.0.2-1
- aci.blbl.cr/aci-redis-dictator:20
- aci.blbl.cr/aci-go-nerve:21-27
- aci.blbl.cr/aci-prometheus-redis-exporter:0.12.2-1
nodes:
- hostname: redis-meeting-point1
fleet:
- MachineMetadata=rack=110
- Conflicts=*redis-meeting-point*
- hostname: redis-meeting-point2
fleet:
- MachineMetadata=rack=210
- Conflicts=*redis-meeting-point*
- hostname: redis-meeting-point3
fleet:
- MachineMetadata=rack=310
- Conflicts=*redis-meeting-point*
ggn prod-dc1 trip-meeting-point update -y ggn prod-dc1 redis-meeting-point update -y
Volatile by design
trip-meeting-point dependencies
cat ./prod-dc1/services/trip-meeting-point/service-manifest.yml
- containers:
- aci.blbl.cr/aci-trip-meeting-point:20180928.145115-v-979da34
- aci.blbl.cr/aci-go-synapse:15-41
- aci.blbl.cr/aci-go-nerve:21-27
- aci.blbl.cr/aci-logshipper:27
[...] cat ./aci-trip-meeting-point/aci-manifest.yml
- name: aci.blbl.cr/aci-trip-meeting-point:{{.version}}
aci: dependencies:
- aci.blbl.cr/aci-java:1.8.181-2
[...] cat ./aci-java/aci-manifest.yml
- name: aci.blbl.cr/aci-java:1.8.181-2
aci: dependencies:
- aci.blbl.cr/aci-debian:9.5-9
- aci.blbl.cr/aci-common:7
trip-meeting-point aci-java aci-debian aci-common aci-trip-meeting-point aci-go-synapse aci-go-nerve aci-logshipper aci-hindsight
Volatile - When should I redeploy?
A change in my own app/container: “immutable” Noisy neighbours: “mutualization” A change on a sidecar container or its dependencies When you are ready for instability your are HA
How?
Infrastructure Ecosystem
bare-metal servers
1 type of hardware 3 disk profiles
fleet cluster
CoreOS
ggn
“Distributed init system” Hardware
Container Registry
etcd dgr
Service Codebase
rkt PODs
build run store host create
mysqld
monitoring
nerve
mysql-main1
php nginx nerve
monitoring
synapse
front1
synapse
nerve
zookeeper
Service Discovery
Infrastructure Ecosystem
bare-metal servers
1 type of hardware 3 disk profiles
fleet
CoreOS
ggn
“Distributed init system” Hardware
Container Registry
etcd dgr
Service Codebase
rkt PODs
build run store host create
mysqld
monitoring
nerve
mysql-main1
php nginx nerve
monitoring
synapse
front1
synapse
nerve
zookeeper
Service Discovery kubernetes
helm
backend pod client pod
Service Discovery
/database/node1
go-nerve does health checks and reports to zookeeper in service keys
node1 /database
Applications hit their local haproxy to access backends go-synapse watches zookeeper service keys and reloads haproxy if changes are detected
HAProxy go-nerve Zookeeper go-synapse
Stateful Services into containers
MariaDB as an example
“Stateful” and “volatile by design”?
The recipe/prereqs/pillars to succeed:
Be Quiet! “A node should be able to restart without impacting the app” Abolish Slavery “For a given service, every node have the same role” Build Smart “Services can be
- perate by any SRE”
MariaDB as an example
Abolish Slavery
“For a given service, every node have the same role”
Asynchronous vs. Synchronous
Master Slave Slave Slave wsrep wsrep wsrep wsrep
MariaDB Cluster
wsrep
MariaDB Cluster means
No Single Point of Failure
No Replication Lag Auto States Transfers As fast as the slowest
The Target
wsrep wsrep wsrep wsrep
MariaDB Cluster
wsrep
MariaDB Cluster Containers
Writes go on one node
Writes
Reads are balanced
- n the others
Reads
How to hit the target?
Service Discovery
# zookeepercli -c lsr /services/mysql/main mysql-main1_192.168.1.2_ba0f1f8b mysql-main2_192.168.1.3_734d63da mysql-main3_192.168.1.4_dde45787 # zookeepercli -c get /services/mysql/main/mysql-main1_192.168.1.2_ba0f1f8b3 { "available":true, "host":"192.168.1.2", "port":3306, "name":"mysql-main1", "weight":255, "labels":{ "host":"r10-srv4" } } # cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
- verride:
nerve: services:
- name: "mysql-main"
port: 3306 reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
Nerve - Track and report service status
# cat env/prod-dc1/services/tripsearch/attributes/tripsearch.yml —-
- verride:
tripsearch: database: read: host: localhaproxy database: tripsearch user: tripsearch_rd port: 3307 write: host: localhaproxy database: tripsearch user: tripsearch_wr port: 3308
Synapse - Service discovery router
# cat env/prod-dc1/services/tripsearch/attributes/synapse.yml
- verride:
synapse: services:
- name: mysql-main_read
path: /services/mysql/main port: 3307
- name: mysql-main_write
path: /services/mysql/main port: 3308 serverOptions: backup serverSort: date
Be Quiet!
“A node should be able to restart without impacting the app”
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
- verride:
nerve: services:
- name: "mysql-main"
port: 3306 reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql request: "SELECT 1" datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/"
Nerve - “Readiness Probe”
mysql -h 127.0.0.1 -ulocal_mon -plocal_mon -p3306 -e ‘SELECT 1;’
Starting Pod mysql-main1 Nerve check is KO Starting MySQL Nerve check is KO MySQL is syncing (IST/SST) Nerve check is KO MySQL is ready Nerve check is OK
# cat env/prod-dc1/services/mysql-main/attributes/nerve.yml
- verride:
nerve: services:
- name: "mysql-main"
port: 3306 reporters:
- {type: zookeeper, path: /services/mysql/main}
checks:
- type: sql
driver: mysql datasource: "local_mon:local_mon@tcp(127.0.0.1:3306)/" disableCommand: "/report_remaining_processes.sh" disableMaxDurationInMilli: 180000
Nerve - “Grace Period”
Wait
The remaining sessions are finishing their job
Pod Stopped
The service can be shutdown without risk.
Stop Pod Call /disable on Nerve’s API
Set weight to 0 = no more new sessions will go into the services.
SELECT COUNT(1) FROM processlist WHERE user LIKE 'app_%';
Build Smart
“Services can be operate by any SRE”
Use Service Discovery to find peers
Example:
Use Service Discovery to find peers
Eg: the wsrep_cluster_address attribute in Galera Cluster
Description: The addresses of cluster nodes to connect to when starting up. Good practice is to specify all possible cluster nodes, in the form gcomm://<node1 or ip:port>,<node2 or ip2:port>,<node3 or ip3:port>. Specifying an empty ip (gcomm://) will cause the node to start a new cluster.
node1
mysql-main
node2 node3 node1
Ask the Service Discovery to find mysql-main peers ? No peer found!
wsrep_cluster_address = gcomm://
node2
node1
wsrep_cluster_address = gcomm://node1
node3
node1, node2
wsrep_cluster_address = gcomm://node1,node2
Next challenges
Kubernetes, the Cloud
Kubernetes, the Cloud, why now?
Kubernetes, the Cloud, why now?
Fleet is deprecated
Fleet is no longer developed and maintained by CoreOS.
Kubernetes
From a simple “Distributed init system” to the standard for container
- rchestration.
Docker
rkt-based implementation
- f Kubernetes has a poor
adoption.
Service Oriented Architecture
Delegated Ownership.
Google Kubernetes Engine & Managed Services
Allows us to focus on services.
3 years old servers
We need to renew our hardware.
Kubernetes, the Cloud, why now?
Kubernetes and stateful services?
Kubernetes Statefulsets
Stable, unique network identifiers. Stable, persistent storage. Ordered, graceful deployment, scaling and rolling updates. StatefulSets control Pods that are based on an identical spec.
Google Kubernetes Engine...
Why are we excited about GKE?
Native suport of Liveness and Readiness probes Release granularity, from Pod to Deployment/Statefulset Native Service Discovery (kube-proxy and Services) GCEPersistentDisk provisioner to manage Persistent Volumes This + resources limitations make powerfull orchestration