Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, - PowerPoint PPT Presentation

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail

Datadog 10000s hosts in our infra Over 350 integrations 10s of k8s clusters with 50-2500 nodes Over 1,200 employees Multi-cloud Over 8,000 customers Very fast growth Runs on millions of hosts Trillions of data points per day lbernail

Why Kubernetes? Dogfooding Immutable Improve k8s integrations Move from Chef Multi Cloud Community Common API Large and Dynamic lbernail

The very hard way?

It was much harder

This talk is about the fine print “Of course, you will need a HA master setup” “Oh, and yes, you will have to manage your certificates” “By the way, networking is slightly more complicated, look into CNI / ingress controllers” lbernail

What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail

Resilient and Scalable Control Plane

Kube 101 Control Plane Master etcd apiserver scheduler controllers Service in-cluster kubelet kubectl apps lbernail

Making it resilient Master Master Master etcd etcd etcd apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail

Kube 101 Control Plane Master etcd apiserver scheduler controllers Service in-cluster kubelet kubectl apps lbernail

Separate etcd nodes etcd etcd Master Master Master apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail

Single active Controller/scheduler etcd etcd Master Master Master apiserver apiserver apiserver scheduler controllers scheduler controllers scheduler controllers Service LoadBalancer in-cluster kubelet kubectl apps lbernail

Split scheduler/controllers etcd apiserver apiserver apiserver controllers controllers Service LoadBalancer schedulers in-cluster schedulers kubelet kubectl apps lbernail

Split etcd etcd etcd events apiserver apiserver apiserver controllers controllers Service LoadBalancer schedulers in-cluster schedulers kubelet kubectl apps lbernail

Sizing the control plane 2x (3 or 5 nodes) disk + net ios etcd etcd events X nodes RAM + net ios apiserver apiserver apiserver 2 nodes controllers CPU controllers Service LoadBalancer 2 nodes schedulers CPU in-cluster schedulers kubelet kubectl apps lbernail

What happens after “Kube 101” 1. Resilient and Scalable Control Plane 2. Securing the Control Plane a. Kubernetes and Certificates b. Exceptions? c. Impact of Certificate Rotation 3. Efficient networking a. Giving pod IPs and routing them b. Ingresses: Getting data in the cluster lbernail

Kubernetes and Certificates

From “the hard way” lbernail

“Our cluster broke after ~1y” lbernail

Certificates in Kubernetes ● Kubernetes uses certificates everywhere ● Very common source of incidents ● Our Strategy: Rotate all certificates daily lbernail

Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t n e i l C d c t E apiserver lbernail

Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver controllers Controller client cert scheduler Scheduler client cert kubelet Kubelet client/server cert lbernail

Certificate management etcd PKI Peer/Server cert etcd Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster kubelet app Kubelet client/server cert lbernail

Certificate management etcd PKI Peer/Server cert etcd Apiservice cert (proxy/webhooks) apiservice PKI Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e s i t p E A apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster apiservice kubelet app webhook... Kubelet client/server cert lbernail

Certificate management etcd PKI Peer/Server cert OIDC etcd provider Apiservice cert (proxy/webhooks) apiservice PKI Vault t r e c t r e t c n t n e kube PKI e i l c t i e l l e C b u k / r d e v r c e OIDC auth s i t p E A kubectl apiserver SA public key kube kv SA private key controllers Controller client cert n e k o t A S scheduler Scheduler client cert In-cluster apiservice kubelet app webhook... Kubelet client/server cert lbernail

Exception ? Incident...

Kubelet: TLS Bootstrap apiserver kube PKI Vault 3- Get signing key kube kv controllers 1- Create Bootstrap token admin 2- Add Bootstrap token to vault lbernail

Kubelet: TLS Bootstrap 3- Verify Token and map groups apiserver kube PKI Vault kube kv controllers 2- Authenticate with token 4- Create CSR 5- Verify RBAC for CSR creator 6- Sign certificate 7- Download certificate 8- Authenticate with cert 9- Register node 1- Get Bootstrap token kubelet lbernail

Kubelet certificate issue 1. One day, some Kubelets were failing to start or took 10s of minutes 2. Nothing in logs 3. Everything looked good but they could not get a cert 4. Turns out we had a lot of CSRs in flight 5. Signing controller was having a hard time evaluating them all CSR resources in the cluster Lower is better! lbernail

Why? Kubelet Authentication ● Initial creation: bootstrap token, mapped to group “ system:bootstrappers ” ● Renewal: use current node certificate, mapped to group “ system:nodes “ Required RBAC permissions ● CSR creation ● CSR auto-approval CSR creation CSR auto-approval system:bootstrappers OK OK system:nodes OK lbernail

Exception 2? Incident 2...

Temporary solution apiserver Create webhook with self-signed cert as CA Vault Get cert and key admin webhook kube kv Add self-signed cert + key to Vault One day, after ~1 year ● Creation of resources started failing (luckily only a Custom Resource) ● Cert had expired... lbernail

Take-away ● Rotate server/client certificates ● Not easy But, “If it’s hard, do it often” > no expiration issues anymore lbernail

Impact of Certificate rotation

Apiserver certificate rotation

Impact on etcd apiserver restarts We have multiple apiservers We restart each daily etcd traffic Significant etcd network impact (caches are repopulated) etcd slow queries Significant impact on etcd performances lbernail

Impact on Load-balancers apiserver restarts ELB surge queue Significant impact on LB as connections are reestablished Mitigation: increase queues on apiservers net.ipv4.tcp_max_syn_backlog net.core.somaxconn

Impact on apiserver clients apiserver restarts ● Apiserver restarts ● clients reconnect and refresh their cache coredns memory usage > Memory spike for impacted apps No real mitigation today lbernail

Impact on traffic balance 15MB/s 2.5MB/s Number of connections / traffic very unbalanced Because connections are very long-lived More clients => Bigger impact clusterwide 2300 connections 300 connections lbernail

Why? Simple simulation Simulation for 48h ● 5 apiservers ● 10000 connections (4 x 2500 nodes) ● Every 4h, one apiserver restarts ● Reconnections evenly dispatched Cause ● Cloud TCP load-balancers use round-robin ● Long-lived connections ● No rebalancing lbernail

Kubelet certificate rotation

Pod graceful termination admin or apiserver controller Delete pod Stop Container with timeout “terminationGracePeriodSeconds” kubelet containerd Send SIGTERM After timeout, send SIGKILL container

Restarts impact graceful termination admin or apiserver controller Delete pod kubelet containerd Send SIGTERM After timeout, or Context Cancelled send SIGKILL container Kubelet restarts end graceful termination Fixed upstream “Do not SIGKILL container if container stop is cancelled” https://github.com/containerd/cri/pull/1099

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, - PowerPoint PPT Presentation

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail Datadog 10000s hosts in our infra Over 350 integrations 10s of k8s clusters with 50-2500 nodes Over 1,200 employees Multi-cloud Over 8,000 customers

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

Continuous Delivery the hard way with Kubernetes Luke Marsden, Developer Experience @lmarsden

Kubernetes Administration from Zero to (junior) Hero Lszl Budai Component Soft Ltd.

OpenStack on Kubernetes: Make OpenStack and Kubernetes Fail-Safe Seungkyu Ahn (ahnsk@sk.com)

Stateful workloads on kubernetes with ceph Agenda CaaS Kubernetes

Developing Kubernetes Services at Airbnb Scale @MELANIECEBULA What is kubernetes?

Lecture 3: Kubernetes AC295 AC295 Advanced Practical Data Science Pavlos Protopapas Outline

Kubernetes APIs Under the Hood @pwittrock Who Am I? Phillip Wittrock (@pwittrock) Software

Kubernetes & AI with Run:AI, Red Hat & Excelero AI WEBINAR Date/Time: Tuesday, June 9 |

Data Management in Kubernetes Using Kanister T om Manville | April 25th, 2018 2 3 4 yes* 5

Draft Report & Recommendations Briefing for: Mayors Task Force on the Prevention of

IN A LAGRANGIAN MODELLING SYSTEM FOR EMERGENCY RESPONSE PURPOSES Patrick ARMAND 1 , Christophe

Illustra(on showing the new shipping lane with easy direct

IP IP Ea Earn rnings ings Anc Anchor hored ed 14% 14% Gr Growt owth h in in Gr Group

Public Utilities Commission Fiscal Year 2016 17 and 2017 18 2 Year Capital Budget

WATER COMMITTEE MEETING June 24, 2020 (Via Teleconference) Agenda AGENDA Opening Remarks A.

C Community Board Eight Community Board Eight C it B it B d Ei ht d Ei ht Second Avenue

Technical Team Meeting #10 February 24, 2014 CDOT I-70 Mountain Corridor | HDR Engineering,

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, - PowerPoint PPT Presentation

Kubernetes the Very Hard Way Laurent Bernaille Staff Engineer, Infrastructure @lbernail Datadog 10000s hosts in our infra Over 350 integrations 10s of k8s clusters with 50-2500 nodes Over 1,200 employees Multi-cloud Over 8,000 customers

Airflow on Kubernetes: Containerizing your Workflows By Michael Hewitt Agenda Kubernetes

Kubernetes on ARM64 Kubernetes on ARM64 Raspberry PI 4 Kubernetes cloud for a Raspberry PI 4

Matthias Sohn Adel Zaalouk SAP From Containers to Kubernetes From Containers to Kubernetes

From Laptop to the World With Kubernetes @saturnism @googlecloud #kubernetes Ray Tsang

Contributing to kubernetes Who am I? Senior Software Engineer at Gojek Organizer at Kubernetes

Continuous Kubernetes Security @sublimino and @controlplaneio Im: - Andy - Dev-like -

Kubernetes Matthias Haeussler Mirna Alaisami Overview Overview Kubernetes is an open-source

Continuous Delivery the hard way with Kubernetes Luke Marsden, Developer Experience @lmarsden

Kubernetes Administration from Zero to (junior) Hero Lszl Budai Component Soft Ltd.

OpenStack on Kubernetes: Make OpenStack and Kubernetes Fail-Safe Seungkyu Ahn (ahnsk@sk.com)

Stateful workloads on kubernetes with ceph Agenda CaaS Kubernetes

Developing Kubernetes Services at Airbnb Scale @MELANIECEBULA What is kubernetes?

Lecture 3: Kubernetes AC295 AC295 Advanced Practical Data Science Pavlos Protopapas Outline

Kubernetes APIs Under the Hood @pwittrock Who Am I? Phillip Wittrock (@pwittrock) Software

Kubernetes &amp; AI with Run:AI, Red Hat &amp; Excelero AI WEBINAR Date/Time: Tuesday, June 9 |

Data Management in Kubernetes Using Kanister T om Manville | April 25th, 2018 2 3 4 yes* 5

Draft Report &amp; Recommendations Briefing for: Mayors Task Force on the Prevention of

IN A LAGRANGIAN MODELLING SYSTEM FOR EMERGENCY RESPONSE PURPOSES Patrick ARMAND 1 , Christophe

Illustra(on showing the new shipping lane with easy direct

IP IP Ea Earn rnings ings Anc Anchor hored ed 14% 14% Gr Growt owth h in in Gr Group

Public Utilities Commission Fiscal Year 2016 17 and 2017 18 2 Year Capital Budget

WATER COMMITTEE MEETING June 24, 2020 (Via Teleconference) Agenda AGENDA Opening Remarks A.

C Community Board Eight Community Board Eight C it B it B d Ei ht d Ei ht Second Avenue

Technical Team Meeting #10 February 24, 2014 CDOT I-70 Mountain Corridor | HDR Engineering,

Kubernetes & AI with Run:AI, Red Hat & Excelero AI WEBINAR Date/Time: Tuesday, June 9 |

Draft Report & Recommendations Briefing for: Mayors Task Force on the Prevention of