π¦ do.co/doks @maybeawg
20,000 Upgrades Later
Lessons From a Year of Managed Kubernetes Upgrades Adam Wolfe Gordon DigitalOcean
1
20,000 Upgrades Later Lessons From a Year of Managed Kubernetes - - PowerPoint PPT Presentation
20,000 Upgrades Later Lessons From a Year of Managed Kubernetes Upgrades Adam Wolfe Gordon DigitalOcean do.co/doks 1 @maybeawg This Talk Started One(ish) Year Ago... Me, in Barcelona DO, in Barcelona do.co/doks 2 @maybeawg
π¦ do.co/doks @maybeawg
1
π¦ do.co/doks @maybeawg
2
π¦ do.co/doks @maybeawg
3
UPGRADES!
π¦ do.co/doks @maybeawg
5
π¦ do.co/doks @maybeawg
β You might upgrade differently!
β Your workloads might be different!
6
π¦ do.co/doks @maybeawg
7
π¦ do.co/doks @maybeawg
a. Update any resources that arenβt supported in the target version. b. Upgrade etcd (if needed). c. Upgrade kube-apiserver. d. Upgrade kube-controller-manager. e. Upgrade kube-scheduler. f. Upgrade your CNI plugin (if needed). g. Upgrade provider-specific components (e.g. cloud-controller-manager, CSI controller). h. Upgrade kubelet and kubectl.
a. Cordon and drain a worker node. b. Update kubelet configuration (if needed). c. Upgrade the kubelet. d. Uncordon the node. e. Repeat for each node in the cluster.
8
π¦ do.co/doks @maybeawg
a. Update any resources that arenβt supported in the target version. b. Upgrade etcd (if needed).
c. Upgrade kube-apiserver.
d. Upgrade kube-controller-manager. e. Upgrade kube-scheduler. f. Upgrade your CNI plugin (if needed). g. Upgrade provider-specific components (e.g. cloud-controller-manager, CSI controller). h. Upgrade kubelet and kubectl.
a. Cordon and drain a worker node. b. Update kubelet configuration (if needed).b. Destroy the node. c. Upgrade the kubelet.
d. Uncordon the node. e. Repeat for each node in the cluster.
9
π¦ do.co/doks @maybeawg
β (Mostly)
10
π¦ do.co/doks @maybeawg
11
π¦ do.co/doks @maybeawg
12
π¦ do.co/doks @maybeawg
13
π¦ do.co/doks @maybeawg
β DaemonSets β Init containers
14
π¦ do.co/doks @maybeawg
15
π¦ do.co/doks @maybeawg
β Might be drained to a node thatβs about to be deleted.
16
π¦ do.co/doks @maybeawg
17
π¦ do.co/doks @maybeawg
18
π¦ do.co/doks @maybeawg
19
π¦ do.co/doks @maybeawg
β Making replacement even slower.
20
π¦ do.co/doks @maybeawg
β This usually requires make-before-break.
21
π¦ do.co/doks @maybeawg
β Safely: Use PodDisruptionBudgets. β Quickly: Respond to signals.
22
π¦ do.co/doks @maybeawg
23
π¦ do.co/doks @maybeawg
24
π¦ do.co/doks @maybeawg
25
π¦ do.co/doks @maybeawg
26
π¦ do.co/doks @maybeawg
27
π¦ do.co/doks @maybeawg
28
π¦ do.co/doks @maybeawg
29
π¦ do.co/doks @maybeawg
30
π¦ do.co/doks @maybeawg
31
π¦ do.co/doks @maybeawg
32
π¦ do.co/doks @maybeawg
33
π¦ do.co/doks @maybeawg
34
π¦ do.co/doks @maybeawg
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: webhook.example.com webhooks:
rules:
apiVersions: ["v1"]
resources: ["pods"] scope: "Namespaced" clientConfig: service: namespace: "webhook-namespace" name: "webhook-service" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None timeoutSeconds: 30 failurePolicy: Ignore
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: webhook.example.com webhooks:
rules:
apiVersions: ["v1"]
resources: ["pods"] scope: "Namespaced" clientConfig: service: namespace: "webhook-namespace" name: "webhook-service" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None timeoutSeconds: 30 failurePolicy: Fail
35
π¦ do.co/doks @maybeawg
β Usually in the kube-system namespace.
36
π¦ do.co/doks @maybeawg
webhook-service kube-proxy cilium
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: webhook.example.com webhooks:
rules:
apiVersions: ["v1"]
resources: ["pods"] scope: "Namespaced" clientConfig: service: namespace: "webhook-namespace" name: "webhook-service" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None timeoutSeconds: 30 failurePolicy: Fail
37
π¦ do.co/doks @maybeawg
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: webhook.example.com webhooks:
rules:
apiVersions: ["v1"]
resources: ["pods"] scope: "Namespaced" clientConfig: service: namespace: "webhook-namespace" name: "webhook-service" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None timeoutSeconds: 30 failurePolicy: Ignore
π¦ do.co/doks @maybeawg
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: webhook.example.com webhooks:
rules:
apiVersions: ["v1"]
resources: ["pods"] scope: "Namespaced" clientConfig: service: namespace: "webhook-namespace" name: "webhook-service" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None timeoutSeconds: 5 failurePolicy: Ignore
π¦ do.co/doks @maybeawg
apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingWebhookConfiguration metadata: name: webhook.example.com webhooks:
namespaceSelector: matchExpressions:
... clientConfig: service: namespace: "webhook-namespace" name: "webhook-service" admissionReviewVersions: ["v1", "v1beta1"] sideEffects: None timeoutSeconds: 5 failurePolicy: Fail
π¦ do.co/doks @maybeawg
41
π¦ do.co/doks @maybeawg
β Or run the webhook service outside the cluster.
42
π¦ do.co/doks @maybeawg
β Retain node names and IP addresses if you can. β Workloads should assume that nodes will go away. β Create new nodes before destroying old ones, if possible.
β Especially if you avoid alpha features.
β Check your targets. β Check your failure policies. β Check your timeouts.
43
π¦ do.co/doks @maybeawg
Adam Wolfe Gordon awg@do.co