Running Kafka on Kubernetes with Strimzi Sean Glover, Lightbend - - PowerPoint PPT Presentation

running kafka on kubernetes with strimzi
SMART_READER_LITE
LIVE PREVIEW

Running Kafka on Kubernetes with Strimzi Sean Glover, Lightbend - - PowerPoint PPT Presentation

Running Kafka on Kubernetes with Strimzi Sean Glover, Lightbend @seg1o Who am I? Im Sean Glover Principal Engineer at Lightbend Member of the Lightbend Pipelines team Organizer of Scala Toronto (scalator) Author and


slide-1
SLIDE 1

Running Kafka on Kubernetes with Strimzi

Sean Glover, Lightbend @seg1o

slide-2
SLIDE 2

@seg1o

Who am I?

I’m Sean Glover

  • Principal Engineer at Lightbend
  • Member of the Lightbend Pipelines team
  • Organizer of Scala Toronto (scalator)
  • Author and contributor to various projects in the Kafka

ecosystem including Kafka, Alpakka Kafka (reactive-kafka), Strimzi, Kafka Lag Exporter, DC/OS Commons SDK

3

/ seg1o

https://seanglover.com/ sean@seanglover.com

slide-3
SLIDE 3

@seg1o

Operations Is Hard

“Technology will make our lives easier”

Technology makes running other technology easier Automate as much operations work as we can

4

Designed by Freepik

slide-4
SLIDE 4

Motivating Example: Zero-downtime Kafka Upgrade

slide-5
SLIDE 5

@seg1o

Motivating Example: Upgrading Kafka

High level steps to upgrade Kafka 1. Rolling update to explicitly define broker properties inter.broker.protocol.version and log.message.format.version 2. Download new Kafka distribution and perform rolling upgrade 1 broker at a time 3. Rolling update to upgrade inter.broker.protocol.version to new version 4. Upgrade Kafka clients 5. Rolling update to upgrade log.message.format.version to new version

7

slide-6
SLIDE 6

@seg1o

Motivating Example: Upgrading Kafka

Any update to the Kafka cluster must be performed in a serial “rolling update”. The complete Kafka upgrade process requires 3 “rolling updates” Each broker update requires

  • Secure login
  • Configuration linting - Any change to a broker requires a rolling broker update
  • Graceful shutdown - Send SIGINT signal to broker
  • Broker initialization - Wait for Broker to join cluster and signal it’s ready

This operation is error-prone to do manually and difficult to model declaratively using generalized infrastructure automation tools.

8

slide-7
SLIDE 7

@seg1o

Automation

“If it hurts, do it more frequently, and bring the pain forward.”

  • Jez Humble, Continuous Delivery

9

slide-8
SLIDE 8

@seg1o

Automation of Operations

Upgrading Kafka is just one of many complex operational concerns. For example)

  • Initial deployment
  • Manage ZooKeeper
  • Replacing brokers
  • Topic partition rebalancing
  • Decommissioning or adding brokers

How do we automate complex operational workflows in a reliable way?

10

slide-9
SLIDE 9

Container Orchestrated Clusters

slide-10
SLIDE 10

@seg1o

Cluster Resource Managers

12

slide-11
SLIDE 11

@seg1o

Task Isolation with Containers

  • Cluster Resource Manager’s use Linux Containers to

constrain resources and provide isolation

  • cgroups constrain resources
  • Namespaces isolate file system/process trees
  • Docker is just a project to describe and share containers

efficiently (others: rkt, LXC, Mesos)

  • Containers are available for several platforms

13

Physical or Virtual Machine

Linux Kernel

Namespaces cgroups Modules

Cluster Resource Manager Container Engine

Container Container Container User space Kernel space Drivers

Linux Containers (LXC)

Jail Linux Container Windows Container

slide-12
SLIDE 12

Kubernetes and the Operator Pattern

slide-13
SLIDE 13

@seg1o

15

slide-14
SLIDE 14

@seg1o

The Operator Pattern

16

  • 1. Controller/Operator

// Active Reconciliation Loop for { desired := getDesiredState() current := getCurrentState() makeChanges(desired, current) }

Kafka Cluster

watches CRUD changes deploy reconciliation plan

  • 2. Configuration State

“Kafka” Custom Resource

apiVersion: kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: simple-strimzi spec: kafka: config: ...

“Kafka” Custom Resource

apiVersion: kafka.strimzi.io/v1alpha1 kind: Kafka metadata: name: simple-strimzi spec: kafka: config: ...

slide-15
SLIDE 15

@seg1o

Stateful Services in Kubernetes

17

StatefulSet

name: kafka-brokers

Pod

name: kafka-brokers-0

PersistentVolumeClaim

name: data-kafka-brokers-0

PersistentVolume

name: pvc-2a4f8bcb-45cd

StatefulSet’s

  • Stable pod & network identity
  • Stable persistent storage
  • Ordered deployment and updates
  • Ordered graceful deletion and termination
  • Ordered automated rolling updates.
slide-16
SLIDE 16

@seg1o

Abstracting Persistence

18

PersistentVolumeClaim

name: data-kafka-brokers-0 size: 10GB storage class: aws-ebs

PersistentVolume

name: pvc-2a4f8bcb-45cd

AWS EBS Volume StorageClass

name: aws-ebs provisioner: kubernetes.io/aws-ebs Provisioner (aws-ebs)

slide-17
SLIDE 17

Strimzi

An operator-based Kafka on Kubernetes project

slide-18
SLIDE 18

@seg1o

Strimzi

Strimzi is an open source operator-based Apache Kafka project for Kubernetes and OpenShifu

  • Announced Feb 25th, 2018
  • Evolved from non-operator project known as

Barnabas by Paolo Patierno, Red Hat

  • Part of Red Hat Developer Program
  • “Streams” component of Red Hat AMQ, a

commercial product of messaging technologies by Red Hat

20

slide-19
SLIDE 19

@seg1o

Cluster Operator

21

“Kafka” CRD

watches deploys

Kafka StatefulSet ZooKeeper StatefulSet Broker Pod Broker Pod Broker Pod ZK Pod Cluster Operator Entity Operators

(User and Topic Operator)

Demo: ./resources/simple-strimzi.yaml

slide-20
SLIDE 20

@seg1o

Entity Operator (User and Topic Operators)

22

“KafkaTopic” CRD Kafka and ZooKeeper StatefulSets Entity Operators Topic Operator User Operator “KafkaUser” CRD

synchronizes with watches

Demo: ./resources/simple-topic.yaml

slide-21
SLIDE 21

@seg1o

Strimzi Storage Modes

23

Broker Pod emptyDir Volume

  • 1. Ephemeral

Broker Pod PersistentVolume (PV)

  • 2. Persistent

Broker Pod PV 2 (b). Persistent JBOD PV PV

transient persistent persistent

Broker config

log.dirs = [PV1, PV2, PV3]

slide-22
SLIDE 22

Operational Concerns

slide-23
SLIDE 23

@seg1o

Install Strimzi

Installation and running a Strimzi Kafka cluster is a two step process. 1. Install the Strimzi Helm Chart 2. Create a Kafka Kubernetes resource

Helm Chart Install: helm repo add strimzi http://strimzi.io/charts/ helm install strimzi/strimzi-kafka-operator Demo: ./demo/01-create-simple-strimzi-cluster.sh 25

slide-24
SLIDE 24

@seg1o

Connecting Clients

simple-strimzi-kafka-bootstrap.strimzi.svc.cluster.local:9092

27

Kafka resource metadata.name Broker load balancer name Namespace K8s Service

Fully qualified service hostname:

“Plain” 9092 TLS 9093 Interbroker 9094 Prometheus 9404

Demo: ./demo/02-connecting-clients.sh run-kafka-perf-producer.sh

slide-25
SLIDE 25

@seg1o

Rolling Configuration Updates

Rolling Configuration Process 1. Watched Kafka resource change 2. Apply new config to Kafka StatefulSet spec 3. Starting from pod 0, delete the pod and allow the StatefulSet to recreate it 4. Kafka pod will generate new broker.config 5. Kafka is started 6. Wait until the readiness check is good. 7. Repeat from step 3 for the next pod

Demo: ./demo/03-broker-config-update.sh 28

slide-26
SLIDE 26

@seg1o

Scaling Brokers Up

1. Increase replica count spec.kafka.replicas 2. Reassign partitions: ./bin/kafka-reassign-partitions.sh

Demo: ./demo/04-scale-brokers.sh ./partition-reassignment/generate-plan-output.json 29

kafka-0 kafka-0 kakfa-1 kafka-2

P0 P1 P2 P0 P1 P2

slide-27
SLIDE 27

@seg1o

Rolling Broker Upgrades

Rolling Broker Upgrade Process: 1. Upgrade Strimzi Cluster Operator 2. Update config: a. (Optional) Set log.message.format.version broker config b. Set desired Kafka release version Rolling Updates (1-2x) 3. (Optional) Upgrade clients using cluster 4. (Optional) Set log.message.format.version broker config Rolling Update (0-1x)

30

slide-28
SLIDE 28

@seg1o

Broker Replacement & Movement

Replacing brokers is common with large busy clusters $ kubectl delete pod kafka-1 Broker replacement also useful to facilitate broker movement across the cluster 1. Research the max bitrate per partition for your cluster 2. Move partitions from broker to replace 3. Replace broker 4. Rebalance/move partitions to new broker

31

slide-29
SLIDE 29

@seg1o

Broker Replacement & Movement

1. Research the max bitrate per partition for your cluster

Run a controlled test

  • Bitrate depends on message size, producer batch, and consumer fetch size
  • Create a standalone cluster with 1 broker, 1 topic, and 1 partition
  • Run producer and consumer perf tests using average message/client properties
  • Measure broker metric for average bitrate

kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec

32

slide-30
SLIDE 30

@seg1o

Broker Replacement & Movement

2. Move partitions from broker to replace

Use Kafka partition reassignment tool

  • Generate an assignment plan without old broker 1
  • Pick a fraction of the measured max bitrate found in step 1

(Ex. 75%, 80%)

  • Apply plan with bitrate throttle
  • Wait till complete

33

Broker 0

P P P P

Broker 1

P P P P

Broker 2

P P P P

Broker 0

P P P P

Broker 1

P P

Broker 2

P P P P P P

kafka-reassign-partitions … --topics-to-move-json-file topics.json --broker-list "0,2" --generate kafka-reassign-partitions … --reassignment-json-file reassignment.json --execute --throttle 10000000 kafka-reassign-partitions … --topics-to-move-json-file topics.json --reassignment-json-file reassignment.json --verify

slide-31
SLIDE 31

@seg1o

Broker Replacement & Movement

3. Replace broker

Replace broker pod instance with kubectl

$ kubectl delete pod kafka-1

  • Old broker 1 instance is shutdown and resources deallocated
  • Deploy plan provisions a new broker 1 instance
  • New broker 1 is assigned same id as old broker 1: 1

34

Broker 0

P P P P

Broker 1

P P

Broker 2

P P P P P P

Broker 1

X

slide-32
SLIDE 32

@seg1o

Broker Replacement & Movement

4. Rebalance/move partitions to new broker

Use Kafka partition reassignment tool

  • Generate an assignment plan with new broker 1
  • Pick a fraction of the measured max bitrate found in step 1

(Ex. 75%, 80%)

  • Apply plan with bitrate throttle
  • Wait till complete

35

Broker 0

P P P P

Broker 1

P P P P

Broker 2

P P P P

Broker 0

P P P P

Broker 1

P P

Broker 2

P P P P P P

slide-33
SLIDE 33

@seg1o

MirrorMaker

Synchronize Kafka topics between clusters

  • Disaster Recovery
  • Multi Data Center

○ Active / Passive cluster ○ Active / Active cluster

36

Kafka StatefulSet Cluster Operator “KafkaMirrorMaker” CRD

watches

MirrorMaker

deploys

Other Kafka

consumes produces

Data Center A Data Center B

Demo: resources/kafka-mirror-maker.yaml

slide-34
SLIDE 34

@seg1o

Monitoring

37

+ +

Kubernetes Prometheus Grafana

slide-35
SLIDE 35

@seg1o

Monitoring

38

Strimzi exposes a Prometheus Health Endpoint with Prometheus JMX Exporter

Broker Container Kafka Broker Process

Prometheus JMX Exporter Java Agent

0.0.0.0:9404/health

Prometheus Server

Demo: “Production” Strimzi resource: ./resources/pipelines-strimzi.yaml Grafana Dashboard

scrapes

slide-36
SLIDE 36

Conclusion

slide-37
SLIDE 37

@seg1o

Is running Kafka on Kubernetes safe?

40

slide-38
SLIDE 38

@seg1o

Is running Kafka on Kubernetes safe?

Pros

  • Confluent cloud runs on Kubernetes clusters on Google and Amazon
  • Strimzi is an open source component of a commercial product: Red Hat AMQ
  • Kafka data is usually transient

Cons

⚠ Beware of risks running PersistentVolumes and StatefulSets ⚠

  • Still need SRE’s and operations knowledge in production
  • More abstractions -> Harder to reason about
  • Simplistic update strategies for large clusters

41

slide-39
SLIDE 39

@seg1o

Strimzi Project

  • Apache Kafka project for Kubernetes and OpenShifu
  • Licensed under Apache License 2.0
  • Considered stable as of 0.8.2 release (0.11.4 current)
  • Web site: http://strimzi.io/
  • GitHub: https://github.com/strimzi/strimzi-kafka-operator
  • Slack: strimzi.slack.com
  • Mailing list: strimzi@redhat.com
  • Twitter: @strimziio

42

slide-40
SLIDE 40

One More Thing...

slide-41
SLIDE 41

@seg1o

Kafka Lag Exporter

Monitor Kafka Consumer Group Latency and Lag of Apache Kafka applications Main features include

  • Report group and partition metadata as Prometheus metrics
  • Estimate consumer group latency in time
  • Auto-discovery of Strimzi Apache Kafka clusters
  • Installed as a Helm chart

GitHub repo: https://github.com/lightbend/kafka-lag-exporter Blog post: https://bit.ly/2Jzvg8p

44

slide-42
SLIDE 42

@seg1o

Lightbend Platform

45

https://www.lightbend.com/lightbend-platform

slide-43
SLIDE 43
slide-44
SLIDE 44

Thank You!

Sean Glover @seg1o in/seanaglover sean.glover@lightbend.com

Free eBook! https://bit.ly/2J9xmZm