Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has - - PowerPoint PPT Presentation

kafka needs no keeper
SMART_READER_LITE
LIVE PREVIEW

Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has - - PowerPoint PPT Presentation

1 Kafka Needs No Keeper Colin McCabe 2 Introduction Kafka has gotten its mileage out of Zookeeper But it is still a second system KIP-500 has been adopted by the community This is not a 1-1 replacement Weve been


slide-1
SLIDE 1

1

Kafka Needs No Keeper

Colin McCabe

slide-2
SLIDE 2

2

  • Kafka has gotten its mileage out of Zookeeper
  • But it is still a second system
  • KIP-500 has been adopted by the community
  • This is not a 1-1 replacement
  • We’ve been headed this direction for years

Introduction

slide-3
SLIDE 3

3

Evolution of Apache Kafka Clients

slide-4
SLIDE 4

4

Producer Consumer Admin Tools

slide-5
SLIDE 5

5

write to topics Producer Consumer Admin Tools

slide-6
SLIDE 6

6

write to topics read from topics Producer Consumer Admin Tools

slide-7
SLIDE 7

7

write to topics read from topics

  • ffset

fetch/commit group partition assignment Producer Consumer Admin Tools

slide-8
SLIDE 8

8

write to topics read from topics

  • ffset

fetch/commit group partition assignment topic create/delete Producer Consumer Admin Tools

slide-9
SLIDE 9

9

Consumer Group Coordinator

slide-10
SLIDE 10

10

Consumer

  • ffset

fetch/commit group partition assignment read from topics

slide-11
SLIDE 11

11

Consumer

  • ffset

fetch/commit group partition assignment read from topics Consumer APIs

  • Fetch
slide-12
SLIDE 12

12

Consumer

  • ffset

fetch/commit group partition assignment read from topics Consumer APIs

  • Fetch
slide-13
SLIDE 13

13

Consumer Consumer APIs

  • Fetch
  • ffset

fetch/commit group partition assignment read from topics

__offsets

slide-14
SLIDE 14

14

  • ffset

fetch/commit Consumer group partition assignment read from topics Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch

__offsets

slide-15
SLIDE 15

15

Consumer group partition assignment read from topics

  • ffset

fetch/commit Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch

__offsets

slide-16
SLIDE 16

16

Consumer group partition assignment read from topics

  • ffset

fetch/commit Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch

__offsets

slide-17
SLIDE 17

17

group partition assignment Consumer read from topics

  • ffset

fetch/commit Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch
  • JoinGroup
  • SyncGroup
  • Heartbeat

__offsets

slide-18
SLIDE 18

18

Consumer read from topics

  • ffset

fetch/commit group partition assignment Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch
  • JoinGroup
  • SyncGroup
  • Heartbeat

__offsets

slide-19
SLIDE 19

19

Consumer read from topics

  • ffset

fetch/commit group partition assignment Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch
  • JoinGroup
  • SyncGroup
  • Heartbeat

__offsets

slide-20
SLIDE 20

20

read from topics

  • ffset

fetch/commit group partition assignment Consumer Consumer APIs

  • Fetch
  • OffsetCommit
  • OffsetFetch
  • JoinGroup
  • SyncGroup
  • Heartbeat

__offsets

slide-21
SLIDE 21

21

Consumer Producer Admin Tools create/delete topics

slide-22
SLIDE 22

22

Kafka Security and the Admin Client

slide-23
SLIDE 23

23

Consumer Producer create/delete topics Admin Tools

slide-24
SLIDE 24

24

ACL Enforcement create/delete topics Admin Tools Consumer Producer

slide-25
SLIDE 25

25

create/delete topics ACL Enforcement Admin Tools Consumer Producer

slide-26
SLIDE 26

26

create/delete topics ACL Enforcement Admin Tools

slide-27
SLIDE 27

27

AdminClient Admin Tools ACL Enforcement create/delete topics

slide-28
SLIDE 28

28

AdminClient Admin Tools ACL Enforcement create/delete topics Admin APIs:

  • CreateTopics
  • DeleteTopics
  • AlterConfigs
  • ...
slide-29
SLIDE 29

29

Admin APIs:

  • CreateTopics
  • DeleteTopics
  • AlterConfigs
  • ...

AdminClient Admin Tools ACL Enforcement

slide-30
SLIDE 30

30

Producer Consumer AdminClient Client APIs:

  • Produce
  • Fetch
  • Metadata
  • CreateTopics
  • DeleteTopics
  • ...
slide-31
SLIDE 31

31

Producer Consumer AdminClient Client APIs:

  • Produce
  • Fetch
  • Metadata
  • CreateTopics
  • DeleteTopics
  • ...
  • Encapsulation
  • Security
  • Validation
  • Compatibility
slide-32
SLIDE 32

32

Inter Broker Communication

slide-33
SLIDE 33

33

slide-34
SLIDE 34

34

Broker Registration ACL Management Dynamic Configuration ISR Management

slide-35
SLIDE 35

35

Controller Broker Registration ACL Management Dynamic Configuration ISR Management

slide-36
SLIDE 36

36

Controller Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-37
SLIDE 37

37

Controller Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-38
SLIDE 38

38

Controller Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-39
SLIDE 39

39

Controller Controller APIs:

  • LeaderAndIsr
  • UpdateMetadata
  • StopReplica

Leader/ISR Push Update Metadata Stop/Delete Replica Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-40
SLIDE 40

40

Controller Controller APIs:

  • LeaderAndIsr
  • UpdateMetadata
  • StopReplica

Leader/ISR Push Update Metadata Stop/Delete Replica Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-41
SLIDE 41

41

Controller Controller APIs:

  • LeaderAndIsr
  • UpdateMetadata
  • StopReplica

Leader/ISR Push Update Metadata Stop/Delete Replica Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-42
SLIDE 42

42

Controller Controller APIs:

  • LeaderAndIsr
  • UpdateMetadata
  • StopReplica
  • AlterIsr

Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election Leader/ISR Push Update Metadata Stop/Delete Replica

slide-43
SLIDE 43

43

Controller Leader/ISR Push Update Metadata Stop/Delete Replica ISR Management Controller APIs:

  • LeaderAndIsr
  • UpdateMetadata
  • StopReplica
  • AlterIsr

Broker Registration ACL Management Dynamic Configuration ISR Management Controller Election

slide-44
SLIDE 44

44

slide-45
SLIDE 45

45

  • Encapsulation
  • Compatibility
  • Ownership
slide-46
SLIDE 46

46

Broker Liveness

slide-47
SLIDE 47

47

Zk Session

slide-48
SLIDE 48

48 /brokers/1 -> { host: 10.10.10.1:9092 rack: rack-1 }

slide-49
SLIDE 49

49 /brokers/1 -> { host: 10.10.10.1:9092 rack: rack-1 }

slide-50
SLIDE 50

50

slide-51
SLIDE 51

51

Watch trigger

slide-52
SLIDE 52

52

Watch trigger Broker 1 is offline

slide-53
SLIDE 53

53

Network Partition Resilience

slide-54
SLIDE 54

54

slide-55
SLIDE 55

55

Case 1: Total partition

slide-56
SLIDE 56

56

Case 2: Broker partition

slide-57
SLIDE 57

57

Case 3: Zk Partition

slide-58
SLIDE 58

58

Case 4: Controller partition

slide-59
SLIDE 59

59

Metadata Inconsistency

slide-60
SLIDE 60

60

slide-61
SLIDE 61

61

Metadata Source

  • f Truth
slide-62
SLIDE 62

62

Metadata Source

  • f Truth

Metadata Cache

  • sync writes
  • async updates
slide-63
SLIDE 63

63

Metadata Source

  • f Truth

Metadata Cache

  • async update

Metadata Cache

  • sync writes
  • async updates

Metadata Cache

  • async update
slide-64
SLIDE 64

64

slide-65
SLIDE 65

65

slide-66
SLIDE 66

66

slide-67
SLIDE 67

67

Last Resort: > rmr /controller

slide-68
SLIDE 68

68

Last Resort: > rmr /controller New controller!

slide-69
SLIDE 69

69

Last Resort: > rmr /controller Load ALL Metadata

slide-70
SLIDE 70

70

Last Resort: > rmr /controller Load ALL Metadata

slide-71
SLIDE 71

71

Last Resort: > rmr /controller Push ALL Metadata

slide-72
SLIDE 72

72

Last Resort: > rmr /controller Push ALL Metadata

slide-73
SLIDE 73

73

Last Resort: > rmr /controller Push ALL Metadata How do you know the metadata has diverged?

slide-74
SLIDE 74

74

Performance of Controller Initialization

slide-75
SLIDE 75

75

slide-76
SLIDE 76

76

slide-77
SLIDE 77

77

New controller!

slide-78
SLIDE 78

78

Load ALL Metadata

slide-79
SLIDE 79

79

Load ALL Metadata Complexity: O(N) N = number of partitions

slide-80
SLIDE 80

80

slide-81
SLIDE 81

81

Push ALL Metadata

slide-82
SLIDE 82

82

Push ALL Metadata Complexity: O(N*M) N = number of partitions M = number of brokers

slide-83
SLIDE 83

83

Metadata as an Event Log

slide-84
SLIDE 84

84 84

Metadata as an Event Log

  • Each change becomes a

message

  • Changes are propagated

to all brokers

... 924 Create topic ”foo” 925 Delete topic “bar” 926 Add node 4 to the cluster 927 Create topic “baz” 928 Alter ISR for “foo-0” 929 Add node 5 to the cluster

slide-85
SLIDE 85

85 85

Metadata as an Event Log

  • Clear ordering
  • Can send deltas
  • Offset tracks consumer

position

  • Easy to measure lag

... 924 Create topic ”foo” 925 Delete topic “bar” 926 Add node 4 to the cluster 927 Create topic “baz” 928 Alter ISR for “foo-0” 929 Add node 5 to the cluster

slide-86
SLIDE 86

86

Consumer Consumer Consumer

  • ffset=3
  • ffset=1
  • ffset=2
slide-87
SLIDE 87

87

  • ffset=3
  • ffset=1
  • ffset=2

Broker Broker Broker

?

slide-88
SLIDE 88

88

  • ffset=3
  • ffset=1
  • ffset=2

Broker Broker Broker Controller

slide-89
SLIDE 89

89

Can we use the existing Kafka log replication protocol?

  • How do we elect the leader?

We need a self-managed quorum.

Implementing the Controller Log

slide-90
SLIDE 90

90

Can we use the existing Kafka log replication protocol?

  • How do we elect the leader?

We need a self-managed quorum.

Implementing the Controller Log

Enter Raft. Leader election is by simple majority.

slide-91
SLIDE 91

91

Kafka Raft Writes Single Leader Single Leader Fencing Monotonically increasing epoch Monotonically increasing term Log reconciliation Offset and epoch Term and index Push/Pull Pull Push Commit Semantics ISR Majority Leader Election From ISR through Zookeeper Majority

slide-92
SLIDE 92

92

The Controller Quorum

slide-93
SLIDE 93

93

  • ffset=1
  • ffset=2

Broker Broker Controller Controller Controller

The Controller Raft Quorum

  • The leader is the active controller
  • Controls reads / writes to the log
  • Typically 3 or 5 nodes, like ZK
slide-94
SLIDE 94

94

  • ffset=1
  • ffset=2

Broker Broker Controller Controller Controller

Instant Failover

  • Low-latency failover via Raft election
  • Standbys contain all data in memory
  • Brokers do not need to re-fetch
slide-95
SLIDE 95

95

/mnt/logs/kafka/metadata

  • ffset=1

Broker Broker Controller Controller Controller

Metadata Caching

  • Brokers can persist metadata to disk
  • Only fetch what they need
  • Use snapshots if we’re too far behind

/mnt/logs/kafka/metadata

  • ffset=2
slide-96
SLIDE 96

96

Broker Registration

  • Building a map of the

cluster

  • What brokers exist in

the cluster?

  • How can they be

reached?

Controller

slide-97
SLIDE 97

97

Broker Registration

  • Brokers send

heartbeats to the active controller

  • The controller uses this

to build a map of the cluster

Controller

slide-98
SLIDE 98

98

Controller

Broker Registration

  • Brokers send

heartbeats to the active controller

  • The controller uses this

to build a map of the cluster

  • The controller also tells

brokers if they should be fenced or shut down

slide-99
SLIDE 99

99

Controller

Fencing

  • Brokers need to be

fenced if they’re partitioned from the controller, or can’t keep up

  • Brokers self-fence if

they can’t talk to the controller

slide-100
SLIDE 100

100

Handling network partitions

slide-101
SLIDE 101

101

Case 1: Total partition

slide-102
SLIDE 102

102

Case 1: Total partition

slide-103
SLIDE 103

103

Case 2: Broker partition

slide-104
SLIDE 104

104

Case 3: Controller partition

slide-105
SLIDE 105

105

Case 3: Controller partition

slide-106
SLIDE 106

106

Deployment

Current KIP-500 Configuration File Kafka and ZooKeeper Kafka Metrics Kafka and ZK Kafka Administrative Tools ZK Shell, Four letter words, Kafka tools Kafka tools Security Kafka and ZK Kafka

slide-107
SLIDE 107

107

Shared Controller Nodes

  • Fewer resources

used

  • Single node

clusters (eventually)

slide-108
SLIDE 108

108

Separate Controller Nodes

  • Better resource

isolation

  • Good for big

clusters

slide-109
SLIDE 109

109

Roadmap

slide-110
SLIDE 110

110

Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum

slide-111
SLIDE 111

111

Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum Incremental KIP-4 Improvements

  • Create new APIs
  • Deprecate direct ZK

access

slide-112
SLIDE 112

112

Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum Broker-Side Fixes

  • Remove deprecated

direct ZK access for tools

  • Create broker-side

APIs

  • Centralize ZK access

in the controller

slide-113
SLIDE 113

113

Remove Client-side ZK dependencies Remove Broker-side ZK dependencies Controller Quorum First Release without ZooKeeper

  • Raft
  • Controller quorum
slide-114
SLIDE 114

114

Upgrade Issues

  • Tools using ZK
  • Brokers

accessing ZK

  • State in ZK

KIP-500 Release Older Kafka Release

slide-115
SLIDE 115

115

Bridge Release KIP-500 Release Older Kafka Release

Bridge Release

  • No ZK access

from tools, brokers (except controller)

slide-116
SLIDE 116

116

Upgrading

  • Starting from the

bridge release

slide-117
SLIDE 117

117

Upgrading

  • Start new controller

nodes (possibly combined)

  • Quorum elects leader
  • Claims leadership in

ZK

slide-118
SLIDE 118

118

Upgrading

  • Roll nodes one by
  • ne as usual
  • Controller continues

sending LeaderAndIsr, etc. to

  • ld nodes
slide-119
SLIDE 119

119

Upgrading

  • When all brokers

have been rolled, decommission ZK nodes

slide-120
SLIDE 120

120

Conclusion

slide-121
SLIDE 121

121

Apache ZooKeeper has served us well

  • KIP-500 is not a 1:1 replacement, but a different

paradigm We have already started removing ZK from clients

  • Consumer, AdminClient
  • Improved encapsulation, security, upgradability
slide-122
SLIDE 122

122

Metadata should be managed as a log

  • Deltas, ordering, caching
  • Controller Failover, Fencing
  • Improved scalability, robustness, easier deployment

The metadata log must be self-managed

  • Raft
  • Controller quorum
slide-123
SLIDE 123

123

It will take a few releases to implement KIP-500

  • Additional KIPs for APIs, Raft, Metadata, etc.

Rolling upgrades will be supported

  • Bridge release
  • Post-ZK release

Kafka needs no Keeper

slide-124
SLIDE 124

124

cnfl.io/meetups cnfl.io/blog cnfl.io/slack

THANK YOU

Colin McCabe cmccabe@confluent.io