[PPT] - Implementing Raft protocol by coroutines and Ktor Framework Andrii PowerPoint Presentation

SLIDE 1

Implementing Raft protocol by coroutines and Ktor Framework

Andrii Rodionov @AndriiRodionov

SLIDE 2

About me

Devoxx Ukraine organizer
KNight Kyiv co-organizer
JUG UA Leader
Kyiv Kotlin User Group

Co-leader

SLIDE 3

Devoxx Ukraine

30% discount Code: DevoxxUAKotlin

devoxx.org.ua

SLIDE 4

Agenda

Consensus algorithms
Replicated state machine
Raft basics
Raft algorithm building blocks
Implementation details
Demo

SLIDE 5

Consensus algorithms overview

allow a collection of machines to work as a coherent group
replicated state machines

○ servers compute identical copies of the same state

can survive the failures of some of its members
play a key role in building reliable distributed systems

(ZooKeeper, HDFS, …)

SLIDE 6

Replicated State Machine

Replicated log ⇒ replicated state machine

○ All servers execute same commands in same order

Consensus module ensures proper log replication
System makes progress as long as any majority of servers are up
Failure model: fail-stop (not Byzantine), delayed/lost messages

SLIDE 7

Replicated State Machine

Replicated log ⇒ replicated state machine

○ All servers execute same commands in same order

Consensus module ensures proper log replication
System makes progress as long as any majority of servers are up
Failure model: fail-stop (not Byzantine), delayed/lost messages

SLIDE 8

Replicated State Machine

Replicated log ⇒ replicated state machine

○ All servers execute same commands in same order

Consensus module ensures proper log replication
System makes progress as long as any majority of servers are up
Failure model: fail-stop (not Byzantine), delayed/lost messages

SLIDE 9

Replicated State Machine

Replicated log ⇒ replicated state machine

○ All servers execute same commands in same order

Consensus module ensures proper log replication
System makes progress as long as any majority of servers are up
Failure model: fail-stop (not Byzantine), delayed/lost messages

SLIDE 10

Replicated State Machine

Replicated log ⇒ replicated state machine

○ All servers execute same commands in same order

Consensus module ensures proper log replication
System makes progress as long as any majority of servers are up
Failure model: fail-stop (not Byzantine), delayed/lost messages

SLIDE 11

Why Raft?

Paxos has dominated discussion for 25 years

○ Hard to understand ○ Not complete enough for real implementations

New consensus algorithm: Raft

○

Raft is a consensus algorithm for managing a replicated log

○

Diego Ongaro and John Ousterhout - Stanford University

Raft adoption

○

Docker swarm, Consul, Kudu, RavenDB etc

SLIDE 12

Raft consensus algorithm

Leader election

○ Select one of the servers to act as cluster leader ○ Detect crashes, choose new leader

Log replication

○ Leader takes commands from clients, appends them to its log ○ Leader replicates its log to other servers (overwriting inconsistencies)

SLIDE 13

Server States and RPCs

SLIDE 14

Log replication

Client sends command to leader
Leader appends command to its log
Leader sends AppendEntries RPCs to all followers
Once new entry committed:

○ Leader executes command in its state machine, returns result to client ○ Leader notifies followers of committed entries in subsequent AppendEntries RPCs ○ Followers execute committed commands in their state machines

Crashed/slow followers?

○ Leader retries AppendEntries RPCs until they succeed

Optimal performance in common case:

○ One successful RPC to any majority of servers

SLIDE 15

Raft API to implement

service Raft { rpc Vote (RequestVoteRPC) returns (ResponseVoteRPC); rpc Append (RequestAppendEntriesRPC) returns (ResponseAppendEntriesRPC); }

SLIDE 16

Raft architecture

[As a FOLLOWER]

waiting for heartbeats

[As a CANDIDATE]

ask for votes

[As a LEADER]

generate heartbeats
replicate log

[As a Raft-node]

process vote requests
process append requests

SLIDE 17

Raft algorithm building blocks

RPC
State transition
Resettable countdown timer
Retry operation
Leader election
Heartbeat

SLIDE 18

gRPC Kotlin - Coroutine based gRPC for Kotlin

class RaftServer(... ) : RaftGrpcKt.RaftImplBase(), CoroutineScope { fun vote(request: RequestVoteRPC): Deferred<ResponseVoteRPC> = async { … } fun append(request: RequestAppendEntriesRPC): Deferred<ResponseAppendEntriesRPC> = async { … } }

https://github.com/rouzwawi/grpc-kotlin

SLIDE 19

Server States and RPCs

SLIDE 20

Channel<State> for transition between states

val channel = Channel<State>() init { val waitingForHeartbeat = waitingForHeartbeatFromLeaderTimer() launch { channel.consumeEach { when (it) { FOLLOWER -> waitingForHeartbeat.reset() CANDIDATE -> leaderElection() LEADER -> appendRequestAndLeaderHeartbeat() } } } }

SLIDE 21

Resettable Countdown Timer

class ResettableCountdownTimer(private val action: suspend () -> Unit) { private var timer = startTimer() fun reset() { timer.cancel() timer = startTimer() } private fun startTimer(): Timer { val newTimer = Timer() newTimer.schedule(randomDelay()) { runBlocking { action() } } return newTimer } }

SLIDE 22

retry

suspend fun <T> retry(delay: Long = 5000, block: suspend () -> T): T { while (true) { try { return block() } catch (e: Exception) { } delay(delay) } }

SLIDE 23

Leader election

1 2 3 4 5

SLIDE 24

Leader election

val countDownLatch = CountDownLatch(majority)

1 2 3 4 5

SLIDE 25

Leader election

val countDownLatch = CountDownLatch(majority) val job = Job() servers.forEach { srv -> launch(parent = job) { val responseVote = retry { srv.vote( … ) } countDownLatch.countDown() ... } } countDownLatch.await(electionTimeout, TimeUnit.SECONDS)

1 2 3 4 5

SLIDE 26

Leader election (with old coroutines)

val countDownLatch = CountDownLatch(majority) val job = Job() servers.forEach { srv -> launch(parent = job) { val responseVote = retry { srv.vote( … ) } countDownLatch.countDown() ... } } countDownLatch.await(electionTimeout, TimeUnit.SECONDS) job.cancelAndJoin()

1 2 3 4 5

SLIDE 27

Leader election (with structured concurrency)

val countDownLatch = CountDownLatch(majority) coroutineScope { servers.forEach { launch { val responseVote = retry { it.vote( … ) } countDownLatch.countDown() ... } } countDownLatch.await(electionTimeout, TimeUnit.SECONDS) coroutineContext.cancelChildren() }

1 2 3 4 5

SLIDE 28

Heartbeat via fixedRateTimer

fixedRateTimer(period = 2000) { runBlocking { servers.forEach { launch { try { val response = it.append( … ) ... } catch (e: Exception) { } } } } }

SLIDE 29

Ktor

private fun ktorServer() { val server = embeddedServer(Netty, port = 7000) { routing { get("/") { call.respondText("Server $id log ${entries()}", Text.Plain) } get("/cmd/{command}") { appendCommand(call.parameters["command"]) call.respondText("Server $id log ${entries()}", Text.Plain) } } } server.start(wait = false) }

SLIDE 30

Demo

SLIDE 31

Thank you!

Questions? @AndriiRodionov

SLIDE 32

Implementing Raft protocol by coroutines and Ktor Framework Andrii - - PowerPoint PPT Presentation

Implementing Raft protocol by coroutines and Ktor Framework

About me

Co-leader

Devoxx Ukraine

devoxx.org.ua

Agenda

Consensus algorithms overview

Replicated State Machine

Replicated State Machine

Replicated State Machine

Replicated State Machine

Replicated State Machine

Why Raft?

Raft consensus algorithm

Server States and RPCs

Log replication

Raft API to implement

Raft architecture

Raft algorithm building blocks

gRPC Kotlin - Coroutine based gRPC for Kotlin

Server States and RPCs

Channel<State> for transition between states

Resettable Countdown Timer

retry

Leader election

Leader election

Leader election

Leader election (with old coroutines)

Leader election (with structured concurrency)

Heartbeat via fixedRateTimer

Ktor

Demo

Thank you!

Questions? @AndriiRodionov

Literature

https://raft.github.io