CRaft: Building High-Performance Consensus Protocols with Accurate - - PowerPoint PPT Presentation

craft building high performance consensus protocols with
SMART_READER_LITE
LIVE PREVIEW

CRaft: Building High-Performance Consensus Protocols with Accurate - - PowerPoint PPT Presentation

CRaft: Building High-Performance Consensus Protocols with Accurate Clocks Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang *Stanford University, eBay Inc. Overview CRaft: a multi-leader extension to Raft enabled by


slide-1
SLIDE 1

CRaft: Building High-Performance Consensus Protocols with Accurate Clocks

Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang† *Stanford University, †eBay Inc.

slide-2
SLIDE 2

Overview

  • CRaft: a multi-leader extension to Raft enabled by accurate clocks

2

Better performance Existing protocol Synchronized clocks

slide-3
SLIDE 3

State Machines

  • Maintain internal states
  • Respond to external requests
  • Examples: databases, storage systems
  • How do we make them reliable?

3

State x: 2 y: 3 x←1 State x: 1 y: 3 State x: 1 y: 1 y←x

slide-4
SLIDE 4

Replicated State Machines

Servers State Machine x: 2 y: 3 Consensus Log

x←1 y←1 …

Client Client x←1 State Machine x: 2 y: 3 Consensus Log

x←1 y←1 …

State Machine x: 2 y: 3 Consensus Log

x←1 y←1 …

  • Consensus: ensures all servers agree on the same log
  • Continues to operate if at least a majority of servers are up

4

Diego Ongaro and John Ousterhout. The Raft consensus algorithm. https://raft.github.io

slide-5
SLIDE 5

The Raft Consensus Protocol

  • A widely used consensus protocol
  • Leader-based
  • Benefits: simple and efficient
  • Limitation: leader is the bottleneck for

throughput and scalability

5

Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus

  • algorithm. In USENIX Annual Technical Conference. 305–319.

Follower Follower Leader Client Client Client Leader

x←1 y←1 … x←1 y←1 … x←1 y←1 …

slide-6
SLIDE 6

Limitations with Single Leader

  • Single leader limits throughput and scalability

6

Performance degrades with high load Decreasing throughput with larger cluster sizes

Load increases

slide-7
SLIDE 7

Challenge in a Multi-Leader Protocol

7

Replicate my log I have a log I have a log I have a log

  • k
  • k

Single leader Multiple leaders

  • Challenge: how to coordinate leaders?
  • Solution: agreement on time => agreement on order
slide-8
SLIDE 8

Clock Synchronization

Percentile 90th 99th 99.9th max Clock offset 7us 11us 15us 26us Distribution of clock offsets between servers (20 machines on CloudLab)

  • Achieving agreement on time is not trivial in a distributed system
  • Huygens: a software clock synchronization system

8

Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In NSDI 2018. 81–94.

Huygens precision: ~20us NTP precision: ~20ms

slide-9
SLIDE 9

Our Approach: CRaft

Raft CRaft (Clocks + Raft)

Scalability

Output A replicate log A replicated log

Safety & Consistency

✓ ✓

Same guarantee as Raft Practicability

✓ ✓

A simple add-on to Raft; easy to implement

9

slide-10
SLIDE 10

The CRaft Consensus Protocol

slide-11
SLIDE 11

CRaft Overview

Follower Server 2 Leader Server 1 Follower Server 3 Client Leader Follower Follower Follower Follower Leader Group 1 Group 2 Group 3 Client Client Merged log Merged log Merged log

11

slide-12
SLIDE 12

Life of a Request

Leader Server 1 Follower Follower Client State Machine log Follower Server 2 Leader Follower State Machine log Follower Server 3 Follower Leader State Machine log

12

Replicate Commit

Merge

Execute

  • Replicated on a majority of servers
  • Safe and durable
slide-13
SLIDE 13

Timestamp Management

  • CRaft guarantees monotonically increasing timestamps in each log
  • Safe time: indicates how up-to-date a log is

1 x←1 4 y←1 6 y←x 17 x←2 18 x←5 1 2 3 4 5 index

13

Leader Server Follower Follower Merged log timestamp command Log Safe time = 20

slide-14
SLIDE 14

Safe Times

14

1 4 6 17 18 Log Safe time = 20 23 … 25 Current entries: timestamps <= safe time No entries come in with a timestamp smaller than safe time Now How up-to-date is this log?

slide-15
SLIDE 15

Merging

1 4 6 17 2 5 12 18 3 8 10 15 1 2 3 4 5 Log 1 Log 2 Log 3 6 5 8 ts = 18 index ts = 12 ts = 19 merged log

  • Merge up to the smallest safe time
  • CRaft ensures merged log in monotonically increasing timestamp order

15

10 12 …

slide-16
SLIDE 16

Optimization: Fast Path

  • Fast path: respond to clients early for certain write operations

16

Replicate Commit Merge Execute Normal path: respond after execution Fast path: respond before execution

slide-17
SLIDE 17

Evaluation

slide-18
SLIDE 18

Experiment Setup

  • Implementation
  • Based on HashiCorp Raft – a popular and well-optimized implementation
  • Environment
  • CloudLab, single data center
  • Workload
  • In-memory key-value store
  • Multiple clients send get or set requests concurrently

18

slide-19
SLIDE 19

Throughput vs Cluster Size

  • Up to ~2x read and ~2.5x write throughput compared to Raft

19

slide-20
SLIDE 20

Latency vs Throughput

Average latency vs throughput (3 servers) 99th percentile latency vs throughput (3 servers)

  • CRaft improves throughput and latency under high load

20

Performance gain under high load

Load increases Load increases

slide-21
SLIDE 21

Average Latency

Performance vs Number of Clients

21

Latency is bounded by clock difference Throughput 2x 2x 2x

  • NTP precision: ~20ms, Huygens: ~20us
slide-22
SLIDE 22

Conclusion

22

Better performance Stronger consistency Existing systems Synchronized clocks

  • Accurate clocks enable better performance and/or consistency
slide-23
SLIDE 23

Thank you!