CRaft: Building High-Performance Consensus Protocols with Accurate - - PowerPoint PPT Presentation
CRaft: Building High-Performance Consensus Protocols with Accurate - - PowerPoint PPT Presentation
CRaft: Building High-Performance Consensus Protocols with Accurate Clocks Feiran Wang*, Balaji Prabhakar*, Mendel Rosenblum*, Gene Zhang *Stanford University, eBay Inc. Overview CRaft: a multi-leader extension to Raft enabled by
Overview
- CRaft: a multi-leader extension to Raft enabled by accurate clocks
2
Better performance Existing protocol Synchronized clocks
State Machines
- Maintain internal states
- Respond to external requests
- Examples: databases, storage systems
- How do we make them reliable?
3
State x: 2 y: 3 x←1 State x: 1 y: 3 State x: 1 y: 1 y←x
Replicated State Machines
Servers State Machine x: 2 y: 3 Consensus Log
x←1 y←1 …
Client Client x←1 State Machine x: 2 y: 3 Consensus Log
x←1 y←1 …
State Machine x: 2 y: 3 Consensus Log
x←1 y←1 …
- Consensus: ensures all servers agree on the same log
- Continues to operate if at least a majority of servers are up
4
Diego Ongaro and John Ousterhout. The Raft consensus algorithm. https://raft.github.io
The Raft Consensus Protocol
- A widely used consensus protocol
- Leader-based
- Benefits: simple and efficient
- Limitation: leader is the bottleneck for
throughput and scalability
5
Diego Ongaro and John Ousterhout. 2014. In search of an understandable consensus
- algorithm. In USENIX Annual Technical Conference. 305–319.
Follower Follower Leader Client Client Client Leader
x←1 y←1 … x←1 y←1 … x←1 y←1 …
Limitations with Single Leader
- Single leader limits throughput and scalability
6
Performance degrades with high load Decreasing throughput with larger cluster sizes
Load increases
Challenge in a Multi-Leader Protocol
7
Replicate my log I have a log I have a log I have a log
- k
- k
Single leader Multiple leaders
- Challenge: how to coordinate leaders?
- Solution: agreement on time => agreement on order
Clock Synchronization
Percentile 90th 99th 99.9th max Clock offset 7us 11us 15us 26us Distribution of clock offsets between servers (20 machines on CloudLab)
- Achieving agreement on time is not trivial in a distributed system
- Huygens: a software clock synchronization system
8
Yilong Geng, Shiyu Liu, Zi Yin, Ashish Naik, Balaji Prabhakar, Mendel Rosenblum, and Amin Vahdat. Exploiting a natural network effect for scalable, fine-grained clock synchronization. In NSDI 2018. 81–94.
Huygens precision: ~20us NTP precision: ~20ms
Our Approach: CRaft
Raft CRaft (Clocks + Raft)
Scalability
Output A replicate log A replicated log
Safety & Consistency
✓ ✓
Same guarantee as Raft Practicability
✓ ✓
A simple add-on to Raft; easy to implement
9
The CRaft Consensus Protocol
CRaft Overview
Follower Server 2 Leader Server 1 Follower Server 3 Client Leader Follower Follower Follower Follower Leader Group 1 Group 2 Group 3 Client Client Merged log Merged log Merged log
11
Life of a Request
Leader Server 1 Follower Follower Client State Machine log Follower Server 2 Leader Follower State Machine log Follower Server 3 Follower Leader State Machine log
12
Replicate Commit
Merge
Execute
- Replicated on a majority of servers
- Safe and durable
Timestamp Management
- CRaft guarantees monotonically increasing timestamps in each log
- Safe time: indicates how up-to-date a log is
1 x←1 4 y←1 6 y←x 17 x←2 18 x←5 1 2 3 4 5 index
13
Leader Server Follower Follower Merged log timestamp command Log Safe time = 20
Safe Times
14
1 4 6 17 18 Log Safe time = 20 23 … 25 Current entries: timestamps <= safe time No entries come in with a timestamp smaller than safe time Now How up-to-date is this log?
Merging
1 4 6 17 2 5 12 18 3 8 10 15 1 2 3 4 5 Log 1 Log 2 Log 3 6 5 8 ts = 18 index ts = 12 ts = 19 merged log
- Merge up to the smallest safe time
- CRaft ensures merged log in monotonically increasing timestamp order
15
10 12 …
Optimization: Fast Path
- Fast path: respond to clients early for certain write operations
16
Replicate Commit Merge Execute Normal path: respond after execution Fast path: respond before execution
Evaluation
Experiment Setup
- Implementation
- Based on HashiCorp Raft – a popular and well-optimized implementation
- Environment
- CloudLab, single data center
- Workload
- In-memory key-value store
- Multiple clients send get or set requests concurrently
18
Throughput vs Cluster Size
- Up to ~2x read and ~2.5x write throughput compared to Raft
19
Latency vs Throughput
Average latency vs throughput (3 servers) 99th percentile latency vs throughput (3 servers)
- CRaft improves throughput and latency under high load
20
Performance gain under high load
Load increases Load increases
Average Latency
Performance vs Number of Clients
21
Latency is bounded by clock difference Throughput 2x 2x 2x
- NTP precision: ~20ms, Huygens: ~20us
Conclusion
22
Better performance Stronger consistency Existing systems Synchronized clocks
- Accurate clocks enable better performance and/or consistency