Fault Tolerance at Speed
Todd L. Montgomery @toddlmontgomery
StoneTor
Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About - - PowerPoint PPT Presentation
StoneTor Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault Tolerance? What is Clustering? Why Aeron? Design for Speeding Up? What type of Fault Tolerance? What is Clustering? Why Aeron? Design
Todd L. Montgomery @toddlmontgomery
StoneTor
https://www.forbes.com/sites/forbestechcouncil/2017/12/15/why-energy-is-a-big-and-rapidly-growing-problem-for-data-centers/#344456665a30 https://www.datacenterdynamics.com/opinions/power-consumption-data-centers-global-problem/ https://www.nature.com/articles/d41586-018-06610-y
Service Client
Service Client
Service Client Service Service
Service Client Service Service Client Client
Service Client Service Service Client Client
Service Service Service
Service Client Service Service Client Client
Service Service Service State
1 2 3 4 5 6 X
1 State 2 3 4 5 6 X
1 State 2 3 4 5 6 X
1 State 2 3 4 5 6 X
5 6 X
Snapshot State
Service Service Service
Service Service Service Log Archive Log Archive Log Archive
https://en.wikipedia.org/wiki/State_machine_replication
Replicated State Machines
Replicated State Machines
Service Service Service Archive Archive Archive
1 2 3 4 5 6 1 2 3 4 5 6 7 1 2
Replicated State Machines
https://raft.github.io/
Service Service Service Archive Archive Archive
1 2 3 4 5 6 1 2 3 4 5 6 7 1 2
Service Service Service Archive Archive Archive
1 2 3 4 5 6 1 2 3 4 5 6 7 1 2
Raft
Service Service Service Archive Archive Archive Consensus Consensus Consensus
Replicated State Machines
Replicated State Machines
The Real World
Service Service Service Archive Archive Archive Consensus Consensus Consensus
Client
Benefits
Finance
Beyond
Hint - a contended database is a good indicator
Aeron
https://github.com/real-logic/Aeron
Aeron
https://github.com/real-logic/Aeron
"AmdahlsLaw" by Daniels220 at English Wikipedia - Own work based on: File:AmdahlsLaw.png. Licensed under CC BY-SA 3.0 via Wikimedia Commons
Universal Scalability Law
2 4 6 8 10 12 14 16 18 20 1 2 4 8 16 32 64 128 256 512 1024
Speedup Processors
Amdahl USL
Ingress Message, Sequence, Disseminate
Client Follower X Leader
Ingress
Follower Y
Log (multicast or serial unicast) Member Status Log Event Log Event
Followers Append
Client Follower X Leader
Ingress
Follower Y
Log (multicast or serial unicast) Member Status Append Position Append Position
Commit Message
Client Follower X Leader
Ingress
Follower Y
Log (multicast or serial unicast) Member Status Commit Position Commit Position
Follower Leader
Log (multicast or serial unicast) Member Status Commit Position @4096 Append Position @6912 Log Event @8192
Stream Positions
Archive Position @8096 Archive Position @7168
Store locally asynchronous to Position processing by Consensus, & Log processing by Service Batching: Log, Appends, Commits
Follower
Recovery Positions
Archive Position @8096 Archive Position @7168
A synchronous system doesn’t make this complexity go away! Election still needs to assert state of the cluster & locally catch-up
Follower Follower
Archive Position @7584 Commit Position @4096 Commit Position @4064 Commit Position @4032 Service Position @4096 Service Position @4064 Service Position @3776
Client Followers Leader
Ingress Log (multicast or serial unicast) Member Status Commit Position Append Position Log Event
Client to Service A: 0.5 RTT Client to Service Ox: 1 RTT Client to Service A (on Commit): 1.5 RTT Client to Service Ox (on Commit): 2 RTT
Constant Delay Network
Service A Service Ox
Round-Trip Time (RTT)
Client to Service A: 50ns Client to Service Ox: 100ns Client to Service A (on Commit): 150ns Client to Service Ox (on Commit): 200ns
Limits from Constant Delay
Shared Memory RTT <100ns
Client to Service A: 50us Client to Service Ox: 100us Client to Service A (on Commit): 150us Client to Service Ox (on Commit): 200us
DC RTT <100us
Client to Service A: 5us Client to Service Ox: 10us Client to Service A (on Commit): 15us Client to Service Ox (on Commit): 20us
Rack (Kernel Bypass) RTT <10us
Measured Latency at Throughput
RTT (us) 75 150 225 300 Percentile Min 0.50 0.90 0.99 0.9999 0.999999 Max
100K msgs/sec 200K msgs/sec
Intel Xeon Gold 5118 (2.30GHz, 12 cores) 32GB DDR4 2400 MHz ECC RAM Intel Optane SSD 900P Series 480GB SolarFlare X2522-PLUS 10GbE NIC All servers are connected to an Arista 7150S CentOS Linux 7.7, kernel 4.4.195-1.el7.elrepo.x86_64 tuned for low-latency workload. Courtesy Mark Price Single client session, bursts of 20x 200B messages, 3-node cluster, Service(s) echo(es) the payload back.
Sponsored by
https://weareadaptive.com/
Aeron: https://github.com/real-logic/Aeron Twitter: @toddlmontgomery
Questions?
StoneTor