Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About - PowerPoint PPT Presentation

StoneTor Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery

About me…

What type of Fault Tolerance? What is Clustering? Why Aeron? Design for Speeding Up?

What type of Fault Tolerance? What is Clustering? Why Aeron? Design for Speeding Up? Efficiency

https://www.nature.com/articles/d41586-018-06610-y https://www.forbes.com/sites/forbestechcouncil/2017/12/15/why-energy-is-a-big-and-rapidly-growing-problem-for-data-centers/#344456665a30 https://www.datacenterdynamics.com/opinions/power-consumption-data-centers-global-problem/

We seem to assume efficiency/security/quality/etc. is a “special” characteristic added … later… if at all

Fault Tolerance

Service Client

Service Service Service Client

Service Service Service Client Client Client

e t a t S Service Service Service Client Client Client

State “Storage” Service Service Service

e t a t S Service Service Service Client Client Client

Fault Tolerance of State

Partition Replication State Service Service Service

Contiguous Log with Snapshot & Replay

1 2 3 4 5 6 … X

1 2 3 4 State 5 6 … X

1 2 3 4 Snapshot State 5 6 … X

1 2 3 Snapshot 4 Snapshot State 5 5 6 6 State … … X X

Clustered Services

Service Service Service

Service Service Service Log Archive Log Archive Log Archive

Replicated State Machines https://en.wikipedia.org/wiki/State_machine_replication

Replicated State Machines Each Replicated Service Same event log Same input ordering Log replicated locally

Replicated State Machines Checkpoints / Snapshots Event in the log “Rolling” up previous log events

When should a service “consume” (or process) a log event?

Service Service Service 1 2 1 2 3 4 5 6 1 2 3 4 5 6 7 Archive Archive Archive

Once processed, Event can not be altered Only process once event is stable

Replicated State Machines Raft Consensus Event must be recorded at majority of Replicas before being consumed by any Replica https://raft.github.io/

Service Service Service 1 2 1 2 3 4 5 6 1 2 3 4 5 6 7 Archive Archive Archive

Raft Strong Leader Elected member of the Cluster Orders Input Disseminates Consensus

Service Service Service Consensus Consensus Consensus Archive Archive Archive

Replicated State Machines Raft is An algorithm with formal verification

Replicated State Machines Raft is not A specification Nor A complete system

The Real World More than Raft Leader timestamps events Async, not RPC-based Timers

* Leader Service Service Service Consensus Consensus Consensus Archive Archive Archive Client

Benefits

Benefits Determinism Log is immutable Log can be played, stopped, & replayed Each event is timestamped Services restarted from snapshot & log

What Can You Do?

Distributed Key/Value Store Distributed Timers Distributed Locks

Finance Matching Engines Order Management Market Surveillance P&L, Risk, …

Beyond Venue Ticketing / Reservations Auctions Hint - a contended database is a good indicator

Why Aeron?

Aeron Efficient reliable UDP unicast, UDP multicast, and IPC message transport Java, C/C++, C#, Go https://github.com/real-logic/Aeron

Aeron And a little bit more… Very fast Archival & Replay https://github.com/real-logic/Aeron

The “Efficient” bit…

All communications Aeron publications & subscriptions Aeron archival & replay Aeron shared counters

Consensus based on Aeron stream position

Batching Critical to efficient operation Optimizing pipelined throughput

Flow Control Critical to correct operation

Design for Efficiency?

Cache Hit/Miss Ratios Branch Prediction Allocation Rates Garbage Collection Inlining Optimizations

Not… Yet…

Ownership, Dependency, & Coupling Complexity Layers of Abstraction (ain’t free) Resource Management

Closer… But… Still. Not. Yet.

"AmdahlsLaw" by Daniels220 at English Wikipedia - Own work based on: File:AmdahlsLaw.png. Licensed under CC BY-SA 3.0 via Wikimedia Commons

Universal Scalability Law 20 18 16 14 Speedup 12 10 8 6 4 2 0 1 2 4 8 16 32 64 128 256 512 1024 Processors Amdahl USL

Breakdown Interactions Fundamental Sequential Operations

Ingress Message, Sequence, Disseminate Client Leader Ingress Log Log Log (multicast or serial unicast) Event Event Member Status Follower X Follower Y

Followers Append Client Leader Ingress Append Append Log (multicast or serial unicast) Position Position Member Status Follower X Follower Y

Commit Message Client Leader Ingress Commit Commit Log (multicast or serial unicast) Position Position Member Status Follower X Follower Y

Breakdown Interactions Pipeline-able Operation & Batching

Stream Positions Log Event @8192 Leader Follower Append Position @6912 Commit Position @4096 Archive Position @8096 Archive Position @7168 Store locally asynchronous to Position processing by Consensus, & Log processing by Service Log (multicast or serial unicast) Batching: Log, Appends, Commits Member Status

Doesn’t this Complicate Recovery?

Recovery Positions Follower Follower Follower Archive Position @8096 Archive Position @7584 Archive Position @7168 Commit Position @4096 Commit Position @4064 Commit Position @4032 Service Position @4096 Service Position @4064 Service Position @3776 A synchronous system doesn’t make this complexity go away! Election still needs to assert state of the cluster & locally catch-up

Limitations of Efficiency Throughput & Latency

Round-Trip Time (RTT) Service A Service Ox Client Leader Log Event Followers Append Position Commit Position Constant Delay Network Client to Service A: 0.5 RTT Ingress Client to Service Ox: 1 RTT Log (multicast or serial unicast) Client to Service A (on Commit): 1.5 RTT Client to Service Ox (on Commit): 2 RTT Member Status

Limits from Constant Delay Shared Memory RTT <100ns DC RTT <100us Client to Service A: 50ns Client to Service A: 50us Client to Service Ox: 100ns Client to Service Ox: 100us Client to Service A (on Commit): 150ns Client to Service A (on Commit): 150us Client to Service Ox (on Commit): 200ns Client to Service Ox (on Commit): 200us Rack (Kernel Bypass) RTT <10us Client to Service A: 5us Client to Service Ox: 10us Client to Service A (on Commit): 15us Client to Service Ox (on Commit): 20us

Measured Latency at Throughput 100K msgs/sec 200K msgs/sec Intel Xeon Gold 5118 (2.30GHz, 12 cores) 300 32GB DDR4 2400 MHz ECC RAM Intel Optane SSD 900P Series 480GB SolarFlare X2522-PLUS 10GbE NIC 225 All servers are connected to an Arista 7150S RTT (us) 150 CentOS Linux 7.7, kernel 4.4.195-1.el7.elrepo.x86_64 tuned for low-latency workload. 75 Courtesy Mark Price 0 Min 0.50 0.90 0.99 0.9999 0.999999 Max Percentile Single client session, bursts of 20x 200B messages, 3-node cluster, Service(s) echo(es) the payload back.

Takeways Efficiency is part of design Power of a timestamped, replicated log Replicated State Machines

Current Status Aeron Archiving - fully supported Aeron Clustering - pre-release Sponsored by https://weareadaptive.com/

Questions? StoneTor Aeron: https://github.com/real-logic/Aeron Twitter: @toddlmontgomery Thank You!

Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About - PowerPoint PPT Presentation

StoneTor Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault Tolerance? What is Clustering? Why Aeron? Design for Speeding Up? What type of Fault Tolerance? What is Clustering? Why Aeron? Design

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute Whats Left For

Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume

Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz Agenda Threats

Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate circuit 0/1 N gate

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time Replication and

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

No SQL? Image credit: http://browsertoolkit.com/fault-tolerance.png No SQL? Image credit:

Fibre bundle framework for unitary quantum fault tolerance Lucy Liuxuan Zhang University of

Towards an Efficient Fault-Tolerance Scheme for GLB Claudia Fohry, Marco Bungart and Jonas Posner

Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Class Overview Introduction

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Improving Scalability and Fault Improving Scalability and Fault Tolerance in an Application

Watershed Restoration Program and Purpose of the Batchellors Run & Woodlawn Stream

Groundwater Statistics and Interpretation at Landfills It can be a useful tool . . . honest!

Fault T olerance for Highly Available Internet Services: Concept, Approaches, and Issues By

Learning Objectives After this training, participants should be able

Bahiagrass Grows by rhizomes Grows in bunches Likes acidic soil rhizome Bahiagrass (

DAMAGE TOLERANCE ANALYSIS OF ADHESIVELY BONDED REPAIRS TO COMPOSITES STRUCTURES C. H. Wang 1 *, J.

of International MRLS AAPCO 2018 Annual Meeting Presentation Donna Davis, Acting Associate

Osmotin Transgenics and Aphid Tolerance Shanmukh Salimath, Kent D. Chapman Department of

Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About - PowerPoint PPT Presentation

StoneTor Fault Tolerance at Speed Todd L. Montgomery @toddlmontgomery About me What type of Fault Tolerance? What is Clustering? Why Aeron? Design for Speeding Up? What type of Fault Tolerance? What is Clustering? Why Aeron? Design

Distributed Systems 5. Fault Tolerant Systems Fault-Tolerance - 1 Lszl Bszrmnyi

Lecture 10: Fault Tolerance Fault Tolerant Concurrent Computing The main principles of fault

Adaptability and Fault Tolerance Adaptability and Fault Tolerance Rog rio rio de Lemos de

General Principles of Fault- Tolerance Daniel Gottesman Perimeter Institute Whats Left For

Roadmap for Section 10.1 The Notion of Fault-Tolerance Fault-Tolerance Support in NTFS Volume

Challenging Malicious Inputs with Fault Tolerance Techniques Bruno Luiz Agenda Threats

Rigorous fault-tolerance thresholds Ben Reichardt UC Berkeley N gate circuit 0/1 N gate

Fault Tolerance and Robustness in Concurrent Systems Faults, errors, failures, and fault

CSci 5105 Introduction to Distributed Systems Fault Tolerance Last Time Replication and

Fault Tolerance in Message Passing Fault Tolerance in Message Passing and in Action and in

No SQL? Image credit: http://browsertoolkit.com/fault-tolerance.png No SQL? Image credit:

Fibre bundle framework for unitary quantum fault tolerance Lucy Liuxuan Zhang University of

Towards an Efficient Fault-Tolerance Scheme for GLB Claudia Fohry, Marco Bungart and Jonas Posner

Distributed Systems (ICE 601) Fault Tolerance Dongman Lee ICU Class Overview Introduction

PERFORMANCE FAULT TOLERANCE AVAILABILITY FEATURE VELOCITY PERFORMANCE FAULT TOLERANCE

Improving Scalability and Fault Improving Scalability and Fault Tolerance in an Application

Watershed Restoration Program and Purpose of the Batchellors Run &amp; Woodlawn Stream

Groundwater Statistics and Interpretation at Landfills It can be a useful tool . . . honest!

Fault T olerance for Highly Available Internet Services: Concept, Approaches, and Issues By

Learning Objectives After this training, participants should be able

Bahiagrass Grows by rhizomes Grows in bunches Likes acidic soil rhizome Bahiagrass (

DAMAGE TOLERANCE ANALYSIS OF ADHESIVELY BONDED REPAIRS TO COMPOSITES STRUCTURES C. H. Wang 1 *, J.

of International MRLS AAPCO 2018 Annual Meeting Presentation Donna Davis, Acting Associate

Osmotin Transgenics and Aphid Tolerance Shanmukh Salimath, Kent D. Chapman Department of

Watershed Restoration Program and Purpose of the Batchellors Run & Woodlawn Stream