keeping raft afloat
play

Keeping RAFT Afloat Cloud Scale Distributed Consensus Philip Haynes - PowerPoint PPT Presentation

Keeping RAFT Afloat Cloud Scale Distributed Consensus Philip Haynes YOW! Data September 2016 ThreatMetrix Confidential Information Do Not Copy or Distribute Without Express Written Permission CONSENSUS? To the general public, consensus


  1. Keeping RAFT Afloat Cloud Scale Distributed Consensus Philip Haynes YOW! Data September 2016 ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission

  2. CONSENSUS? • To the general public, consensus is usually a good thing….. ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 2

  3. The downside…… • On the other hand, what if the robotic consensus decides that humans should be exterminated…. • In IT reality of course, the role of consensus is much more subtle ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 3

  4. What is a consensus algorithm then? • The consensus problem is fundamental in the control of a multi agent system e.g. multiple servers • A consensus problem requires agreement among a number of processes (or agents) for a single data value • Some of the processes (agents) may fail or be unreliable in other ways, so consensus protocols must be fault tolerant • One option is for all processes (agents) to agree on a majority value e.g. > half the votes ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 4

  5. Hard Distributed Consensus is: • Fundamental where a consistent view of a system in the presence of failure is required • Think financial records / TP systems • Unfortunately perceived as being too difficult and costly for cloud scale – hence “eventually” consistent models • Critical for simplifying big data processing and its analysis where: • Near enough is not good enough; and • Systems must always be up ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 5

  6. Cloud scale at ThreatMetrix Global Device Identity Recognition Rates ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 6

  7. Cloud scale consensus requirements • > 100M digital identity requests per day. Internal SLA < 100ms. Multi-data center. > 400 node Cassandra cluster. • New capability for operational fraud processing: • Capable of processing > 1B events per day • < 5 ms initial detection • Far fewer nodes than 400 (3-5); where • Results are evidentiary (i.e. consistency matters) • We also care about availability and scalability ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 7

  8. Building blocks for cloud scale distributed consensus • Hardware aware programming methods • Aeron messaging. Low latency reliable transport from Martin Thompson et. al. • RAFT. New distributed consensus algorithm designed to be understandable (compared to Paxos etc.) • Asynchronous replicated log system model • Concern existed that distributed consensus is hard to implement * • RocksDB. LSM database originally from Google, now Facebook • Designed for write heavy loads on SSD’s * Our local experiment continues to support this view. ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 8

  9. RAFT – A replicated log • Replicated log => replicated state machine • All servers execute same command in same order • Consensus module ensures proper log replication • Systems makes progress as long as the majority of servers are up • Failure mode: fail-stop (not Byzantine), delayed loss messages ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 9

  10. Replicate across a cluster with a leader adding concept of time (called a term) Odd number of servers to support voting 1. Request Vote RPC to elect leader 2. AppendEntry RPC to replicate log entries 3. When majority of followers append entries the log entry is committed and the state machine may be applied ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 10

  11. Initial RAFT implementation ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 11

  12. Key initial implementation failures • Conceptual: Misunderstood that communication between leader and followers must be viewed as a queue of request and responses • Flow control • Started seeing > 7000 messages in flight • When this happened the system collapsed • Requirement for flow control not understood on raft-dev • State of Practice • TMX RAFT: > 3K msgs/s @ ~300us latency • Public: 20 per second, batched to 50ms over TCP/IP ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 12

  13. Flow control for RAFT: The hypothesis ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 13

  14. RAFT flow control: The implementation • Keep moving averages of round trip time and service time • Keep a record of messages in flight (i.e. queue size and active nodes) • Throttle when: • timeSinceLastReceived < heatBeatTimeOut; and • Queue size >= maxQueueSize; where • maxQueueSize = Math. max(1 + (int)(minRoundTripLatency / serviceTime), 10); ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 14

  15. Performance results: Round trip time and service time Round Trip Time ( m s) For 2 Followers (<750 m s) Histogram of Service Time (<50 m s) 15000 0.015 10000 Frequency 0.010 Density 5000 0.005 0 15 20 25 30 35 40 45 50 0.000 Time ( m s) 200 300 400 500 600 700 Time ( m s) ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 15

  16. Round trip and service time analysis Round Trip Time ( m s) Density Function (<750 m s) Follower 0 Follower 2 0.015 0.010 Density 0.005 0.000 200 300 400 500 600 700 Time ( m s) ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 16

  17. Round trip final analysis Round Trip Time ( m s) For 2 Followers (<750 m s) 0.015 0.010 Density 0.005 0.000 300 350 400 450 500 550 600 650 Time ( m s) ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 17

  18. Getting it to really work • Implementation attempted to utilize multicast to provide discovery across the cluster • Flow control interference between clusters • Modified for independent RAFT cluster flow control • RAFT messages prioritization over command messages • Introduced flow control between the different clusters • DTrace to identify and remove outliers (Units in nano’s ) • Now processing: • 1,600 events per second; to process • Creating and closing 1,600 cases per second ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 18

  19. Conclusion • Aeron and other hardware aware programming techniques are fundamental to reducing the cost for cloud scale services • RAFT and DFSM’s are fundamental for implementing transaction engines but are insufficient • Cloud scale is fundamentally different scale to research systems • Performance model and measure system processing ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 19

  20. Questions? ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission

  21. Study limitations • Yet to fully optimize the system Histogram of Service Time (>100 m s) • Repeat on 10G hardware 60 • More than 4 followers 50 40 • Multi-data center issue Frequency 30 • Flow control during failure scenarios 20 10 0 100 150 200 250 Time ( m s) ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 21

  22. Latency Curve Histogram of Service Time (>100 m s) 60 50 40 Frequency 30 20 10 0 100 150 200 250 Time ( m s) ThreatMetrix Confidential Information – Do Not Copy or Distribute Without Express Written Permission 22

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend